C memory model
From Seo Wiki - Search Engine Optimization and Programming Languages
|This article's factual accuracy is disputed. (September 2009)|
Four registers are used to refer to four segments on the 16-bit x86 segmented memory architecture. DS (data segment), CS (code segment), SS (stack segment), and ES (extra segment). A logical address on this platform is written segment:offset, in hexadecimal. In real mode, in order to calculate the physical address of a byte of memory, one left-shifts the contents of the appropriate register 4 bits, and then adds the offset.
For example the logical address 7522:F139 yields the 20-bit physical address:
Note that this process leads to aliasing of memory, such that any given physical address may have multiple logical representations. This makes comparison of pointers difficult.
Pointers can either be near, far, or huge. Near pointers refer to the current segment, so neither DS nor CS must be modified to dereference the pointer. They are the fastest pointers, but are limited to point to 64 kilobytes of memory (the current segment).
Far pointers contain the new value of DS or CS within them. To use them the register must be changed, the memory dereferenced, and then the register restored. They may reference up to 1 megabyte of memory. Note that pointer arithmetic (addition and subtraction) does not modify the segment portion of the pointer, only its offset. Operations which exceed the bounds of zero or 65535 (0xFFFF) will undergo modulo 64K operation just as any normal 16 bit operation.
For example, the code below will wrap around and overwrite itself:
char far* myfarptr = (char far*) 0x50000000L ; unsigned long counter ; for(counter=0; counter<128*1024; counter++) // access 128K memory *(myfarptr+counter) = 7 ; // write all 7s into it
The moment counter becomes (0x10000), the resulting absolute address will roll over to 0x5000:0000.
Huge pointers are essentially far pointers, but are normalized every time they are modified so that they have the highest possible segment for that address. This is very slow but allows the pointer to point to multiple segments, and allows for accurate pointer comparisons, as if the platform were a flat memory model: It forbids the aliasing of memory as described above, so two huge pointers that reference the same memory location are always equal.
The memory models are:
* In the Tiny model, all four segment registers point to the same segment.
** In all models with near data pointers, SS equals DS.
In protected mode a segment cannot be writable, readable and executable. Therefore, when implementing the Small and Tiny memory models the code segment register must point to the same physical address and have the same limit as the data segment register. This defeated one of the features of the 80286, which makes sure data segments are never executable and code segment are never writable (which means that self-modifying code is never allowed). However, on the 80386, with its flat memory model it is possible to protect individual memory pages against writing.
Memory models are not limited to 16-bit programs. It is possible to use segmentation in 32-bit protected mode as well (resulting in 48-bit pointers) and there exist C language compilers which support that. However segmentation in 32-bit mode does not allow to access a larger address space than what a single segment would cover, unless some segments are not always present in memory and the linear address space is just used as a cache over a larger segmented virtual space. It mostly allows to better protect access to various objects (areas up to 1 megabyte long can benefit from a 1-byte access protection granularity, versus the coarse 4 KiB granularity offered by sole paging), and is therefore only used in specialized applications, like telecommunications software. Technically, the "flat" 32-bit address space is a "tiny" memory model for the segmented address space. Under both reigns all four segment registers contain one and the same value.
On the x86-64 platform, a total of seven memory models exist, as the majority of symbol references are only 32 bits wide, and if the addresses are known at link time (as opposed to position-independent code). This does not affect the pointers used, which are always flat 64-bit pointers, but only how values that have to be accessed via symbols can be placed.
- Turbo C++ Version 3.0 User's Guide. Borland International, Copyright 1992.