Union (computer science)
From Seo Wiki - Search Engine Optimization and Programming Languages
| This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (August 2009)
In computer science, a union is a value that may have any of several representations or formats; or a data structure that consists of a variable which may hold such a value. Some programming languages support special data types, called (somewhat confusingly) union types, to describe such values and variables. In type theory a union has a sum type.
Depending on the language and type, a union value may be used in some operations, such as assignment and comparison for equality, without knowing its specific type. Other operations may require that knowledge — either by some external information, or by the use of a tagged union.
Note: The remainder of this article refers strictly to primitive untagged unions, as opposed to tagged unions.
Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in an unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store the tag.
The name "union" stems from the type's formal definition. If one sees a type as the set of all values that that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one fields of the union can take on a single common value, it is impossible to tell from the value alone which field was last written.
Unions in various programming languages
In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. However, C++ does not allow for a data member to be any type that has a full fledged constructor/destructor and/or copy constructor, or a non-trivial copy assignment operator. In particular, it is impossible to have the standard C++ string as a member of a union. The union object occupies as much space as the largest member, whereas structures require space equal to at least the sum of the size of its members. This gain in space efficiency, while valuable in certain circumstances, comes at a great cost of safety: the program logic must ensure that it only reads the field most recently written along all possible execution paths.
The primary usefulness of a union is to conserve space, since it provides a way of letting many different types be stored in the same space. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.
One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. This is not, however, a safe use of unions in general.
Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.—ANSI/ISO 9899:1990 (the ANSI C standard) Section 22.214.171.124
In COBOL, union data items are defined in two ways. The first uses the RENAMES (66 level) keyword, which effectively maps a second alphanumeric data item on top of the same memory location as a preceding data item. In the example code below, data item PERSON-REC is defined as a group containing another group and a numeric data item. PERSON-DAT is defined as an alphanumeric data item that renames PERSON-REC, treating the data bytes continued within it as character data.
01 PERSON-REC. 05 PERSON-NAME. 10 PERSON-NAME-LAST PIC X(12). 10 PERSON-NAME-FIRST PIC X(16). 10 PERSON-NAME-MID PIC X. 05 PERSON-ID PIC 9(9) PACKED-DECIMAL. 01 PERSON-DATA RENAMES PERSON-REC.
The second way to define a union type is by using the REDFINES keyword. In the example code below, data item VERS-NUM is defined as a 2-byte binary integer containing a version number. A second data item VERS-BYTES is defined as a two-character alphanumeric variable. Since the second item is redefined over the first item, the two items share the same address in memory, and therefore share the same underlying data bytes. The first item interprets the two data bytes as a binary value, while the second item interprets the bytes as character values.
01 VERS-INFO. 05 VERS-NUM PIC S9(4) COMP. 05 VERS-BYTES PIC X(2) REDEFINES VERS-NUMBER.
- boost::variant, a type-safe alternative to C++ unions