This document describes a set of naming conventions used by the Applications Development group. These conventions commonly go by the name “Hungarian”, referring both to the nationality of their original developer, Charles Simonyi, and also to the fact that to an uninitiated programmer they are somewhat confusing. Once you have gained familiarity with Hungarian, however, we believe that you will find that the clarity of code is enhanced. For convenience, this memo first describes how to use Hungarian, and then describes why it is useful; the general approach is from a programming viewpoint, rather than a mathematical one.
For a more theoretical approach, you are invited to read Chapter 2 of Simonyi’s “Meta-Programming” thesis. 2. The Rules Hungarian is largely language independent; it is equally applicable to a microprocessor assembly language and to a fourth-generation database application language (and has been used in both). However, there is a little flavor of C, in that arrays and pointers to arrays are not clearly distinguished. While this may sound confusing, in practice there is little ambiguity. 2. 1. Variables The most common type of identifier is a variable name.
All variable names are composed of three elements: prefixes, base type, and qualifier. These are also referred to as constructors, tag, and qualifier). Not all elements are present in all variable names; the only part that is always present is the base type. This type should not be confused with the types supported directly by the programming language; most types are application specific. For example, an lbl type could refer to a structure containing symbol information; a co could be a value specifying a color. 2. 1. 1. Base types (tags) As the above examples indicate, tags should be short (typically two or three letters) and somewhat mnemonic.
Because of the brevity, the mnemonic value will be useful only as a reminder to someone who knows the application, and has been told what the basic types are; the name will not be sufficient to inform (by itself) a casual viewer what is being referred to. For example, a co could just as easily refer to a geometric coordinate, or to a commanding officer. Within the context of a given application, however, a co would always have a specific meaning; all co’s would refer to the same type of object, and all references to such an object would use the term co.
One should resist the natural first impulse to use a short descriptive generic English term as a type name. This is almost always a mistake. One should not preempt the most useful English phrases for the provincial purposes of any given version of a given program. Chances are that the same generic term could be equally applicable to many more types in the same program. How will we know which is the one with the pretty “logical” name, and which have the more arbitrary variants typically obtained by omitting various vowels or by other disfigurement?
Also, in communicating with other programmers, how do we distinguish the generic use of the common term from the reserved technical usage? In practice, it seems best to use some abbreviated or form of the generic term, or perhaps an acronym. In speech, the tag may be spelled out, or a pronounceable nickname may be used. In time, the exact derivation of the tag may be forgotten, but its meaning will still be clear. As is probably obvious from the above, it is essential that all tags used in a given application be clearly documented.
This is extremely useful in helping a new programmer learn the code; it not only enables him (or her) to decode the otherwise cryptic names, but it also serves to describe the underlying concepts of the program, since the data types tend to determine how the program works. It is also worth pointing out that this is not nearly as onerous as it sounds; while there may be tens of thousands of variables in a program, the number of types is likely to be quite small. Although most types are particular to a given application, there are a few standard ones that appear in many different ones; synonyms for these types should never be used: a flag (boolean, logical).
The qualifier (see below) should describe the condition that will cause this flag to be set (e. g. fError would be clear if there were no error, set if one exists). This tag may refer to a single bit, a byte, or a word; often it will be an object of type BOOL (defined by the application, usually as int). Usually the object referred to will contain either 1 (fTrue, TRUE) or 0 (fFalse, FALSE). In some instances, other values may be used, either for efficiency or historical reasons; such a use usually indicates that another type may be more appropriate. cha one-byte character.
Note that this is not adequate for Kanji. sta Pascal-type string (first byte is count, remainder is the actual characters). Typically refers to a pointer to the actual memory. This should be the most common type of string used in the Applications group; it is more efficient than an sz (below). sza zero-terminated string, or a pointer to it. These are most often used to interface to an operating system (or equivalent) that requires them; for most other uses, an st is preferable. Unfortunately, C string constants are normally zero-terminated, so it takes a little more effort to use st’s; the effort is worth it.
The Applications Development compiler provides ways to make strings constants st’s. fna function. Since about the only thing you can do with a function is take its address, this almost always has a “p” prefix (see below). For this reason, in some applications fn is itself used to mean pointer to a function. There are some more types that appear in many applications; they should only be used for the most generic purposes: wa word (typically 16 bits). For most purposes, this is an incorrect usage, since the usage of the word is specific to a particular type of word, and should be so distinguished.
Correct usages are generally limited to generic subroutines (e. g. sort an array of words) that can deal with a number of different types; another common use is in conjunction with the prefix c (see below), to produce a count of words (the size) for some object. The exact meaning of w is also somewhat loose; it sometimes means a signed quantity and sometimes unsigned. ba byte (typically 8 bits). The same warnings apply to this as to w. la long (typically 32 bits). The same warnings apply to this as to w. uan unsigned word (typically 16 bits).
The same as w, except this is always unsigned. bita single bit. Typically used to specify bits used within other types. This concept is usually better handled with the “f” and “sh” prefixes (see below). va void. This corresponds to the C definition of void, meaning that the type is not specified. This type will never be used without a “p” prefix since it is not possible to have an unspecified type for a variable; conceivably ther are additional prefixes (e. g. ppv), but such a usage is unlikely. It is perfectly valid to assign a pv to a pointer of any other type, or vice versa.
The major use of this type is for generic subroutines (such as allocate and free) which return or take as arguments pointers of various types. There are a few types that are used widely within the applications group, but may not be applicable to others: envan environment. Used to implement non-local goto’s (SetJmp and DoJmp). The exact format of an env (including size), varies from system to system. sba segment base. The part of a segmented pointer that determines the segment. The exact implementation varies from system to system.
These are used directly in some applications for efficiency; the same results can be obtained (less efficiently) through the use of far or huge pointers. iban offset. The part of a pointer that determines the offset within a segment. These are used directly in some applications for efficiency; the same results can be obtained (less efficiently) through the use of far or huge pointers. For the literal-minded, ib is not really a new type at all; it is simply the prefix i (index) applied to the type b (byte), with the viewpoint that a segment is just an array of bytes.
Many people prefer to consider it a true indivisible base type. There are a number of other basic types used widely for applications that run under Windows or the Macintosh: [This list has not been compiled yet] 2. 1. 2. Prefixes (constructors) Base types are not by themselves sufficient to fully describe the type of a variable, since variables often refer to more complex items. The more complex items are always derived from some combination of simple items, with a few operations. For example, there may be a pointer to an lbl, or an array of them, or a count of co’s.
These operations are represented in Hungarian by prefixes; the combination of the prefixes and the base type represent the complete type of an entity. Note that a type may consist of multiple prefixes in addition to the base type (e. g. a pointer to a count of co’s); the prefixes are read right to left, with each prefix applying to the remainder of the type (see examples below). The term constructor is used because a new type is constructed from the combination of the operation and the base type. In theory, new prefixes can be created, just as new types are routinely created for each application.
In practice, very few new prefixes have been created over the years, as the set that already exists is rather comprehensive for operations likely to be applied to types. Prefixes that have been added tend to deal with the specifics of machine architecture, and are variations on existing prefixes (i. e. different flavors of pointers). One can go overboard in refusing to create a new prefix, however; some new concepts really are logically expressed as prefixes, not types. A couple of examples of incorrect usage in the list below derived from the reluctance to create a new prefix. The standard prefixes are: pa pointer.
Within the Applications group, this is usually used to refer to a near pointer (16 bit). Note that a pointer is not itself a type, it is an operation applied to a type. For example, a pch is a pointer to a character. lpa far pointer. This is a 32 bit direct pointer; the actual implementation is machine-dependent, and represents the native pointer format for that machine. These usages are becoming rarer as we deal more and more with moveable heaps and true segmentation. Some applications use q instead of lp; a few decided (erroneously) that this was a type rather than an operation, and used pa as the type tag. can itself be viewed as a prefix (applied in this case to the “p” prefix); it can only be applied to pointer prefixes or types. For example, lrg is a far array and lst is a far pointer to a Pascal-type string. hpa huge pointer. This is a 32 bit pointer, composed of an sb and an ib. Some level of machine- and environment-dependent indirection is used to handle segmentation. These are now used extensively for non-local data structures. Some older applications use k instead of hp; Mac Excel decided (erroneously) that this was a type rather than an operation, and used ptr as the type tag.
As with l, h can be viewed as a prefix, though ambiguity occurs when h is also being used a prefix meaning handle (see below). npa near pointer. This is a 16 bit pointer. This prefix is not used within the Applications Development Group; p refers to near pointers. This prefix would be used to explicitly refer to a near pointer when compiling a large model program with a conventional compiler. As with l, n can be viewed as a prefix. rgan array, or a pointer to it. The name comes from a mathematical viewpoint of an array as the range of a function (see mp and dn below).
For example, an rgch is an array of characters; a pch could point to one of the characters in the array. Note that it is perfectly reasonable to assign an rgch to a pch; pch points to the first character in the array. ian index into an array. For example, an ich is used to index an rgch ca count. For example, the first byte of an st is a count of characters, or a cch. da difference between two instances of a type. This is often confused with a count, but is in reality quite separate. For example, a cch could refer to the number of characters in a string, whereas a dch could refer to the difference between the values ‘a’ and ‘A’.
The confusion arises when dealing with indices; a dich (difference between indices into a character array) is equivalent to a cch (count of characters); which one to use depends on the viewpoint. This gets most confusing when dealing with base types that are in effect indices, though not specifically labelled as such. For example, a spreadsheet could have a rw type that indicates a row in the spreadsheet; it does not contain the actual data for the row, but is simply a one-word integer specifying the row number.
A type specifying a count of rows (not rw’s) would correctly be a drw (difference between row numbers), not a crw (count of row numbers). ha handle. This is often a pointer to a pointer (used to allow moveable heap objects). The types of the pointers may vary among applications; the two most common cases are a near pointer to a near pointer (h is equivalent to pp) and a far pointer to a far pointer (h is equivalent to lplp). Most commonly used for interface to an operating system; within applications, moveable objects can be handled through huge pointers. In some systems (e. . Windows) a handle is not a pointer to a pointer. To avoid confusion it may be best to use pp (or lplp) as prefixes when the application is actually going to do the indirection, and reserve h for instances in which the handle is just passed on to the system. Doing this prevents the most common misuse of h in defining a handle to an array (or other implicit pointer type); uses of hsz to imply two indirections to obtain a character are incorrect. This should properly be done as a psz or, if h must be used, as an hasz (see ‘a’ prefix below). hha huge handle.
This is a huge pointer to a 16-bit pointer within the same segment pointed to by the huge pointer; it is useful for managing heaps. This can be viewed as an hpib, where the sb to be used with the ib is obtained from the hp. gra group, or a pointer to it. This is similar to an rg, but is used for variable size objects. In this case an index (i) is not particularly useful, since it can not be used directly to obtain an object (one can, of course, write a routine that will take the gr and i, walk through the data in a type-specific manner, and derive a pointer to the object desired).
This is a rarely used prefix, and in some code, grp has been used instead of gr. ban offset. This is typically used in conjunction with a gr, in place of an i, in order to get around the problem mentioned above. This offset is in terms of bytes, so pfoo=(BYTE *)grfoo+bfoo. As with gr, this is a somewhat rare usage in current code. b originally stood for base-relative pointer, but should really be considered to be an offset within a data structure; true base-relative pointers are just near pointers (p); the base is the segment they are within. mpan array.
This prefix is followed by two types, rather than the standard one, and represents the most general case of an array. From a mathematical viewpoint, an array is simply a function mapping the index to the value stored in the array (hence mp as an abbreviation of map). In the construct mpxy, x is the type of the index and y is the type of the value stored in the array. In most cases, the only type that is important is the type of the value; the index is always an integer with no other meaning. In this case, an rg is used; this means that an rgx is equivalent to an mpixx. This also explains the weird prefix rg; it is an abbreviation for range). dnan array. This is used in the rare case that the important part of the array mapping is the index, not the value. dn is an abbreviation for domain. Only a few of these are used in the entire Applications group; an example of a plausible use is given in the discussion of e, below. ean element of an array. This is used in conjunction with a dn (and is thus just as rare); it is the type of the value stored in a dn. Just as rgx is equivalent to mpixx, dnx is equivalent to mpxex.
An example of use is in the native code generation part of the CS compiler; there is a type vr (an acronym for virtual register). A vr is just a simple integer, specifying which register to use for various pieces of code output. However, there is quite a bit more information than just a number that is associated with each register. This additional data is stored in a structure called an evr; there is an array of them called dnvr. Thus, the information for a given register can be found with the expression dnvr[vr]. fa bit within a type.
This is a new prefix that is currently used only by a few projects, but is now the approved method for dealing with bits. It is typically used for overloading an integer type with one or more bit flags, in otherwise unused portions of the integer. This should not be confused with the f type, in which the entire value is used to contain the flag. An example is a scan mode (type sm), with possible values smForward and smBackwards. Since the basic mode only requires a few bits (in this case only one bit), the remainder of a word can be used to encode other information. One bit is used for fsmWrap, another for fsmCaseInsens.
Here the f is a prefix to the sm type, specifying only a single bit is used. sha shift amount. This is another new prefix used to deal with bits within other types (complementing the “f” prefix); it specifies the location within the type by a bit number (rather than the bit mask which the “f” prefix specifies). It actually is followed by two types; the first type is the type being shifted (almost always an f), and the second type is the type the bits are stored within. Continuing the above example of scan modes, if fsmWrap has a value of 4000 hex, shfsmWrap would have the value of 14. a union. This is a rarely used prefix; it is used for variables that can hold one of several types. In practice this becomes unwieldy. An example is a urwcol, which can hold either a rw type or a col type. aan allocation. This is a rarely used prefix; it is used to distinguish between an array and a pointer to it. Thus, sz is a pointer to a null-terminated string and asz is the actual allocated space. a is almost invariably used in conjunction with a pointer-type prefix, in order to allow the pointer to be explicit (rather than implicit, as with an sz).
It is essentially the inverse of a p prefix, so pasz is equivalent to sz. Its best use is with the h prefix; hasz is a handle to a null-terminated string. Most of the current Applications code (incorrectly) omits the a. va global. This is really not a correct usage of Hungarian, but you may see it used in some applications anyway. It really should be a qualifier (see below), if it is present at all. Except in extremely bizarre cases, it must be the first prefix. 2. 1. 2. 1. Some examples Since the prefixes and base types both appear in lower case, with no separating punctuation, ambiguity can arise.
Is pfc a tag of its own (e. g. for a private first class), or is it a pointer to an fc? Such questions can be answered only if one is familiar with the specific types used in a program. To avoid problems like this it is often wise to avoid creating base type names that begin with any of the common prefixes. In practice, ambiguity does not seem to be a problem. The idea of additional punctuation to remove the ambiguity has been shown to be impractical. The following list contains both common and rarer usages: pcha pointer to a character. ichan index into an array of character. gstan array of Pascal-type strings. Hungarian is not sufficient in itself to indicate whether this is an array of characters or an array of pointers; since strings are usually variable length, it is probably a safe bet that this is an array of pointers to the actual characters. grsta group of Pascal-type strings.
As with the above example, this could be either an array of characters or of pointers; since it is a gr, not an rg, it is probably safe to assume that it is an array of characters. bstan offset to a particular Pascal-type string in a grst. hpxa near pointer to a huge pointer to an object of type x. picha near pointer to an index into a character array. A common use for something like this is passing a pointer as a parameter to a function so that a return value can be stored through the pointer; pich would be extremely unlikely to be used in an expression without indirection (pich+=2 is probably gibberish; (*pich)+=2 may well be meaningful). enprobably a base type (such as an entry). Conceivably it is an element of an array indexed by an n; only knowledge of the application can tell for certain. hrgnhandle to a region.
Again there is ambiguity; this could be interpreted as a handle to an array of n’s or a huge pointer to an array of n’s. dxlength of a horizontal line (difference between x coordinates). rgrgxa two-dimensional array of x’s (an array of arrays of x’s). mpmipfnan array of pointers to functions, indexed by mi’s. For example, an mi could be a menu item, and this array could be used for a command dispatch. Again, context makes the parsing clear; this could equally well be interpreted as an array of fn’s (perhaps friendly nukes), indexed by mip’s (perhaps missile placements). pvpointer to a void.
Could be used as an argument to Free. hrgchhuge pointer to an array of characters. Could instead be interpreted as a handle to an array of characters, depending on the application. 2. 1. 3. Qualifiers While the prefixes and base type are sufficient to fully specify the type of a variable, this may not be sufficient to distinguish the variable. If there are two variables of the same type within the same context, further specification is required to disambiguate. This is done with qualifiers. A qualifier is a short descriptive word (or facsimile; good English is not required) that indicates what the variable is used for.
In some cases, multiple words may be used. Some distinctive punctuation should be used to separate the qualifier from the type; in C and other languages that support it, this is done by making the first letter of the qualifier upper-case. (If multiple words are used, the first letter of each should be upper-case; the remainder of the name, both type and qualifier, is always lower-case. There is one special case to watch out for; defined constants specifying the size of a type are often of the form cbFOO or cwFOO, where foo is the type.
Strictly speaking only the F in FOO should be capitalized, but the incorrect usage is fairly common. ) Exactly what constitutes a naming context is language specific; within C the contexts are individual blocks (compound statements), procedures, data structures (for naming fields), or the entire program (globals). As a matter of good programming style, it is not recommended that hiding of names be used; this means that any context should be considered to include all of its subcontexts. (In other words, don’t give a local the same name as a global. If there is no conflict within a given context (only one variable of a given type), it is not necessary to use a qualifier; the type alone serves to identify the variable. In small contexts (data structures or small procedures), a qualifier should not be used except in case of conflict; in larger contexts it is often a good idea to use a qualifier even when not necessary, since later modification of the code may make it necessary. In cases of ambiguity, one of the variables may be left with no qualifier; this should only be done if it is clearly more important than the other variables of the same type (no qualifier implies primary usage).
Since many uses of variables fall into the same basic categories, there are several standard qualifiers. If applicable, one of these should be used, since they specify meaning with no chance of confusion. In the case of multiple word qualifiers, the order of the words is not crucial, and should be chosen for clarity; if one of the words is a standard qualifier, it should probably come last (unfortunately, this suggestion is by no means uniformly followed). The standard qualifiers are:
Firstthe first element in a set. This is usually used with an index or a pointer (e. g. pchFirst), referring to the first element of an array to be dealt with. The index may be an implied index (as with a rw type in a spreadsheet). Lastthe last element in a set. This is usually used with an index or a pointer (e. g. pchLast), referring to the last element of an array to be dealt with). Both First and Last represent valid values (compare with Lim below); they are often paired, as in this common loop: