1-by-1 black pixel for creating lines
1-by-1 black pixel for creating lines
EricGiguere.com > Articles > ANSI C Summary
Printer-friendly version Set your preferences
Read my blogs on AdSense and affiliate marketing
 
 
  
Learn about these ads

The ANSI Standard: A Summary for the C Programmer

By Eric Giguere
December 18, 1987
 
The following summary of ANSI C was originally written for inclusion with the Waterloo C compiler for VM/CMS, a compiler developed by the Computer Systems Group (CSG) at the University of Waterloo. At the time, I was working as a co-op student at CSG, working on Waterloo C. It was also published as an article in the Transactor for the Amiga, Volume 1, Number 3. It is presented here primarily for historical interest — I doubt it will be that useful to anyone! Although it predates the final standard, it presents the major changes quite nicely. You can also check out the second edition of K&R for more information.

Introduction

Programming languages are constantly evolving and diversifying. The C language is no exception, especially due to its increased popularity in recent years. The original specification document for C, The C Programming Language by Brian Kernighan and Dennis Ritchie, commonly referred to as K&R, is now almost ten years old. K&R has served as the C programmer's "bible", the de facto standard for C. But as the language has evolved, the need for a formal language standard has become apparent.

Under the auspices of the International Standards Organization (ISO), the American National Standards Institute (ANSI) began the preparation of such a standard through the leadership of the X3J11 Technical Committee. The proposed standard is now in final draft form and is expected to be approved by ANSI and ISO in 1988.

This document is a summary of the proposed standard's major changes to the language as it pertains to the C programmer. All information is drawn from official X3J11 documents: the Draft Proposed American National Standard for Information Systems - Programming Language C and the accompanying Rationale for Draft Proposed National Standard for Information Systems - Programming Language C. (These publications will be referred to as the Standard and the Rationale, respectively.)

Purpose of the Summary

Few programmers have either the time or the interest to wade through the actual text of the draft Standard in its entirety. What interests them are the major points of the Standard and the changes it makes to what was defined in K&R and how it affects their current programs. By summarizing these changes, this document is intended to provide a quick reference that the average C programmer can read and understand in one session. In keeping with this goal, the Standard Library is only briefly mentioned. Readers interested in the specifics of the Library should consult the Standard itself or the documentation accompanying any ANSI-conforming compiler.

This document is not a criticism or a justification of the Standard, only a commentary. Nor is it a tutorial on the C programming language. Readers should also be aware that changes may occur to the Standard before its final acceptance by ANSI.

What to expect from the Standard

The Standard is not creating a new language definition. Its purpose, to quote the Rationale, is to "codify common existing practice". This means that the fundamental structure and syntax of the language as described in K&R has been left unchanged. The Standard has instead tried to unify the diverse extensions and dialects that have grown over the years (the existing practice) into a single cohesive language definition. Existing practice is often inconsistent, however, so many compromises have had to be made.

Perhaps the most important thing to remember about the Standard is that it is not intended to invalidate existing C code. Existing programs should compile with only minor changes when using an ANSI-conformant compiler.

Common Terms

Throughout this document the word implementation will be used to refer to any particular implementation of an ANSI-conformant C language interpreter or compiler.

1. Reserved Identifiers

The following keywords have been added to the language:

  • const
  • enum
  • signed
  • void
  • volatile

Explanations for each keyword follow in Sections 2 and 4. In addition, the identifier "entry" has been deleted from the list of reserved identifiers as it was never implemented by K&R or subsequent versions of C.

Each keyword is a reserved identifier; programs that currently use these keywords as variable names must be changed to compile under an ANSI-conformant implementation.

2. Data Types

Major language changes occur with respect to data types. The trend in the Standard has been to provide the language with stronger typing facilities.

Integers

The list of available integer types has been expanded to include "signed char" and the following variations:

  • signed char
  • signed int
  • signed long
  • signed short
  • unsigned int
  • unsigned long
  • unsigned short

long int and short int may also be used as variations of long and short, respectively. As would be expected, the declarations:

    signed x;
    unsigned y;

may be used as shorthand for signed int and unsigned int. All logical combinations for integer types are now allowed.

Whether or not a simple char is considered to be signed or unsigned is left up to the implementation.

While int is still the default type for variables and functions, at least one storage class (auto, register, static, extern) or type specifier must be present when declaring a variable. A declaration of the form:

    x;

is no longer allowed and must be replaced with:

    int x;

to compile.

Floats

The new type long double has been added for more precision. But like any long type, an object of this type is only guaranteed to be at least as large as a double.

The type long float (a previous synonym for double) is now invalid. The only acceptable floating-point types are float, double and long double.

Structures and Unions

Member name spaces are now unique within structures and unions. That is, two different structures or unions may contain members with the same name without fear of conflict.

Structures and unions may now:

  • be assigned to another of the same type
  • be initialized when declared with the auto storage class
  • be passed as function parameters and return values

Enumerations

Already available in most compilers, enumerations have been added to the language. An enumeration is a way of declaring a set of integer constants. The declaration:

    enum colours { RED, BLUE, GREEN };

would declare colours as an enumeration tag representing the integer constants RED, BLUE and GREEN. These enumeration constants are given integer values starting at 0 and increasing by 1 with each identifier.

An enumeration constant may be used wherever an integer is expected. The following is equivalent to the above enumerated type:

    #define RED   0
    #define BLUE  1
    #define GREEN 2

Enumeration constants are not restricted to upper case, but upper case is a widely recognized convention for constants.

Variables may be declared to have enumeration type. The declaration:

    enum colours x, y;

declares x and y to be integer variables capable of holding an enumeration constant of type colours. In practice, little or no checking is done to make sure enumeration constants are used, so the following assignments are equivalent:

    x = BLUE;
    x = 1;   /* defeats purpose of enum */

Constant values may be directly assigned within an enumeration as well:

    enum relation { EQUAL = 1, LESS_THAN = 2,
                    GREATER_THAN = 4 };

If no value is specified for a given identifier, the constant is taken to have the value of the previous constant plus one.

The size of an enumeration type has been left unspecified; the implementation is free to store it in the most optimal fashion, providing that it always behaves like an int.

Void Type

The void data type has been added to indicate that an expression has no value. No variables can be declared with such a type, but expressions may be cast to void.

For example, the following declaration:

    (void) printf( "hello world" );

specifically indicates to the compiler that the return value from printf (an integer) is to be ignored. As such, the following statement is illegal:

    a = (void) func(); /* illegal! */

since the assignment operator expects a value to be returned for assignment.

Void pointers and void functions are discussed below and in Section 4.

Pointers and Arrays

Pointers are no longer synonymous with the int type. Pointers may only be compared with or assigned:

  • the integer value 0 (used to define a null pointer)
  • a pointer of the same type
  • a pointer of generic type (a void pointer)

Any other use of a pointer will generate a warning message upon compilation. Many assignment statements will require explicit casting of the right-hand values to avoid generating these messages.

A void pointer is a pointer that has no base type — that is, it points to a type of unknown specification — and is declared using the syntax:

    void *ptr;

Indirection through a void pointer is not allowed; it must be cast to an appropriate pointer type first. Its main use is as a generic pointer.

Arrays with storage class auto may now be initialized. If specified, the size of an array must be an integral expression greater than zero.

Special Modifiers

The Standard makes available the two attributes const and volatile for use as type modifiers.

An object declared to be const cannot be modified (assigned to, incremented or decremented) by a program. Thus the following code is invalid:

    const int x;
    /* ..... */
    x = 2;        /* illegal! */

Initialization, however, is allowed:

    const unsigned char masks[] = { 0x00, 0xff };

A const object (if it is of static storage duration) may be data that is put into read-only memory. Declaring such data with the const attribute allows the compiler to diagnose any attempts at modifying the data. Function parameters may also be declared as const to indicate that they are not modified by the function. This provides both extra documentation and, when function prototypes (described in Section 4) are used properly, consistent error-checking.

A volatile object is one that may be modified outside of program control. Memory-mapped I/O ports are a typical example. Declaring an object as volatile indicates that the compiler should always generate code to fetch the object's value from its actual memory location — it may have changed since the last access by the program. (This disallows optimizations which could load the value into a register and possibly return erroneous results.)

    volatile char *port1 = 0x00f3; /* ptr to I/O port */

    while( *port1 & DATA_FLAG ) /* needs to be volatile */
        clear_io();

The const and volatile modifiers may be used (singly or together) in combination with any other valid type specifiers.

Pointers may also be declared to be const or volatile through the use of special syntax:

    int const *a;
    int *const b;
    int *const *c;

In the example, a is declared to be a pointer to a const integer, whereas b is declared to be a const pointer to an integer. The distinction lies in the placement of the const attribute. The declaration for c is even more confusing: it declares a pointer to a const pointer to an integer. Consider the following statements:

    a = NULL;  /* ok */
    *a = 0;    /* error */
    b = NULL;  /* error */
    *b = 0;    /* ok */
    c = &b;    /* ok */
    *c = NULL; /* error */
    **c = 0;   /* ok */

Because a is a pointer to a const int, the value it points to may not be changed. Similarly, because b is a const pointer to an int, it may not be modified, though the value it points to may. The pointer c may be modified, but not the pointer *c, though **c (an integer) is modifiable itself.

The volatile modifier may also be used with pointers in conjunction with or separate from const.

Bit-fields

Bit-fields may now be of type int, unsigned int or signed int. Whether or not the high-order bit of an int bit-field is to be considered a sign bit is implementation-defined.

Vacuous Definitions

A vacuous definition consisting only of a struct or union specifier with a tag name is now allowed. Its purpose is to hide any outer declaration of the same name in the current block, as the definitions for struct a demonstrate in this example:

    struct a {
       int x;
    };
    
    int func(){
    struct a st1;  /* struct defined above */
    
    struct a;    /* vacuous definition:  it "clears"
        the current defn of struct a to
        make way for a new one */
    
    /* references to struct a now refer to the new
       definition within this block */
    
    struct b {
        struct a *y;    /* refers to NEXT struct */
    } st2;
    
    struct a {
        struct b *z;
    } st3;
    
    st1.x = 1;
    st2.y = &st3;       /* &st1 would give warning */
    st3.z = &st2;
  }
    
    /* old struct defn now back in scope */

Here the member y of st2 is a pointer to the second struct a, which is defined below it. If the vacuous definition

    struct a;

had not been present, y would instead have been a pointer to the struct a defined in the previous (enclosing) scope level outside the function.

Conversions and Promotions

The Standard defines the integral promotions as follows: the char, short or bit-field types (with or without the signed or unsigned modifiers) may be used wherever an int is expected. The values will be converted to int if possible; otherwise they will be converted to unsigned int.

The usual arithmetic conversions used with most binary operators have been modified to reflect the new types available to the programmer. Of particular note: expressions of type float are no longer automatically converted to double for arithmetic purposes; such arithmetic may now be performed less accurately.

The Standard also specifies other rules regarding conversions. Where signed and unsigned integer values are concerned, the Standard now advocates value preserving as opposed to unsigned preserving rules: unsigned values are promoted to signed int if possible, otherwise they are promoted to unsigned int. Floating-point values must now truncate towards zero when converted to integral types. No rounding need occur when a double is demoted to float. Otherwise, the rules in K&R are unchanged.

Any program comparing or performing arithmetic on values of different types should be closely screened for possible changes in behaviour.

Minimum Type Limits

Any compiler conforming to the Standard must also respect the following limits with respect to the range of values any particular type may accept. Note that these are lower limits: an implementation is free to exceed any or all of these. Note also that the minimum range for a char is dependent on whether or not a char is considered to be signed or unsigned.

TypeMinimum Range
signed char-127 to +127
unsigned char0 to 255
short int-32767 to +32767
unsigned short int0 to 65535
int-32767 to +32767
unsigned int0 to 65535
long int-2147483647 to +2147483647
unsigned long int0 to 4294967295
 
TypeMinimum Precision
float6 digits
double10 digits
long double10 digits

The Standard also specifies that these limits should be present as preprocessor macros in the header file <limits.h>.

3. Data Objects

Changes in this area have occurred mainly with respect to variable (object) linkage and initialization.

Initialization

Objects declared as either static or auto may be initialized by following the declaration with an equals sign, '=', and an initialization expression. External (inter-module) objects are discussed below.

If no initialization is given for a static object, all arithmetic types in the object are assigned 0 and all the pointers are set to NULL. If no initialization is given to an auto object, its initial value is undefined. These rules are unchanged from K&R.

All initializers for either static objects or auto arrays, unions and structures must be constant expressions.

Unions may now be initialized: the initialization value is assigned to the first member of the union.

The initialization expression for a scalar (integral, floating-point or pointer) object may optionally be enclosed in braces. Braces must enclose the initialization expressions for arrays, structures and unions. There can be no more initializers in an initialization list than there are objects to be initialized (there may be less, though, and any remaining uninitialized objects are handled as described above).

An array of char or pointer to char may be initialized with a string constant.

Linkage

In this section object refers to an object declared outside of any function.

The linkage of an object determines its scope within the program. An object with external linkage is known to all files in a program. An object with internal linkage is known only to the file in which it is declared. Current C compilers often differentiate between the two in incompatible ways, an issue which the Standard resolves.

An object is said to be defined if it includes an initializer. A defined object has internal linkage if the storage class static is specified; otherwise it has external linkage. An object can only be defined once.

Any object declaration without the extern modifier and without an initializer constitutes what is known as a tentative definition of the object. If an actual definition for the object is encountered in the same file, all tentative definitions are considered to be simple declarations referring to that object. Otherwise the first tentative definition is considered to be an actual definition with initializer equal to 0.

    /* example drawn from the Standard */

    int i1 = 1;         /* definition, external linkage */
    static int i2 = 2;  /* definition, internal linkage */
    extern int i3 = 3;  /* definition, external linkage */
    int i4;             /* tentative definition */
    static int i5;      /* tentative definition */

    int i1; /* tentative def., refers to previous */
    int i2; /* invalid -- linkage disagreement */
    int i3; /* tentative def., refers to previous */
    int i4; /* tentative def., refers to previous */
    int i5; /* invalid -- linkage disagreement */

    extern int i1;  /* these are all valid references */
    extern int i2;
    extern int i3;
    extern int i4;
    extern int i5;

These complex rules provide the most flexibility and allow the majority of current C code to be compatible with the Standard.

The simplest way to declare an externally-linked object is to define it in one file (with or without initializer) and reference it in all others through the use of an appropriate extern declaration.

4. Functions

Additions to the language definition occur in the area of function declarations, function definitions and variable parameter lists.

Function Definitions

The Standard now allows the types of formal parameters to be specified within the actual function declaration at the start of the function definition. This new-style definition form more closely resembles languages such as Pascal and Modula-2:

    int main( int argc, char *argv[] ){
    /* ... */
    }

If this style is used, a type must be specified separately for each formal parameter in the argument list. Mixing the new-style with the K&R-style in the same definition is not allowed.

This style is intended by ANSI to become the favored style, and a future Standard may disallow the K&R-style definition. For the moment, however, both styles may be used interchangeably.

Functions may also be defined as explicitly having no return values. Such functions are called void functions and are defined using the type void:

    void func( int a ){
        /* ... */
        return;
    }

Though the use of the return statement is allowed, such functions must not return an expression. If no type is explicitly specified, the function return type still defaults to int to retain compatibility with K&R.

Function Declarations and Prototypes

Function type declarations are also consistent with the new-style function definitions and may include a list of formal parameters. These parameters consist of type declarations with or without identifiers. Identifiers are cosmetic only and need only be included for readability. Some examples are:

    int main( int, char *[] );
    extern char *strcpy( char *dst, const char *src );

Note that declarations must be consistent as they will be checked by the compiler. Each declaration of a function should agree with all previous declarations in both the number and types of parameters.

The following declarations illustrate two special cases:

    extern int func1( void );
    extern int func2();

The first case explicitly declares that the function func1 does not take any parameters; that the parameter list is empty or void.

The second declaration states that no information is known on the number and types of any formal parameter. This is to provide compatibility with K&R.

A function declaration that provides the number and types of parameters is called a function prototype. The addition of prototypes to C allows for stricter type-checking by the compiler. When a prototype for a function has been declared, each subsequent call to that function is checked to make sure that the correct number of arguments has been supplied. As well, the type of each argument is compared with what was declared in the prototype. If different, the argument is converted to the required type as if it had been assigned to an object of that type. The default argument promotions (char and short to int, float to double) are not performed when a prototype has been declared. (Note: The default argument promotions are separate from the usual arithmetic conversions.)

If a function prototype occurs in the same file as the definition of that function, both the prototype and the definition must agree exactly if the definition is of the new style. In K&R-style definitions, the formal parameters are first widened by the default argument promotions and then compared to the prototype(s). If no prototype occurs in the file, the function definition itself serves as a prototype for the code following it.

Variable Parameter Lists

Certain C functions are designed to take a variable number of parameters. Unfortunately, some compilers use different schemes for handling such situations and what works in one implementation may not work elsewhere. The Standard therefore provides for the explicit declaration of such functions and portable facilities for handling them. A function that takes a variable number of parameters is defined by ending the parameter list (new-style only) with an ellipsis:

    int printf( const char *format, ... ){
        /* ... */
    }

Thus the only thing known about printf is that it takes at least one parameter, the type of which is a pointer to const char. Prototypes may also be declared in this fashion:

    extern sprintf( char *dest, const char *format, ... );

The compiler will then make sure that each call to sprintf has at least two arguments, both of which are pointers to char.

The arguments themselves are accessed through the use of special macro facilities defined in the header file <stdarg.h>, part of the ANSI Standard Library.

5. The Preprocessor

The C preprocessor, long since recognized as an integral part of the language, has benefitted from a number of additions and clarifications in the Standard.

New Directives

The #elif directive has been added as a shorthand form of the #else #if preprocessor sequence.

The identifier defined is reserved during an #if or #elif so that:

    #if defined( NULL )
    #if !defined( TRUE )

are equivalent to:

    #ifdef NULL
    #ifndef TRUE

Also new on the list are the directives #error and #pragma. The former produces an error message at compile-time; the latter is implementation-defined in its use and effects.

File Inclusion

Besides the two allowable forms:

    #include <fname1>
    #include "fname2"

a third form:

    #include fname3

is acceptable, provided that fname3 is a macro which expands into one of the other two forms.

Macro Operators

Two new operators have been added for use within a macro replacement string. The ## (concat) operator concatenates two adjacent preprocessor tokens (a preprocessor token is any consecutive series of non-blank characters). The # (stringize) operator places the parameter following it in string form. For example, consider the following definition:

    #define debug( s )  printf( "x" # s "= %d\\n", x ## s )

The following macro call:

    debug( 1 );

expands to:

    printf( "x" "1" " = %d\\n", x1 );

which after string concatenation (see Section 7) gives the final result:

    printf( "x1 = %d\\n", x1 );

Program debugging through the use of macros has thus been made simpler.

Predefined Macros

Five new macros are predefined in the Standard, all of which are expanded to their appropriate values upon file compilation.

MacroExpands To
__DATE__current date
__TIME__current time
__FILE__current file name
__LINE__current line name
__STDC__non-zero value

The definition of the __STDC__ macro indicates an ANSI-conformant compiler.

None of these macros may be redefined by a program.

6. The C Library

A standardized library of routines aids the programmer and enhances portability. The Standard defines such a library, which is too large to describe here in any detail. The Standard Library is based on the library compiled by /usr/group, a UNIX user's group, with all the UNIX dependencies deleted.

The Standard Library also provides a set of standard library headers. These headers provide function prototypes for the set of routines that make up the library and define commonly-used macros. As well, the functions and their prototypes have been changed so as to be invariant to the default promotions — all are declared using promoted types (such as int and double) for parameters. Thus parameters passed to a library function will always be of the same type, regardless of whether a prototype is in scope or not.

Macros may also be defined in a header file to take the place of actual calls to library routines. However, the library routines themselves must exist as the macros may be subjected to an #undef directive by the user at any time.

Among the most notable additions to the library are variable argument handling, numeric limits information, and locale (the current environment) information.

K&R library functions have also been converted to the new style and syntax, so that malloc, for example, now returns a void * as opposed to a char *.

7. Miscellaneous

Numerous other minor changes and additions have occurred throughout the language:

  • The escape characters '\a' and '\v' have been added for alarm (bell) and vertical tab.
  • A series of special trigraph characters has been added as equivalents to ASCII characters which may not appear in the character sets of some countries; a trigraph is a three-character sequence starting with "??":
    CharacterTrigraph
    #??=
    [??(
    \??/
    ]??)
    ^??'
    {??<
    |??!
    }??>
    ~??-
  • The suffixes u and l may be used with integer constants to specify unsigned and long values; both may be used together to specify unsigned long.
  • The suffixes f and l may be used with floating-point constants to specify float and long double values.
  • If the high bit of an octal or hex constant is set, it is considered to be unsigned.
  • Adjacent strings separated only by white space are concatenated.
  • The unary plus ('+') has been added to force the evaluation of an arithmetic expression to occur before any others.
  • A function can be called through a pointer using either the K&R-style syntax (*fp)() or the new-style fp().
  • External identifier length significance is still 6 characters with no case sensitivity (for compatibility with existing linkers).
  • Internal identifier length significance is a minimum of 31 characters (case sensitive).
  • Each macro function call is expanded only once, which prevents the definition of recursive macros.
  • External identifiers beginning with an underscore are reserved for library usage.
  • Identifiers beginning with an underscore followed either by a capital letter or another underscore are reserved for use as predefined macro names.
  • Multi-byte character constants are allowed, though their values are implementation-defined.
  • Hexadecimal character constants may be specified using \x followed by a series of hexadecimal digits (ex.: '\xff')

Identifiers in current programs that are now reserved by the Standard will have to be altered to be portable across compilers.

References

American National Standards Institute, Inc.
Draft Proposed American National Standard for Information Systems - Programming Language C, ANSI X3J11/87-221 (November 9, 1987).
American National Standards Institute, Inc.
Rationale for Draft Proposed American National Standard for Information Systems - Programming Language C, ANSI X3J11/87-219 (November 6, 1987).
Kernighan, Brian W., and Dennis M. Ritchie
The C Programming Language, Prentice-Hall, Englewood Cliffs, NJ (1978).
Plum, Thomas
Notes on The Draft C Standard, Plum Hall Inc., Cardiff, NJ (1987).

User groups have permission to reprint this article for free as described on the copyrights page.

Google Web www.ericgiguere.com   
1-by-1 black pixel for creating lines
 
Copyright ©2003-2012 Eric Giguere | Send mail about this page | About this site | Privacy policy
Site design and programming by Eric Giguere | Hosting by KGB Internet Solutions
This site is Java-powered
Other sites: The Unofficial AdSense Blog | Google Suggest Explorer | Invisible Fence Guide | Synclastic
This page was last modified on Last modified on September 25, 2003
1-by-1 black pixel for creating lines
1-by-1 black pixel for creating lines