Table of Contents
Preface
This document is intended to introduce pointers to beginning
programmers in the C programming language. Over several years of
reading and contributing to various conferences on C including
those on the FidoNet and UseNet, I have noted a large number of
newcomers to C appear to have a difficult time in grasping the
fundamentals of pointers. I therefore undertook the task of
trying to explain them in plain language with lots of examples.
The first version of this document was placed in the public
domain, as is this one. It was picked up by Bob Stout who
included it as a file called PTR-HELP.TXT in his widely
distributed collection of SNIPPETS. Since that release, I have
added a significant amount of material and made some minor
corrections in the original work.
Acknowledgements:
There are so many people who have unknowingly contributed to
this work because of the questions they have posed in the FidoNet
C Echo, or the UseNet Newsgroup comp.lang.c, or several other
conferences in other networks, that it would be impossible to
list them all. Special thanks go to Bob Stout who was kind
enough to include the first version of this material in his
SNIPPETS file.
About the Author:
Ted Jensen is a retired Electronics Engineer who worked as a
hardware designer or manager of hardware designers in the field
of magnetic recording. Programming has been a hobby of his off
and on since 1968 when he learned how to keypunch cards for
submission to be run on a mainframe. (The mainframe had 64K of
magnetic core memory!).
Use of this Material:
Everything contained herein is hereby released to the Public
Domain. Any person may copy or distribute this material in any
manner they wish. The only thing I ask is that if this material
is used as a teaching aid in a class, I would appreciate it if it
were distributed in its entirety, i.e. including all chapters,
the preface and the introduction. I would also appreciate it if
under such circumstances the instructor of such a class would
drop me a note at one of the addresses below informing me of
this. I have written this with the hope that it will be useful
to others and since I'm not asking any financial remuneration,
the only way I know that I have at least partially reached that
goal is via feedback from those how find this material useful.
By the way, you needn't be an instructor or teacher to
contact me. I would appreciate a note from _anyone_ who finds
the material useful, or who has constructive criticism to offer.
I'm also willing to answer questions submitted by mail.
Ted Jensen
Introduction
If one is to be proficient in the writing of code in the C
programming language, one must have a thorough working knowledge
of how to use pointers. Unfortunately, C pointers appear to
represent a stumbling block to newcomers, particularly those
coming from other computer languages such as Fortran, Pascal or
Basic.
To aid those newcomers in the understanding of pointers I have
written the following material. To get the maximum benefit from
this material, I feel it is important that the user be able to
run the code in the various listings contained in the article. I
have attempted, therefore, to keep all code ANSI compliant so
that it will work with any ANSI compliant compiler. And I have
tried to carefully block the code within the text so that with
the help of an ASCII text editor one can copy a given block of
code to a new file and compile it on their system. I recommend
that readers do this as it will help in understanding the
material.
Chapter 1: What is a pointer?
One of the things beginners in C find most difficult to
understand is the concept of pointers. The purpose of this
document is to provide an introduction to pointers and their use
to these beginners.
I have found that often the main reason beginners have a
problem with pointers is that they have a weak or minimal feeling
for variables, (as they are used in C). Thus we start with a
discussion of C variables in general.
A variable in a program is something with a name, the value
of which can vary. The way the compiler and linker handles this
is that it assigns a specific block of memory within the computer
to hold the value of that variable. The size of that block
depends on the range over which the variable is allowed to vary.
For example, on PC's the size of an integer variable is 2 bytes,
and that of a long integer is 4 bytes. In C the size of a
variable type such as an integer need not be the same on all
types of machines.
When we declare a variable we inform the compiler of two
things, the name of the variable and the type of the variable.
For example, we declare a variable of type integer with the name
k by writing:
int k;
On seeing the "int" part of this statement the compiler sets
aside 2 bytes (on a PC) of memory to hold the value of the
integer. It also sets up a symbol table. And in that table it
adds the symbol k and the relative address in memory where those
2 bytes were set aside.
Thus, later if we write:
k = 2;
at run time we expect that the value 2 will be placed in that
memory location reserved for the storage of the value of k. In C
we refer to a variable such as the integer k as an "object".
In a sense there are two "values" associated with the object
k, one being the value of the integer stored there (2 in the
above example) and the other being the "value" of the memory
location where it is stored, i.e. the address of k. Some texts
refer to these two values with the nomenclature rvalue (right
value, pronounced "are value") and lvalue (left value, pronounced
"el value") respectively.
In some languages, the lvalue is the value permitted on the
left side of the assignment operator '=' (i.e. the address where
the result of evaluation of the right side ends up). The rvalue
is that which is on the right side of the assignment statement,
the '2' above. Rvalues cannot be used on the left side
of the assignment statement. Thus: 2 = k; is illegal.
Actually, the above definition of "lvalue" is somewhat
modified for C. According to K&R-2 (page 197): [1]
"An _object_ is a named region of storage; an _lvalue_ is an
expression referring to an object."
However, at this point, the definition originally cited above is
sufficient. As we become more familiar with pointers we will go
into more detail on this.
Okay, now consider:
int j, k;
k = 2;
j = 7; <-- line 1
k = j; <-- line 2
In the above, the compiler interprets the j in line 1 as the
address of the variable j (its lvalue) and creates code to copy
the value 7 to that address. In line 2, however, the j is
interpreted as its rvalue (since it is on the right hand side of
the assignment operator '='). That is, here the j refers to the
value _stored_ at the memory location set aside for j, in this
case 7. So, the 7 is copied to the address designated by the
lvalue of k.
In all of these examples, we are using 2 byte integers so all
copying of rvalues from one storage location to the other is done
by copying 2 bytes. Had we been using long integers, we would be
copying 4 bytes.
Now, let's say that we have a reason for wanting a variable
designed to hold an lvalue (an address). The size required to
hold such a value depends on the system. On older desk top
computers with 64K of memory total, the address of any point in
memory can be contained in 2 bytes. Computers with more memory
would require more bytes to hold an address. Some computers,
such as the IBM PC might require special handling to hold a
segment and offset under certain circumstances. The actual size
required is not too important so long as we have a way of
informing the compiler that what we want to store is an address.
Such a variable is called a "pointer variable" (for reasons
which hopefully will become clearer a little later). In C when
we define a pointer variable we do so by preceding its name with
an asterisk. In C we also give our pointer a type which, in this
case, refers to the type of data stored at the address we will be
storing in our pointer. For example, consider the variable
declaration:
int *ptr;
ptr is the _name_ of our variable (just as 'k' was the name
of our integer variable). The '*' informs the compiler that we
want a pointer variable, i.e. to set aside however many bytes is
required to store an address in memory. The "int" says that we
intend to use our pointer variable to store the address of an
integer. Such a pointer is said to "point to" an integer.
However, note that when we wrote "int k;" we did not give k a value.
If this definition was made outside of any function many compilers
will initialize it to zero. Similarly, ptr has no value, that is
we haven't stored an address in it in the above declaration. In
this case, again if the declaration is outside of any function,
it is initialized to a value #defined by your compiler as NULL. It
is called a NULL pointer. While in most cases NULL is #defined
as zero, it need not be. That is, different compilers handle
this differently. Also while zero is an integer, NULL
need not be. However, the value that NULL actually has
internally is of little consequence to the programmer since at
the source code level NULL == 0 is guaranteed to evaluate to
true regardless of the internal value of NULL.
But, back to using our new variable ptr. Suppose now that we
want to store in ptr the address of our integer variable k. To
do this we use the unary '&' operator and write:
ptr = &k;
What the '&' operator does is retrieve the lvalue (address)
of k, even though k is on the right hand side of the assignment
operator '=', and copies that to the contents of our pointer ptr.
Now, ptr is said to "point to" k. Bear with us now, there is
only one more operator we need to discuss.
The "dereferencing operator" is the asterisk and it is used
as follows:
*ptr = 7;
will copy 7 to the address pointed to by ptr. Thus if ptr
"points to" (contains the address of) k, the above statement will
set the value of k to 7. That is, when we use the '*' this way
we are referring to the value of that which ptr is pointing
to, not the value of the pointer itself.
Similarly, we could write:
printf("%d\n",*ptr);
to print to the screen the integer value stored at the address
pointed to by "ptr".
One way to see how all this stuff fits together would be to
run the following program and then review the code and the output
carefully.
#include
int j, k;
int *ptr;
int main(void)
{
j = 1;
k = 2;
ptr = &k;
printf("\n");
printf("j has the value %d and is stored at %p\n",j,&j);
printf("k has the value %d and is stored at %p\n",k,&k);
printf("ptr has the value %p and is stored at %p\n",ptr,&ptr);
printf("The value of the integer pointed to by ptr is %d\n",
*ptr);
return 0;
}
To review:
- A variable is declared by giving it a type and a name (e.g. int k;)
- A pointer variable is declared by giving it a type and a name
(e.g. int *ptr) where the asterisk tells the compiler that
the variable named ptr is a pointer variable and the type
tells the compiler what type the pointer is to point to
(integer in this case).
- Once a variable is declared, we can get its address by
preceding its name with the unary '&' operator, as in &k.
- We can "dereference" a pointer, i.e. refer to the value of
that which it points to, by using the unary '*' operator as
in *ptr.
- An "lvalue" of a variable is the value of its address, i.e.
where it is stored in memory. The "rvalue" of a variable is
the value stored in that variable (at that address).
References in Chapter 1:
- "The C Programming Language" 2nd Edition
B. Kernighan and D. Ritchie
Prentice Hall
ISBN 0-13-110362-8