CS13002 Programming and Data Structures | Spring semester |
Structures
Now it is time to combine heterogeneous data to form a named collection. For example, think of a student's record that might comprise a name, a roll number, a height and a CGPA. A name and a roll number are strings, a height (in cms, rounded to the nearest integer) is an integer and a CGPA is a floating point value. A structure can be used to combine these different types of data into a single item. Moreover, each constituent field in the composite data is made individually accessible. What we benefit from using structures is a convenient and logical way of looking at and arranging data. That's the basic motivation behind every abstraction.
Defining structures
Structures can be defined by the struct keyword. For example, a student's record can be defined as:
#define MAXLEN 100 struct stud { char name[MAXLEN]; char roll[MAXLEN]; int height; float cgpa; };This declaration gives a user-defined data of type struct stud that has four members: two character arrays of names name and roll, an integer named height and a floating point value named cgpa. The struct declaration only defines a data type but no instances of data of this type. In order to declare specific instances of structure data, one should employ the usual variable declaration procedure.
struct stud thatStudent, FBStudents[60], *studPointer;This declaration defines a structure with the name thatStudent, an array FBStudents consisting of 60 student records and a pointer studPointer to a student record. A single instance of struct stud is depicted in the following figure.
Figure : Example of a simple structureA second example is provided by complex numbers which can be represented as pairs of real numbers. One can use the following structure:
struct comp { double real; double imag; };Here the two fields of the structure are of the same data type and so one can in fact use a double array of size 2 to represent a complex number. However, the above definition enhances readability and highlights the logical (mathematical) structure of a complex number.
One then uses the declaration
struct comp z, z1, z2;to obtain specific instances of complex numbers.
Type definitions
The typedef declarations are used to rename data types in C. For example, if one plans to work with unsigned long long int variables, but plans not to write that big a name, one may define the following short-cut:
typedef unsigned long long int ull;After this definition, the data type unsigned long long int can be called also as ull, i.e., one can declare variables as:
ull n, array[100], *ptr;One can also typedef pointers and arrays:
typedef ull *ullPointer; typedef ull ullArray[128];Here we assume that unsigned long long int is already typedef'd as ull. We use this definition to typedef two other data types. First, ullPointer is defined to be a pointer to an unsigned long long int, whereas ullArray is defined to be an array of 128 unsigned long long int data. One can instantiate data of these types in the usual way:
ullPointer p;defines a pointer variable p, whereas
ullArray A;defines an array A of 128 unsigned long long int data.
In an analogous way, one can use typedef's to give short single-word names to structure data types. For example, the declarations
typedef struct stud student; typedef struct comp complex;give names student and complex to the user-defined data types struct stud and struct comp. The following variable declarations are legal after these typedef's:
student thatStudent, FBStudents[60], *studPointer; complex z, z1, z2;One can combine a struct declaration and a subsequent typedef as follows:
typedef struct stud { char name[MAXLEN]; char roll[MAXLEN]; int height; float cgpa; } student; typedef struct { double real; double imag; } complex;Since the new struct data type is now given an explicit name (like student or complex), it is not necessary to give any tag after the keyword struct. This is what we have done for the complex data type. Notice, however, that using a tag after struct is not prohibited (see the definition of student) and is indeed essential in a particular situation that we will describe towards the end of this chapter.
Initializing structures
Structures can be initialized much in the same way as arrays can be -- by a curly-braced comma-separated list of initializing constant values for the individual members. For example, the above student record can be initialized as:
struct stud thatStudent = { "Foolan Barik", "03FB1331", 175, 9.81 };or with the typedef'd name as:
student thatStudent = { "Foolan Barik", "03FB1331", 175, 9.81 };Initializing values populate the members of the variable in the same order as they appear in the struct declaration. For the above example, the string name receives the value "Foolan Barik", roll the value "03FB1331", height the value 175 and cgpa the value 9.81.
Accessing members of structures
Accessing individual members of a structure is different from what is done with arrays. Now one should write the name of a structure variable followed by a dot (.) and then by the formal name given to the member. For example, if thatStudent is initialized as above, thatStudent.name refers to the string "Foolan Barik", thatStudent.roll refers to the string "03FB1331", thatStudent.height refers to the integer value 175 and thatStudent.cgpa to the floating point value 9.81.
If we have an array of structures, one first uses square brackets to refer to an element of the array and then uses dot and a member name to access the corresponding member of the structure, For example, FBStudents[5].height refers to the height field of the element at index 5 in the array FBStudents.
If we have a pointer to a structure, we first dereference the pointer in order to obtain the structure and then write dot and the member name. For example, if studPointer is a pointer to a struct stud, the notation (*studPointer).roll refers to the string holding the roll number of the student whose records are pointed to by the pointer studPointer. There is an alternative way of writing the same thing: studPointer->roll. The dereferencing * and the dot . are combined to the symbol -> which C designers deemed to be an intuitive and natural notation.
Example: The following function computes the average CGPA of the students of the Department of Foobarnautic Engineering:
float avCGPA ( struct stud FBStudents[] , int n ) { float sum = 0; int i; for (i=0; i<n; ++i) sum += FBStudents[i].cgpa; return (sum/(float)n); }Here is how you can do the same with pointers:
float avCGPA2 ( struct stud FBStudents[] , int n ) { float sum = 0; int i; struct stud *p; p = FBStudents; for (i=0; i<n; ++i) { sum += p->cgpa; ++p; } return (sum/(float)n); }Passing structures to functions
Syntactically, structures are treated analogously as normal (built-in) variables. Passing structure variables to and from functions follows the call-by-value mechanism explained earlier.
Example: Let us write a function that accepts two complex structures as arguments and returns the structure representing the product of these two arguments.
#include <stdio.h> struct comp { float real; float imag; }; struct comp cmul ( struct comp z1 , struct comp z2 ) { struct comp z; z.real = (z1.real) * (z2.real) - (z1.imag) * (z2.imag); z.imag = (z1.real) * (z2.imag) + (z1.imag) * (z2.real); return z; } int main () { struct comp a = {1.1,2.2}, b = {2.4,-3.6}, c; c = cmul(a,b); printf("product = (%f)+i(%f)\n", c.real, c.imag); }This program outputs:
product = (10.560000)+i(1.320000)
Animation example : passing structures to functions
What requires explanation is what is meant by call-by-value in connection with structure arguments. A structure requires some amount of memory to accommodate all the defining members. This size (in bytes) can be accessed by the sizeof call, like:
printf("Size of struct stud = %d\n", sizeof(struct stud));In fact, the sizeof statement can be used for any data type, built-in or user-defined. For example, sizeof(int) typically returns 4, sizeof(double) returns 8, sizeof(char) returns 1 and so on. For structures, the value returned by the sizeof statement is dependent on the compiler and the architecture of the underlying machine.
When a structure is passed to a function, the corresponding sizeof() bytes are copied to the formal argument of the function. For example, in my machine sizeof(struct stud) is 208. This includes locations to store the arrays name and roll, the integer height and the floating point number cgpa. When a struct stud variable is passed to a function, these 208 bytes are copied to the argument. This, in particular, implies that changes in the members of the argument are not visible outside the function. This also includes changes in the arrays name and roll.
When a struct stud value is returned from a function and assigned to a variable in the caller function, 208 bytes are copied from the returned value to the variable.
Let us now define a structure with pointers:
struct stud2 { char *name; char *roll; int height; float cgpa; };
Figure : Example of a structure with pointersNow sizeof(struct stud2) is 16. This is what is needed to store two pointers, one integer and one floating point number. These pointers may point to arrays (or may be allocated memory dynamically), but the memory for these arrays lies outside the structure variable. When we pass a struct stud2 variable to a function, only 16 bytes are copied. That includes the pointers name and roll, but not the arrays which they point to. Any change in the arrays pointed to by these pointers is now visible to the caller function.
Arrays and pointers are similar, but not the same thing!
Structures with self-referencing pointers
A structure with pointer(s) to structure(s) of the same type turns out to be very useful for representing many interesting objects. The following figure illustrates how such structures form the basic building block (a node) for representing a list and a tree. We will see later how such objects can be dynamically created and maintained. For the time being, let us focus on how a structure representing a node in a list or tree can be defined.
Figure : Example of structures with self-referencing pointers
(a) List structure (b) Tree structureFirst consider a node in a list. Let us assume that we are dealing with a list of integers. In order to create the linked structure of the above figure, we need a node to contain a pointer to another node of the same type. In practice, a node may contain data other than an integer and a pointer. For simplicity here we restrict the members of a node to only these two fields.
struct _listnode { int data; struct _listnode *next; };One can also use type definitions:
typedef struct _listnode { int data; struct _listnode *next; } listnode;An important thing to note here is that the formal tag after the struct keyword (_listnode in the last example) was absolutely necessary for these declarations, even when the new structure is typedef'd. There is nothing other than this formal name that can specify the type of the pointer next. It is only after the part inside curly braces can be defined properly, when the typedef makes sense.
After these definitions we can use individual variables and pointers. The declaration
listnode mynode, *head;defines a structure mynode of type listnode and a pointer head to a structure of this type.
A node in a (binary) tree consists of two pointers, the first for pointing to the left child and the second for pointing to the right child.
typedef struct _treenode { int data; struct _treenode *left; struct _treenode *right; } treenode;After this definition one can declare individual nodes like:
treenode thatNode, leaf[100];One can declare pointers to nodes in the usual way:
treenode *root;or by using other type definitions:
typedef treenode *tnptr; tnptr root;We will shortly use such linked structures with dynamic memory allocation for realizing several useful (abstract) objects.
Unions
Suppose we want to make a list of nodes. Each node in the list may be one of two possible types: a data node and a control node. Suppose further that a data node stores an int, whereas a control node stores a control information that can be specified by a 16-character string. A structure like the following can be used:
struct foonode { int data; char control[16]; } thisNode, fooArray[1024];The problem with this is that irrespective of whether a node is a control node or a data node, the structure requires space for both the data and the control string. A data node does not use the control string at all, and similarly a control node does not require the data. That leads to unnecessary waste of space. In order to reduce the space requirement of each node, we should use a union instead of a struct.
union barnode { int data; char control[16]; } thisNode, barArray[1024];In this case the compiler reserves the space that is sufficient to store the biggest of the individual members. For example, the int member requires 4 bytes, whereas the control string requires 16 bytes. For the struct foonode the compiler uses 20 bytes of memory. For the union barnode, on the other hand, a memory of only 16 bytes is allocated. That memory (more correctly, a part of it) can be used as an integer variable or as a character string. In other words, the members of a union occupy overlapping space. When we say thatNode.data or barArray[51].data, the content of the memory is interpreted as an integer, whereas thatNode.control or barArray[51].control refers to a character string.
This may seem confusing initially, because it is not clear what data is actually stored in the memory. Interpreting a character string as an integer need not always make sense, and vice versa. The information regarding what kind of data a union stores is to be maintained externally, i.e., outside the union. One possibility is to use unions in conjunction with structures.
#define DATA_NODE 0 #define CONTROL_NODE 1 struct foobarnode { int what; /* can be either DATA_NODE or CONTROL_NODE */ union { int data; char control[16]; } info; } thatNode, foobarArray[1024];This structure stores the type of the node and then the union of an integer and a character string. Depending on the value of what, the programmer is to interpret the type of the node. If what is set to DATA_NODE, one should use the union info as an integer data and access this as thatNode.info.data or as foobarArray[131].info.data. On the other hand, if what is set to CONTROL_NODE, one should use the union as a character string that can be accessed as thatNode.info.control or as foobarArray[131].info.control.
Here is another example, in which a node contains a union of three different kinds of data.
#include <stdio.h> typedef struct _foostruct { int intArray[512]; double dblArray[128]; char chrArray[1024]; struct _foostruct *next; } foostruct; typedef struct _barstruct { int type; union { int intArray[512]; double dblArray[128]; char chrArray[1024]; } data; struct _barstruct *next; } barstruct; int main () { printf("sizeof(foostruct) = %d\n", sizeof(foostruct)); printf("sizeof(barstruct) = %d\n", sizeof(barstruct)); }In my machine, this program outputs:
sizeof(foostruct) = 4100 sizeof(barstruct) = 2056Look at the space saving effected by using the union. Note also that the next pointer should be there in every node irrespective of its type. That is why this pointer should be declared outside the union.