Pointers

Pointers are a fundamental part of C programming, and typically necessary in any non-trivial software functionality. The ability to directly operate with pointers and memory is one of the differentiating factors between C and many other (higher-level) programming languages. Unfortunately, the ability to directly operate with (virtual) memory also introduces various ways for errors that are sometimes hard to trace.

Background

Variables declared in the program code allocate memory space that depends on the type of variable. For example, a char variable uses one byte (8 bits) of memory, and int (usually) uses four bytes (32 bits) in memory. For local variables the system allocates the needed memory space automatically when the variable is declared, and releases the memory when execution exits the program block or function where the variable was declared. After exiting a function or program block (the section framed with curly brackets), the compiler does not allow using the variables introduced inside the block, but returns an error in response to such attempt.

Consider a simple example as in following:

void main(void)
{
    char a = 10;
    char b = 12;
    int c = 123456;
    char *d = &a;
}

The above function does not do anything else, but declares a few variables. After the four variable declarations, the content of memory is like follows:

Variables a and b use one byte of memory each, and variable c takes 4 bytes. As the variables are declared, their initial values are set. If this was not done, the memory would be allocated in the similar way, but its content would be unspecified. Variable d is a pointer to a char-type value in memory, and it is initialized to point to the location of variable a using an address operator (&). Even though it points to a char-type value, variable d takes four bytes from the memory, because it is a pointer: its value is a memory address (in this example we assume that addresses are 32 bits long). The picture also shows imaginary memory addresses allocated for the variables. Normally the programmer does not need to know about numeric addresses, but they are shown here to clarify the functionality of pointers. Despite this example, one should not generally make specific assumptions about how variables are placed in the memory, because compilers sometimes take liberties in memory usage.

When function exits, the memory allocated for local variables is released. This is not a problem for the basic data types, because compiler raises an error if the program tries to use the variable from outside the scope it was declared. However, when referring to the variable via pointers, the compiler cannot protect the programmer from referring to invalid memory location. Therefore careless use of pointers may cause problems, for example when referring to memory that has already been released for other use.

Like other types of variables, an uninitialized pointer has unknown value. Use of such pointer will likely cause the program to crash.

Pointer variable basics

The above example showed an example of how a pointer variable is declared and initialized in the case char data type, using the * (dereferencing) operator after the data type. Similar format can be used with other data types. There could be, for example, int *e or float *f. Despite different data types, all these variables allocate similar space in memory, because their content is actually a memory address that points to location that contains a value of given data type.

C allows different uses of spacing for declaring pointers: char* d;, char * d;, and char *d; are all valid and equivalent notations. Usually it is good coding style to consistently use one of the above throughout the program.

Let's extend the previous example by a few lines, and take one more pointer, e, into use:

int main(void)
{
    char a = 10;
    char b = 12;
    int c = 123456;
    char *d = &a;
    char *e;

    e = d;
    printf("*e: %d   e: %p\n", *e, e);

    *d = 13;
    printf("*d: %d   d: %p   *e: %d   a: %d\n", *d, d, *e, a);
    if (*d > b)
        printf("New value is greater than b!\n");
}

Pointer variables can be assigned as any other variables, and such assignment (as on line 9 for variable 'e') will copy the address. After the assignment, both 'e' and 'd' point to the same location (address of variable 'a').

The value referenced by a pointer can be accessed using * (dereferencing) operator, as shown for example on line 10 with the printf call. The deferencing operator can be used in any expression (part of function call, assignments, comparisons, ...). Because pointer 'e' references to variable 'a', the printf call outputs value 10 for the first field. The latter printf field ('%p') is an example of printing out the value of pointer in hexadecimal format (not very often needed).

Line 12 uses the dereferencing operator to modify the value pointed by 'd'. Because pointer 'd' refers to variable 'a', this causes the value of variable a to be changed.

It may be sometimes difficult to distinguish dereferencing operator and multiplication: the difference is that latter is binary, and has operands on both sides, while derefence is unary, and works with single operand.

Assuming the imaginary addresses shown in the picture, the above function will output (note that above we use a new printf formatting conversion, %p for addresses):

*e: 10   e: 0x1000
*d: 13   d: 0x1000   *e: 13   a: 13
New value is greater than b!

At the end of the main function, the content of variables looks like this:

C compiler does not protect from invalid memory access during compile time, so the effects of invalid memory access can be only detected when running the program. Therefore pointers allow making programming errors that may sometimes be difficult to track. Let's further modify this example and add a few new lines that are incorrect, and will cause an error. This is to illustrate a very common class of C programming errors, invalid memory references.

void main(void)
{
    char a = 10;
    char b = 12;
    long int c = 123456;
    char *d = &a;

    *d = 13;
    printf("*d: %d   d: %p   a: %d\n", *d, d, a);
    if (*d > b)
        printf("New value is greater than b!\n");
    d = 14;
    printf("d: %d\n", d);
    *d = 15;
    printf("bye bye!\n");
}

When compiling this program, the following output appears:

$ gcc -Wall -std=c99 -pedantic testi2.c
testi2.c: In function ‘main’:
testi2.c:12: warning: assignment makes pointer from integer without a cast
testi2.c:13: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘char *’

Compiler produces an executable binary file, but shows two warnings during compiling the program. Warnings are a sign of a programming error, and if they show up, one should try to fix the code accordingly. Here the first warning shows up because an integer is incorrectly assigned into a pointer variable on line 12. The second warning says that '%d' format specifier should not be used with pointer on line 13. On line 13, printf("d: %d\n", *d); would have been correct, because the object behind pointer is integer: after dereferencing operator the object type changes to the actual type behind the pointer.

Line 12 changes the value of pointer to 14, which is an unaccessible address for the program, and therefore will cause a failure when the program is run. Also the usage in printing the value of 'd' is incorrect in the second printf call, because 'd' is not integer. We should have used %p formatting conversion instead, that is used to show addresses in pointers.

Despite the warnings, an executable is created. When it is executed, the following appears:

*d: 13   d: 1000   a: 13
d: 14
Segmentation fault: 11

The system does its best to execute the program. The first line shows that value of 'a' has changed. The line also demonstrates how addresses can be shown using '%p' formatting conversion (although here we have very small address space for the example). The next line shows that the asignment operation seemingly works, and the value of pointer 'd' can be printed to the screen, despite the warnings the compiler gave. When after this the program tries to modify the content behind pointer 'd', it exits immediately in a system signal. "Segmentation fault" means that the program tried to access invalid address in memory. Operating system prevents this, and terminates program immediately. Therefore line 15 is not executed, and "bye bye" never appears on the screen.

In the early part of the example, we also saw an address operator &. The address operator is an unary operator returns address (that can be assigned to pointer variable) to the object (variable, array member, ...) that follows it.

Pointers in functions

Pointers can be used much like other variables, and they can also be used as a parameters in function call, and return values for a function. The scanf function uses pointers as parameters, because the function needs to copy the value typed by user to the context where scanf was called, and pointer is practically the only way to do it. If a basic (non-pointer) data type is used directly as a function parameter, the function could not modify its value in the caller's context, because the parameters are passed as value, and copied for the function's local use.

Below is a short example, 'my_readint' function, that reads a character from user, converts it to integer, and copies its value to the location indicated by the pointer. The function returns 1, if it changed the pointed value, or 0, if it did not get a valid number, and therefore did not change the content of memory.

#include <stdio.h>

int my_readint(int *value)
{
    char c;
    int ret;
    ret = scanf("%c", &c);
    if (ret == 1 && c >='0' && c <= '9') {
        int num = c - '0';   // converting the character to integer
        *value = num; // write value where pointer refers to
        return 1;
    }
    return 0;
}

int main(void)
{
    int a;
    int *ptr_a = &a;
    if (my_readint(ptr_a))
        printf("reading succeeded: %d\n", a);

    // here is another way to use the function
    my_readint(&a);
}

There are two calls to my_readint from the main function (lines 20 and 24). In both calls to my_readint, the parameter is set to point to the location of variable 'a', but in two alternative ways (the latter just shows that address operator can be used as part of other expressions, such as function calls).

Inside the function, argument 'value' is a pointer to an integer, that was set by the caller of the function. The function reads a character from user, makes an assumption that user types a digit, and converts the ASCII digit into a integer number by subtracting ASCII code of '0' from the value entered by user (line 9). The resulting integer value is assigned to the memory location pointed by 'value' (line 10), which in this case points to variable 'a' in the main function. With pointers, a function can modify the variables and other memory outside the function, which would not be possible otherwise.

Using pointers as parameters is useful, for example to be able to return values from inside the function by also other means than just return parameter. This is needed, if a function needs to return more than one values to the caller.

Pointer can also be used as the return value type of a function, as seen in the below example. The example also illustrates the use of NULL pointer, which is a special pointer value that is used to indicate an error, unused pointer, or some other special case. Trying to refer to NULL pointer with dereference operator will always cause an error, but it is ok to assign NULL as a pointer variable value, or in conditions compare a pointer value with NULL for some special program action. NULL is not defined in C by default, but included in stddef.h header, which you'll need to include in order to use it.

Below is an example of a function that reads an integer from the user to a given address, but only if the given pointer is not NULL. During normal operation, the function returns the pointer it got as a parameter. A NULL pointer is returned, if the function was unable to read the integer into the pointer. One error case is that the given pointer was NULL. This needs to be checked, because the program would crash due to an invalid memory access, if a NULL pointer was passed as a scanf parameter. Another error case is that scanf failed to read the number, for example due to invalid user input. Both cases result in the function returning NULL.

int *read_int(int *number)
{
    int ret;
    if (number == NULL) { // Check the pointer before calling scanf
        return NULL; // If the pointer is invalid, the function exits
    }
    ret = scanf("%d", number);
    if (ret != 1) { // Check if an error occurred during reading
        return NULL;
    }
    return number;
}

Task 02_basics_1: Number Swap (1 pts)

Objective: A simple basic example to practice the use of pointers.

Implement function number_swap(int *a, int *b) that gets two pointers to integers as parameters. The function should swap the content of pointed integers. For example after the following code, value of val1 should become 5, and value of val2 should become 4.

int val1 = 4; int val2 = 5;
number_swap(&val1, &val2);
if (val1 == 5 && val2 == 4) {
    printf("Great, it worked!\n");
}

Address arithmetics

Plus and minus arithmetic operators can be used also on pointers. These cause the pointer to be moved either forward or backward by the given number of objects of the size of referenced data unit. Therefore, the adjustment in the address depends on the size of the data type behind the pointer.

Here is an example that demonstrates address arithmetics in different formats:

#include <stdio.h>

int main(void)
{
    int array[50];  // allocate space for 50 consecutive integers
    int *intPtr = array;  // points to the beginning of array         
    int i;
    i = 50;
    while (i > 0) {
        *intPtr = i * 2;  // the latter is ordinary multiplication operator
       intPtr++;  // move the pointer forward to next object in array 
       i--;
    }

    intPtr = array;  // reset intPtr to the beginning of array

    for (i = 0; i < 50; i++) {
        int value = *(intPtr + i);  // deference to i-th object in array 
        printf("%d ", value);
    }
}

The function starts by allocating space for 50 integers (arrays are described later on this page), and setting a pointer 'intPtr' to point at the beginning of this space (reference to array is a pointer to its first element). In the while loop, the function sets the location referred by intPtr based on integer variable i, multiplied by two (line 10). intPtr++ on line 11 causes the pointer to move to the next integer in the array. We can repeat this 50 times, because we allocated space for 50 integers.

After the array is set, 'intPtr' is reset back to the beginning of the array (line 15), and a different form of loop is applied. Here we set the integer value based on the i-th object in the array (line 18): adding 'i' to a pointer walks 'i' steps forward in memory. It is possible to use dereference operator together with arithmetic calculations on pointers, as shown here, but the use of parenthesis is important in this case because of the precedence rules. As a result, this function will output even numbers from 100 to 2 in decreasing order, separated by space.

In order for the address arithmetics to work correctly, it is important that the pointer data type is correct. Compiler will remind you about this, if incompatible pointers are used together.

Pointers can be used in comparisons like other variables, but it should be noticed that in such case addresses are compared, not the values behind the pointers. For example the following code will tell that the pointers are different, but the values behind pointers are same.

int a = 5;
int b = 5;
int *pa = &a;
int *pb = &b;

if (pa == pb)
    printf("Pointers are same\n");
else
    printf("Pointers are different\n");

if (*pa == *pb)
    printf("Values are same\n");

Task 02_basics_2: Array Sum (1 pts)

Objective: Practice address arithmetics with pointers.

You should implement function int array_sum(int *array, int count) that gets a pointer to the beginning of array of integers in consecutive slots in memory, and calculates the sum of the integers. The number of integers to be counted is given in parameter count. The sum is returned as the return value of the function.

For example, the following code:

1 2	int valarray[] = { 10, 100, 1000 }; int ret = array_sum(valarray, 3);

should set ret to 1110.

Arrays

Basics

Multiple values of a given data type in consecutive memory slots is called an array. When array is declared, its size is given so that the system to allocate enough memory for the array. In variable declaration, array size is indicated using square brackets.

Below is a simple example of how to use arrays:

short apples = 10;
short slots[4];
short oranges = 20;

int i;
for (i = 0; i < 4; i++) {  /* Here we initialize the array */
    slots[i] = i + 1;
}

for (i = 0; i < 4; i++) {  /* Output the values in array */
    printf("array element %d is %d\n", i, *(slots + i));
}

The program defines three variables: 'apples' and 'oranges' are normal short integers, but 'slots' is an array that has four elements. The array size must be given as constant. As with other data types in C, the elements of an array are not initialized by default, so their initial values are unknown. After the three declarations the variables are placed in memory (roughly) in the following way:

In memory, the array takes the given number of consecutive slots, with size depending on the data type used. In case of short integer, each element takes 2 bytes (16 bits), so in total the four-element array uses eight bytes.

Following the variable declarations the array is initialized in a for loop (line 7). The example shows the use of array notation (slots[i]) for accessing a particular array element. The first element is always indexed as 0. When indexing an array, any form of expression can be used (variable, constant, arithmetic operation, ...).

Arrays and pointers have a close relationship, as can be seen in the last part of the above example. The 'slots' array can be represented as a pointer to its first element, and the different elements of the array can be accessed using pointer arithmetics. The printf call on line 11 that outputs the contents of the array shows an example of the pointer arithmetic usage. Using slots[i] is equivalent with *(slots + i). The program would show the following:

array element 0 is 1
array element 1 is 2
array element 2 is 3
array element 3 is 4

C compiler does not check at compile time whether array index is within the array bounds, and the compiler allows indexes beyond 4 in above example. Such errors can be detected only when running the program, and even then they don't always clearly show up. If the for - loop had been wrong and processed indexes beyond 4, it would overwrite other content in the memory. The following small modification to above code (the for loops erroneusly go through 6 items):

short apples = 10;
short slots[4];
short oranges = 20;

 int i;
 for (i = 0; i < 6; i++) {  /* Here we initialize the array */
     slots[i] = i + 1;
 }

 for (i = 0; i < 6; i++) {  /* Output the values in array */
     printf("array element %d is %d\n", i, *(slots + i));
 }

caused the value of 'apples' to become 5 in my test run. The compiler does not warn anything, and also the program seems to work, if one does not pay careful attention to the variable values. However, the memory neighbouring to the array is silently modified. For their near-invisible nature, these sort of errors have been commonly used in attacks that can turn the programming errors into security vulnerabilities. (Heartbleed is a recent example of a severe security vulnerability caused by buffer over-read bug, although in a bit different context)

Like other kinds of variables, also array can also be initialized with declaration. Here are two ways of initializing an array:

1 2	short slots[5] = { 3, 7, 2, 123, 45 }; int numbers[] = { 67, 12, 34 };

In first of the above lines, the array length is explicitly specified. When defined together with an initialization list, the array length does not need to be separately given, because the length of the initialization list indicates the length of the array. In such case, just empty square brackets are enough, as in the case of 'numbers' above. If initialization list is shorter than the explicitly given array length, the unspecified values are initialized to 0. This is true only if the initialization list contains at least one element.

Arrays in function

Arrays can be passed in function arguments, but the array operator is usually not used in that case. Arrays are rather passed using a pointer type, where the pointer refers to the first element in array. The length of the array cannot be read from the function parameter, but it needs to be indicated by some other means. Two common solutions are to either pass the length as another function parameter, or indicate the end of an array by some special value after the last element. In the latter case the allocated length of the array needs to have extra space for the end marker.

Array cannot be returned as a function return parameter (at least without dynamic memory that will be introduced in module 3). If function should return an array, the space for the array needs to be allocated either outside the function, or it needs to be allocated dynamically.

Here is an example of using array with function. It also shows how the sizeof operator can be used to determine the total size of the array in bytes (see next section for more information). One way of getting the number of elements in the array is to divide its total size by size of one element, as done in this example. The example shows, once more, how pointer type parameter is used to access the array inside the function.

void show_table(short *a, size_t n)
{
    int i;
    for (i = 0; i < n; i++) {
        // print the table using pointer arithmetics:
        printf("%d ", *(a + i))

        // Also this would produce same result:
        printf("%d ", a[i])

        // We could also do this for same effect, but pointer 'a' is modified.
        // Therefore, this cannot be used together with indexing, as above.
        // Modification of pointer 'a' is not visible outside the function.
        //printf("%d ", *a++);
    }   
    printf("\n");
}

int main()
{
    short table[] = { 1, 4, 6, 8};
    printf("size: %lu\n", sizeof(table)); /* print array size for fun */

    /* below is one way to get the number of elements */
    // sizeof(table) is 4 * sizeof(short) == 8;
    show_table(table, sizeof(table)/sizeof(short));

    // in this case the above would be equivalent to:
    show_table(table, 4);
}

Task 02_basics_3: Array Reader (1 pts)

Objective: Practice use of arrays and array notation together with scanf.

Implement function int array_reader(int *vals, int n) that reads integer values using scanf into pre-allocated array ('vals'). The numbers read from input are separated with whitespace (space, tab, newline,...) that is the default field separator for scanf function, i.e., you should be able to use the basic scanf format string for decimal numbers. Parameter 'n' gives the maximum length of the array, and the maximum number of values to be read. If user does not give a valid integer (as can be seen from return value of scanf), the array ends, even if the maximum size was not yet reached. The function returns the final size of the array at the end, which can be smaller that the incoming 'n' parameter, if the user finished the input with non-valid integer.

Below is an example how this function can be tested:

int array[10];
int n = array_reader(array, 10);
printf("%d numbers read\n", n);
int i;
for (i = 0; i < n; i++) {
    printf("%d ", array[i]);
}

For example, the following input should cause the first four array elements to become 5, 8, 2, and 7, and then terminate because the fifth field read is not a decimal number:

5 8 2 7 -

Task 03_mastermind: Mastermind (1 pts)

Objective: Practice manipulation of arrays.

Implement function void mastermind(const int *solution, const int *guess, char *result, unsigned int len) that compares integer array 'guess' to array 'solution'. Both arrays contain 'len' integers from 0 to 9. The function outputs character array 'result', that also has 'len' characters, in the following way:

If arrays 'solution' and 'guess' have same number in Nth array location, character array 'result' will have '+' in Nth location.
If array 'guess' has number in Nth location that exists in array 'solution', but in different location, character array 'result' will have '*' in Nth location.
If array 'guess' has number in Nth location that does not exist at all in array 'solution', character array 'result' will have '-' in that location.

Note that arrays 'solution' and 'guess' are such that you should not modify (input parameters), and array 'result' may not have any sane content when function is called, but you need to set it in the function.

For example, when 'len' is 6, 'solution' is { 2, 6, 6, 3, 5, 3} and 'guess' is {4, 5, 6, 1, 8, 9}, the function sets 'result' to {'-', '*', '+', '-', '-', '-'}.

The main function in main.c implements a simple Mastermind game you can use to test your function.

Task 04_sort: Sort (1 pts)

Objective: Further practice of array manipulation, this time with a bit more complicated case.

Write function void sort(int *start, int size) that sorts the integers in the given array into an ascending order (from smallest to largest). You can use the selection sort algorithm: first find the smallest number in array, and swap it with the first element of the array. Then do the same starting from the second element of the array, moving on to third, fourth, etc. until the whole array is processed. Test the function with different arrays of different size.

Strings

Basics

The C language does not have any particular string abstraction, but strings in C are just char type arrays that end in '\0' - character (that represents numeric value 0). The final 0-charater is not visible, but it still uses one byte of memory in the end of the string.

String constants can be specified inside quotes, for example in the following way:

1
2
3

char *string_A = "This is first string";
char string_B[] = "Another string";
char string_C[] = { 'O','n','e',' ','m','o','r','e','\0' };

The first and second example above are not identical: the first is a pointer to a string constant that cannot be modified (it is typically allocated from a read-only section in memory). The second and third definitions specify an array that can be modified later. When using double quotes, as in the first two cases, the final '\0' is added implicitly to the string even if not shown, but in the last alternative the '\0' character needs to be added separately, because the string is specified as an array of characters. Below is the same in graphical form -- string_A is actually a pointer that refers to a read-only part of the memory, the other two strings are directly allocated in the context of the function call. This also makes a difference when using the sizeof operator on these variables.

Strings can be output using %s formatting specification in printf. For example:

1 2	char string_B[] = "another string"; printf("My string is %s\n", string_B)

In functions strings are represented as a pointer to the beginning of the character array -- similarly to handling any other arrays in functions. For example, here is a function that "encrypts" a string by incrementing all its character values by one:

void encode(char *str)
{
    while (*str) {  /* Terminates when we come to \0 */
        *str = *str + 1;  // modify the character behind pointer
        str++;  // move the pointer to next character
    }
}

int main()
{
    char message[] = "It is going to rain tomorrow";
    encode(message);
    printf("encoded: %s\n", message);
}

The while loop above repeats as long as the character referred by pointer is something else than '\0' (remember: in conditional statements 0 means false, everything else is true).

Strings cannot be copied with a simple assignment operation. If a full string needs to be copied, it needs to be done character by character between two arrays. Function strcpy performs such string copy (see details below).

Task 06_strbasic_1: Count Alpha (1 pts)

Objective: Get familiar with operating on a string, character by character until the end of the string.

Write function int count_alpha(const char *str) that counts the number of alphabetic characters in given string. You can use function int isalpha(int character) defined in ctype.h header, to check whether a single given character is alphabetic (i.e. you need to add a correct #include directive in the beginning of your source file). isalpha returns non-zero if the given character is alphabetic, or zero if it is not alphabetic. The function should return the number of alphabetic characters.

Helpful Qualifiers and Operators

Size of variable or data type

The sizeof operator can be used to query the size of a data type or variable (the size of same data type may differ between different system architectures). Here is a short example illustrating the use of sizeof:

short a;
short *pa;
printf("size a: %lu\n", sizeof(a));
printf("size *pa: %lu\n", sizeof(pa));

On my Mac the above code shows that sizeof(a) is 2 bytes, as expected for a short integer, and sizeof(pa) is 8 bytes, because it is a pointer, running on 64-bit address architecture. Note the format specifier on above printf calls: sizeof returns unsigned long integer.

Declaring new data types

Sometimes it is inconvenient to use long data type specifications repeatedly. With typedef declaration the program can specify new data types. The C library also specifies some commonly used type aliases for specific uses, such as size_t, an unsigned integer used for representing size of data objects in C. The return value of the sizeof operator is of size_t type. An alternative "mySize" type could be specified and used in the following way:

1
2
3

typedef unsigned long mySize;
long b;
mySize a = sizeof(b);

Typedef may become useful especially later with data structures and abstract data types.

Constant parameters and variables

Parameters and variables can be declared as constant by the const qualifier. Such variables can be read, but they cannot be written to.

Using the const qualifier is useful for documenting function interfaces: it tells that a particular function parameter is not going to be modified by the function. This is not mandatory, but helps the programmer defend against possible programming errors. An example of (an erroneous) function with a constant parameter, that the function tries to modify:

void a_func(const int *param)
{
     int a = *param;
     a = a + 1; /* This is ok, because original parameter is not modified */
     *param = *param + 1; /* This is NOT ok, because of const qualifier */
}

The above function declaration indicates that the parameter "param" is not to be modified inside the function. Const qualifier also prevents modifying the values referred by a pointer. Therefore the compiler returns an error on the line that tries to modify the value behind "param" pointer.

The const qualifier can also be used with variables. This makes the variable constant, that cannot be changed during the lifespan of the program. Note that in this case the variable needs to be initialized immediately. For example:

const size_t maxSize = 10; /* global variable, can be used anywhere */

int main(void)
{
    int i;
    for (i = 0; i < maxSize; i++) {
        /* do something */
    }
}

Precedence

Understanding the order of evaluation of different operators between is important, because that affects the outcome of expressions. If precedence rules are not properly understood, tracking down invalid behavior is difficult. For example, in the case of pointer arithmetics, a forgotten pair of parenthesis may lead to invalid caluclation, and invalid memory reference.

An (incomplete) list of the order of evaluation between different operators is listed below. Table 2-1 on page 53 of the K&R book gives a complete list of all the C operators and their precedence. When evaluating an expression, the operators on the top of the list are evaluated before the operators below them (some of the operators are introduced only in the next module).

() , [] , -> , .
++ , -- , * (pointer) , & (address-of) , (TYPE) (type cast) , sizeof (associavity: right to left)
* , / , % (arithmetic operators)
+ , - (arithmetic operators)
< , <= , > , >=
== , !=
&& (logical AND)
|| (logical OR)

In most cases the operators of similar precedence level are applied from left to right, but group 2 above makes on exception: the operators are associated from right to left.

The precedence order explains why we needed parenthesis when accessing array by the means of pointer arithmetics:

1
2
3

int arr[10];
int b;
b = *(arr + 2);

accesses the third element in array 'arr', but

1
2
3

int arr[10];
int b;
b = *arr + 2;

increases the first element in array by 2, because the dereference operator is evaluated before sum. When unsure, using extra pair of parenthesis is not harmful. If you want to ensure your code is clear and understandable, it is a good idea to avoid writing code that excessively depends on the order of evaluation: usually it is possible to split complex expressions in multiple parts, even if that may involve slightly more overhead.

It is also useful to know that C does not specify the precedence of operands around an operator. For example, in int val = funcA() + funcB(); the two functions could be executed in either order. If the two functions have mutual dependencies, for example, they modify a common variable, the outcome of the statement can be unpredictable, and produce different results in different environments.

Strings

Functions for string handling

The standard library contains functions that are helpful in operating with strings. They are defined in include header , so if you want to use them, add #include <string.h> in your program. The detailed descriptions of the functions can be found in the Unix manual pages, that can be accessed by the "man" command in the command line shell, or use the man content provided in the web.

strlen returns the length of a given string. The exact form of the function is size_t strlen(const char *s), i.e., it takes a pointer to string as a parameter, and returns the number of characters in the return before the terminating nul character.
strcmp compares two null-terminated strings. The exact form is int strcmp(const char *s1, const char *s2). The function returns 0, if the strings are same, or non-zero if they are different.
strcpy and strncpy copy a string to another location. The exact form is char *strcpy(char *dst, const char *src) or char *strncpy(char *dst, const char *src, size_t n), where the string pointed by 'src' is copied to location pointed by 'dst'. The difference with latter is that it copies at most 'n' characters, even if the original string was longer. This is useful to protect against overflow of the destination buffer. The destination buffer needs to be properly allocated before copying. The functions return the destination pointer.
strcat and strncat append a string after the other string, to make them a single concatenated string. The exact form is char *strcat(char *s1, const char *s2) or char *strncat(char *s1, const char *s2, size_t n), where the string pointed by 's2' is appended to the end of string 's1'. Again, care should be taken to ensure that the destination buffer in 's1' has enough space. The latter form of the function helps in that, because the 'n' parameter gives the upper limit to the number of characters to be appended.
strchr finds the first instance of given character in a string. The exact form is char *strchr(const char *s, int c), where 'c' is the character that is sought from string 's'. The function returns pointer to the first instance of the given character, or NULL if the character was not found in the string.
strstr tries to find the first instance of a substring in another string. The exact form is char *strstr(const char *s1, const char *s2), where 's2' is the string that is sought from within string 's1'. The function returns the pointer to the start of the first instance of the substring within 's1', or NULL if fully matching substring was not found.

Below is an example that demonstrates the above functions, and a few other aspects related to strings in practice.

#include <stdio.h>
#include <string.h>

int main(void)
{
    char buffer[40];
    char *strings[] = { "Paasikivi", "Kekkonen", "Koivisto",
                        "Ahtisaari", "Halonen" };
    int left, i;

    strcpy(buffer, strings[0]);
    left = sizeof(buffer) - strlen(strings[0]);
    i = 1;
    while (left > 0 && i < 5) {
        strncat(buffer, strings[i], left - 1);
        left = left - strlen(strings[i]);
        i++;
    }
    printf("buffer: %s, length: %lu\n", buffer, strlen(buffer));
}

On line 6, variable 'buffer' allocates a character array of 40 bytes, that can fit a string of 39 characters plus the terminating null character. 'strings' is an array of strings, i.e., each element in the array is a pointer to constant string (i.e., a charater pointer). Multidimensional arrays will be discussed in more detail in module 4.

Line 11 copies the first string ("Paasikivi") to 'buffer' (including the terminating null). Variable 'left' tracks the remaining space in the buffer: sizeof(buffer) is 40, and strlen(strings[0]) is 9 (the terminating null is not included). sizeof(buffer) is always the same, regardless of the string content currently in buffer.

Starting from line 14 the code iterates through the remaining 4 strings in the 'strings' array in while loop. Line 15 concatenates the string after the previous string content in 'buffer', replacing the earlier null terminator in the buffer, and adding a new null terminator at the end of the combined string. strncat is used, together with 'left' variable, to avoid overwriting the 40-byte buffer (left - 1 is used, to leave space for the terminating null character). After each concatenated string, the left counter is decremented accordingly (line 16). If there is no space remaining in buffer, the while condition (left > 0) causes the loop to terminate.

Finally, on line 19 the resulting string in 'buffer' is shown to user, along with its length. The output looks like this:

buffer: PaasikiviKekkonenKoivistoAhtisaariHalon, length: 39

The last name did not fully fit to the buffer, but because we used strncat, we did not overwrite the buffer. Had we used strcat without the 'left' counter, we would have overwritten the buffer, and likely corrupted the memory.

Task 06_strbasic_2: Count Substring (1 pts)

Objective: Practice the use of string functions (although the exercise could be done without them as well).

Write function int count_substr(const char *str, const char *sub) that counts how many times string 'sub' occurs in string 'str', and return that as return value. For example, call count_substr("one two one twotwo three", "two") should return 3, because "two" occurs three times in the longer string. Note that the spaces do not have any special role in these string manipulation operations -- they are just normal characters like everything else.

Hint: Function strstr might be helpful here. It is also useful to observe that you can process partial strings by using a pointer to the middle of string (or any array in general). In such case the function ignores the characters before the pointer, and continues processing from the pointed location.

Task 07_altstring: New String (4) (4 pts)

Objective: How do the string functions really work? This exercise might help understanding them.

For this task we assume a new kind of string that does not end at '\0' like the normal strings. Instead, the new string terminator is hash mark '#'. Therefore we need to re-implement some of the common string processing functions following the new string specifications.

Note that the above-discussed string functions defined in string.h do not work with this exercise! The char arrays given to the functions do not necessary contain the usual '\0' terminator.

2.9.a Print string

Implement function void es_print(const char *s) that outputs string s until the first instance of the string terminating '#'. However, the hash character should not be printed. For example, if the function gets the following standard C string as input:

1	char *str = "Auto ajoi#kilparataa";

it will output:

Auto ajoi

2.9.b String length

Implement function unsigned int es_length(const char *s) that returns the number of characters in array s before the terminating '#'. The hash character should not be included in count.

2.9.c String copy

Implement function int es_copy(char *dst, const char *src) that copies string 'src' to the location pointed by 'dst'. The function should return the number of characters copied, excluding the hash character. The function must copy characters only until the first hash character, and remember that the destination string must also terminate with '#'. (Hint: you can test that the destination string looks correct by using the es_print function)

2.9.d String tokenizer

Implement string tokenizer that can be used to split the given string into substrings, seprated by given character. The function format is char *es_token(char *s, char c), where 's' points to the string, and 'c' is the character that splits the string. When character in parameter 'c' is found, it is replaced by '#', and the function returns with a pointer to the position that follows the just-replaced character. If character in parameter 'c' is not found, the function returns NULL (note that NULL is defined in stddef.h header. The token replacement happens on the original string -- the function should not copy the string.

For example, calling es_token(str, ',') on the following string should change string:

1	char *str = "aaa,bbb,ccc#ddd,eee";

to become:

"aaa#bbb,ccc#ddd,eee"

after the first call to es_token, and the return value should point to where "bbb" starts. Note that tokens after the '#' sign are not replaced, because the tokenizer stops at the end of the string.

Task 08_korso: Korsoraattori (1 pts)

Objective: Get more familiar with character-by-character string manipulation.

(To non-Finnish students: with this exercise you will also get to practice modern Finnish language. This exercise is a tribute to the orginal Korsoraattori service.)

Implement function void korsoroi(char *dest, const char *src) that "korsorizes" ("korsoroi" in Finnish) the string given in parameter 'src' and writes the resulting string to the location pointed by 'dest'. The string must be modified in the following way:

Every instance of "ks" should be changed to "x".
Every instance of "ts" should be changed to "z".
After every third word in the original string there should be additional word "niinku" in the destination string.
After every fourth word in the original string there should be additional word "totanoin" in the destination string.

You can recognize the end of a word from space character (' '). You do not need to add anything after the last word. You can assume that the there is enough space at address dest to store the resulting string. You can assume that all letters are in lower case.

For example, string "yksi auto valui itsekseen ilman kuljettajaa mäkeä alas" will become "yxi auto valui niinku izexeen totanoin ilman kuljettajaa niinku mäkeä alas".

Task 09_stringarray: String array (2 pts)

Objective: Take a advance peek into arrays of strings (which are essentially two-dimensional arrays).

Also strings can be placed in arrays. Because string is an array of characters, string arrays are arrays of string, that each are arrays of characters. This exercise assumes an array of strings, where end of the array is indicated with a NULL pointer.

a) Print string array

Implement function void print_strarray(char *array[]) that prints all strings in array, each on a separate line (there is newline character at the end of each string). The function argument notation may seem new: it represents an array that is composed of char * - typed elements. Therefore, you can use each array member as any normal string in expressions. Remember that the end of array is represented by a NULL pointer.

b) Convert string into array

Implement function void str_to_strarray(char* string, char** arr) that gets string as a parameter, and turns it into an array of strings (arr). The original string may contain multiple words separated by space, and the function separates each space-separated word into its own array member. Remember that each string in array must end in '\0' - character, and the array must end in NULL pointer.

We have not yet covered two-dimensional arrays, but when arr[i] is a string in an array as described above, you can access individual j'th character with notation arr[i][j], either for reading or writing.