Creative Commons License

Input/Output

In C input and output happens through I/O streams. I/O stream is an abstraction for delivery of ordered sequence of unstructured data. So far we have used two streams that are open by default in C programs: standard output, for printing text on the screen (printf() function), and standard input, for taking input from user. In addition to these, streams can be opened to other devices, for example to read and write to files, or for communication between devices or hosts, and so on. The main operations for I/O stream are reading and writing a byte stream. In addition, a stream needs to be opened before use (apart from stdin, stdout, and error stream stderr), and closed after use.

Stream input and output is buffered. Buffer is a memory area that temporary stores the bytes before delivering them forward. Buffers are commonly used to enhance system performance. As new input is available, it is first written to buffer, until data is delivered to user program. As program outputs data to stream, it first goes into buffer, until it can be delivered to the I/O resource (such as the terminal window in case of printing text to standard output stream. Because of the buffering, there may be delay between calling the output operation from program and the time the output actually shows up.

I/O stream is accessed using FILE* data type. It is an abstract data type for accessing the stream using the specified functions in the C library. A new FILE reference can be obtained using the fopen function that opens a named I/O stream (e.g. a file), and returns a FILE* pointer that allows accessing the I/O stream. After that the stream can be accessed by using the I/O functions. When the stream is not needed anymore, it should be closed with fclose function that disassociates the FILE variable from the I/O stream, and releases the resources needed by stream management. I/O streams can be associated with any kind of I/O resource, but in the following we mostly work with files.

Here are the basic functions to use a stream. These are defined in the stdio.h header. More information about the function can be found from the man pages.

  • FILE *fopen(const char *path, const char *mode) opens the given file for input and/or output. The path parameter is a string that contains filename (possibly including path) of the file to be opened. mode is also a string that consists of combination of letters that define the mode in which file is opened: "r" means the file is opened for reading; "w" means the file is opened for writing (erasing the previous content of file); "r+" opens the file for both reading and writing. In these cases the file operation starts from the beginning of file, but with "a" mode, the file can be opened for writing starting from the end of the file, for example for appending new content on top of previous one. "a+" does the same, but allows also file reading. The function returns pointer to a new FILE object that can be used by the following functions to access the file, or NULL if opening the file did not succeed.

  • int fclose(FILE *fp) closes the given file, after which it cannot be accessed anymore. The return value is 0 if closing was succesful or constant EOF (that equals to -1) on failure.

  • int fgetc(FILE *stream) reads one character from file stream. The character is returned as return parameter. If the stream is at the end, or some other exceptional condition occurs, the function returns special value EOF (end-of-file). EOF is a macro constant that is used for this purpose by the different I/O functions. The return parameter is integer instead of unsigned char, because fgetc needs to handle full 8-bit value range from 0 to 0xff as legitimate values, and in addition support EOF (which equals to -1).

  • int fputc(int c, FILE *stream)q writes character c to file stream. The function returns the character written, or EOF, if there was an error.

The following example shows a implementation of writeString function (very similar to fputs function) that writes given string to the indicated I/O stream. Writing ends when end of string is encountered. The main function opens file "testfile" for writing ("w" mode), and writes string mystring to the file. Finally the I/O stream is closed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <stdlib.h>
#include <stdio.h>

int writeString(FILE *fp, const char *str) {
    while (*str) {
        // write characters until the end of string
        if (fputc(*str, fp) == EOF) {
            return -1;  // error in writing
        }
        str++;
    }
    return 0;
}

int main(void) {
    char *mystring = "One line written to file\n";

    // open 'testfile' for writing (remove previous content)   
    FILE *f = fopen("testfile", "w");
    if (!f) {
        fprintf(stderr, "Opening file failed\n");
        exit(EXIT_FAILURE);
    }
    writeString(f, mystring);
    fclose(f);
}

I/O stream can consist of any binary data (not just text). In an ASCII (or otherwise) encoded text file the content is assumed to contain printable and readable characters that can be read into a string and shown as a string. A text file can be opened and read with any text editor, but opening a binary file in a text editor is not useful, because it will contain characters that cannot be printed. For investigating binary files other tools, such as hexdump should be used. Binary file can also contain number 0 in a legitimate meaning (not to terminate a string), and number 10 ('\n' in ASCII) likely has some other meaning than starting a new line.

In addition to the basic functions, there are advanced functions for reading and writing data, as follows. Some of them are intended for text data, while others work with binary data.

  • size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream) reads nmemb elements of size bytes into buffer pointed by ptr, from file stream. This function in suitable for reading also binary content: it does have any special processing for special text characters such as line feeds. The function returns the number of items read. If the file ended before given number of items could be read, or if there was an error, the return value is smaller than nmemb. feof and ferror functions (see below) can be used to see the reason for premature completion of read.

  • size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) writes nmemb elements of size bytes from buffer pointed by ptr. Like fread(), this function is suitable for writing binary content. The function returns the number of items written. If there was an error, the return value is smaller than nmemb.

  • char *fgets(char *s, int size, FILE *stream) reads at most size-1 characters to the buffer pointed by s from I/O stream stream. This function is intended for working with text files. The function also reads at most one line, and stops reading if there is a newline character in file (which is included in the buffer). fgets automatically appends '\0' to the end of the buffer. In other words, fgets is primarily intended to operate with text files that can be converted into strings. The function pointer to the string that was read, or NULL if we are at the end of file, or if there was a failure in reading.

  • int fputs(const char *s, FILE *stream) writes string s to file stream. Writing stops at the terminating '\0' character in the string, which is not written to the file. This function is suitable for working with strings and text files. The function returns non-negative number if it is succesful, or EOF on failure.

  • long ftell(FILE *stream) tells the current position in the stream, as bytes from the beginning of the stream. As data is being read or written from the stream, the position indicator moves forward.

  • int fseek(FILE *stream, long offset, int whence) sets the file position indicator to the given position (offset), counted as bytes. If whence is SEEK_SET, the position is counted as distance from the beginning of file, when it is SEEK_END, the position is relative to the end of the file. Setting the position works on files, but may not work on some other types of streams (for example when accessing terminal with stdin or stdout).

  • int fprintf(FILE *stream, const char *format, ...) works similarly to the printf function, but takes one additional parameter, stream, and produces the formatted output to the given file instead of the standard output stream that is typically shown on the screen. The function returns the number of characters printed if writing was succesful, or negative value if there was an error.

  • int fscanf(FILE *stream, const char *format, ...) is similar to the scanf function, except that it tries to read the input from file stream instead of standard input. The function returns the number of fields read, or EOF if there was an error or end of file was reached before the specified fields could be read.

  • int feof(FILE *stream) returns non-zero if the file is at the end (and no more reading can be done), or zero if the file is not yet at the end. Note: the end-of-file state is only set after an attempt to read "past" the end of file. Therefore, if you have read all content of the file, but have not tried to read any further, feof still returns 0.

  • int ferror(FILE *stream) returns non-zero if an error has occurred in an earlier I/O operation, or zero if no error has happened.

  • int fflush(FILE *stream) flushes the buffered data in output stream buffer. Returns 0 on success or -1 on failure.

For standard input and standard output there are pre-defined streams stdin and stdout. For example, calling

fprintf(stdout, "%d\n", an_int)

is equivalent to calling printf with the same format specifiers and parameters. In addition there is a third stream that is open by default, stderr that is conventionally used for printing error outputs from programs.

A text file can be read as follows. The program reads file "test.c" line by line, and shows each line on the standard output. It also uses the standard error stream for error messages.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *f;
    char buffer[100];

    f = fopen("test.c", "r"); // open file for reading
    if (!f) {
        fprintf(stderr, "Opening file failed\n");
        return EXIT_FAILURE;
    }
    while (fgets(buffer, sizeof(buffer), f) != NULL) {
        if (fputs(buffer, stdout) == EOF) {
            fprintf(stderr, "Error writing to stdout\n");
            fclose(f);
            return EXIT_FAILURE;
        }
    }
    fclose(f);
}

For binary files, one should use fread and fwrite for reading and writing. These functions do not have any special treatment on nul characters or newlines. Below is an example of binary write of an integer array of 10 numbers, followed by reading the array from disk. The example demonstrates also the use of feof and ferror indicators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    int numbers[10] = { 1, 0, -2, 3, 10, 4, 3, 2, 3, 9 };
    FILE *fp = fopen("intarray", "w");
    if (!fp) {
        fprintf(stderr, "Could not open file\n");
        return EXIT_FAILURE;
    }
    size_t n = fwrite(numbers, sizeof(int), 10, fp);
    if (ferror(fp)) {
        fprintf(stderr, "Error occurred\n");
        return EXIT_FAILURE;
    }
    fprintf(stdout, "%lu items written\n", n);  // same as printf
    fclose(fp);

    // re-open file for reading, and read the integers
    fp = fopen("intarray", "r");
    int *num2 = malloc(10 * sizeof(int));
    n = fread(num2, sizeof(int), 10, fp);

    // feof indicator should not be set yet, because we did not read
    // past the end of file
    if (feof(fp)) {
        fprintf(stderr, "prematurely reached end of file\n");
        return EXIT_FAILURE;
    } else if (ferror(fp)) {
        fprintf(stderr, "error occurred\n");
        return EXIT_FAILURE;
    }
    fprintf(stdout, "%lu items read\n", n);

    // should not read anything, because we should be at the end of file
    n = fread(num2, sizeof(int), 10, fp);
    if (feof(fp)) {
        fprintf(stdout, "%lu items read, EOF indicator is set\n", n);
    }

    fclose(fp);
    free(num2);
    return EXIT_SUCCESS;
}

This code creates a file of 40 bytes (10 integers of 32 bits each). This is a binary file that cannot be understood by text editor, but hexdump shows the file content as follows:

$ ./a.out
10 items written
10 items read
0 items read, EOF indicator is set

$ hexdump -C intarray 
00000000  01 00 00 00 00 00 00 00  fe ff ff ff 03 00 00 00  |................|
00000010  0a 00 00 00 04 00 00 00  03 00 00 00 02 00 00 00  |................|
00000020  03 00 00 00 09 00 00 00                           |........|
00000028

Note that each integer takes four bytes, in little-endian byte order.

Task 01_filedump: File dump (2 pts)

Objective: Practice basic file reading.

Implement functions textdump and hexdump that will read the given file (the file name in filename parameter) and print its contents to the screen. Both functions return the number of bytes read, or -1 if there was error in opening the file. You will need to implement two output formats as follows:

In Exercise (a) the file is output as text characters. If the read character is printable (as determined by the isprint function call), it should be printed as is. If the character is not printable, '?' should be printed instead.

In Exercise (b) the file content should be printed as hexdump. Each byte is printed as hexadecimal number that uses exactly two characters on the screen. If the hexadecimal number is less than 0x10, a leading zero is applied. Each hexadecimal number is followed by space. A line can have at most 16 hexadecimal numbers, after that a new line is started. Also the number at the end of the line should have a trailing space following it. Here is an example output:

0e 54 65 65 6d 75 20 54 65 65 6b 6b 61 72 69 30 
30 30 30 30 41 00 00 14 45 4c 45 43 2d 41 31 31 
30 30 00 00 00 00 00 00 00 

Task 02_stats: File statistics (2 pts)

Objective: More practice on file processing

Implement functions to calculate the following metrics from a given file:

(a) Line count

Implement function int line_count(const char *filename) that calculates the number of lines in the given file, and returns the line count. If there is an error, the function should return -1. Empty file is considered to have no lines. If the last line of the file is not empty, it should be counted as a line even if it does not end in newline character.

(b) Word count

Implement function int word_count(const char *filename) that calculates the number of words in the given file and returns the word count. In this exercise we define word like this: Word is a substring that contains at least one alphabetic character. Two words are separated by one or more whitespace characters. If there is an error, the function should return -1. (Note that shell command 'wc -w' defines a "word" differently, and cannot be used to compare results with this function)

Task 03_base64: Base64 (2 pts)

Objective: Practice file input and output, with some bitwise operations.

Note: This is may be the hardest task in module 5. If you are unsure about how to approach the task, you may want to try exercises 5.4 and 5.5 first, and then come back to this exercise again.

Base64 encoding is applied when binary content needs to be converted into printable format, for example for transmission as E-mail attachment. This exercise requires functions to_base64 (a), that reads an file and writes it to another file in a Base64 encoded format; and from_base64 (b) that does the reverse operation, i.e., reads a Base64 encoded file, and writes it as a decoded binary file. In other words, when you apply to_base64() and from_base64 successively, the latter should produce a file that is an exact copy of the original file passed to to_base64().

The idea of Base64 is that the input file (or generally any string) is represented as 6-bit units that are encoded as printable characters (A-Z, a-z, 0-9, +, /). This can be done by processing the source file as units of three 8-bit numbers (that make 24 consecutive bits), and converting those to four 6-bit numbers (still the same 24 bits as binary format). Each of the 6-bit numbers are represented by a character according to table presented here. The Wikipedia page also has useful diagrams that illustrate the idea of the encoding.

It is possible that the source file length is not divisible by 3. In such case the remaining bits must be set to 0, and the 6-bit characters that were not used are replaced by a padding character ('='). The Wikipedia page gives examples how this is done.

The Wikipedia page also gives additional information, background and examples about the encoding. There are different variants of the encoding, but we will apply the original RFC 1421 format, in other words:

  • Each encoded line must be 64 characters long, except the last line that can be shorter. I.e., after 64 characters, there must be a newline ('\n') character. In this exercise we will use a simple newline, not CRLF, as many other implementations do. The last line does not end in a newline character.

  • All line lengths must be divisible by 4 (not counting the newline character). If needed, padding characters ('=') must be appended to make the line length of the last line correct.

Both functions, to_base64() and from_base64(), return the number of bytes read from the source file, or -1 if there was an error (file not found, etc.)

Hints for implementing to_base64():

  • It is recommended that you start testing your implementation with small source files with simple strings (such as "Man", as used in Wikipedia example), and then gradually expanding your tests before passing your code to TMC.

  • Start from implementing the bitwise operations (bit shift is needed, at least) that modify 3 incoming 8-bit units into 4 outgoing 6-bit units (e.g., if you use 'char' type variables, there will be 4 'char's, where the highest two bits will always be zero in the output format).

  • You will need to convert the 4 6-bit values into a printable character, i.e. the actual number value needs to be assigned a character value. The task template contains the Base64 characters in correct order in string 'encoding' (which can be seen as array of 64 characters). It is most likely useful for you.

  • When bit manipulations and conversion into printable character seem to work, finally add padding and division into 64-character lines.

As always, you can use src/main.c for testing, and modify that file as needed.

Preprocessor

The preprocessor processes the C code before passing it to the actual C compiler that produces the binary object code. The preprocessor, for example, removes the comments from the code, and executes preprocessor directives, such as #include that includes definitions from another header file as part of the compilations. The output of preprocessor is still text-format source code. The preprocessor output can be checked using the gcc compiler with the -E flag.

In the following we take a look at some of the most common preprocessor directives.

Basics

The preprocessor instructions begin with a hash character (#) and usually contain some parameters. So far we have seen mainly one preprocessor instruction, #include, that fetches another file as part of the C source file. In principle #include could be used with any other file, but they are supposed to be used with header files that contain only definitions of data types, constants and functions, and do not produce program code themselves. After the preprocessor phase all preprocessor directives have been replaced by C code that can be compiled by the actual compiler.

A preprocessor directive begins from the start of the hash-marked line and ends at the end of line: each instruction takes exactly one line, and there is no trailing semicolon as in normal C statements. However, for long instructions, a line can be split with backslash (\) character at the end of the line. This means that the preprocessor directive continues on the next line.

One of the most common preprocessor directives is the #define directive that defines a constant that will be replaced in source code with the text given as part of the #define instruction. The format of the #define declaration is: #define NAME some text, that will replace all following instances of NAME by "some text" before the code is compiled. A common use for #define is to define named constants, for example for numbers, that will be used elsewhere in the code. For example as in follows:

1
2
3
4
5
6
7
8
#include <string.h>

#define MAXSTRING 80

int main(void) {
    char str[MAXSTRING];
    strncpy(str, "string", MAXSTRING - 1);
}

In the above example preprocessor replaces the MAXSTRING labels on line 6 and 7 by number 80, before passing the code to the actual C compiler. It is a common convention (but not mandatory) that constants and macros defined using #define use upper case names, to distinguish them from other variables in the code. In some other case it would be possible that a defined macro is transformed into a string, or a small piece of C code. The preprocessor just does the text replacement, and is not concerned with the types of the variables.

(Alternatively, in the above case MAXSTRING could have been defined as constant global variable: const int MAXSTRING = 80;. The difference is, that then the type of the value is clearly defined, and the operation is done by the C compiler, not by the preprocessor).

The #define declaration can be removed by #undef NAME. Following the #undef declaration, the NAME replacement cannot be used in code.

The preprocessor supports conditional statement #if that contains a section of code until #endif. The if conditions and logical operations work as normally in C. In addition, there is #elif declaration for else if, and #else. The behavior of these conditions is much like before with normal C conditional statements, but these are evaluated in preprocessing phase, and not visible during actual compilation. Below is an example of using these.

1
2
3
4
5
6
7
#if (VERSION == 1)
#include "hdr_ver1.h"
#elif (VERSION == 2)
#include "hdr_ver2.h"
#else
#error "Unknown version"
#endif

The #error declaration shown above raises a (compile) error with given message, and the compilation fails at this point. Note that the error is a compile-time error, and the conditions are evaluated before compilation. If we happened to have the right version above, the error will never appear in compiled code.

#define declarations for a name can also be given without a value, just to tell the preprocessor that a particular condition exists. This is commonly used with include guards. The purpose of include guard is to ensure that a particular C source file does not include the same header definitions multiple times, which would cause compile errors. This can sometimes happen, when there are nested include dependencies between multiple header files. #ifdef declaration can be used to test whether a particular name has been defined, regardless of its value. #ifndef is for the opposite test, and is true if a name has not been declared. Here is an example:

1
2
3
4
5
6
#ifndef SOME_HEADER_H  // at the beginning of file
#define SOME_HEADER_H

// some header content

#endif // at the end of the file

The above #ifndef condition is true for the first time a particular header file is included as part of the C source (by #include directive). If the same header is included another time, SOME_HEADER_H is already defined, and the header content is not re-evaluated. In large software is not unusual that a header is included multiple times, because there can be nested header definitions that cause complicated dependencies between them.

The preprocessor also has some readily defined macros that can be used in the C code. These can be useful for debugging purposes:

  • DATE is substituted with the current (compile time) date. This will be evaluated at compile time: if compilation is successful, the date will not change before the next time the program is recompiled.

  • TIME is substituted with the compile time time, with behavior as above.

  • FILE is substituted with the name of the C source file where the macro is located. This could be used, for example, in implementing a common debugging macro (such as assert).

  • LINE is substituted with the line number of the location of the macro. Again, this is useful in conjunction with some debugging macro.

A simple test program could use these in the following way:

1
2
3
4
5
6
#include <stdio.h>

int main(void) {
    printf("This file was compiled on %s at %s\n", __DATE__, __TIME__);
    printf("We are in source file %s on line %d\n", __FILE__, __LINE__);
}

to output:

This file was compiled on Mar 12 2014 at 01:36:14
We are in source file testi2.c on line 5

Macros with parameters

#define macros can also contain parameters. When such macro is used and expanded in code, the parameters are used as part of the expanded code. For example, we could have the following macro definition:

#define GROW_MEM(Var, Size) Var = realloc(Var, Size)

that could be used in the following way:

1
2
3
4
5
int main(void)
{
    char *p = malloc(100);
    GROW_MEM(p, 200);
}

Whenever GROW_MEM macro is applied, the preprocessor replaces it with the given realloc statement, but replacing Var and Size with the parameters given. As this is preprocessor, this is pure text replacement, and the preprocessor does not do any type checks. However, if invalid parameters are given, the C compiler would not be able to compile the code, and produce compilation error.

Task 04_arraytool: Array tool (3 pts)

Objective: Practice use of parametrized macros, for operating with arrays (of generic type).

This exercise does not contain other *.c files than main.c (and the test sources in test directory). Instead, the relevant code you'll need to implement is in src/arraytool.h header, where you need to place the following three macros:

Exercise (a): CHECK(cond, msg) that will check logical condition cond, and if the condition fails, output string msg to standard output stream. This is like the assert macro, but does not terminate the execution of the program if condition fails. Example: CHECK(5 > 10, "5 > 10 failed\n");

Exercise (b): MAKE_ARRAY(type, n) that will create a dynamically allocated array, that contains n obejcts of type type. The macro returns pointer to the allocated memory. Example: void *ptr = MAKE_ARRAY(int, 10);

Exercise (c): ARRAY_IDX(type, array, i) that will access the given array at index i (count starting from 0, as always), with given type. Example: ARRAY_IDX(int, ptr, i) = i * 2;

When the three macros are correctly implemented, src/main.c should allocate an int array for 10 members, initialize it, and print its contents. The main function also demonstrates the use of CHECK macro with failing condition.

Variable length argument lists

Usually the number and type of arguments in C functions are fixed. However, there are situations when the exact number of parameters or their type cannot be determined. The most common example of this is the printf, and other functions that use format specifiers to handle other variables. The C language contains a mechanism for these cases.

A function can be defined with variable length parameter list with the following notation:

int printf(const char *fmt, ... )

The above is the definition of the printf function. As we know by now, the first parameter is always a string. After that there is a variable number of other parameters, for which the type or number is not specified in function defintion. The implementation of printf determines the number of parameters based on the format string and the format specifiers included in it.

The parameter list will be processed using the va_list data type, and using macros va_start, va_arg and va_end. These are defined in the stdarg.h header.

Below is an example of function that calculates average from varying number of floating point numbers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#include <stdio.h>
#include <stdarg.h>

double average(int n, ... )
{
    va_list args;
    double sum = 0;
    va_start(args, n);
    for (int i = 0; i < n; i++) {
        sum += va_arg(args, double);
    }
    va_end(args);
    return sum / n;
}

int main(void)
{
    printf("average: %f\n", average(4, 1.0, 10.0, 0.1, 0.2));
    printf("another: %f\n", average(2, 0.1, 0.3));
}

va_start initializes the handling of the parameter list, and tells the argument after which the variable length list begins. In a function with a variable parameter list there always needs to be at least one fixed parameter.

The function picks the arguments one at a time using the va_arg macro. The macro takes the va_list instance as the first parameter, and the expected data type as the second parameter. The application logic therefore needs to have some way to determine this. In our example this is easy, because we know that all numbers are double type. On the other hand, the printf function determines the type of the next argument based on the format specified types in the format string. After all arguments have been processed, the va_args state needs to be cleaned up using va_end macro.

We can see from the main function that now we can call the average function with different number of parameters, as long as the function knows the number of arguments.

Task 05_myprint: Integer printer (1 pts)

Objective: Learn variable length argument lists

Implement function myprint that prints a variable number of integers to standard output stream following the format indicated by a given format string. The function can take variable number of arguments: the first argument is always a (constant) format string (as in printf), but the number of other arguments depends on the format string. Our output function is a simpler version of printf: it can only print integers. myprint differs from traditional printf by using character & as the format specifier that should be substituted by the integer indicated at the respective position in the argument list. Because we only output integers, this simple format specifier will suffice.

For example, this is one valid way of calling the function: myprint("Number one: &, number two: &\n", 120, 1345);

The function should return an integer, that indicates how many format specifiers (and integer arguments) were included in the given format string.

In this exercise you will get just empty C source files to fill in. Read the above description (and main.c) carefully to figure out how the function prototype should look like. You'll need to modify both myprint.c and myprint.h.

If your implementation works correctly, the default main function (in main.c) should output the following:

Hello!
Number: 5
Number one: 120, number two: 1345
Three numbers: 12 444 5555
I just printed 3 integers

Hint: As a reminder, strchr will return pointer to the next instance of given character from a string, and fputc will output one character at a time. You may or may not want to use these functions.

Function pointers

So far we have considered functions as static entities that have a fixed name and a specified functionality. Because all code is located in its dedicated (read-only) virtual memory area, all functions can be referred to using an address. In fact, the function name can be seen as a global static pointer to the beginning of the function implementation. Pointers to the function implementation can also be stored to a variable of a specific kind. Such variables are called function pointers, and despite their special kind and syntax, they can be handled similarly to any other pointer.

Below are a few examples of a function pointer definition, and the syntax used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <stdlib.h>

int funcAdd(int a)
{
    return a + 1;
}

int main(void)
{
    // The following declares four variables for function pointers
    int (*add_one)(int) = funcAdd;
    void* (*varaa)(size_t);
    void (*vapauta)(void *);
    void* (*varaa_uudestaan)(void *, size_t);
    // above pointers are now uninitialized

    int b = add_one(1);

    // set the pointers to the addresses of functions in C library
    varaa = malloc;
    vapauta = free;
    varaa_uudestaan = realloc;
}

Line 11 declares function pointer 'add_one' that refers to function that returns an int value, and has one argument of int type. In addition, add_one is assigned to point to function funcAdd, that must have matching return value and one argument with matching type with the function pointer.

Similarly, line 12 defines function pointer 'varaa', that returns generic void pointer, and takes size_t - typed value as argument, line 13 defines function pointer 'vapauta', and line 14 defines function pointer 'varaa_uudestaan', that takes too arguments. The latter three function pointers are not yet initialized, and they point to unspecified address.

Line 17 shows an example how a function pointer is used to call a function. Here calling 'add_one' executes function 'funcAdd', and causes integer 'b' to become 2.

Assignment to function pointers works as with any other pointer, as long as the pointed function has consistent type (both return type and arguments) with the respective function pointer. On lines 20-22, the function pointers are assigned a value. 'varaa' is set to point to the malloc() implementation (that has matching return value and argument types), and 'vapauta' and 'varaa_uudestaan' are set to point to free() and realloc() implementations. They are not part of programmer's own source file, but because they have matching types, and they are linked as part of the program from the C library, this can be done.

After the function pointers have been set, the functions can be called using the function pointers similarly to any normal function call. Below is an extension to the above example, where we changed our mind about the 'vapauta' function pointer, and change it to point to our own function named 'just_kidding' instead of the existing C library implementation of free(). Note that the two functions have same argument and return value types. After this we call each of these three functions: in practice we first allocate 100 bytes of memory, then increase the memory block by reallocation to 200 bytes, and finally try to free it. Freeing does not work properly, however, because 'vapauta' points to our mock implementation that just prints out a message instead of freeing anything.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdlib.h>
#include <stdio.h>

void just_kidding(void *ptr)
{
    printf("Did not release the memory block starting at %p\n", ptr);
}

int main(void) {
    void* (*varaa)(size_t);
    void (*vapauta)(void *);
    void* (*varaa_uudestaan)(void *, size_t);
    // above pointers are now uninitialized

    // set the pointers to the addresses of functions in C library
    varaa = malloc;
    vapauta = free;
    varaa_uudestaan = realloc;

    vapauta = just_kidding;

    void *os = varaa(100);  // i.e., malloc
    os = varaa_uudestaan(os, 200);  // i.e., realloc
    vapauta(os);  // i.e., free ..umm.. or actually not
}

Function pointers are like any other data type: they can be passed as function parameters, or used as part of data structures. When used as function parameters, function pointers allow a new degree of flexibility: in addition to just passing static values to functions, we can pass some specific functionality as parameter or structure field. One example of this is the generic qsort function (defined in stdlib.h) that sorts a given array. qsort is given a pointer to function that implements a comparison between two values, as needed by the sort operation. This way the sort algorithm can be used with different data types. The qsort function is defined as follows:

void qsort (void *base, size_t nmemb, size_t size,
            int (*compar)(const void *, const void *));

qsort() operates on array starting from address base, having nmemb members. Each array element is size bytes long (we need to tell the size of one element, because function is generic and uses void pointer). The fourth parameter is function pointer compar, to function that compares two values. The function pointer definition in the parameter list needs to contain return value, name of the parameter that is used inside the function implementation, and parameters of the function. qsort needs this function to put the array elements in order. The array can contain any data type, as long as the function implementing the comparison criteria is defined (and pointed by 'compar*' parameter). Below is an example that uses qsort to reorder an array of names into alphabetical order. The ordering is based on last name; if the last names are same, the first name defines order.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <stdlib.h>
#include <stdio.h>

struct name {
    char *last;
    char *first;
};

int name_compare(const void *a, const void *b)
{
    const struct name *name_a = a;
    const struct name *name_b = b;

    // first compare the last names
    int res = strcmp(name_a->last, name_b->last);
    if (res != 0)
        return res;
    else
        // if last names are same, first names decide order
        return strcmp(name_a->first, name_b->first);
}

int main(void) {
    struct name array[4] = {
        {"Kimalainen", "Kalle"},
        {"Mehilainen", "Maija"},
        {"Ampiainen", "Kerttu"},
        {"Ampiainen", "Antti"}
    };
    qsort(array, 4, sizeof(struct name), name_compare);
    for (size_t i = 0; i < 4; i++) {
        printf("%s, %s\n", array[i].last, array[i].first);
    }
}

Function pointer can be included as a structure member as well. This goes without surprises after the previous examples (structure 'monster' contains a function pointer member 'attack':

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <stdlib.h>
#include <stdio.h>

struct monster {
    char *name;
    int hitpoints;
    int (*attack)(struct monster *, struct monster *); // function pointer
};

int punch(struct monster *me, struct monster *target) {
    int damage = rand() % 5;
    printf("%s punches %s with %d damage\n", me->name, target->name, damage);
        target->hitpoints -= damage;
    return damage;
}

int bite(struct monster *me, struct monster *target) {
    int damage = 20;
    printf("%s bites %s with %d damage\n", me->name, target->name, damage);
        target->hitpoints -= damage;
    return damage;
}

int main(void) {
    struct monster goblin = { "goblin", 20, punch };
    struct monster vampire = { "vampire", 10, bite };

    vampire.attack(&vampire, &goblin);
    goblin.attack(&goblin, &vampire);

    // goblin starts biting as well
    goblin.attack = bite;
    goblin.attack(&goblin, &vampire);
}

As in the case of local variable in the earlier example, the different functions can be called using the function pointers (on lines 28, 29 and 33 above). Only the function call notation is now slightly different, because the function pointer is a member of a structure. Otherwise the function pointer works similarly to any other type, for example in structure initialization (lines 25 and 26) or in assignment (line 32).

Task 05_sheet: Spreadsheet (4 pts)

Objective: Practice use of function pointers. This exercise also revisits various other recent topics such as typedefs, unions, two-dimensional arrays, etc.

This exercise implements a two-dimensional spreadsheet. Each cell in the spreadsheet can be in three possible states: a) unspecified, i.e., the cell has no defined content; b) it can hold a static value of double type; or c) it can hold a function that makes an calculation over specified area of the spreadsheet.

The "function" is specified as a function pointer to your code that performs the intended calculation. All such functions take two coordinates as their arguments: location of the upper left corner of an area, and location of the lower right corner of an area. The functions should return a double-typed return value as a result of their calculation, that is shown as a value of the particular location in the spreadsheet.

There are some readily given functions in the task template, that are only used from src/main. It is recommended that you use src/main to test your code before passing it to TMC tests.

  • parse_command reads a short command from user that can set a given sheet location to a static value or to one of the three functions given in the template. The coordinates are represented as two letters in the beginning of command. For example: "AA 6" sets the upper left coordinate to value 6, and "BA sum CC EE" sets location (1,0) to show the sum over a 3x3 area between coordinates (2,2) and (4,4).

  • print_sheet outputs the content of the current spreadsheet.

Note that in order to either of the above functions to work correctly, you'll need to implement a few other functions they depend on.

You'll need to implement this exercise in the following phases (in this respective order):

(a) Creating and releasing the spreadsheet

Implement the following functions:

  • create_sheet that allocates the memory needed for the spreadsheet, i.e., the Sheet structure, and the two-dimensional array.

  • free_sheet that releases all memory allocated by create_sheet()

  • get_cell that accesses one cell with given coordinates in the spreadsheet and returns a pointer to that. get_cell() should be safe against indexing out of bounds: if invalid coordinates are given, it should return NULL (instead of Valgrind errors).

Note that the tests use these functions in all of the following parts, so they need to be implemented first.

(b) Set content for the spreadsheet

Implement the following functions:

  • set_value that sets the given location in the spreadsheet to a double-typed constant value.

  • set_func that sets the given location in the spreadsheet to contain the given function and its parameters.

Both functions should be safe against indexing out of bounds. In such case, they should do nothing.

(c) Evaluate cell content

Implement function eval_cell that returns double-typed value based on the cell content. If cell type is VALUE, the function should return this value. If cell type is FUNC, the function should call the function associated with the cell, and return the value returned by the function. If cell type is UNSPEC, or if the caller is indexing out of bounds, the function should return constant NAN (not-a-number), which is defined in the 'math.h' header. (Note: if you need to test whether a value is NAN, you should use isnan() macro.)

(d) Three spreadsheet functions

Implement functions for calculating the maximum value over an area, a sum over an area, and count of specified cells over an area as follows:

  • maxfunc will return the largest value inside an area with upper left corner and lower right corner as given by arguments. If the area contains unspecified cells, or out-of-bound coordinates, they should be ignored.

  • sumfunc will return the sum of values inside an area with upper left corner and lower right corner as given by arguments. If the area contains unspecified cells, or out-of-bound coordinates, they should be ignored.

  • countfunc that will return the number of cells within the given area that contain specified content (value or function).

These functions are called by the eval_cell() function as needed based on spreadsheet content.

Note that in all three above functions the target area may contain either values or functions. For cells that contain functions, the result of the given function needs to be used in calculation. In other words, the functions themselves may need to use eval_cell() and subsequently call other functions.

Some useful functions

Operations on characters

The ctype.h header contains functions for testing and handling the characters in the system. The functions can be used, for example, for testing whether a character is one of the alphabetical charcters, a number, or a whitespace character. The ctype.h functions can also be used to convert characters between uppercase and lowercase characters. The functions also take into account the localization settings (locales), and can operate correctly on scandinavian umlauts, which are not part of the traditional ASCII character set. A selection of functions for testing various conditions on a character are listed shortly below, and you can essentially find the same descriptions, along with some additional functions, from the related man pages. All functions return non-zero (true) value if the character belongs to the tested group of characters, or zero (false) if it does not. All functions take one character as a parameter.

  • isalpha tests whether a charater is alphabetic character.

  • isdigit tests whether a character is a digit (from 0 to 9).

  • isblank tests whether a character is space or tab.

  • isspace tests whether a character is space, tab, or any of the other whitespace characters, such as newline.

  • isalnum tests whether a character is alphabetic or digit (isalpha || isdigit).

  • islower tests whether a character is a lower case alphabetic character.

  • isupper tests whether a character is an upper case alphabetic character.

Below is an example of the use of isalpha function. The other functions listed above work similarly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <ctype.h>

unsigned int countLetters(const char *string)
{
    unsigned int count = 0;
    do {
        if (isalpha(*string))
            count++;
    } while (*string++);
    return count;
}

int main(void)
{
    char *str = "abc 123 DEF";
    printf("letters: %u\n", countLetters(str));
}

In addition, there are functions to convert a character to upper or lower case character. toupper(int ch) converts an alphabetic character to upper case and returns the converted character, or if conversion was not possible (e.g., for non-alphabetic character), it returns the character itself. tolower(int ch) converts and character to lower case with similar logic.

Operations on strings

We have used printf and scanf starting from the first module, and this module showed that these functions are just a special case of fprintf and fscanf, operating specifically on stdout and stdin I/O streams. With fprintf and fscanf, formatted output and input could be targeted to any I/O stream or file.

Yet another variant of formatted input and output are sscanf and sprintf functions, that take formatted input from given string, or produce formatted output to the given string. The function interfaces are as follows:

  • int sprintf(char *str, const char *format, ...) is similar to the other printf variants, but the first parameter str is an array of characters allocated by the caller. The formatted output will be written to this buffer instead of any output I/O stream.

  • int sscanf(const char *str, const char *format, ...) reads formatted input from string str, instead of an input I/O stream, and sets the given arguments accordingly, as with the other scanf variants. This function could also be used to convert strings to integers (with '%d' format specification).

  • long int strtol(const char *nptr, char **endptr, int base) (defined in stdlib.h) converts string nptr to integer that is returned by the function. The string is assumed to contain numbers, and parsing stops to first character that is not a number. endptr is pointer to a char * type variable. When function returns, this variable will point to the first character that was not a number in string. If *endptr points to the same location as nptr the string did not contain valid numbers. base will tell the numeric system to be read: for (the most common) decimal numbers it should be 10.

  • atoi is a simpler function for converting strings to decimal numbers. Its use is not recommended, however, because there is no way to detect if reading a number failed.

Mathematical functions

Many mathematical functions are defined in math.h header. The functions operate on double-typed floating point numbers. To use these functions, -lm parameter needs to be added to the gcc command line (linking the code with mathematical library), for example in the Makefile used for building the executable. Below is a selection of functions with brief descriptions. More detailed information can be found, again, in man pages.

  • round rounds a number to the nearest integer.

  • ceil rounds a number to the next higher integer.

  • floor rounds a number to the next lower integer.

  • pow raises the value given as parameter to a given power.

  • sqrt calculates square root of a value.

  • fabs returns the absolute value of given value.

  • exp calculates base-e exponential function, raised to the power given.

  • log calculates natural logarithm of given value.

  • cos calculates cosine of given value (in radians).

  • sin calculates the sine of given value (in radians).

  • tan calculates the tangent of given value (in radians).

In addition, the stdlib.h header contains some mathematical functions for integers. In addition, it contains functions for generating pseudorandom numbers: rand() returns a pseudo-random integer between 0 and RAND_MAX (a large number). The pseudorandom generator can be initialized with a given seed using the srand(seed) call.