New: sample
final exam, extra notes on numerical modelling of
data, etc. A program showing linear
fitting to data.
Example mid-session
questions. Solutions to Q11, 12,
and 13.
Zipfile
containing all the on-line notes (last updated: 13 Jun 2004) (you
can download this to your computer, and browse the contents off-line).
Student feedback April 2004,
and
my responses.
These notes are a work in progress, and will be updated regularly. The
complete notes will be handed out in
hardcopy form in week 14.
The course syllabus, assessment details, etc are available
on-line.
Please have a look at the preliminary marks to date, and make sure that there is nothing missing. Contact Michael Ashley ASAP if there is.
IMPORTANT: if you are finding the course difficult, see me (Michael Ashley, Room 129 Old Main Building, phone 9385 5465, e-mail m.ashley@unsw.edu.au) for assistance. If there are concepts you are having difficulty with, please let me know as soon as possible so that I can cover them in lectures and provide additional notes on the web.
Also, to gain maximum benefit from the course, you should TURN UP TO LECTURES! It is NOT sufficient to simply download these notes from the web. The lectures contain additional value. I also spend considerable time answering questions from the students, covering exam questions, giving hints on the assignments, and reviewing the answers to past assignments.
Lectures often have an "aside" where I talk for 10-15 minutes on some interesting (to me, hopefully to you as well) non-examinable topic related to the use of computers in physics. Some of these asides are mentioned below.
In 2004 the lectures were as follows:
Week Date Topics
----------------------------------------------------------------------------
1 Mar Introduction to the various ways that physicists use
computers. Overview of computer languages, including C, C++, C#,
FORTRAN, perl, python, Java, IDL, Mathematica, Maple.
5 Mar GNU/Linux, bit/bytes/words, binary/octal/hexadecimal, I/O redirection
8 Mar Assignments, how they work, what is expected
12 Mar cygwin, online manual pages, more on assignments
15 Mar floating point, operations, AND, OR, XOR
19 Mar operations, assignment operators (e.g., +=)
22 Mar integer arithmetic, arrays, functions
26 Mar (Aside: first results from MASS) for loops, while loops, printf,
static/automatic variables, scope of variables.
5 Apr No lecture.
19 Apr (Aside: seeing from Dome C), random numbers - why they are useful -
how to generate them, structures, using the gdb debugger.
23 Apr The Mandelbrot set, complex numbers, using Imagemagick.
26 Apr Public holiday.
30 Apr Solutions to the example mid-session exam.
3 May Mid-session exam.
7 May Quasi-random numbers, the Sobel sequence, Gnuplot,
sorting numbers, qsort.
10 May Answers to assignments 5 & 6. Bug in original sobel.c.
The effect of array-out-of-bounds errors. John Horton
Conway's Game of Life. Life32.
14 May Answers to the mid-session exam questions.
17 May 1D cellula automata.
21 May Command-line arguments.
24 May Gravity.
28 May Gravity; Numerical integration; midpoint method; 4th order Runge-Kutta.
31 May Statistical analysis of data; moments of distributions; means,
variance, standard-deviations, average deviation; skew, kurtosis;
rounding errors when calculating the variance.
4 Jun Answer to assignment 9; API - application programming interface;
dynamic allocation of memory at run-time - malloc; more on pointers.
7 Jun Median; chi-square minimisation; least-square; linear
fitting; polynomial fitting; generalised fitting.
11 Jun Notes handed out; GNU readline; exam discussion.
In 2003 the lectures were as follows:
Week Date Topics
----------------------------------------------------------------------------
1 3 Mar Introduction to the various ways that physicists use
computers. Overview of computer languages, including C, C++, C#,
FORTRAN, perl, python, Java, IDL, Mathematica, Maple.
7 Mar hello.c, compiling, make, printf.
2 10 Mar Introduction to PHYS2020 assignments, C data types: integers,
floating point, chars, character strings. C operators: unary and
binary.
14 Mar for, while, loops. The circle program.
3 17 Mar (Aside: the AASTINO experiment). Bytes, words, floating point
representation.
21 Mar Mandelbrot program. Floating-point "for" loops. Infinite loops.
The math library, libm.a.
4 24 Mar Arrays.
28 Mar Pointers. GNU debugger.
5 31 Mar (Aside: gamma ray bursts). GRB030329. Image analysis.
4 Apr scanf, finding the min/max of an array.
6 7 Apr GNU readline, assert, random numbers,
11 Apr Random numbers, sorting, qsort.
7 14 Apr Gnuplot and tutorial with Anthony Frith.
8 28 Apr Numerical integration, 2-body gravitation, C structures,
example questions for the mid-session exam.
2 May Mid-session exam.
9 5 May (Aside: the evaporation of globular clusters). Numerical
integration, 2-body gravitation.
9 May Numerical integration, 2-body gravitation,
Euler and midpoint approximations.
10 12 May 1D cellular automata, Stephen Wolfram's A New Kind of Science,
command-line arguments.
16 May Mid-term exam solutions Q1-Q3.
11 19 May (Aside: NOAA weather satellite imagery). Mid-term exam
solutions Q4-Q9.
23 May Mid-term exam solutions Q10. 2D cellular automata,
Conway's Game of Life.
12 26 May Life32, different rules for 2D cellular automata,
(Aside: Turing machines, artificial intelligence),
speeding-up our version of Conway's Game of Life,
solution to Assignment 9.
30 May (Aside: Tierra artificial life program by Thomas Ray),
finding gliders automatically in the Game of Life, the roots of a
quadratic equation: subtle issues with rounding errors and
handling special cases.
13 2 Jun Subtleties in calculating standard deviations; average deviations;
the switch statement; allocating memory.
6 Jun Simulations of gas particles, xgas, writing our own code to
track the motion of particles hitting walls.
14 9 Jun Public holiday
C evolved from a language called B, written by Ken Thompson at Bell Labs in 1970. Ken used B to write one of the first implementations of UNIX. B in turn was a decendant of the language BCPL (developed at Cambridge (UK) in 1967), with most of its instructions removed.
In fact, so many instructions were removed in going from BCPL to B, that Dennis Ritchie of Bell Labs put some back in (in 1972), and called the language C.
The famous book The C Programming Language was written by Kernighan and Ritchie in 1978, and has remained the definitive reference book on C.
The original C was still too limiting, and not standardized, and so in 1983 an ANSI committee was established to formalise the language definition.
By 1993 the ANSI standard became well accepted and almost
universally supported by compilers. The GNU C compiler, gcc, has
become, in many ways, the standard implementation of C.
C++ is a superset of C, incorporating many object-oriented programming ideas. However, it is not particularly widely used. Java is a better choice in most cases.
C# is a Java-like language developed by Microsoft. Enough said.
For coverage of typical problems encountered with C, have a look at the Comp.lang.c FAQ.
For a rather good tutorial introduction to C, check out this document by Gordon Dodrill.
Before reading further, have a look at the generic Hello World, program written in
C.
But before we get too excited and start to write really useful
programs, there are a few fundamental concepts that you will need to
understand first.
The following table compares the various numbering systems (note that you can't directly specify a binary constant in the C programming language); octal constants are distinguished by a leading zero; hexadecimal constants by a leading "0x"):
Decimal Binary Octal Hexadecimal
--------------------------------
0 0000 000 0x0
1 0001 001 0x1
2 0010 002 0x2
3 0011 003 0x3
4 0100 004 0x4
5 0101 005 0x5
6 0110 006 0x6
7 0111 007 0x7
8 1000 010 0x8
9 1001 011 0x9
10 1010 012 0xa
11 1011 013 0xb
12 1100 014 0xc
13 1101 015 0xd
14 1110 016 0xe
15 1111 017 0xf
Conversion between binary, octal, and hexadecimal is fairly trivial using the above table. For example, to convert the binary number 11100101010 to octal, break the number into groups of three bits starting from the least significant bit (i.e., 11 100 101 010), then convert each three-bit group to octal (i.e., 3452) and add a leading 0 to tell C it is octal (i.e., 03452). To convert the number to hexadecimal, break it into groups of four bits (i.e., 111 0010 1010), convert each group (i.e., 72a) and add a leading "0x" (i.e., 0x72a).
To convert between octal and hexadecimal, it is probably easiest to go via binary.
To convert from binary to decimal, note that n-th bit from the least-significant is worth 2 to the power n. So,
11100101010
\\\\\\\\\\\_ 0 x 1 = 0
\\\\\\\\\\_ 1 x 2 = 2
\\\\\\\\\_ 0 x 4 = 0
\\\\\\\\_ 1 x 8 = 8
\\\\\\\_ 0 x 16 = 0
\\\\\\_ 1 x 32 = 32
\\\\\_ 0 x 64 = 0
\\\\_ 0 x 128 = 0
\\\_ 1 x 256 = 256
\\_ 1 x 512 = 512
\_ 1 x 1024 = 1024
Total = 1832
./a.out | standard input comes from the keyboard |
./a.out < file | standard input comes from a file called "file" |
command | ./a.out | standard input comes from the standard output of "command" |
./a.out | standard output appears on the computer screen as text |
./a.out > file | standard output is sent to the file called "file" |
./a.out | command | standard output is sent to the standard input of "command" |
./a.out | standard error appears on the
computer screen as text, intermingled with the standard output |
./a.out 2> file | standard error is sent to the file called "file" |
./a.out 2>&1 | command | standard error (and standard
output) is sent to the standard input of "command" |
./a.out
2.1 < data.txt > results.txt |
"2.1" is an argument, standard
input comes from "data.txt", standard output goes to "results.txt" |
./a.out
> results.txt 2> error.txt |
there are no arguments, standard
input is from the keyboard, standard output to "results.txt" and standard error to "error.txt" |
A character constant in C is a single character (or an escape
sequence such as \n) enclosed in single quotes, e.g., 'A'. It
occupies one byte of storage.
The value of a character constant is the numeric value of the character in the computer's character set (e.g., 'A' has the value 65). In 99.99% of cases this is the ASCII character set, but this is not defined by the C standard!
But how do you represent a character such as a single quote itself, or one that doesn't print (such as the ASCII BEL character)? The answer is to use an escape sequence.
For reference, here is a program to print out all the special escape sequences. Have a look at it now.
In addition, you can specify any 8-bit ASCII character using either \ooo or \xhh where `ooo' is an octal number (with from 1 to 3 digits), and 'xhh' is a hexadecimal number (with 1 or 2 digits). For example, \x20 is the ASCII character for SPACE, so ' ' and '\x20' are identical.
String constants are a sequence of zero or more characters, enclosed in double quotes. For example, "test", "", "this is an invalid string" are all valid strings (you can't always believe what a string tells you!). String constants are stored in the computer's memory as a sequence of numbers (usually from the ASCII character set), and are terminated by a null byte (\0). So, "test" would appear in memory as the numbers 116, 110, 115, 116, 0.
A consequence of the null byte terminator is that a C character string can never contain a null byte. In practice, this isn't often a limitation.
We have already used some examples of strings in our programs, e.g, "hello world\n" was a null-terminated character string that we sent to the printf function above.
The above program also shows how to add comments to your C program.
To make your program more readable, it is good practice to include comments. These do not influence the operation of the program, nor are they printed out when the program runs, they are simply to document things that you think are important.
Comments are delimited by `/*' and `*/', or from '//' to the end of the line. Any text between the first `/*' and the next `*/' is ignored by the compiler, irrespective of the position of line breaks. Note that this means that comments do not nest. Here are some examples of the valid use of comments:
/* This is a comment */
// and so is this
/* here is another one
that spans two lines */
i = /* a big number */ 123456;
Here are some problems that can arise when using comments:
i = 123456; /* a comment starts here
i = i + 2; this statement is also part of the comment */
/* this is a comment /* and so is this */ but this will generate an error */
The fact that comments don't nest is a real nuisance if you want to comment-out a whole section of code. In this case, it is simplest to use pre-processor directives to do the job. For example:
i = 123456;
#if 0 // The next two lines are ignored by the compiler.
i = i + 2;
j = i * i;
#endif
C source files contain the C program that you write. The files invariably end with the suffix ".c". You compile them with the C compiler to produce...
Have a ".o" suffix, and are the result of compiling a source file. They contain the machine language equivalent of the C program, but they are not ready to execute yet, since they need to be linked against libraries of other C object files containing any functions that are used, but not coded for, in the source file (e.g., the printf function). Linking can be done for you by the C compiler, or can be a separate step if you wish. After linking, you have an...
This file is ready to run, by typing its name at the command prompt.
The idea behind make is that you should be able to compile a program simply by typing make (followed by ENTER). For this to work, you have to tell the computer how to build your program, by storing instructions in a Makefile.
While this step takes some effort, you are amply repaid by the time savings later on. make is particularly useful for large programs that consist of numerous source files: make only recompiles the file(s) that need recompiling (it works out which ones to process by looking at the dates of last modification).
Here is an example of a simple Makefile:
test: test.o; gcc test.o -o testNotes:
OBJS = main.o sub1.o sub2.oIt is well worth learning a little bit about make since it can save a lot of time!
main: $(OBJS); gcc $(OBJS) -o main
cd
mkdir hello
cd hello
cat > hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("hello world\n");
return 0;
}
EOF
cat > Makefile <<EOF
hello: hello.o; gcc hello.o -o hello
EOF
make
./hello
NOTE: the use of "cat > hello.c <<EOF" above is purely a quick way of entering the source code into the file hello.c. You would normally do this using an editor (such as emacs, vi, pico, nano, nedit, xedit, gedit).
/* A program to print out various machine-dependent constants */
/* Michael Ashley / UNSW / 03-Mar-2003 */
#include <stdio.h> /* for printf definition */
#include <limits.h> /* for CHAR_MIN, CHAR_MAX, etc */
#include <float.h> /* for FLT_DIG, DBL_DIG, etc */
int main(void) {
printf("char %d bytes %d to %d \n", sizeof(char ), CHAR_MIN, CHAR_MAX );
printf("unsigned char %d bytes %d to %d \n", sizeof(unsigned char ), 0 , UCHAR_MAX );
printf("short %d bytes %hi to %hi \n", sizeof(short ), SHRT_MIN, SHRT_MAX );
printf("unsigned short %d bytes %hu to %hu \n", sizeof(unsigned short), 0 , USHRT_MAX );
printf("int %d bytes %i to %i \n", sizeof(int ), INT_MIN , INT_MAX );
printf("unsigned int %d bytes %u to %u \n", sizeof(unsigned int ), 0 , UINT_MAX );
printf("long %d bytes %li to %li \n", sizeof(long ), LONG_MIN, LONG_MAX );
printf("unsigned long %d bytes %lu to %lu \n", sizeof(unsigned long ), 0 , ULONG_MAX );
printf("float %d bytes %e to %e \n", sizeof(float ), FLT_MIN , FLT_MAX );
printf("double %d bytes %e to %e \n", sizeof(double ), DBL_MIN , DBL_MAX );
printf("precision of float %d digits\n", FLT_DIG);
printf("precision of double %d digits\n", DBL_DIG);
return 0;
}
char 1 bytes -128 to 127
unsigned char 1 bytes 0 to 255
short 2 bytes -32768 to 32767
unsigned short 2 bytes 0 to 65535
int 4 bytes -2147483648 to 2147483647
unsigned int 4 bytes 0 to 4294967295
long 4 bytes -2147483648 to 2147483647
unsigned long 4 bytes 0 to 4294967295
float 4 bytes 1.175494e-38 to 3.402823e+38
double 8 bytes 2.225074e-308 to 1.797693e+308
precision of float 6 digits
precision of double 15 digits
We have already seen how to write character constants and strings. Let's now look at other types of constants:
int 123, -1, 2147483647, 040 (octal), 0xab (hexadecimal)
unsigned int 123u, 2147483648, 040U (octal), 0X02 (hexadecimal)
long 123L, 0XFFFFl (hexadecimal)
unsigned long 123ul, 0777UL (octal)
float 1.23F, 3.14e+0f
double 1.23, 2.718281828
long double 1.23L, 9.99E-9L
Note:
It pays to be very careful when specifying numbers, to make sure that you do it correctly, particularly when dealing with issues of precision. For example, have a look at this program:
#include <stdio.h>
int main(void) {
double r;
r = 1.0 + 0.2F;
r = r - 1.2F;
printf("%22.16e\n", r);
return 0;
}
The program gives at output of "-4.4703483581542969e-08", not zero as you might expect.
The lesson to be learnt here is when writing constants, always think carefully about what type you want them to be, and use the suffixes "U", "L", and "F" to be explicit about it. It is not a good idea to rely on the compiler to do what you expect. Don't be surprised if different machines/compilers give different answers if you program sloppily.
In C, as in all computer languages, there are rules that the compiler uses when a program mixes integers and floating point numbers in the same expression.
Let's look at what happens if you assign a floating point number to an integer variable:
#include <stdio.h>
int main(void) {
int i, j;
i = 1.99;
j = -1.99;
printf("i = %d; j = %d\n", i, j);
return 0;
}
This program produces the result "i = 1; j = -1". Note that the floating point numbers have been truncated and not rounded.
When converting integers to floating-point, be aware that a "float" has fewer digits of precision than an "int", even though they both use 4 bytes of storage (on normal PCs). This can result in some strange behaviour, e.g.,
#include <stdio.h>
int main(void) {
unsigned int i;
float f;
i = 4294967295; /* the largest unsigned int */
f = i; /* convert it to a float */
printf("%u %20.13e %20.13e\n", i, f, f - i);
return 0;
}
This program produces the following output when compiled with "gcc":
4294967295 4.2949672960000e+09 1.0000000000000e+00
Rather than rely on automatic type-conversion, you can be explicit about it by using a type-cast operator, e.g.,
f = (float)i;
This converts the number or variable or parethesised expression immediately to its right, to the indicated type. It is a good idea to use type-casting to ensure that you leave nothing to chance.
Some of the operators we have already seen (e.g., "sizeof()"), others are very simple (e.g., +, -), others are really neat (e.g., ~, !), others are useful for adding/subtracting 1 automatically (e.g., ++i, --i, i++, i--), and the rest involve pointers and addressing, which will be covered in detail later.
sizeof(i) the number of bytes of storage allocated to i
+123 positive 123
-123 negative 123
~i one's complement (bitwise complement)
!i logical negation (i.e., 1 if i is zero, 0 otherwise)
*i returns the value stored at the address pointed to by i
&i returns the address in memory of i
++i adds one to i, and returns the new value of i
--i subtracts one from i, and returns the new value of i
i++ adds one to i, and returns the old value of i
i-- subtracts one from i, and returns the old value of i
i[j] array indexing
i (j) calling the function i with argument j
i.j returns member j of structure i
i->j returns member j of structure pointed to by i
Here is a list. All the usual operators that you would expect are there, with a whole bunch of interesting new ones.
+ additionNote:
- subtraction
* multiplication
/ division
% remainder (e.g., 2%3 is 2), also called 'modulo'
<< left-shift (e.g., i<<j is i shifted to the left by j bits)
>> right-shift
& bitwise AND
| bitwise OR
^ bitwise exclusive-OR (XOR)
&& logical AND (returns 1 if both operands are non-zero; else 0)
|| logical OR (returns 1 if either operand is non-zero; else 0)
< less than (e.g., i<j returns 1 iff i is less than j)
> greater than
<= less than or equal
>= greater than or equal
== equals
!= does not equal
? conditional operator, explained later...
Left argument |
Right
argument |
AND |
OR |
XOR |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
Left argument |
Right
argument |
& |
| |
^ |
&& |
|| |
0011 |
0101 |
0001 |
0111 |
0110 |
1 |
1 |
1110 |
1010 |
1010 |
1110 |
0100 |
1 |
1 |
0000 |
1111 |
0000 |
1111 |
1111 |
0 |
1 |
= assignment
+= addition assignment
-= subtraction assignment
*= multiplication assignment
/= division assignment
%= remainder/modulus assignment
&= bitwise AND assignment
|= bitwise OR assignment
^= bitwise exclusive OR assignment
<<= left shift assignment
>>= right shift assignment
So, for example, "i += j" is equivalent to "i = i + j". The advantage of the assignment operators is that they can reduce the amount of typing that you have to do, and make the meaning clearer. This is particularly noticeable when, instead of a simple variable such as "i", you have something complicated such as "position[wavelength + correction_factor * 2]";
The thing to the left of the assignment operator has to be something where a result can be stored, and is known as an "lvalue" (i.e., something that can be on the left of an "="). Valid "lvalues" include variables such as "i", and array expressions. It doesn't make sense, however, to use constants or expressions to the left of an equals sign, so these are not "lvalues".
For example,
Example Equivalent to
------- -------------
i = ((j = 2), 3); i = 3; j = 2;
myfunct(i, (j = 2, j + 1), 1); j = 2; myfunct(i, 3, 1);
The comma operator has the lowest precedence, so it is always executed last when evaluating an expression. Note that in the example given comma is used in two distinct ways inside an argument list for a function.
Both the above examples are artificial, and not very useful. The comma operator can be useful when used in "for" and "while" loops as we will see later.
The precedence of an operator gives the order in which operators are applied in expressions: the highest precedence operator is applied first, followed by the next highest, and so on.
The associativity of an operator gives the order in which expressions involving operators of the same precedence are evaluated.
The following table lists all the operators, in order of precedence, with their associativity:
Operator Associativity
-------- -------------
() [] ->> . left-to-right
- + ++ -- ! ~ * & sizeof (type) right-to-left (note: unary +, -, *, and &)
* / % left-to-right
+ - left-to-right
<< >> left-to-right
< <= > >= left-to-right
== != left-to-right
& left-to-right
^ left-to-right
| left-to-right
&& left-to-right
|| left-to-right
?: right-to-left
= += -= *= /= %= &= ^= |= <<= >>= right-to-left
, left-to-right
Note: the unary operators + (introducing a positive number), - (introducing a negative number), * (pointer dereferencing), and & (address of); appear twice in the above table. The unary forms (on the second line) have higher precedence that the binary forms.
Operators on the same line have the same precedence, and are evaluated in the order given by their associativity.
To specify a different order of evaluation you can use parentheses. In fact, it is often good practice to use parentheses to guard against making mistakes in difficult cases, or to make your meaning clear.
It is possible to write C expressions that give different answers on different machines, since some aspects of expression-evaluation are not defined by the ANSI standard. This is deliberate since it gives the compiler writers the ability to choose different evaluation orders depending on the underlying machine architecture. You, the programmer, should avoid writing expressions with side effects.
Here are some examples:
myfunc(j, ++j); /* the arguments may be the same, or differ by one */
array[j] = j++; /* is j incremented before being used as an index? */
i = f1() + f2(); /* the order of evaluation of the two functions
is not defined. If one function affects the
results of the other, then side effects will
result */
A useful aspect of C is that it guarantees the order of evaluation of expressions containing the logical AND (&&) and OR (||) operators: it is always left-to-right, and stops when the outcome is known. For example, in the expression "1 || f()", the function "f()" will not be called since the truth of the expression is known regardless of the value returned by "f()".
It is worth keeping this in mind. You can often speed up programs by rearranging logical tests so that the outcome of the test can be determined as soon as possible.
Another good example is an expression such as "i >= 0 &&
i <n && array[i] == 0". This will guarantee that
the index into "array" is within legal bounds (assuming the array has
"n" elements).
Incidentally, the "bash" shell
uses "&&" and "||" in similar way to C. For example, the shell
command line:
cd /home/fred && ./a.out > output && mv output /home/fred/results/will first change the current working directory to "/home/fred". If (and only if) this succeeds, the program "./a.out" would be run, sending its output to the file "output". If (and only if) the program was successful (i.e., it had an exit code of zero), the output file would be moved to the directory "/home/fred/results/".
auto double int struct
break else long switch
case enum register typedef
char extern return union
const float short unsigned
continue for signed void
default goto sizeof volatile
do if static while
Prototypes for the math functions are in the system include-file "math.h", so you should put the line
#include <math.h>in any C source file that calls one of them.
Here is a list of the math functions defined by the ANSI standard:
sin(x) sine of xIn the above table, "x", and "y" are of type "double", and "n" is an "int". All the above functions return "double" results.
cos(x) cosine of x
tan(x) tan of x
asin(x) arcsine of x, result between -pi/2 and +pi/2
acos(x) arccosine of x, result between 0 and +pi
atan(x) arctan of x, result between -pi/2 and +pi/2
atan2(y,x) arctan of (y/x), result between -pi and +pi
hsin(x) hyperbolic sine of x
hcos(x) hyperbolic cosine of x
htan(x) hyperbolic tan of x
exp(x) exponential function
log(x) natural logarithm
log10(x) logarithm to base 10
pow(x,y) x to the power of y (x**y in FORTRAN)
sqrt(x) the square root of x (x must not be negative)
ceil(x) ceiling; the smallest integer not less than x
floor(x) floor; the largest integer not greater than x
fabs(x) absolute value of x
ldexp(x,n) x times 2**n
frexp(x, int *exp) returns x normalized between 0.5 and 1; the exponent of 2 is in *exp
modf(x, double *ip) returns the fractional part of x; the integral part goes to *ip
fmod(x,y) returns the floating-point remainder of x/y, with the sign of x
C libraries may also include "float" versions of the above. For
example, "fsin(x)" takes a float argument and returns a float result.
The basic looping construct in C is the "for" loop.
Here is the syntax of the "for" statement:
for (initial_expression; loop_condition; loop_expression) statement;
and this is precisely equivalent to the following code fragment:
initial_expression;
while (loop_condition) {
statement;
loop_expression;
}
An example will, hopefully, clear this up:
for (i = 0; i < 100; i++) printf("%i\n", i);
which simply prints the first 100 integers. If you want to include more that one statement in the loop, use curly brackets to delimit the body of the loop, e.g.,
for (i = 0; i < 100; i++) {
j = i * i;
printf("i = %i; j = %i\n", i, j);
}
And here is a more complex example, illustrating nested loops, with an if thrown in for good measure:
ASCII circle printer.And here is another program illustrating the power of loops: Mandelbrot image generator
gcc -o prog prog.c -lm
The "-l" switch stands for "library", which means that the specified library of pre-compiled C routines is searched in order to satisfy any external references from your program "prog.c". The library that is searched in this case is "libm.a" and the path that is used for the search is the default library search path, which includes the directory "/usr/lib", where "libm.a" is found.
To use a library in a directory that is not part of the default library search path, you use the "-L" switch. For example, to search the library "/usr/users/smith/libnano.a", you would use
gcc -o prog prog.c -L/usr/users/smith -lnano
Note: the order of the switches is important. External references are only searched for in libraries to the right of the reference. So, if you have two libraries that call each other, then you need to do something like the following:
gcc -o prog prog.c -L/usr/users/smith -llib1 -llib2 -llib1
Here is a simple example of calling the math library:
#include <stdio.h>
#include <math.h>
int main(void) {
const double pi = 3.14159265358979323846;
double e, d = pi/2;
e = sin(d);
printf("The sine of %f is %f\n", d, e);
return 0;
}
This program produces the result:
The sine of 1.570796 is 1.000000
Aside: the file /usr/include/math.h defines M_PI to be the double precision value of pi. So, use this rather than typing the value of pi yourself.
The last statement in the body of statements in a "for" loop (or, in fact, in any other compound statement) must be terminated with a semicolon.
For example,
for (i = 0; i < 10; i++) {
x = i * i;
x += 2; /* the semicolon is required here */
} /* do not use a semicolon here */
You can create variables that are local to a compound statement by declaring the variables immediately after the leading curly bracket.
Variables in C belong to one of two fundamental storage classes: "static" or "automatic".
A static variable is stored at a fixed memory location in the computer, and is created and initialised once when the program is first started. Such a variable maintains its value between calls to the block (a function, or compound statement) in which it is defined.
An automatic variable is created, and initialised, each time the block is entered (if you jump in half-way through a block, the creation still works, but the variable is not initialised). The variable is destroyed when the block is exited.
Variables can be explicitly declared as "static" or "auto" by using these keywords before the data-type definition. If you don't use one of these keywords, the default is "static" for variables defined outside any block, and "auto" for those inside a block.
Actually, there is another storage class: "register". This is like "auto" except that it asks the compiler to try and store the variable in one of the CPU's fast internal registers. In practice, it is usually best not to use the "register" type since compilers are now so smart that they can do a better job of deciding which variables to place in fast storage than you can.
Variables can be qualified as "const" to indicate that they are really constants that can be initialised, but not altered.
Variables can also be termed "volatile" to indicate that their value may change unexpectedly during the execution of the program (e.g., they may be hardware registers on a PC, able to be altered by external events). By using the "volatile" qualifier, you prevent the compiler from optimising the variable out of loops.
Variables (and functions) can also be classified as "extern", which means that they are defined external to the current block (or even to the current source file). An "extern" variable must be defined once (and only once) without the "extern" qualifier.
As an example of an "extern" function, all the functions in "libm.a" (the math library) are external to the source file that calls them.
#include <stdio.h>
int i; /* i is static, and visible to the entire program */
extern j; /* j is static, and visible to the entire program */
static int j; /* k is static, and visible to the routines in this source file */
void func(void) { /* i.e., a function that takes no arguments, and
doesn't return a value */
int m = 1; /* m is automatic, local to this function, and initialised
each time the function is called */
auto int n = 2; /* n is automatic, local to this function, and initialised
each time the function is called */
static int p = 3; /* p is static, local to this function, and initialised
once when the program is first started up */
extern int q; /* q is static, and defined in some external module */
for (i = 0; i < 10; i++) {
int m = 10; /* m is automatic, local to this block, and initialised
each time the block is entered */
printf("m = %i\n", m);
}
}
A variable is initialised by equating it to a constant expression on the line in which it is defined. For example
int i = 0;
"static" variables are initialised once (to zero if not explicitly initialised), "automatic" variables are initialised when the block in which they are defined is entered (and to an undefined value if not explicitly initialised).
The "constant expression" can contain combinations of any type of constant, and any operator (except assignment, incrementing, decrementing, function call, or the comma operator), including the ability to use the unary "&" operator to find the address of static variables.
Here are some valid examples:
#include <stdio.h>
#include <math.h>
int i = 0;
int j = 2 + 2;
int k = 2 * (3 << 8) / 3;
int m = (int)(&i + 2);
int p = sizeof(float) * 2;
int q = sizeof(p);
float r = (float)(2 * 3);
int main(void) {
printf("i = %i\n", i);
printf("j = %i\n", j);
printf("k = %i\n", k);
printf("m = %i\n", m);
printf("p = %i\n", p);
printf("q = %i\n", q);
printf("r = %f\n", r);
for (r = 0.0; r < 1.0; r += 0.1) {
double s = sin(r);
printf("The sine of %f is %f\n", r, s);
}
return 0;
}
Notes:
if (expression)
statement
else if (expression)
statement
else if (expression)
statement
else
statement
Where "statement" is a simple C statement ending in a semicolon, or a compound statement ending in a curly bracket.
Some examples will help:
if (i == 6) x = 3;
if (i == 6) {
x = 3;
}
if (i) x = 3;
if (i == 2)
x = 3;
else if (i == 1)
if (j)
y = 2;
else /* NOTE: possible ambiguity here, the compiler uses */
y = 3; /* the closest if statement that does not have */
else { /* an else clause */
x = 4;
y = 5;
}
"break" exits the innermost current loop.
"continue" starts the next iteration of the loop.
Here are three ways of writing an infinite loop, choose whichever one you like:
for (;;;) {
statement;
}
while (1) {
statement;
}
do {
statement;
} while (1);
Infinite loops can be useful. They are normally terminated using a conditional test with a "break" or "return" statement. For example:
while (1) {
statement;
if (logical_condition) break;
}
C does have a "goto" statement, but you normally don't need it. Using "goto" is usually a result of bad programming (although in some circumstances it can be a sensible choice).
The switch statement allows you to choose between multiple paths of execution depending on the value of an expression. It could be replaced by a series of if statements, but switch can make the flow of the program easier to understand.
switch (expression) {
case const-expression: statements case const-expression: statements default: statements }
Beware that control will flow from one case to the next unless you
explicitly include a break statement to cause control to skip
to the end of the switch statement.
Here is an example of using switch:
switch (i) {
case 0: printf("i had the value zero\n");
break;
case 1:
case 2: printf("i was either one or two\n");
break;
default: printf("i was outside the range 0 to 2\n");
}
The format string given to the "printf" function may contain both ordinary characters (which are simply printed out) and conversion characters (beginning with a percent symbol, %, these define how the value of an internal variable is to be converted into a character string for output).
Here is the syntax of a conversion specification:
%{flags: - + space 0 #}{minimum field width}{.}{precision}{length modifier}{conversion character}
Character Type Result
d,i int signed decimal integer
o int unsigned octal (no leading zero)
x, X int unsigned hex (no leading 0x or 0X)
u int unsigned decimal integer
c int single character
s char * characters from a string
f double floating point [-]dddd.pppp
e, E double exponential [-]dddd.pppp e[=/-]xx
g, G double floating if the exponent less than -4 or >= precision
else exponential
p void * pointer
n int * the number of characters written so far by printf
is stored into the argument (i.e., not printed)
% print %
Here is an example program to show some of these effects (try running the program yourself):
#include <stdio.h>
int main(void) {
int i = 123;
double f = 3.14159265358979323846;
printf("i = %i\n", i);
printf("i = %o\n", i);
printf("i = %x\n", i);
printf("i = %X\n", i);
printf("i = %+i\n", i);
printf("i = %8i\n", i);
printf("i = %08i\n", i);
printf("i = %+08i\n", i);
printf("f = %f\n", f);
printf("f = %10.3f\n", f);
printf("f = %+10.3f\n", f);
printf("f = %g\n", f);
printf("f = %10.6g\n", f);
printf("f = %10.6e\n", f);
return 0;
}
Notes:
Input in C is similar to output: the same conversion characters are used. The main difference is that you use a routine called "scanf", and you must pass the addresses of the variables you want to change, not their values.
For example:
scanf("%d", &i); /* reads an integer into `i' */
scanf("%i", &i); /* reads an integer (or octal, or hex) into `i' */
scanf("%f %i", &f, &i); /* reads a double followed by an integer */
scanf is actually an integer function, which returns the number of input items assigned (or the integer constant EOF (defined in stdio.h) if the end-of-file is reached or an error occurred).
The ampersand character "&" is a unary operator that returns the address of the thing to its right. Recall that a C function can not alter the value of its arguments (but there is nothing stopping it altering the value that is pointed to by one of its arguments!).
Input data |
scanf function call to read the data |
123 |
scanf("%i", &i); |
123 0x456 |
scanf("%i %i", &i, &j); |
123. 456. |
scanf("%i. %i.", &i, &j); |
123.1 456.1 |
scanf("%f %f", &x, &y); |
e=2.71 pi=3.14 |
scanf("e=%f pi=%f", &x, &y); |
apples red |
scanf("%s %s", fruit, colour); |
Guido ate 2 apples |
scanf("Guido ate %i %s", &i, fruit); |
FILE *fp;
float x, y;
fp = fopen("input.dat", "r");
fscanf(fp, "%f %f", &x, &y);
char input="1.23 4.56";
sscanf(input, "%f %f", &x, &y);
While it is possible to use "scanf" for reading input typed by a person, you can greatly improve the user experience by using the GNU readline package.
The GNU readline package provides some very convenient features such as line editing and history. This means, e.g., that you can use the arrow keys to move within a line, and between previously entered lines. You can also use, e.g., CTRL-A to go to the beginning of the line, CTRL-E to go to the end, and CNTRL-R to search backwards for particular strings. See "man readline" (on a GNU/Linux computer) for a complete description.
Here is an example program showing the use of GNU readline.
#include <assert.h>If the expression inside parentheses after "assert" is true, the program continues as normal, if the expression is false (i.e, zero), then the program aborts with an error message.
#include <stdio.h>
int main(void) {
int i;
printf("enter a positive integer > ");
assert(1 == scanf("%i", &i));
assert(i > 0);
printf("you successfully entered %i\n", i);
return 0;
}
double myAsin(double x) {Also, you can use assert to check for unexpected conditions such as failure to open a file, or failure to allocate memory, e.g.,
assert(fabs(x) < 1.0);
...
}
assert(NULL != (fp = fopen("data.dat", "r")));
assert(NULL != (p = malloc(1024));
Random numbers are used in all sorts of ways in mathematics and
physics. They are a fundamental tool that is often useful when
conducting numerical simulations of physical systems with a computer.
Writing a computer program to generate random numbers requires understanding some fairly subtle concepts.
There are several different algorithms for generating a new random number given an initial random number. One of the simplest (and best) algorithms for generating random integers is a multiplicative congruential algorithm: you simply multiply the current random number generator by a constant, and then take the modulus. The choice of the constant and the modulus is critical. A good choice is 16807 and 2147483647 (see Numerical Recipes in C, Press et al, 2nd Ed, page 278).
But how do you generate the initial random number, the so-called "seed", for the algorithm?
One technique is to use the time (in milli or micro-seconds since
the last integral second). This is easy to program, but has a few
disadvantages, e.g, there are only 1000 or 1000000 different seeds, and
these numbers are not necessarily equally likely to occur (e.g., the
operating system might tend to start programs at particular times - see here for more details).
Another technique on GNU/Linux systems is to read from the file /dev/random. /dev/random contains random numbers generated from "random" events such as the arrival time of internet packets, the movement of the mouse, the timing of user keystrokes, etc. If you "od -h /dev/random" ("od -h" prints out the file as hexadecimal characters) you will find that it will eventually stop, waiting for new events to generate randomness - move the mouse and it will continue. An alternative is /dev/urandom, which will continuously generate random numbers, using an algorithm if events are too infrequent.
If you require very high quality random numbers, then you should read about the subject, perhaps using Numerical Recipies in C as a starting point.
The C libraries provide the functions "srandom" and "random" to
generate random integers between 0 and RAND_MAX (defined in
"stdlib.h") inclusive. Use the on-line manual pages to learn about
"srandom" and "random" (i.e., type "man srandom" at the
prompt). Here is an example program
showing their usage, and seeding them with the time in
microseconds.
A frequent requirement is to sort quantities (numbers, character strings, or more complicated structures) into some order (ascending, descending, or something more complex). The C library provides the function "qsort", an implementation of the famous quicksort algorithm. It is usually best to use "qsort" than write your own sorting algorithm.
Here are two example programs showing the use of qsort.
Here are Anthony Frith's notes on Gnuplot.
Here is an example of using numerical integration to follow the motion of a planet around the Sun:
Reasons for worrying about how fast a program is include:
Hints on improving speed:
In 1970, the mathematician John Horton Conway published the description of a simple 2D cellular automata which he called the Game of Life. Cells in a 2D grid were either "alive" or "dead", and their state in the next generation was determined by their current state and the number of nearest neighbours (from 0 to 8, inclusive). The rules were chosen to simulate some properties of biological systems (e.g., a cell would die of "overcrowding" if too many of its neighbours were alive, and die of "lack of support" if too few of its neighbours were alive).
Here is a "simple" example of how to program the Game of Life on a computer.
If we are interested in following the evolution of the game for many thousands of generations, it is advantageous to think of ways of speeding up our simple implementation. The first thing to try is to turn on full optimisation during the compilation, using the "-O4" switch. The next step is to make the critical functions "inline" (see here for the code). We can also think about the algorithm, and realise that a lot of our calculation of the number of nearest neighbours was involved in handling the special case of being on the boundary; we can re-write our program to separate out the boundary case, thereby allowing a simplified (faster) function to do most of the neighbour calculations, at the expense of increased complexity in handling the evolution from one generation to the next. Finally, we can try to use the fact that much of the Life "universe" tends to be sparsely populated, allowing us to produce a second neighbourhood-calculating function for the special case that the preceeding cell had no nearest neighbours. These are by no means the only speed-ups that are possible, but they give you some flavour of what is possible.
The following table gives the time taken to follow one million generations using a 850MHz Pentium III computer and the various tricks from the last paragraph.
Time Technique
[seconds]
--------------------------------------------------
330 Original program, no optimisation
205 Original program, -O4 optimisation
113 Inline function calls 45 Boundary case handled separately 37 Sparse population handled separately
So, we have managed almost a factor of ten improvement in speed over our original attempt.
To show why this might be useful, let's now alter our program to automatically identify glider patterns (those that can self-propagate). We will choose a brute-force technique where we randomly seed the centre of our rectangular "universe" with alive/dead cells, and then evolve the system until one of the boundary cells is hit. When this happens we stop the evolution and print a 21x21 grid around the cell, hopefully capturing a glider in full flight! Here is the code, and you can see that it is a straightforward variation on the earlier program. It typically finds a glider within 2 seconds on a 850MHz Pentium III computer.
Use "man -k keyword" to search for information about "keyword". For example, suppose you want to know something about random number generators: "man -k random" on a GNU/Linux system will return output containing lines like these:
ppmspread (1) - displace a portable pixmap's pixels by a random amount
rand (3) - random number generator
RAND_bytes (3ssl) - generate random data
random (3) - random number generator
random (4) - kernel random number source devices
RAND_pseudo_bytes [RAND_bytes] (3ssl) - generate random data
rand [sslrand] (1ssl) - generate pseudo-random bytes
rand [sslrand] (3ssl) - pseudo-random number generator
seed48 [drand48] (3) - generate uniformly distributed pseudo-random numbers
The number in parentheses is called the section number (e.g., section 3 is reserved for C functions, section 4 for low-level kernel interfaces), and you need to specify this if there are entries in multiple volumes. So, e.g., to read about the kernel random number source devices, you would type "man 4 random".
When debugging a program, a common technique is to insert "print"
statements to display the values of variables. A much more powerful
technique is to use the GNU C debugger, "gdb". Other C compilers will
have debuggers as well.
Firstly, you must compile your program with the debug switch (-g). It is also a good idea to disable optimisation with -O0.
gcc -g -O0 main.c
Then run your program using gdb:
gdb a.out
The following are some of the more common commands that are available:
help - obtain help
help {arg} - obtain help on topic {arg}
list - list 10 lines around current position
list {line} - list 10 lines around line number {line}
list {beg},{end} - list lines from line number {beg} to {end}
run arg1 arg2 ... - begin execution of the program, with the arguments
break line - suspend execution at the given line number
step - steps (and executes) one line in your program
print {exp} ... - print the value of the given expressions
printf "string", exp, ... - print expressions using format string(C)
quit - quit gdb
<ENTER> - repeats the last command (very useful with "step")
In mathematics and physics you often need to group numbers together
into arrays, i.e., 1, 2, or
higher-dimensional structures.
Arrays are declared in C as follows (for example):
int counts[100];
float temperature[1024];
double mat[2][3];
In this example, "count" is an array that can hold 100 integers,
"temperature" is an array that can hold 1024 floats, and "mat" is a 2x3
array (or matrix) of doubles.
Note carefully that the first element of an array in C is element number 0 (rather than 1 as in FORTRAN). While this may be confusing to die-hard FORTRAN programmers, it is really a more natural choice. A side-effect of this choice is that the last element in an array has an index that is one less that the declared size of the array. This is a source of some confusion, and something to watch out for.
So, "counts[0]" is the first element of the array "counts", defined above. And "counts[99]" is the last element. There are 100 elements in total.
To initialise an array to have particular numerical values, specify the initial values in a list within curly brackets. For example:
int primes[100] =
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107,
109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181,
191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263,
269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433,
439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521,
523, 541};
float temp[1024] = {5.0F, 2.3F};
double trouble[] = {1.0, 2.0, 3.0};
In this example, "primes" is initialised with the values of the first 100 primes (trust me!), and the first two elements (temp[0] and temp[1]) of "temp" are initialised to 5.0F and 2.3F respectively. The remaining elements of "temp" are set to 0.0F. Note that we use the trailing "F" on these numbers to indicate that they are floats, not doubles.
The array "trouble" in the above example contains three double numbers. Note that its length is not explicitly declared, C is smart enough to calculate the length for you.
Static arrays that are not explicitly initialised are set to zero. Automatic arrays that are not explicitly initialised, have undefined values.
Multidimensional arrays are declared and referenced in C by using multiple sets of square brackets. For example,
int table[2][3][4];
"table" is a 2x3x4 array, with the rightmost array subscript changing most rapidly as you move through memory (the first element is "table[0][0][0]", the next element is "table[0][0][1]", and the last element is "table[1][2][3]").
When writing programs that use large arrays (say, more than a few megabytes), you should be very careful to ensure that array references in your program are as close as possible to being consecutive (otherwise you may get severe swapping problems, i.e., the memory is swapped to hard disk, resulting in more than a factor of 1000 slow-down).
This is best demonstrated with an example:
int mat[3][4] = {
{ 0, 1, 2, 23},
{ 3, 4, 5, 24},
{ 6, 7, 8, 25}
}
int a0[10], a1[10], a2[10];Even something as simple as setting one array to have the same value as another has to be done with a "for" loop. E.g. :
for (int i = 0; i < 10; i++) {
a0[i] = a1[i] + a2[i];
}
int a0[10], a1[10];
for (int i = 0; i < 10; i++) {
a0[i] = a1[i];
}
Arrays can be of any type. In C, you manipulate character strings as arrays of characters, and operations on the strings (such as concatenation, searching) are done by calling special libary functions (e.g., strcat, strcmp).
Note that when calling a string-manipulation function, the end of the string is taken as the position of the first NUL character (which has a numerical value of zero) in the string.
Character arrays can be initialised in the following ways:
char str[] = {'a', 'b', 'c'};
char prompt[] = "please enter a number";
In the example, "str" has length 3 bytes, and "prompt" has length 22 bytes, which is one more than the number of characters in "please enter a number", the extra byte is used to store a NUL character (zero) as an indication of the end of the string. "str" does not having a trailing NUL.
If there is one thing that sets C apart from many other languages,
it is the pervasive use of pointers.
To understand pointers, it helps to have a mental image of the way in which the memory of a computer is organised. e.g., imagine that your computer has 128MB of memory; then you could label each one of these 128 million bytes of memory with a number: byte 0 would be the first byte in memory, byte 1 the second, and so on up to 128 million. These numbers are referred to as the addresses of the memory cells (note: I'm actually simplifying the true situation; in a modern computer each separate process has it's own virtual address-space, and there are several layers of indirection involved in going from the addresses that C uses, to the actual physical piece of memory on the computer motherboard). A pointer is then simply a variable containing a memory address. This is a very natural concept when you think in terms of the hardware of the computer, which is a viewpoint that C encourages.
Suppose that "i" is declared as "int i". Then the address in memory
at which "i" is stored is given by "&i", where "&" is the unary address operator. One says
that "&i" is a pointer to "i". On a 32-bit computer such as a
typical PC, a pointer is 4 bytes in length (therefore allowing up to
4GB of memory to be addressed; this is an uncomfortably small number).
Pointers are declared by specifying the type of thing they point at. For example,
int *p;
defines "p" as a pointer to an int (so, therefore, "*p" is an int, hence the form of the declaration).
Note carefully, then by declaring "p" as in the above example, the compiler simply allocates space for the pointer (4 bytes for 32-bit computers, 8 bytes for 64-bit computers), not for the variable that the pointer points to! This is a very important point, and is often overlooked. Before using "*p" in an expression, you have to ensure that "p" is set to point to a valid int. This "int" must have had space allocated for it, either statically, automatically, or by dynamically allocating memory at run-time (e.g., using the "malloc" function, see later).
Here is a rather convoluted example showing some of the uses of pointers:
#include <stdio.h>
int main(void) {
int m = 0, n = 1, k = 2;
int *p;
char msg[] = "hello";
char *cp;
/* Line 7 */
p = &m; /* Line 8: p now points to m */
*p = 1; /* Line 9: m now equals 1 */
k = *p; /* Line 10: k now equals 1 */
cp = msg; /* Line 11: cp points to the first character of msg */
*cp = 'H'; /* Line 12: change the case of the 'h' in msg */
cp = &msg[4]; /* Line 13: cp points to the 'o' */
*cp = 'O'; /* Line 14: change its case */
printf("m = %d, n = %d, k = %d\nmsg = \"%s\"\n", m, n, k, msg);
return 0;
}
By using the GNU debugger, gdb, we can identify the actual memory locations that the C compiler uses to store the variable in the above program. It is then possible to draw up a table (see below) showing the detailed operation of the program. You should follow this example very carefully so that you understand what is going on.
Variable Starting Value after the specified line has executed
location 7 8 9 10 11 12 13 14
------------------------------------------------------------------
m 0xbffff98c 0 0 1 1 1 1 1 1
n 0xbffff988 1 1 1 1 1 1 1 1
k 0xbffff984 2 2 2 1 1 1 1 1
p 0xbffff980 ? 98c 98c 98c 98c 98c 98c 98c
msg 0xbffff970 hello hello hello hello hello Hello Hello HellO
cp 0xbffff96c ? ? ? ? 970 970 976 976
The memory addresses are shown in hexadecimal, and only the rightmost three digits are shown (e.g., 98c) for convenience in displaying the table. Memory addresses are always given in units of bytes. The above table shows, for example, that the variable "m" occupies the memory addresses 0xbffff98c, 0xbffff98d, 0xbffff98e, and 0xbffff98f. You will note that 16 bytes are reserved for "msg", even though "msg" only needs 6 bytes (5 for the characters in "hello" and one for the trailing null byte). The C compiler won't guarantee any particular placement in memory, and may leave gaps, particularly to align numbers with 4-byte boundaries (i.e., so that their memory addresses are divisable by 4).
Note the very important point that the name of an array ("msg" in the above example), if used without an index, is considered to be a pointer to the first element of the array. In fact, an array name followed by an index is exactly equivalent to a pointer followed by an offset. For example,
#include <stdio.h>
int main(void) {
char msg[] = "hello world";
char *cp;
cp = msg;
cp[0] = 'H';
*(msg+6) = 'W';
printf("%s\n", msg);
printf("%s\n", &msg[0]);
printf("%s\n", cp);
printf("%s\n", &cp[0]);
return 0;
}
Note, however, that "cp" is a variable, and can be changed, whereas "msg" is a constant, and is not an lvalue (i.e., it can not be used on the left of an equals sign).
You can add and subtract numbers to/from pointers, but the results may surprise you. For example, if "p" is a pointer to an "int" (i.e., "int *p"), then "p++" actually adds 4 to "p"! The logic behind this is that adding one to "p" should make it point to the next integer, which has a memory address that is 4 bytes greater.
We have already seen that C functions can not alter their
arguments. The simple reason for this is that when you call a function,
the compiler passes copies of the
values of the arguments to the function. For example, if you
have a function as follow:
void myfunc(int n) {
n = 6;
}
and you call it with
int i = 0;Then "myfunc" is passed the number 0. "myfunc" knows nothing whatsoever about where this number came from. The zero is stored in the local variable n. "myfunc" changed n to 6, but this doesn't have any effect on i in the main program.
myfunc(i);
void myfunc(int *n) {
*n = 6;
}
and you call it with
int i = 0;
myfunc(&i);
Now, the value of i in the main program will be altered by the call to
"myfunc".
Here is an example of a function that swaps the values of its two arguments, which must be pointers to integers:
void swap_args(int *pi, int *pj) {
int temp;
temp = *pi;
*pi = *pj;
*pj = temp;
}
If all the asterisks were left out of this routine, then it would
still compile OK, but its only effect would be to swap local copies
of "pi" and "pj".
int a[10];Then, as we have seen, a[0] is the first element of the array, and a[9] is the last. The address of the first element is "&a[0]", and by convention this can also be written simply as "a".
Any programming language worth its salt has to allow you to manipulate more complex data types than simply ints, floats, and arrays. You need to have structures of some sort. This is best illustrated by example:
To define a structure, use the following syntax:
struct time {
int hour;
int minute;
float second;
};
This defines a new data type, called "time", which
contains
3 fields (stored in
consecutive locations in the computer's memory).
Logically, it makes sense to associate the various fields of "time"
together, since they are part of one physical entity: the time.
To declare a variable of type "time", you could do the following:
struct time t1[10], t2 = {12, 0, 0.0F};This defines "t1" to be an array of 10 "time" structures. The variable "t2" is also a "time" structure, and is initialised to 12 hours, 0 minutes, 0.0 seconds.
For an example program using structures, see the solution to Q12 of the sample mid-session test here.
To refer to an individual element of a structure, you use the following syntax:
t1[0].hour = 12;
t1[0].minutes = 0;
t1[0].second = t2.second;
t1[1] = t2;
Alternatively, and somewhat confusingly, if "p", say, is a pointer
to a structure, then you can access the elements of the structure using
notation like "p->hour". For example:
struct time *p; // p is a pointer to a time structure
struct time t = {1,2,3.0}; // t is a time structure
p = &t; // p now points to t
p->hour = 6; // this statement is exactly equivalent to...
t.hour = 6; // ...this one
Structures can contain any type, including arrays and other structures. A common use is to create a linked list by having a structure contain a pointer to a structure of the same type, for example,
struct person {where "person.next_person" is actually a pointer to a "person" struct. Sounds complicated, but it is simple when you get the hang of it.
char *name[80];
char *address[256];
struct person *next_person;
};
struct particle {We could then define an array of 1000 particles as simply as:
double mass;
double position[3];
double velocity[3];
};
struct particle p[1000];You can take this one step further and define your own type (like "int", "double") by doing:
typedef struct {In which case you can define the array of 1000 particles as:
double mass;
double position[3];
double velocity[3];
} particle;
particle p[1000];
The standard C libraries include a bunch of functions for manipulating strings (i.e., arrays of chars).
Note that before using any of these functions, you should include the line "#include <string.h>" in your program. This include-file defines prototypes for the following functions, as well as defining special types such as "size_t", which are operating-system dependent.
char *strcat(s1, s2) Concatenates s2 onto s1. null-terminates s1.
Returns s1.
char *strchr(s, c) Searches string s for character c. Returns a
pointer to the first occurence, or null pointer
if not.
int strcmp(s1, s2) Compares strings s1 and s2. Returns an integer
that is less than 0 is s1 < s2; zero if s1 = s2;
and greater than zero is s1 > s2.
char *strcpy (s1, s2) Copies s2 to s1, returning s1.
size_t strlen(s) Returns the number of characters in s, excluding
the terminating null.
char *strncat(s1, s2, n) Concatenates s2 onto s1, stopping after `n'
characters or the end of s2, whichever occurs
first. Returns s1.
int strncmp(s1, s2, n) Like strcmp, except at most `n' characters are
compared.
char *strncpy(s1, s2) Like strcpy, except at most `n' characters are
copied.
char *strrchr(s, c) Like strchr, except searches from the end of
the string.
int strstr(s1, s2) Searches for the string s2 in s1. Returns a
pointer if found, otherwise the null-pointer.
size_r strspn(s1, s2) Returns the number of consecutive characters
in s1, starting at the beginning of the string,
that are contained within s2.
size_r strcspn(s1, s2) Returns the number of consecutive characters
in s1, starting at the beginning of the string,
that are not contained within s2.
char *strpbrk(s1, s2) Returns a pointer to the first character in s1
that is present in s2 (or NULL if none).
char *strerror(n) Returns a pointer to a string describing the
error code `n'.
char *strtok(s1, s2) Searches s1 for tokens delimited by characters
from s1. The first time it is called with a
non-null s1, it writes a NULL into s1 at the
first position of a character from s2, and
returns a pointer to the beginning of s1. When
strtok is then called with a null s1, it finds
the next token in s1, starting at one position
beyond the previous null.
<stdio> defines a number of functions that are used for accessing files. Before using a file, you have to declare a pointer to it, so that it can be referred to with the functions. You do this with, for example,
FILE *in_file;
FILE *out_file;
where "FILE" is a type defined in <stdio> (it is usually a complicated structure of some sort). To open a file, you do, for example:
in_file = fopen("input_file.dat", "r");
out_file = fopen("output_file.dat", "w");
Note the use of "r" to indicate read access, and "w" to indicate write access. The following modes are available:
"r" read
"w" write (destroys any existing file with the same name)
"rb" read a binary file
"wb" write a binary file (overwriting any existing file)
"r+" opens an existing file for random read access, and
writing to its end
"w+" opens a new file for random read access, and writing to
its end (destroys any existing file with the same name)
Once the file is opened, you can use the following functions to read/write it:
int fread(void *ptr, int size, int nelm, FILE *fp) Reads nelm elements
of data, each size bytes long, storing them
at the location pointed to by ptr. Returns
the number of elements read.
int fwrite(void *ptr, int size, int nelm, FILE *fp) Writes nelm elements
of data, each size bytes long, from the
location pointed to by ptr. Returns
the number of elements written.
int getc(FILE *fp) Returns the next character from `fp', or
EOF on error or end-of-file.
int putc(int c, FILE *fp) Write the character c to the file `fp',
returning the character written, or EOF on
error.
int fscanf(FILE *fp, char *format, ...) Like scanf, except input is taken
from the file fp.
int fprintf(FILE *fp, char *format, ...) Like printf, except output is written
to the file fp.
char *fgets(char *line, int n, FILE *fp) Gets the next line of input from the
file fp, up to `n-1' characters in length.
The newline character is included at the end
of the string, and a null is appended. Returns
`line' if successful, else NULL if end-of-file
or other error.
int fputs(char *line, FILE *fp) Outputs the string `line' to the file fp.
Returns zero is successful, EOF if not.
When you have finished with a file, you should close it with 'fclose':
int fclose(FILE *fp) Closes the file 'fp', after flushing any
buffers. This function is automatically called
for any open files by the operating
system at the end of a program.
It can also be useful to flush any buffers associated with a file, to guarantee that the characters that you have written have actually been sent to the file:
int fflush(FILE *fp) Flushes any buffers associated with the
file 'fp'.
To check the status of a file, the following functions can be called:
int feof(FILE *fp) Returns non-zero when an end-of-file is read.
int ferror(FILE *fp) Returns non-zero when an error has occurred,
unless cleared by clearerr.
void clearerr(FILE *fp) Resets the error and end-of-file statuses.
int fileno(FILE *fp) Returns the integer file descriptor associated
with the file (useful for low-level I/O).
Note that the above "functions" are actually defined as pre-processor macros in C. This is quite a common thing to do.
As an example of how to program in C, let's explore the topic of multiple precision arithmetic. All the hard work will be done by GNU's Multiple Precision Arithmetic Library (GNU MP), written by Torbjorn Granlund.
Multiple precision arithmetic allows you to perform calculations to a greater precision than the host computer will normally allow. The penalty you pay is that the operations are slower, and that you have to call subroutines to do all the calculations.
GNU MP can perform operations on integers and rational numbers. It uses preprocessor macros (defined in gmp.h) to define special data types for storing these numbers. MP_INT is an integer, and MP_RAT is a rational number. However, since a multiple precision number may occupy an arbitrarily large amount of memory, it is not sufficient to allocate memory for each number at compile time. GNU MP copes with this problem by dynamically allocating more memory when necessary.
To begin with you need to initialise each variable that you are going to use. When you are finished using a variable, you should free the space it uses by calling a special function.
MP_INT x, y;
mpz_init(&x);
mpz_init(&y);
/* operations on x and y */
mpz_clear(&x);
mpz_clear(&y);
Let's now try a full program. For example, calculating the square root of two to about 200 decimal places:
#include <gmp.h>
#include <stdio.h>
int main(void) {
char two[450], guess[225];
int i;
MP_INT c, x, temp, diff;
two[0] = '2';
for (i = 1; i < sizeof(two)-1; i++) {
two[i] = '0';
}
two[i] = 0;
mpz_init_set_str(&c, two, 10);
guess[0] = '1';
for (i = 1; i < sizeof(guess)-1; i++) {
guess[i] = '0';
}
guess[i] = 0;
mpz_init_set_str(&x, guess, 10);
mpz_init(&temp);
mpz_init(&diff);
do {
mpz_div(&temp, &c, &x);
mpz_sub(&diff, &x, &temp);
mpz_abs(&diff, &diff);
mpz_add(&x, &temp, &x);
mpz_div_ui(&x, &x, 2U);
} while (mpz_cmp_ui (&diff, 10U) > 0);
printf("the square root of two is approximately ");
mpz_out_str(stdout, 10, &x);
printf("\n");
return 0;
}
To compile, link, and run the program, use
gcc -o two two.c -lgmp
./two
Up until now, our "main" function has always been declared as "int main(void)". We have seen that the integer return value is available to the UNIX "shell" in the "$?" environment variable, and this can be used to alter the execution of scripts. It turns out that "main" can actually take arguments, and these are traditionally given as "int argc, char *argv[]", where argc is one plus the number of command-line arguments provided to the program, and argv is an array of character strings containing the arguments. argv[0] is usually the name of the program itself. An example will hopefully clarify this:
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("this program is called '%s'\n", argv[0]);
if (argc == 1) {
printf("it was called without any arguments\n");
} else {
int i;
printf("it was called with %d arguments\n", argc - 1);
for (i = 1; i < argc; i++) {
printf("argument number %d was <%s>\n", i, argv[i]);
}
}
exit(argc);
}
echo $?
The situation often arises that you don't know the size of a data
structure in your program at compile-time. For example, suppose you
are writing a program to read an 2D image from a file into the
computer's memory. If the size of the image can vary at run-time, it
would be nice not to have to pre-define the size of the memory array
in your program. C allows you to do this through the concept of
allocating and deallocating memory.
Some important points to remember when using malloc:
Here is an example to get you started:
#include <stdio.h>C programming course, School of Physics, UNSW / Michael Ashley / mcba@newt.phys.unsw.edu.au
#include <stdlib.h>
#include <assert.h>
/* Generates a two dimensional matrix, and fills it randomly with
zeroes and ones. */
int main(void) {
int xdim, ydim;
int i, j;
int *p, *q;
// Obtain the dimensions of the matrix from the user.
printf("x dimension of matrix? > ");
scanf("%d", &xdim);
printf("y dimension of matrix? > ");
scanf("%d", &ydim);
// malloc the memory needed for the matrix. Note the use of "sizeof(int)"
// which allows the program to run on machines with a variety of word sizes.
assert(NULL != (p = (int *)malloc(xdim * ydim * sizeof(int))));
// Fill the matrix with random 0s and 1s.
for (i = 0; i < xdim * ydim; i++) {
if (random() > RAND_MAX/2) {
*(p+i) = 1;
} else {
*(p+i) = 0;
}
}
// Now print the matrix out, showing the use of pointer arithmetic.
for (i = 0; i < ydim; i++) {
q = p + i * xdim;
for (j = 0; j < xdim; j++) {
printf("%1d", *(q++));
}
printf("\n");
}
// Free the memory we had allocated.
free((void *)p);
// And remove all information about where the memory used to be.
// This is overkill in this example, since the program immediately
// terminates, but it is good practice, in order to avoid
// accidentally using free'ed memory.
p = q = NULL;
return 0;
}