Introduction to Computer Organization

What happens when you call gcc?

C preprocessor

  • Handles macros:
    • #define
    • #ifdef
    • #if

This is usually just a simple search-find-replace operation. You just find the macro, and replace it with the definition you have provided.

gcc -E foo.c > foo.i

(foo.i contains preprocessed source code)

Compiler

Converts it to the assembly program. A fairly involved process, and a lot of effort that goes into this conversion.

  • Able to handle any amount of whitespace inbetween commands
gcc -S foo.c

(foo.s contains textual assembly)

Assembler

Takes the assembly code, and produces ones and zeroes. Produces "object file".

as foo.s -o foo.o

or

gcc -c foo.s

Linker

Takes all the object files and links it into one. It's not just when you're having a super complex program – it happens all the time. Whenever you use any libraries, you have to link the libraries.

id foo.o bar.o _bunch_of_other_stuff -o a.out

A common example of linker errors:

  • Defining two main functions. This is because the linker doesn't know which function to call.
  • Forgetting to #include libraries

You can run gcc -V to run the program in verbose mode.

Memory Layout

In a program, once again, there is the code segment, the static data segment, the heap data segment, and the stack segment. The code segment and static data segment is located in the object file, and the stack and heap are dynamically generated.

Linker

If you have a file hello.o and a file stdio.o, then each file has its own code segment, C1 and C2, and data segment, D1 and D2. How can these two be linked together?

Here is one idea: store C1 and C2 sequentially, and D1 and D2 sequentially. The problem is, in the new merged object file, almost all of the addresses in the code and data segments have shifted! This is clearly a problem.

  • If you have a function foo() at address 0 in C1, and a function bar() at address 0 in C2, if you call bar() you will actually call foo().

The question is, how do you keep track of the changed addresses of instructions and data?

This affects:

  • Loads and stores. The loads and stores access data, and you need to know the location of the data.
  • Branching. You have to be able to specify the location of these instructions to branch to.

Useful Data Structures

Here is a useful example:

extern float sin(); // math.h
extern printf(), scanf(); // stdio.h

main() {
    double x, result;
    printf("Type number:");
    scanf("%f", &x);
    result = sin(x);
    printf("Sine is %f\n", result);
}

If you compile the main.c file, you produce a main.o object file. This has a text section with the function calls, a data section (globals, static locals, and stricts), symbol table, and relocation table.

Relocation Table

Keeps track of the subset of instructions that you need to track. This is the table of instructions that can move.

For example, let's say that the first printf call is in the text section at the location 30. Then, in the relocation table, you could have a value in the relocation table:

printf  T[30]

Symbol Table

Same as the relocation table, but for subsets of variables that can move instead.

What the linker does now

You have three .c files, and you have now a code segment composed of the text from main.o, stdio.o, and math.o. Followed by this is the data from main.o, stdio.o, and math.o.

In each symbol table and relocation table, the offsets don't change compared to the base of the file. This allows you to compute the absolute address. You can use the length of the object file along with the offsets to compute the final address.

Note that nothing in the stack or heap is in the relocation table or symbol table, because they are not in the data segment.