Toolchain: Basics¶
GCC - GNU Compiler Collection¶
GCC - GNU Compiler Collection
Started by Richard Stallman in the 1980’s
Part of the GNU Project, along with other basic technology
Frontends for a number of programming languages
⟶ most popular: C and C++
Supports a large number of architectures
Many tiny to middle-sized embedded processors
Alternative to GCC: LLVM
Backed by Apple to a certain extent
Modern design ⟶ modular, clean interfaces, etc.
⟶ Rust (e.g.) is based on it
All-In-One Usage: Single File¶
“Monolithic” program
#include <stdio.h> int main(void) { printf("Hello World\n"); return 0; }
All-in-one: convert C to executable (seemingly directly)
$ gcc hello-single.c
Produces an executable program
$ ls -l a.out -rwxrwxr-x. 1 jfasch jfasch 24360 Mar 25 11:14 a.out $ ./a.out Hello World
Changing the output file’s name (
a.out
is not very expressive)$ gcc -o hello-single hello-single.c $ ls -l hello-single -rwxrwxr-x. 1 jfasch jfasch 24360 Mar 25 11:39 hello-single $ ./hello-single Hello World
A lot going on behind the scenes!
All-In-One Usage: Multiple Files¶
“Modular” program
Main
“Modularized” out
#include "hello.h" int main(void) { hello(); return 0; }
#ifndef HELLO_H #define HELLO_H void hello(void); #endif
#include "hello.h" #include <stdio.h> void hello(void) { printf("Hello World\n"); }
All-in-one: convert multiple C files to executable ⟶ simply list them along with the main file
$ gcc -o hello-modular hello-main.c hello.c
Output as before …
$ ls -l hello-modular -rwxrwxr-x. 1 jfasch jfasch 24416 Mar 25 11:42 hello-modular $ ./hello-modular Hello World
This Is Not As Simple As It Seems!¶
Linux executables are in Executable and Linkable Format
⟶ Complicated but extensible and flexible
“Sections”
Linux kernel starts programs on behalf of a user
User (the shell, for example, when you type the program’s name on the commandline) calls system call
execve()
Kernel: “Ah yes, someone wants me to run a program” 👍
Kernel starts the dynamic loader instead, telling it to start the program to be exec’d
Dynamic Loader (commonly called
ld.so
)The program that interprets the program’s content and sets up the address space
$ ls -l /lib64/ld-linux-x86-64.so.2 lrwxrwxrwx. 1 root root 10 Jan 26 02:53 /lib64/ld-linux-x86-64.so.2 -> ld-2.33.so
Prior to starting the program, shared (dynamic) libraries must be found and loaded
And much much more ⟶ man -s 8 ld.so
Program Loading (Short Version)¶
Few programs are really self-contained
Basic dependency: C runtime (
libc
)Determine program’s dependencies ⟶ must be loaded in dependency order.
$ ldd hello-modular linux-vdso.so.1 (0x00007ffc44130000) libc.so.6 => /lib64/libc.so.6 (0x00007fdd8ea64000) /lib64/ld-linux-x86-64.so.2 (0x00007fdd8ec5c000)
Program loader in action
$ strace ./hello-modular execve("./hello-modular", ["./hello-modular"], 0x7fffea18b920 /* 47 vars */) = 0 brk(NULL) = 0x67e000 arch_prctl(0x3001 /* ARCH_??? */, 0x7fff6d3dd240) = -1 EINVAL (Invalid argument) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=80987, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 80987, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1721afc000 close(3) = 0 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \203\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2420152, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1721afa000 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 1973104, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1721918000 mmap(0x7f172193e000, 1441792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f172193e000 mmap(0x7f1721a9e000, 319488, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x186000) = 0x7f1721a9e000 mmap(0x7f1721aec000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d3000) = 0x7f1721aec000 mmap(0x7f1721af2000, 31600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f1721af2000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1721916000 arch_prctl(ARCH_SET_FS, 0x7f1721afb680) = 0 set_tid_address(0x7f1721afb950) = 471382 set_robust_list(0x7f1721afb960, 24) = 0 rseq(0x7f1721afbfa0, 0x20, 0, 0x53053053) = 0 mprotect(0x7f1721aec000, 16384, PROT_READ) = 0 mprotect(0x403000, 4096, PROT_READ) = 0 mprotect(0x7f1721b42000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7f1721afc000, 80987) = 0 newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0xb), ...}, AT_EMPTY_PATH) = 0 getrandom("\xcd\x7c\xc9\x3b\xd6\xc3\xf2\x44", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x67e000 brk(0x69f000) = 0x69f000 write(1, "Hello World\n", 12Hello World ) = 12 exit_group(0) = ? +++ exited with 0 +++
What’s In A Program? ⟶ Symbols (nm
)¶
Symbols
Global variables
Functions
Contributed from a very wide variety of programs and precompiled object code
$ nm hello-single
...
0000000000401040 T _start
0000000000401000 T _init
00000000004011b8 T _fini
...
Overview: Where Do Which Symbols Come From (⟶ The Toolchain)¶
Symbol |
Where from |
Purpose |
Source |
---|---|---|---|
|
$ ls -l /usr/lib64/crt*.o
-rw-r--r--. 1 root root 18384 Jan 26 02:55 /usr/lib64/crt1.o
-rw-r--r--. 1 root root 1800 Jan 26 02:55 /usr/lib64/crti.o
-rw-r--r--. 1 root root 1536 Jan 26 02:55 /usr/lib64/crtn.o
|
Startup code |
Contributed by OS; remains the same over ages. Calls |
|
Generated |
Used for relocation of shared libraries into the process address space. |
Linker. Part of toolchain; binds objects and libraries
together. Resolves open references (e.g. $ ls -l /usr/bin/ld
-rwxr-xr-x. 1 root root 1762320 Sep 16 2021 /usr/bin/ld
|
|
Generated binary code (“Text”, hence the |
|
Compiler $ ls -l /usr/bin/gcc
-rwxr-xr-x. 3 root root 1224008 Jan 27 12:29 /usr/bin/gcc
|
|
Unresolved symbol, generated by linker who has found it in C
library. Obviously |
Reference for the loader, who will find it in |
Will be resolved by the loader when it finds and loads
$ ls -l /lib64/libc.so.6
lrwxrwxrwx. 1 root root 12 Jan 26 02:53 /lib64/libc.so.6 -> libc-2.33.so
|
Recap: Toolchain¶
The seemingly simple command “Build be an executable hello-single
from hello-single.c
” does a number of separate things under the
hood:
Compile
hello-single.c
to an object file (an ELF format file containing only the machine code of functionhello()
)One does not see that file as it is only kept temporarily. Such object files usually carry the
.o
extension.Multiple intermediate steps are hidden under the name compilation as well, producing more temporary files:
Running the C preprocessor on the source file
Compiling what the preprocessor left, into machine specific (but human readable) assembly code
Running the assembler to turn that into machine code, finally.
Link
hello-single.o
(lets give the temporary file a name) together with the following artifacts, and finally produce the executablehello-single
:The OS specific startup code
The shared C runtime - the “C library” (that library defines a large number of functions like
printf()
,malloc()
,free()
, …)