██████╗  ██████╗ ██╗  ██╗ █████╗ 
╚════██╗██╔═══██╗██║  ██║██╔══██╗
 █████╔╝██║   ██║███████║███████║
 ╚═══██╗██║   ██║██╔══██║██╔══██║
██████╔╝╚██████╔╝██║  ██║██║  ██║
╚═════╝  ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝

Welcome to 3OHA, a place for random notes, thoughts, and factoids that I want to share or remember.



21 April 2022

Static linking with the C standard library in Linux

A student asked me today: Why would you want to statically link with libc? There are several cases where static linking might be the preferred option. One example is to increase the security of certain binaries that need to run in a hostile environment. No hardened binary should trust any shared library in the system—especially libc—because this facilitates some trivial attacks against the binary (think LD_PRELOAD). Static linking is by no means a silver bullet, but it raises the bar one notch.

glibc is not meant to be statically linked

There is only one caveat, though: the C standard library which is available in modern systems, such as glibc (the GNU C Library) in Linux, is not meant to be statically linked. If you want to statically link against libc, you should use a static version of the library. Most Linux distros come with a static version (libc.a) along with the standard libc.so. Yet, linking against libc.a might get you a broken or unstable binary, or one that still depends on libc.so. There are two key reasons for this:

  1. Loading shared objects with dependencies on glibc. glibc uses dlopen() a lot to load other modules. A quick grep over the source code of glibc will give you tons of examples:
    ./dlfcn/tst-dlinfo.c:  void *handle = dlopen ("glreflib3.so", RTLD_NOW);
    ./dlfcn/bug-atexit3.c:  void *handle = dlopen ("$ORIGIN/bug-atexit3-lib.so", RTLD_LAZY);
    ./dlfcn/bug-dl-leaf-lib.c:  hdl = dlopen ("bug-dl-leaf-lib-cb.so", RTLD_GLOBAL | RTLD_LAZY);
    ./dlfcn/tststatic4.c:  global_handle = dlopen ("modstatic3.so", RTLD_LAZY | RTLD_GLOBAL);
    ./dlfcn/errmsg1.c:  h = dlopen ("errmsg1mod.so", RTLD_NOW);
    ./dlfcn/tst-dladdr.c:  handle = dlopen ("glreflib1.so", RTLD_NOW);
    ./dlfcn/modstatic2.c:  void *handle = dlopen ("modstatic2-nonexistent.so", RTLD_LAZY);
    ./dlfcn/modstatic2.c:  handle = dlopen ("modstatic2.so", RTLD_LAZY);
    ./dlfcn/tststatic5.c:  handle = dlopen ("modstatic5.so", RTLD_LAZY | RTLD_LOCAL);
    
    [...]
    
    ./resolv/tst-resolv-canonname.c:  void *nss_dns_handle = dlopen (LIBNSS_DNS_SO, RTLD_LAZY);
    ./resolv/tst-resolv-ai_idn.c:  void *handle = dlopen (LIBIDN2_SONAME, RTLD_LAZY);
    ./resolv/tst-resolv-ai_idn-latin1.c:  void *handle = dlopen (LIBIDN2_SONAME, RTLD_LAZY);
    
    [...]
    
    ./elf/unload6mod3.c:  h = dlopen ("unload6mod1.so", RTLD_LAZY);
    ./elf/tst-tls15.c:  void *h = dlopen ("tst-tlsmod15a.so", RTLD_NOW);
    ./elf/tst-tls15.c:  h = dlopen ("tst-tlsmod15b.so", RTLD_NOW);
    ./elf/tst-debug1.c:  void *h = dlopen ("tst-debug1mod1.so", RTLD_LAZY);
    ./elf/dblunload.c:  p1 = dlopen ("dblloadmod1.so", RTLD_LAZY);
    ./elf/dblunload.c:  p2 = dlopen ("dblloadmod2.so", RTLD_LAZY);
    ./elf/neededtest2.c:  obj2 = dlopen ("neededobj2.so", RTLD_LAZY);
    ./elf/neededtest2.c:  obj3[1] = dlopen ("neededobj3.so", RTLD_LAZY);
    ./elf/tst-unique2.c:  void *h = dlopen ("tst-unique2mod2.so", RTLD_LAZY);
    
    [...]
    
    Some of these shared objects contain calls to C library functions. If your (statically linked) program happens to hit a function that triggers a dlopen() call, then it is very likely that it will also need to dynamically load glibc.so to comply with the requirements of the shared object that is being loaded (e.g., because its functions make calls to C library functions). The overall result is that your program still needs glibc.so in the system plus whatever other shared objects that are loaded. Ensuring that your program gets all the symbols needed by these libraries is not easy, so you will end up with a second, dynamically linked copy of glibc in the memory address space. This is certainly not what you had in mind when statically linking your program.

  2. Non-isolation from changes in the userland/kernel-land boundary. One of the key reasons for using the C standard library is that the program/libc interface is standardized. If the kernel undergoes substantial changes and some system calls are removed or changed, you only need a new libc version that works with the new kernel. Your program will still call, say, fopen(), and this will work with whatever binary interface the new kernel offers. You give up this benefit when you statically link with libc.a, and your program is no longer insulated from changes in the kernel interface. This reason led Sun to stop providing libc.a in Solaris 10 (circa 2004), in an attempt to stop developers from producing statically linked binaries. Rod Evans wrote about it in this post nearly 20 years ago.

Alternative C standard libraries

There are a few alternatives to glibc available for Linux that are a better option when the goal is to produce statically linked binaries. Folks who are familiar with certain types of UNIX malware might recognize some of them, such as uClibc (and the newer version uClibc-ng) or musl. Google's Bionic for Android is another popular libc implementation, though to be precise Bionic is not only libc but also libm, libdl, and the dynamic linker.

The author of musl maintains a thorough comparison of C standard libraries for Linux.



© 2022 Juan Tapiador