ELF Output Generation

Description

Egalito supports parsing ELF files, transforming them, and recreating new ELF output files. As of this writing, ELF generation of userspace binaries is only tested on x86_64. To generate an output binary with no transformations, try:

$ ./etelf -m ../src/ex/hello hello && ./hello

To additionally harden the binary, try:

$ ./etharden -m --cfi ../src/ex/hello hello && ./hello

Egalito has two main ELF generation modes, 1-1 mirrorgen (-m argument) and uniongen (-u argument). Mirrorgen reads in one ELF (a single Module) and produces a corresponding output. Uniongen recursively reads in an ELF and all dependencies (like parse2 in etshell), and merges them all into a single output. Conceptually, uniongen essentially turns a dynamically linked executable into a statically linked one (for technical reasons, however, the output still uses ld.so).

Mirrorgen is generally the most reliable and can be applied to executables or shared libraries. Since we do not generate symbol version structures, if using a transformed shared library, the executable may also have to be transformed. (We also have a helper program rmver which can modify the dynamic section of an executable to remove references to versions.) Uniongen can produce programs with better performance than the original, and produces self-contained outputs, but does not support dlopen (e.g. libc opening libnss).

Output Addresses

Egalito encodes a fair amount of meaning in the addresses in an output ELF. Here is an example of the layout of a mirrorgen ELF:

$ readelf -SW hello-mirrorgen
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  1
  [ 1] .interp           PROGBITS        0000000000200040 000040 00001c 00      0   0  1
  [ 2] .init_array       INIT_ARRAY      000000000020005c 00005c 000010 00  WA  0   0  1
  [ 3] .strtab           STRTAB          0000000000000000 00029c 0000a2 00      0   0  1
  [ 4] .shstrtab         STRTAB          0000000000000000 00033e 00009d 00      0   0  1
  [ 5] .dynstr           STRTAB          0000000000400000 001000 000091 00      0   0  1
  [ 6] .symtab           SYMTAB          0000000000400091 001091 000348 18      3  29  8
  [ 7] .dynsym           DYNSYM          00000000004003d9 0013d9 0000c0 18   A  5   1  8
  [ 8] .gnu.hash         GNU_HASH        0000000000400499 001499 0017dc 00   A  7   0  8
  [ 9] .rela.dyn         RELA            0000000000401c75 002c75 000108 18      7   0  8
  [10] .dynamic          DYNAMIC         0000000000401d7d 002d7d 0000f0 00      5   0  1
  [11] .g.got.plt        PROGBITS        0000000000500000 003000 000028 00  WA  0   0  1
  [12] .rela.plt         RELA            0000000000500028 003028 000018 18  AI  7   0  8
  [13] .plt              PROGBITS        0000000000600000 004000 000030 00  AX  0   0  1
  [14] .rodata           PROGBITS        0000000010000750 005750 000012 00   A  0   0  1
  [15] .got              PROGBITS        0000000010200fd0 005fd0 000030 00  WA  0   0  1
  [16] .got.plt          PROGBITS        0000000010201000 006000 000020 00  WA  0   0  1
  [17] .data             PROGBITS        0000000010201020 006020 000010 00  WA  0   0  1
  [18] .bss              PROGBITS        0000000010201030 006030 000008 00  WA  0   0  1
  [19] .text             PROGBITS        0000000040000000 007000 0001de 00  AX  0   0  1

All sections with addresses less than 0x10000000 are auto-generated from scratch. For example, we re-create .init_array, .strtab, .gnu.hash, and .dynamic, along with a new global offset table and procedure linkage table in .g.got.plt and .plt. These sections contain information from all Modules in the case of uniongen. In both output modes, we place all code into a new .text placed at 0x400000000 (original code is normally at addresses beginning with 0x4... but fewer digits).

All sections beginning with address 0x10XXXXXX are from the first ELF parsed, and were originally at address 0xXXXXXX. Similarly, in uniongen, anything beginning with 0x11XXXXXX is from the second ELF parsed, while 0x12XXXXXX is from the third:

$ readelf -SW hello-uniongen
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  1
  [ 1] .interp           PROGBITS        0000000000200040 000040 00001c 00      0   0  1
  [ 2] .init_array       INIT_ARRAY      000000000020005c 00005c 000020 00  WA  0   0  1
  [ 3] .strtab           STRTAB          0000000000000000 000354 0117f5 00      0   0  1
  [ 4] .shstrtab         STRTAB          0000000000000000 011b49 00014b 00      0   0  1
  [ 5] .dynstr           STRTAB          0000000000400000 012000 000076 00      0   0  1
  [ 6] .symtab           SYMTAB          0000000000400076 012076 01ca40 18      3 2775  8
  [ 7] .dynsym           DYNSYM          000000000041cab6 02eab6 000090 18   A  5   1  8
  [ 8] .rela.dyn         RELA            000000000041cb46 02eb46 0001c8 18      7   0  8
  [ 9] .dynamic          DYNAMIC         000000000041cd0e 02ed0e 0000e0 00      5   0  1
  [10] .g.got.plt        PROGBITS        0000000000500000 02f000 000028 00  WA  0   0  1
  [11] .rela.plt         RELA            0000000000500028 02f028 000030 18  AI  7   0  8
  [12] .plt              PROGBITS        0000000000600000 030000 000030 00  AX  0   0  1
  [13] .rodata           PROGBITS        0000000010000750 031750 000012 00   A  0   0  1
  [14] .got              PROGBITS        0000000010200fd0 031fd0 000030 00  WA  0   0  1
  [15] .got.plt          PROGBITS        0000000010201000 032000 000020 00  WA  0   0  1
  [16] .data             PROGBITS        0000000010201020 032020 000010 00  WA  0   0  1
  [17] .bss              PROGBITS        0000000010201030 032030 000008 00  WA  0   0  1
  [18] __libc_freeres_fn PROGBITS        00000000111493e0 0333e0 000e08 00   A  0   0  1
  [19] __libc_thread_freeres_fn PROGBITS        000000001114a1f0 0341f0 000212 00   A  0   0  1
  [20] .rodata           PROGBITS        000000001114a420 034420 020a20 00   A  0   0  1
  [21] .tdata            PROGBITS        00000000113957c8 0557c8 000010 00  WA  0   0  1
  [22] __libc_subfreeres PROGBITS        00000000113957e0 0557e0 0000f8 00  WA  0   0  1
  [23] __libc_atexit     PROGBITS        00000000113958d8 0558d8 000008 00  WA  0   0  1
  [24] __libc_thread_subfreeres PROGBITS        00000000113958e0 0558e0 000020 00  WA  0   0  1
  [25] __libc_IO_vtables PROGBITS        0000000011395900 055900 000d68 00  WA  0   0  1
  [26] .data.rel.ro      PROGBITS        0000000011396680 056680 002520 00  WA  0   0  1
  [27] .got              PROGBITS        0000000011398d80 058d80 000280 00  WA  0   0  1
  [28] .got.plt          PROGBITS        0000000011399000 059000 000080 00  WA  0   0  1
  [29] .data             PROGBITS        0000000011399080 059080 001680 00  WA  0   0  1
  [30] .bss              PROGBITS        000000001139a700 05a700 004260 00  WA  0   0  1
  [31] .tdata            PROGBITS        00000000113957c8 05f7c8 000010 00 WAT  0   0  1
  [32] .tbss             PROGBITS        00000000113957d8 05f7d8 000000 00 WAT  0   0  1
  [33] .text             PROGBITS        0000000040000000 060000 125df0 00  AX  0   0  1

In this example, 0x11XXXXXX is libc. Since we construct addresses in this way, addresses of global variables or data sections can be easily mapped back to the original ELFs. For example, this uniongen output contains a relocation at 0x11398df8 for _rtld_global. We can look at the original libc at address 0x398df8 and will also find the same relocation. We completely regenerate the relocation list, but since data layout is normally unchanged, the parallel addressing helps when comparing an original and transformed ELF.

Note that Egalito can generate position-dependent outputs (uniongen only) or position-independent outputs (default for mirrorgen and uniongen). The addressing is normally the same in both cases.