Look at me, I'm the loader now!

Table of Contents

The idea behind this post comes from a challenge I created for the Jeanne d’Hack CTF 2026 for the reverse category. The goal was to have a set of challenges in a retro video game theme that would be both accessible for newcomers while having a nice-looking UI (based on Ncurses) to stand out from the traditional crackme.

I wanted to hide the UI implementation from players so they wouldn’t waste time reversing pointless rendering code and could instead focus on the “real” challenge. My initial idea was to ship the UI as a separate shared library, but that has two problems: players can trivially reverse the .so file (yes, reverse engineers can be stubborn sometimes), and they need the right version of libncurses.so installed on their system. Compiling everything into a single static binary solves both, but then the binary is bloated with library code that drowns the actual challenge logic.

My final design was to build an engine: a static binary embedding all dependencies that dynamically loads each challenge level (a .so file) at runtime and exposes a set of UI functions to it. The levels call functions like window_msg or window_prompt, but those symbols do not exist in any shared library on the system. They live inside the engine itself. The only way to make this work is to act as the loader: find where the level expects those functions to be patched in, and write our own pointers there.

But how does a loader actually work?

To become the loader you have to understand the loader
#

Programs come in two forms: static and dynamic. In a static binary, every function is compiled into the same address space and calls are resolved at link time. This produces fast, self-contained executables at the cost of size. If every program on a system statically linked libc, the code would be duplicated thousands of times on disk and in memory.

Dynamic executables solve this by splitting code into shared libraries (.so files). At load time, the dynamic loader (ld-linux.so) maps those libraries into the process and fixes up any unresolved references. To avoid the cost of resolving every symbol upfront, most binaries use lazy binding: a symbol is only resolved the first time it is called. This is orchestrated by two structures in the binary:

PLT (Procedure Linkage Table): a small trampoline stub for each imported function.
GOT (Global Offset Table): a table of pointers, one per imported function, initially pointing back to the PLT resolver.

Here is how the first call to an imported function is resolved:

sequenceDiagram
    participant Code as Caller
    participant PLT  as PLT stub
    participant GOT  as GOT entry
    participant Ld   as ld-linux.so
    participant Lib  as libc.so
    Note over GOT: Initially points back to PLT resolver
    Code->>PLT:  call printf@plt
    PLT->>GOT:   jmp *GOT[printf]
    GOT-->>PLT:  (redirects to resolver on first call)
    PLT->>Ld:    push reloc_index
 jmp _dl_runtime_resolve
    Ld->>Lib:    search for "printf" in loaded libraries
    Ld->>GOT:    write address of printf into GOT[printf]
    Ld->>Lib:    jmp printf
    Note over GOT: Subsequent calls jump directly to printf

On all subsequent calls, the PLT stub jumps through the GOT directly to the resolved function and the loader is never involved again.

Beware: The linker and the loader are two different things.

The linker (ld) runs at compile time to produce the binary and set up the PLT/GOT structures, while the loader (ld-linux.so) runs at runtime to fill those structures in with actual addresses.

This mechanism allows patching the behavior of a running program simply by writing a new function pointer into a GOT entry which is exactly what we are going to do.

Becoming the loader
#

Okay, what’s the plan? The engine needs to intercept symbol resolution for the level’s API calls and redirect them to its own implementations, before the level ever executes.

Here are the steps:

Parse the ELF file on disk to extract the list of dynamic relocations (.rela.plt) and their associated symbol names.
Load the library with dlopen(RTLD_LAZY). The RTLD_LAZY flag is critical: it tells the loader not to resolve symbols at load time. Our API functions don’t exist in any library, so RTLD_NOW would fail immediately with “symbol not found”.
Find the runtime address of .got.plt using dl_iterate_phdr. We can’t use the static address from the ELF because ASLR randomizes where the library is mapped.
Patch each GOT entry for the API functions by writing our engine function pointers directly into the table.
Call enter_level and hand control to the challenge.

Implementation
#

All the code described here is available under the AGPL license in this repository, under reverse/jdhack-rpg/src/level_loader.c.

Parsing the ELF file
#

Before loading the library, we open it as a plain file and manually parse the sections we care about. The ELF format organizes a binary into sections, each described by an entry in the section header table. The sections relevant to us are:

graph TD
    ELF[ELF File] --> EH["ELF Header\n(e_shoff -> section table\ne_shstrndx -> name strings)"]
    EH --> SHT[Section Header Table]
    SHT --> SHSTR[".shstrtab\nsection name strings"]
    SHT --> DYNSYM[".dynsym\ndynamic symbol table"]
    SHT --> DYNSTR[".dynstr\nsymbol name strings"]
    SHT --> RELAPLT[".rela.plt\nPLT relocations"]
    SHT --> GOTPLT[".got.plt\nGlobal Offset Table"]
    DYNSYM -->|"sh_link (index)"| DYNSTR
    RELAPLT -->|"r_info[sym] (index)"| DYNSYM

The parsing starts by reading the ELF header and section header table:

static Elf64_Ehdr *read_elf_header(int fd) { ... }
static Elf64_Shdr *read_section_table(int fd, Elf64_Ehdr *hdr) { ... }
static char *read_section(int32_t fd, Elf64_Shdr *sh) { ... }

These are straightforward read() calls into the file at the offsets given by the header. read_section in particular is a generic helper that allocates a buffer and reads any section by its Elf64_Shdr descriptor.

The interesting work happens in get_dyn_relocations:

static LinkedList *get_dyn_relocations(int fd, Elf64_Ehdr *eh,
                                       Elf64_Shdr *sh_table) {
  uint64_t got_plt_loadaddr = get_got_plt_loadaddr(fd, eh, sh_table);

  // 1. Find .dynsym and its linked string table (.dynstr)
  Elf64_Sym *sym_tbl = NULL;
  char *str_tbl = NULL;
  for (uint32_t i = 0; i < eh->e_shnum; i++) {
    if (sh_table[i].sh_type == SHT_DYNSYM) {
      sym_tbl = (Elf64_Sym *)read_section(fd, &sh_table[i]);
      str_tbl = read_section(fd, &sh_table[sh_table[i].sh_link]);
      break;
    }
  }

  // 2. Find .rela.plt
  Elf64_Rela *rela = NULL;
  Elf64_Shdr *relaplt = NULL;
  for (uint32_t i = 0; i < eh->e_shnum; i++) {
    if (!strcmp(".rela.plt", sh_str + sh_table[i].sh_name)) {
      rela = (Elf64_Rela *)read_section(fd, &sh_table[i]);
      relaplt = &sh_table[i];
      break;
    }
  }

  // 3. For each relocation entry, extract the symbol name and GOT offset
  for (size_t j = 0; j < relaplt->sh_size / sizeof(Elf64_Rela); j++) {
    char *name = str_tbl + sym_tbl[ELF64_R_SYM(rela[j].r_info)].st_name;
    uint64_t offset = rela[j].r_offset - got_plt_loadaddr;
    list_add(relocs, create_reloc(name, j, offset, rela[j].r_info));
  }
  ...
}

A few things worth noting:

Resolving symbol names: Each .rela.plt entry stores r_info, which encodes a symbol index via the ELF64_R_SYM macro. That index points into .dynsym, whose st_name field is an offset into .dynstr (the string table). Chaining these two indirections gives us the symbol name as a plain C string.
The GOT offset trick: r_offset in a relocation entry is the link-time virtual address of the GOT slot. Since ASLR randomizes the load address, this value is useless at runtime. Instead, we compute r_offset - got_plt_loadaddr which is the byte offset of the slot within .got.plt. At runtime we will add the actual runtime base of .got.plt to recover the correct address.

Loading the library with RTLD_LAZY
#

void *handle = dlopen(level, RTLD_LAZY);

RTLD_LAZY defers symbol resolution until each function is actually called. This is not just a performance choice here, it is a requirement. Our API functions (window_msg, window_prompt, etc.) do not exist in any shared library on the system. With RTLD_NOW, the loader would attempt to resolve all symbols immediately and fail with an error. With RTLD_LAZY, the GOT entries are initially set to point back to the PLT resolver stub, giving us the window we need to overwrite them ourselves before any call is made.

Finding the GOT at runtime
#

After dlopen, the library is mapped somewhere in the engine’s address space but the exact address depends on ASLR. We need the runtime address of the library’s .got.plt section. The POSIX API dl_iterate_phdr lets us walk all currently loaded shared objects and inspect their program headers:

static int dl_callback(struct dl_phdr_info *info, size_t size, void *data) {
  dl_iterator_t *res = (dl_iterator_t *)data;

  if (strcmp(info->dlpi_name, res->library) != 0)
    return 0;  // not the library we're looking for

  for (size_t j = 0; j < info->dlpi_phnum; j++) {
    if (info->dlpi_phdr[j].p_type == PT_DYNAMIC) {
      ElfW(Dyn) *dyn =
          (ElfW(Dyn) *)(info->dlpi_addr + info->dlpi_phdr[j].p_vaddr);
      while (dyn->d_tag != DT_NULL) {
        if (dyn->d_tag == DT_PLTGOT) {
          res->ptr = dyn->d_un.d_ptr;  // runtime address of .got.plt
          return 1;
        }
        dyn++;
      }
    }
  }
  return 0;
}

static uint64_t get_got_plt_runtime_addr(const char *libname) {
  dl_iterator_t res = { .library = libname, .ptr = 0 };
  dl_iterate_phdr(dl_callback, &res);
  return res.ptr;
}

The callback matches the library by name, then walks the PT_DYNAMIC segment, a list of (tag, value) pairs describing the library’s dynamic linking metadata. The DT_PLTGOT entry holds exactly what we need: the runtime address of .got.plt.

Patching the GOT
#

With the runtime base of .got.plt and the per-symbol offsets we computed during parsing, we can now overwrite each GOT entry. The patching loop iterates over API_functions, a static array defined in API_export.h and generated at compile time by scripts, mapping each function name to its pointer inside the engine:

uint64_t addr = get_got_plt_runtime_addr(level);

for (int i = 0; i < (sizeof(API_functions) / sizeof(*API_functions)); ++i) {
  Reloc *r = list_search(relocs, (char *)API_functions[i].name,
                         (int (*)(void *, void *))compare_reloc);
  if (r != NULL) {
    uint64_t *gotaddr = (uint64_t *)(addr + r->offset);
    *gotaddr = (uint64_t)API_functions[i].func;
  }
}

addr + r->offset reconstructs the exact address of the GOT slot using the runtime base from dl_iterate_phdr and the relative offset we saved during ELF parsing. Writing our function pointer here is a direct memory write. No loader magic, just a uint64_t assignment to the right address.

Putting it all together
#

The full run_level function assembles all the pieces above:

int run_level(const char *level, Player *player) {
  // 1. Parse ELF relocations before loading
  LinkedList *relocs = get_relocations_from_file(level);
  if (relocs == NULL) {
    LOG_ERROR("Fail to parse relocations");
    return -1;
  }

  // 2. Load with lazy binding so unresolved symbols don't crash yet
  void *handle = dlopen(level, RTLD_LAZY);
  if (!handle) {
    LOG_ERROR("Error when loading: %s\n", dlerror());
    list_map(relocs, (void *(*)(void *))dispose_reloc);
    list_dispose(relocs);
    return -1;
  }

  // 3. Find .got.plt at runtime and patch all API entries
  uint64_t addr = get_got_plt_runtime_addr(level);
  for (int i = 0; i < (sizeof(API_functions) / sizeof(*API_functions)); ++i) {
    Reloc *r = list_search(relocs, (char *)API_functions[i].name,
                           (int (*)(void *, void *))compare_reloc);
    if (r != NULL) {
      uint64_t *gotaddr = (uint64_t *)(addr + r->offset);
      *gotaddr = (uint64_t)API_functions[i].func;
    }
  }

  // 4. Run the level
  int result = -1;
  int (*enter_level)(Player *) = dlsym(handle, "enter_level");
  if (enter_level != NULL) {
    result = enter_level(player);
  }
  int (*leave_level)(void) = dlsym(handle, "leave_level");
  if (leave_level != NULL) {
    leave_level();
  }

  list_map(relocs, (void *(*)(void *))dispose_reloc);
  list_dispose(relocs);
  dlclose(handle);
  return result;
}

Conclusion
#

Even though the use case is somewhat niche, I found it very instructive to reimplement parts of the loader from scratch. It forces you to understand the ELF format concretely rather than just knowing that “the loader resolves symbols somehow.”

I also find it funny that GOT overwrites are a classic primitive in CTF PWN challenges. In an exploit, an attacker patches a GOT entry to redirect a call to system() and get a shell. Here, we use the exact same technique constructively, writing our own function pointers into the level’s GOT to make it call back into the engine. The mechanics are identical; only the intent differs. This time it’s the CTF author using it against the players.

Anyway, I hope you learned something from this article, and if you didn’t, I at least hope you enjoyed reading it. I’m sure there are better ways to solve my original problem, so feel free to share your feedback and ideas!

To become the loader you have to understand the loader#

Becoming the loader#

Implementation#

Parsing the ELF file#

Loading the library with RTLD_LAZY#

Finding the GOT at runtime#

Patching the GOT#

Putting it all together#

Conclusion#