The idea behind this post comes from a challenge I created for the Jeanne d’Hack CTF 2026 for the reverse category. The goal was to have a set of challenges in a retro video game theme that would be both accessible for newcomers while having a nice-looking UI (based on Ncurses) to stand out from the traditional crackme.
I wanted to hide the UI implementation from players so they wouldn’t waste time reversing
pointless rendering code and could instead focus on the “real” challenge. My initial idea
was to ship the UI as a separate shared library, but that has two problems: players can
trivially reverse the .so file (yes, reverse engineers can be stubborn sometimes), and
they need the right version of libncurses.so installed on their system.
Compiling everything into a single static binary solves both, but then the binary is
bloated with library code that drowns the actual challenge logic.
My final design was to build an engine: a static binary embedding all dependencies that
dynamically loads each challenge level (a .so file) at runtime and exposes a set of
UI functions to it. The levels call functions like window_msg or window_prompt,
but those symbols do not exist in any shared library on the system. They live inside
the engine itself. The only way to make this work is to act as the loader: find where
the level expects those functions to be patched in, and write our own pointers there.
But how does a loader actually work?
To become the loader you have to understand the loader#
Programs come in two forms: static and dynamic. In a static binary, every function
is compiled into the same address space and calls are resolved at link time. This produces
fast, self-contained executables at the cost of size. If every program on a system
statically linked libc, the code would be duplicated thousands of times on disk and in
memory.
Dynamic executables solve this by splitting code into shared libraries (.so files).
At load time, the dynamic loader (ld-linux.so) maps those libraries into the process and
fixes up any unresolved references. To avoid the cost of resolving every symbol upfront,
most binaries use lazy binding: a symbol is only resolved the first time it is called.
This is orchestrated by two structures in the binary:
- PLT (Procedure Linkage Table): a small trampoline stub for each imported function.
- GOT (Global Offset Table): a table of pointers, one per imported function, initially pointing back to the PLT resolver.
Here is how the first call to an imported function is resolved:
sequenceDiagram
participant Code as Caller
participant PLT as PLT stub
participant GOT as GOT entry
participant Ld as ld-linux.so
participant Lib as libc.so
Note over GOT: Initially points back to PLT resolver
Code->>PLT: call printf@plt
PLT->>GOT: jmp *GOT[printf]
GOT-->>PLT: (redirects to resolver on first call)
PLT->>Ld: push reloc_index
jmp _dl_runtime_resolve
Ld->>Lib: search for "printf" in loaded libraries
Ld->>GOT: write address of printf into GOT[printf]
Ld->>Lib: jmp printf
Note over GOT: Subsequent calls jump directly to printf
On all subsequent calls, the PLT stub jumps through the GOT directly to the resolved function and the loader is never involved again.
The linker (ld) runs at compile time to produce the binary and set up the PLT/GOT
structures, while the loader (ld-linux.so) runs at runtime to fill those structures
in with actual addresses.
This mechanism allows patching the behavior of a running program simply by writing a new function pointer into a GOT entry which is exactly what we are going to do.
Becoming the loader#

Okay, what’s the plan? The engine needs to intercept symbol resolution for the level’s API calls and redirect them to its own implementations, before the level ever executes.
Here are the steps:
Parse the ELF file on disk to extract the list of dynamic relocations (
.rela.plt) and their associated symbol names.Load the library with
dlopen(RTLD_LAZY). TheRTLD_LAZYflag is critical: it tells the loader not to resolve symbols at load time. Our API functions don’t exist in any library, soRTLD_NOWwould fail immediately with “symbol not found”.Find the runtime address of
.got.pltusingdl_iterate_phdr. We can’t use the static address from the ELF because ASLR randomizes where the library is mapped.Patch each GOT entry for the API functions by writing our engine function pointers directly into the table.
Call
enter_leveland hand control to the challenge.
Implementation#
All the code described here is available under the AGPL license in this
repository, under
reverse/jdhack-rpg/src/level_loader.c.
Parsing the ELF file#
Before loading the library, we open it as a plain file and manually parse the sections we care about. The ELF format organizes a binary into sections, each described by an entry in the section header table. The sections relevant to us are:
graph TD
ELF[ELF File] --> EH["ELF Header\n(e_shoff -> section table\ne_shstrndx -> name strings)"]
EH --> SHT[Section Header Table]
SHT --> SHSTR[".shstrtab\nsection name strings"]
SHT --> DYNSYM[".dynsym\ndynamic symbol table"]
SHT --> DYNSTR[".dynstr\nsymbol name strings"]
SHT --> RELAPLT[".rela.plt\nPLT relocations"]
SHT --> GOTPLT[".got.plt\nGlobal Offset Table"]
DYNSYM -->|"sh_link (index)"| DYNSTR
RELAPLT -->|"r_info[sym] (index)"| DYNSYM
The parsing starts by reading the ELF header and section header table:
static Elf64_Ehdr *read_elf_header(int fd) { ... }
static Elf64_Shdr *read_section_table(int fd, Elf64_Ehdr *hdr) { ... }
static char *read_section(int32_t fd, Elf64_Shdr *sh) { ... }These are straightforward read() calls into the file at the offsets given by the header.
read_section in particular is a generic helper that allocates a buffer and reads any
section by its Elf64_Shdr descriptor.
The interesting work happens in get_dyn_relocations:
static LinkedList *get_dyn_relocations(int fd, Elf64_Ehdr *eh,
Elf64_Shdr *sh_table) {
uint64_t got_plt_loadaddr = get_got_plt_loadaddr(fd, eh, sh_table);
// 1. Find .dynsym and its linked string table (.dynstr)
Elf64_Sym *sym_tbl = NULL;
char *str_tbl = NULL;
for (uint32_t i = 0; i < eh->e_shnum; i++) {
if (sh_table[i].sh_type == SHT_DYNSYM) {
sym_tbl = (Elf64_Sym *)read_section(fd, &sh_table[i]);
str_tbl = read_section(fd, &sh_table[sh_table[i].sh_link]);
break;
}
}
// 2. Find .rela.plt
Elf64_Rela *rela = NULL;
Elf64_Shdr *relaplt = NULL;
for (uint32_t i = 0; i < eh->e_shnum; i++) {
if (!strcmp(".rela.plt", sh_str + sh_table[i].sh_name)) {
rela = (Elf64_Rela *)read_section(fd, &sh_table[i]);
relaplt = &sh_table[i];
break;
}
}
// 3. For each relocation entry, extract the symbol name and GOT offset
for (size_t j = 0; j < relaplt->sh_size / sizeof(Elf64_Rela); j++) {
char *name = str_tbl + sym_tbl[ELF64_R_SYM(rela[j].r_info)].st_name;
uint64_t offset = rela[j].r_offset - got_plt_loadaddr;
list_add(relocs, create_reloc(name, j, offset, rela[j].r_info));
}
...
}A few things worth noting:
Resolving symbol names: Each
.rela.pltentry storesr_info, which encodes a symbol index via theELF64_R_SYMmacro. That index points into.dynsym, whosest_namefield is an offset into.dynstr(the string table). Chaining these two indirections gives us the symbol name as a plain C string.The GOT offset trick:
r_offsetin a relocation entry is the link-time virtual address of the GOT slot. Since ASLR randomizes the load address, this value is useless at runtime. Instead, we computer_offset - got_plt_loadaddrwhich is the byte offset of the slot within.got.plt. At runtime we will add the actual runtime base of.got.pltto recover the correct address.
Loading the library with RTLD_LAZY#
void *handle = dlopen(level, RTLD_LAZY);RTLD_LAZY defers symbol resolution until each function is actually called. This is not
just a performance choice here, it is a requirement. Our API functions (window_msg,
window_prompt, etc.) do not exist in any shared library on the system. With RTLD_NOW, the
loader would attempt to resolve all symbols immediately and fail with an error. With
RTLD_LAZY, the GOT entries are initially set to point back to the PLT resolver stub,
giving us the window we need to overwrite them ourselves before any call is made.
Finding the GOT at runtime#
After dlopen, the library is mapped somewhere in the engine’s address space but the
exact address depends on ASLR. We need the runtime address of the library’s .got.plt
section. The POSIX API dl_iterate_phdr lets us walk all currently loaded shared objects
and inspect their program headers:
static int dl_callback(struct dl_phdr_info *info, size_t size, void *data) {
dl_iterator_t *res = (dl_iterator_t *)data;
if (strcmp(info->dlpi_name, res->library) != 0)
return 0; // not the library we're looking for
for (size_t j = 0; j < info->dlpi_phnum; j++) {
if (info->dlpi_phdr[j].p_type == PT_DYNAMIC) {
ElfW(Dyn) *dyn =
(ElfW(Dyn) *)(info->dlpi_addr + info->dlpi_phdr[j].p_vaddr);
while (dyn->d_tag != DT_NULL) {
if (dyn->d_tag == DT_PLTGOT) {
res->ptr = dyn->d_un.d_ptr; // runtime address of .got.plt
return 1;
}
dyn++;
}
}
}
return 0;
}
static uint64_t get_got_plt_runtime_addr(const char *libname) {
dl_iterator_t res = { .library = libname, .ptr = 0 };
dl_iterate_phdr(dl_callback, &res);
return res.ptr;
}The callback matches the library by name, then walks the PT_DYNAMIC segment, a list of
(tag, value) pairs describing the library’s dynamic linking metadata. The DT_PLTGOT
entry holds exactly what we need: the runtime address of .got.plt.
Patching the GOT#
With the runtime base of .got.plt and the per-symbol offsets we computed during parsing,
we can now overwrite each GOT entry. The patching loop iterates over API_functions,
a static array defined in API_export.h and generated at compile time by scripts,
mapping each function name to its pointer inside the engine:
uint64_t addr = get_got_plt_runtime_addr(level);
for (int i = 0; i < (sizeof(API_functions) / sizeof(*API_functions)); ++i) {
Reloc *r = list_search(relocs, (char *)API_functions[i].name,
(int (*)(void *, void *))compare_reloc);
if (r != NULL) {
uint64_t *gotaddr = (uint64_t *)(addr + r->offset);
*gotaddr = (uint64_t)API_functions[i].func;
}
}addr + r->offset reconstructs the exact address of the GOT slot using the runtime base
from dl_iterate_phdr and the relative offset we saved during ELF parsing. Writing our
function pointer here is a direct memory write. No loader magic, just a uint64_t
assignment to the right address.
Putting it all together#
The full run_level function assembles all the pieces above:
int run_level(const char *level, Player *player) {
// 1. Parse ELF relocations before loading
LinkedList *relocs = get_relocations_from_file(level);
if (relocs == NULL) {
LOG_ERROR("Fail to parse relocations");
return -1;
}
// 2. Load with lazy binding so unresolved symbols don't crash yet
void *handle = dlopen(level, RTLD_LAZY);
if (!handle) {
LOG_ERROR("Error when loading: %s\n", dlerror());
list_map(relocs, (void *(*)(void *))dispose_reloc);
list_dispose(relocs);
return -1;
}
// 3. Find .got.plt at runtime and patch all API entries
uint64_t addr = get_got_plt_runtime_addr(level);
for (int i = 0; i < (sizeof(API_functions) / sizeof(*API_functions)); ++i) {
Reloc *r = list_search(relocs, (char *)API_functions[i].name,
(int (*)(void *, void *))compare_reloc);
if (r != NULL) {
uint64_t *gotaddr = (uint64_t *)(addr + r->offset);
*gotaddr = (uint64_t)API_functions[i].func;
}
}
// 4. Run the level
int result = -1;
int (*enter_level)(Player *) = dlsym(handle, "enter_level");
if (enter_level != NULL) {
result = enter_level(player);
}
int (*leave_level)(void) = dlsym(handle, "leave_level");
if (leave_level != NULL) {
leave_level();
}
list_map(relocs, (void *(*)(void *))dispose_reloc);
list_dispose(relocs);
dlclose(handle);
return result;
}Conclusion#
Even though the use case is somewhat niche, I found it very instructive to reimplement parts of the loader from scratch. It forces you to understand the ELF format concretely rather than just knowing that “the loader resolves symbols somehow.”
I also find it funny that GOT overwrites are a classic primitive in CTF PWN challenges.
In an exploit, an attacker patches a GOT entry to redirect a call to system() and get a
shell. Here, we use the exact same technique constructively, writing our own function
pointers into the level’s GOT to make it call back into the engine. The mechanics are
identical; only the intent differs. This time it’s the CTF author using it against the players.
Anyway, I hope you learned something from this article, and if you didn’t, I at least hope you enjoyed reading it. I’m sure there are better ways to solve my original problem, so feel free to share your feedback and ideas!
