Static pie linking a nolibc Rust binary
Something has been bugging me for a while with tiny-std,
if I try to compile executables created with them as -C target-feature=+crt-static
(statically link the C
-runtime),
it segfaults.
The purpose of creating tiny-std
was to avoid C
, but to get Rust
to link a binary statically, that flag needs
to be passed. -C target-feature=+crt-static -C relocation-model=static
does produce a valid binary though.
The default relocation-model for static binaries is -C relocation-model=pie
,
(at least for the target x86_64-unknown-linux-gnu
) so something about PIE
-executables created with tiny-std
fails,
in this writeup I'll go into the solution for that.
Static pie linking
Static pie linking is a combination of two concepts.
- Static linking, putting everything in the same place at compile time. As opposed to dynamic linking, where library dependencies can be found and used at runtime. Statically linking an executable gives it the property that it can be run on any system that can handle the executable type, i.e. I can start a statically linked elf-executable on any platform that can run elf-executables. Whereas a dynamically linked executable will not start if its dynamic dependencies cannot be found at application start.
- Position-independent code is able to run properly regardless of where in memory is placed. The benefit, as I understand it, is security, and platform compatibility-related.
When telling rustc
to create a static-pie linked executable through -C target-feature=+crt-static -C relocation-model=pie
(relocation-model defaults to pie, could be omitted), it creates an elf-executable which has a header that marks it as
DYN
. Here's what an example readelf -h
looks like:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x24b8
Start of program headers: 64 (bytes into file)
Start of section headers: 1894224 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 32
Section header string table index: 20
This signals to the OS that the executable can be run position-independently, but since tiny-std
assumes that
memory addresses are absolute, the ones they were when compiled, the executable segfaults as soon as it tries to get
the address of any symbols, like functions or static variables, since those have been moved.
Where are my symbols?
This seems like a tricky problem, as a programmer, I have a bunch of variable and function calls, some that the
Rust
-language emits for me, now each of the addresses for those variables and functions are in another place in memory.
Before using any of them I need to remap them, which means that I need to have remapping code before using any
function calls (kinda).
The start function
The executable enters through the _start
function, this is defined in asm
for tiny-std
:
// Binary entrypoint
#[cfg(all(feature = "symbols", feature = "start", target_arch = "x86_64"))]
core::arch::global_asm!(
".text",
".global _start",
".type _start,@function",
"_start:",
"xor rbp,rbp", // Zero the stack-frame pointer
"mov rdi, rsp", // Move the stack pointer into rdi, c-calling convention arg 1
".weak _DYNAMIC", // Elf dynamic symbol
".hidden _DYNAMIC",
"lea rsi, [rip + _DYNAMIC]", // Load the dynamic address off the next instruction to execute incremented by _DYNAMIC into rsi
"and rsp,-16", // Align the stack pointer
"call __proxy_main" // Call our rust start function
);
The assembly prepares the stack by aligning it, putting the stack pointer into arg1 for the coming function-call,
then adds the offset off _DYNAMIC
to the special purpose rip
-register address, and puts that in rsi
which becomes
our called function's arg 2.
After that __proxy_main
is called, the signature looks like this:
unsafe extern "C" fn __proxy_main(stack_ptr: *const u8, dynv: *const usize)
It takes the stack_ptr
and the dynv
-dynamic vector as arguments, which were provided in
the above assembly.
I wrote more about the _start
-function in pgwm03 and fasterthanli.me
wrote more about it at their great blog, but in short:
Before running the user's main
some setup is required, like arguments, environment variables, aux-values,
map in faster functions from the vdso (see pgwm03 for more on that), and set up some thread-state,
see the thread writeup for that.
All these variables come off the executable's stack, which is why stack pointer needs to be passed as an argument to our setup-function, so that it can be used before the stack is polluted by the setup function.
The first extraction looks like this:
#[no_mangle]
#[cfg(all(feature = "symbols", feature = "start"))]
unsafe extern "C" fn __proxy_main(stack_ptr: *const u8, dynv: *const usize) {
// Fist 8 bytes is a u64 with the number of arguments
let argc = *(stack_ptr as *const u64);
// Directly followed by those arguments, bump pointer by 8 bytes
let argv = stack_ptr.add(8) as *const *const u8;
let ptr_size = core::mem::size_of::<usize>();
// Directly followed by a pointer to the environment variables, it's just a null terminated string.
// This isn't specified in Posix and is not great for portability, but this isn't meant to be portable outside of Linux.
let env_offset = 8 + argc as usize * ptr_size + ptr_size;
// Bump pointer by combined offset
let envp = stack_ptr.add(env_offset) as *const *const u8;
let mut null_offset = 0;
loop {
let val = *(envp.add(null_offset));
if val as usize == 0 {
break;
}
null_offset += 1;
}
// We now know how long the envp is
// ...
}
This works all the same as a pie
because:
Prelude, inline
There will be trouble when trying to find a symbol contained in the binary, such as a function call.
Up to here, that hasn't been a problem because even though ptr::add()
and core::mem:size_of::<T>()
is invoked,
no addresses are needed for those. This is because of inlining.
Looking at core::mem::size_of<T>()
:
#[inline(always)]
#[must_use]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_promotable]
#[rustc_const_stable(feature = "const_mem_size_of", since = "1.24.0")]
#[cfg_attr(not(test), rustc_diagnostic_item = "mem_size_of")]
pub const fn size_of<T>() -> usize {
intrinsics::size_of::<T>()
}
It has the #[inline(always)]
attribute, the same goes for ptr::add()
. Since that code is inlined,
an address to a function isn't necessary, and therefore it works even though all of the addresses are off.
To be able to debug, I would like to be able to print variables, since I haven't been able to hook a debugger up
to tiny-std
executables yet. But, printing to the terminal requires code, code that usually isn't #[inline(always)]
.
So I wrote a small print:
#[inline(always)]
unsafe fn print_labeled(msg: &[u8], val: usize) {
print_label(msg);
print_val(val);
}
#[inline(always)]
unsafe fn print_label(msg: &[u8]) {
syscall!(WRITE, 1, msg.as_ptr(), msg.len());
}
#[inline(always)]
unsafe fn print_val(u: usize) {
syscall!(WRITE, 1, num_to_digits(u).as_ptr(), 21);
}
#[inline(always)]
unsafe fn num_to_digits(mut u: usize) -> [u8; 22] {
let mut base = *b"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\n";
let mut ind = base.len() - 2;
if u == 0 {
base[ind] = 48;
}
while u > 0 {
let md = u % 10;
base[ind] = md as u8 + 48;
ind -= 1;
u = u / 10;
}
base
}
Printing to the terminal can be done through the syscall WRITE
on fd
1
(STDOUT
).
It takes a buffer of bytes and a length. The call through syscall!()
is always inlined.
Since I primarily need look at addresses, I just print usize
, and I wrote a beautifully stupid number to digits function.
Since the max digits of a usize
on a 64-bit machine is 21, I allocate a slice on the stack filled with
null
-bytes, these won't be displayed. Then add digit by digit, which means that the number is formatted without leading or
trailing zeroes.
Invoking it looks like this:
fn test() {
print_labeled(b"My msg as bytes: ", 15);
}
Relocation
Now that basic debug-printing is possible work to relocate the addresses can begin.
I previously had written some code the extract aux
-values, but now that code needs to run without using any
non-inlined functions or variables.
Aux values
A good description of aux-values comes from the docs here,
in short the kernel puts some data in the memory of a program when it's loaded.
This data points to other data that is needed to do relocation. It also has an insane layout for reasons that
I haven't yet been able to find any motivation for.
A pointer to the aux-values are put after the envp
on the stack.
The aux-values were collected and stored pretty sloppily as a global static variable before implementing this change, this time it needs to be collected onto the stack, used for finding the dynamic relocation addresses, and then it could be put into a static variable after that (since the address of the static variable can't be found before remapping).
The dyn
-values are also required, which are essentially the same as aux-values, provided for DYN
-objects.
In musl, the aux-values that are put on the stack looks like this:
size_t i, aux[AUX_CNT], dyn[DYN_CNT];
So I replicated the aux-vec on the stack like this:
// There are 32 aux values.
let mut aux: [0usize; 32];
And then initialize it, with the aux
-pointer provided by the OS.
The OS-supplies some values in the aux
-vector more info here
the necessary ones for remapping are:
AT_BASE
the base address of the program interpreter, 0 if no interpreter (static-pie).AT_PHNUM
, the number of program headers.AT_PHENT
, the size of one program header entry.AT_PHDR
, the address of the program headers in the executable.
First a virtual address found at the program header that has the dynamic
type must be found.
The program header is laid out in memory as this struct:
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct elf64_phdr {
pub p_type: Elf64_Word,
pub p_flags: Elf64_Word,
pub p_offset: Elf64_Off,
pub p_vaddr: Elf64_Addr,
pub p_paddr: Elf64_Addr,
pub p_filesz: Elf64_Xword,
pub p_memsz: Elf64_Xword,
pub p_align: Elf64_Xword,
}
The address of the AT_PHDR
can be treated as an array declared as:
let phdr: &[elf64_phdr; AT_PHNUM] = ...
That array can be walked until finding a program header struct with p_type
= PT_DYNAMIC
,
that program header holds an offset at p_vaddr
that can be subtracted from the dynv
pointer to get
the correct base
address.
Initialize the dyn section
The dynv
pointer supplied by the os, as previously stated, is analogous to the aux
-pointer but
trying to stack allocate its value mappings like this:
let dyn_values = [0usize; 37];
Will cause a segfault.
SYMBOLS!!!
It took me a while to figure out what's happening, a zeroed array is allocated in rust, and
that array is larger than [0usize; 32]
(256 bytes of zeroes seems to be the exact breakpoint)
rustc
instead of using sse
instructions, uses memset
to zero the memory it just took off the stack.
The asm will look like this:
...
mov edx, 296
mov rdi, rbx
xor esi, esi
call qword ptr [rip + memset@GOTPCREL]
...
Accessing that memset symbol is what causes the segfault.
I tried a myriad of ways to get the compiler to not emit that symbol, among
posting this
help request.
It seems that there is no reliable way to avoid rustc
emitting unwanted symbols without doing it all in assembly,
and since that seems a bit much, at least right now, I opted to instead restructure the code. Unpacking both
the aux and dyn values and just keeping what tiny-std
needs.
The unpacked aux values now look like this:
/// Some selected aux-values, needs to be kept small since they're collected
/// before symbol relocation on static-pie-linked binaries, which means rustc
/// will emit memset on a zeroed allocation of over 256 bytes, which we won't be able
/// to find and thus will result in an immediate segfault on start.
/// See [docs](https://man7.org/linux/man-pages/man3/getauxval.3.html)
#[derive(Debug)]
pub(crate) struct AuxValues {
/// Base address of the program interpreter
pub(crate) at_base: usize,
/// Real group id of the main thread
pub(crate) at_gid: usize,
/// Real user id of the main thread
pub(crate) at_uid: usize,
/// Address of the executable's program headers
pub(crate) at_phdr: usize,
/// Size of program header entry
pub(crate) at_phent: usize,
/// Number of program headers
pub(crate) at_phnum: usize,
/// Address pointing to 16 bytes of a random value
pub(crate) at_random: usize,
/// Executable should be treated securely
pub(crate) at_secure: usize,
/// Address of the vdso
pub(crate) at_sysinfo_ehdr: usize,
}
It only contains the aux-values that are actually used by tiny-std
.
The dyn-values are only used for relocations so far, so they were packed into this much smaller struct:
pub(crate) struct DynSection {
rel: usize,
rel_sz: usize,
rela: usize,
rela_sz: usize,
}
Now that rustc
's memset emissions has been sidestepped, the DynSection
struct can be filled with the values from the
dynv
-pointer, and then finally the symbols can be relocated:
#[inline(always)]
pub(crate) unsafe fn relocate(&self, base_addr: usize) {
// Relocate all rel-entries
for i in 0..(self.rel_sz / core::mem::size_of::<Elf64Rel>()) {
let rel_ptr = ((base_addr + self.rel) as *const Elf64Rel).add(i);
let rel = ptr_unsafe_ref(rel_ptr);
if rel.0.r_info == relative_type(REL_RELATIVE) {
let rel_addr = (base_addr + rel.0.r_offset as usize) as *mut usize;
*rel_addr += base_addr;
}
}
// Relocate all rela-entries
for i in 0..(self.rela_sz / core::mem::size_of::<Elf64Rela>()) {
let rela_ptr = ((base_addr + self.rela) as *const Elf64Rela).add(i);
let rela = ptr_unsafe_ref(rela_ptr);
if rela.0.r_info == relative_type(REL_RELATIVE) {
let rel_addr = (base_addr + rela.0.r_offset as usize) as *mut usize;
*rel_addr = base_addr + rela.0.r_addend as usize;
}
}
// Skip implementing relr-entries for now
}
After the relocate
-section runs, symbols
can again be used, and tiny-std
can continue with the setup.
Outro
The commit that added the functionality can be found here.
Thanks for reading!