Compiling Functions to Jumps

by Vanessa McHale | 2023-08-04 20:28

Functions are compiled to machine by a convention of jumps and registers; one calls a function by jumping to its location in memory. These jumps are relative and particular functions (say, malloc) may be loaded at different memory locations, so the machine code for a function cannot be pinned down and in fact is contingent on every function that it calls.

Storing compiled code persistently (and then reassembling it for the processor) is involved (see Ian Lance Taylor's blog series for details), but these details are unavoidable for language implementers, particularly for JIT compilation.

JITs

JIT compilation is left out of compiler textbooks but the translation from assembly to machine is not trivial.

On OS X or Linux, to assemble a call malloc instruction:

Allocate memory for the assembled function.
Call dlopen on libc.so to get a handle and pass that to dlsym to get a pointer to malloc. This relies on the operating system to load compiled code, and on us knowing that malloc is defined in libc.so.
Calculate the relative offset between malloc and the call instruction and use this to assemble.

For full details see my own simple JIT here.

Thus, a compiled function which calls other functions cannot be canonicalized into machine code but must depend on the locations in memory of the called functions and its own location in memory. In practice, getting the locations of other functions this depends on system facilities, including the file system. Thus, compiling on one machine and distributing to others is quite fraught with existing operating systems and toolchains.

Assembly

Suppose one has an assembly function:

ncdf:
    ...
    call erf
    ...

This calls erf, which is defined in libm on Linux. We can assemble ncdf to produce an object file and then make this into a shared library, directing the linker to libm (by passing -lm on the command line). Notably, this means that call erf isn't just an instruction that stands on its own: it's also an expectation that pieces will be in order at execution time so that the expected erf code from libm will be called (including an expectation that -lm be passed on the command-line when dealing with the object file).

C programming is one of the few places where programmers consider it admissible to intervene along the way; one can even tweak function lookup when invoking an executable.

Conclusion

The pipeline model of compilers taking one functions to assembly and then machine code is inaccurate. Indeed, making an executable often involves a build system; cabalized Haskell code, for instance, uses the build system to bundle the compiler's call instructions with appropriate linker flags.

Second, JIT compilation is unduly left out of compilers material; implementing a JIT differs substantially from the experience of compiling to assembly and invoking assemblers/linkers to produce an executable.

return