Functions are compiled to machine by a convention of jumps and
registers; one calls a function by jumping to its
location in memory. These jumps are relative and particular functions (say,
malloc
) may be loaded at different memory locations, so the machine code for
a function cannot be pinned down and in fact is contingent on every function
that it calls.
Storing compiled code persistently (and then reassembling it for the processor) is involved (see Ian Lance Taylor's blog series for details), but these details are unavoidable for language implementers, particularly for JIT compilation.
JITs
JIT compilation is left out of compiler textbooks but the translation from assembly to machine is not trivial.
On OS X or Linux, to assemble a call malloc
instruction:
Allocate memory for the assembled function.
Call
dlopen
onlibc.so
to get a handle and pass that todlsym
to get a pointer tomalloc.
This relies on the operating system to load compiled code, and on us knowing thatmalloc
is defined inlibc.so
.Calculate the relative offset between
malloc
and thecall
instruction and use this to assemble.
For full details see my own simple JIT here.
Thus, a compiled function which calls other functions cannot be canonicalized into machine code but must depend on the locations in memory of the called functions and its own location in memory. In practice, getting the locations of other functions this depends on system facilities, including the file system. Thus, compiling on one machine and distributing to others is quite fraught with existing operating systems and toolchains.
Assembly
Suppose one has an assembly function:
ncdf:
...
call erf
...
This calls erf
, which is defined in libm
on Linux. We can assemble ncdf
to produce an object file and then make this into a shared library, directing
the linker to libm
(by passing -lm
on the command line). Notably, this means that call erf
isn't just an
instruction that stands on its own: it's also an expectation that pieces will be
in order at execution time so that the expected erf
code from libm
will be
called (including an expectation that -lm
be passed on the command-line when
dealing with the object file).
C programming is one of the few places where programmers consider it admissible to intervene along the way; one can even tweak function lookup when invoking an executable.
Conclusion
The pipeline model of compilers taking one functions to assembly and then
machine code is inaccurate. Indeed, making an executable often involves a build
system; cabalized Haskell code, for instance, uses the
build system to bundle the compiler's call
instructions with appropriate linker flags.
Second, JIT compilation is unduly left out of compilers material; implementing a JIT differs substantially from the experience of compiling to assembly and invoking assemblers/linkers to produce an executable.