Writing a practical JIT is somewhat complicated and in fact depends on the assembler; here I present a full example in Haskell. Notably this JIT/assembler is capable of calling procedures in system libraries (i.e. malloc, free)

First, we need to be able to allocate executable memory. Eli Bendersky has an explanation on how to do so in C; from there it is a straightforward translation to Haskell using the C FFI.

(This uses hsc2hs; it handles C macro constants better than c2hs)

module Hs.FFI ( bsFp , allocNear , allocExec , finish ) where

import Data.Bits ((.|.)) import Foreign.C.Types (CInt (..), CSize (..), CChar) import Foreign.Ptr (FunPtr, IntPtr (..), castPtrToFunPtr, Ptr, intPtrToPtr, nullPtr) import qualified Data.ByteString as BS import qualified Data.ByteString.Unsafe as BS import System.Posix.Types (COff (..))

#include <sys/mman.h>

allocNear :: Int -> CSize -> IO (Ptr a) allocNear i sz = mmap (intPtrToPtr (IntPtr$i+61024104)) sz #{const PROT_WRITE} (#{const MAP_PRIVATE} .|. #{const MAP_ANONYMOUS}) (-1) 0 -- libc.so is 2.1MB, libm is 918kB

allocExec :: CSize -> IO (Ptr a) allocExec sz = mmap nullPtr sz #{const PROT_WRITE} (#{const MAP_32BIT} .|. #{const MAP_PRIVATE} .|. #{const MAP_ANONYMOUS}) (-1) 0

finish :: BS.ByteString -> Ptr CChar -> IO (FunPtr a) finish bs fAt = BS.unsafeUseAsCStringLen bs $ \(b, sz) -> do let sz' = fromIntegral sz _ <- memcpy fAt b sz' _ <- mprotect fAt sz' #{const PROT_EXEC} pure (castPtrToFunPtr fAt)

bsFp :: BS.ByteString -> IO (FunPtr a, CSize) bsFp bs = BS.unsafeUseAsCStringLen bs $ \(bytes, sz) -> do let sz' = fromIntegral sz fAt <- {-# SCC "mmap" #-} allocExec sz' _ <- {-# SCC "memcpy" #-} memcpy fAt bytes sz' _ <- {-# SCC "mprotect" #-} mprotect fAt sz' #{const PROT_EXEC} pure (castPtrToFunPtr fAt, sz')

foreign import ccall mmap :: Ptr a -> CSize -> CInt -> CInt -> CInt -> COff -> IO (Ptr a) foreign import ccall mprotect :: Ptr a -> CSize -> CInt -> IO CInt foreign import ccall memcpy :: Ptr a -> Ptr a -> CSize -> IO (Ptr a)

This is a translation of the abovelinked blog post except for allocNear. This calls Linux's mmap, requesting that allocated memory be near some address. As we shall see, this is useful when assembling call instructions to system libraries.

The assembler design is standard; it has one pass to calculate addresses of labels and another to translates to machine code. Avoid the tardis monad approach; it is slow and our simple-minded approach fits better when we will need to call into system libraries.

The JIT assembler is somewhat intricate, but trivial:

  1. Do a first pass over all instructions, noting the relative offset of any labels, and calculating the total length of the machine code. Also note any calls to libraries, e.g. malloc.

  2. Open the system libraries if necessary. malloc, for instance, is in libc.so, so we use dlopen to get a handle and then use dlsym to get a pointer to malloc.

  3. Allocate memory using allocNear; request that the memory be near malloc and use the size that we computed with the first pass. This function returns the absolute address of what will be the function itself.

  4. Assemble to machine code in a second pass. Since we have the absolute address of our own procedure and the absolute address of malloc (returned by dlsym), we can compute the relative offset at any call instruction.

    We also fill in jmps with relative addresses in this step.

To get addresses to malloc etc. using dlopen, dlsym:

{-# LANGUAGE OverloadedStrings #-}

module Sys.DL ( libc, mem' ) where

import Data.Functor (($>)) import Foreign.C.Types (CSize) import Foreign.Ptr (FunPtr, IntPtr (..), Ptr, castFunPtrToPtr, ptrToIntPtr) import System.Posix.DynamicLinker.ByteString (DL, RTLDFlags (RTLD_LAZY), dlclose, dlopen, dlsym)

#include <gnu/lib-names.h>

mem' :: IO (Int, Int) mem' = do {(m,f) <- mem; pure (g m, g f)} where g = ((IntPtr i) -> i) . ptrToIntPtr . castFunPtrToPtr

mem :: IO (FunPtr (CSize -> IO (Ptr a)), FunPtr (Ptr a -> IO ())) mem = do {c <- libc; m <- dlsym c "malloc"; f <- dlsym c "free"; dlclose c$>(m, f)}

ll p = dlopen p [RTLD_LAZY]

libc :: IO DL libc = ll {# const LIBC_SO #}

(This uses c2hs; hsc2hs doesn't work here)

Now consider:

mkIx :: Int -> [X86 X86Reg FX86Reg a] -> (Int, M.Map Label Int) mkIx ix (Label _ l:asms) = second (M.insert l ix) $ mkIx ix asms ...

This is the first pass described above, it notes offsets of labels and gives the length of the machine code.

Then:

asm :: Int -> (Int, Maybe (Int, Int), M.Map Label Int) -> [X86 X86Reg FX86Reg a] -> [Word8] asm ix st (J _ l:asms) = let lIx = get l st instr = let offs = lIx-ix-5 in 0xe9:cd (fromIntegral offs :: Int32) in (instr ++) $ asm (ix+5) st asms asm ix st@(self, Just (m, _), _) (Call _ Malloc:asms) | Just i32 <- mi32 (m-(self+ix+5)) = let instr = 0xe8:le i32 in instr ++ asm (ix+5) st asms ...

This is the second pass, it takes offsets computed in the first phase and absolute addresses of malloc, free and returns machine code.

Putting it all together:

pI :: Ptr a -> Int pI = ((IntPtr i) -> i) . ptrToIntPtr

hasMa :: [X86 reg freg a] -> Bool hasMa = any g where g Call{} = True g _ = False

prepAddrs :: [X86 reg freg a] -> IO (Maybe (Int, Int)) prepAddrs ss = if hasMa ss then Just <$> mem' else pure Nothing

allFp :: [X86 X86Reg FX86Reg a] -> IO (FunPtr b) allFp instrs = do let (sz, lbls) = mkIx 0 instrs (fn, p) <- do res <- prepAddrs instrs case res of Just (m, _) -> (res,) <$> allocNear m (fromIntegral sz) _ -> (res,) <$> allocExec (fromIntegral sz) let b = BS.pack$asm 0 (pI p, fn, lbls) instrs finish b p

See the Apple compiler for full details.

Coda

To call the assembled function in Haskell, we can generate FFI wrappers with GHC:

foreign import ccall "dynamic" ff :: FunPtr (Double -> Double) -> Double -> Double

However, given that we are writing a JIT, we may wish to have more flexibility; in particular, we would like to be able to call functions whose type is unknown when we compile the compiler. Fortunately libffi solves exactly this problem and it has Haskell bindings.

Suppose we have machine code for a procedure that returns a double-precision float, stored in a function pointer fp. Then:

callFFI fp retCDouble []