A Proper X86 Assembler in Haskell Using the Escardó-Oliva Functional

by Vanessa McHale | 2025-01-18 21:46

Writing an assembler turns out to be an interesting example: one needs to calculate distances between jumps and their target labels, and the target may appear after the label. It turns out that one can write a one-pass assembler using the tardis monad or in Curry, using logic programming.

The tardis-based assembler doesn't work for X86, however—jump instructions themselves may take 2 or 5 bytes depending on whether the distance to the label is within -128 to 127 bytes. Thus we cannot calculate the offsets of a jump and its target until we know the encoded size of all the jump instructions in between, which are picked based on the distances.

This gives us an opportunity to show off a "big hammer" from functional programming---the Escardó-Oliva functional.

The Escardó-Oliva functional^[1] has type

bigotimes :: [[x] -> (x -> r) -> x] -> ([x] -> r) -> [x]

and computes the optimal moves (here, [x]) for a game whose outcome is of type r, from selection functions of type (x -> r) -> x (which we must supply) with access to the history of moves (of type [x]).

As an example, this interface can be used to write something like an SAT solver that will pick n Boolean variables satisfying a predicate, when possible:

solve :: Int -> ([Bool] -> Bool) -> [Bool]
solve n = bigotimes (replicate n (\_ -> ε))
  where ε = \p -> p True

ghci> solve 3 (\[a,b,c] -> a && b && not c)
[True,True,False]

This may be opaque, but it is not hard to work with. In our case the problem can be thought of as a single-player game; the outcome is the number of bytes required to assemble the code, which we would like to minimize. Each move is an encoding of a jump instruction using 2 or 5 bytes. A move encoding a jump to a label 250 bytes away as a short jump (say) is forbidden by the rules of the game.

The simplified X86 assembly:

type Label = Int
type JIx = Int
data X86 a = J a !Label
           | L !Label
           | Xor -- 3 bytes
ij :: [X86 a] -> [X86 JIx]
ij = go 0 where
    go _ []           = []
    go i (J _ l:isns) = J i l:go (i+1) isns
    go i (L l:isns)   = L l:go i isns
    go i (Xor:isns)   = Xor:go i isns
data Enc = Short | Near deriving (Show, Eq)
bytes :: Enc -> Int
bytes Short = 2
bytes Near  = 5

ij labels each jump for lookup.

The game:

type R = Int
type Move = (JIx, Enc)

The optimal play (here, the assignment of jumps to encodings leading to the smallest code) is provided by

optimalPlay :: [Move]
optimalPlay = bigotimes εs p

where

p :: [Move] -> R

is a predicate defining the outcome (in our case, the number of bytes required) and

εs :: [[Move] -> J R Move]

is a list of selection functions given past moves. J is the selection monad, viz.

type J r x = (x -> r) -> x

For a player trying to minimize R, there is the helper provided by Escardó and Oliva:

arginf :: [x] -> J R x
arginf []       = \ -> undefined
arginf [x]      = \ -> x
arginf (x:y:zs) = \p -> if p x < p y then arginf (x:zs) p else arginf (y:zs) p

which returns an appropriate selection function given potential moves. Our problem is a single-player game so this will suffice to define εs.

At is the offset of the machine code in memory.

type At = Int
assembly :: [X86 ()]
assembly = [J () 1, J () 0] ++ replicate 40 Xor ++ [L 1, L 0]
assemblyj :: [X86 JIx]
assemblyj = ij assembly

The next part is fairly standard---we collect offsets of labels---with the subtlety that we can assume all jumps are short if they have not been specified yet; invalid assignments will be pruned later.

ixes :: [Move] -> ([(Label, At)], [(JIx, At)])
ixes ms = mkIx ms 0 assemblyj
-- location of each instruction given putative jump encodings
mkIx :: [Move] -> At -> [X86 JIx] -> ([(Label, At)], [(JIx, At)])
mkIx ms ix (isn@Xor:isns) = mkIx ms (ix+3) isns
mkIx ms ix (isn@(L l):isns) =
    let (ls, isns') = mkIx ms ix isns
    in ((l,ix):ls, isns')
mkIx ms ix (isn@(J j l):isns) =
    let enc = lookup j ms
        -- if encoding isn't in Move history, we optimistically assume it's short
        -- if this causes contradictions later it will be pruned
        (ls, isns') = mkIx ms (ix+maybe 2 bytes enc) isns
    in (ls, (j, ix):isns')

At any point in history, the below will distinguish inconsistent encodings:

valid :: [Move] -> Bool
valid ms = all g assemblyj
  where g Xor{}   = True
        g L{}     = True
        g (J j l) =
            let enc = lookup j ms
            in case enc of
                Just Short -> within j l (\i -> i >= -128 && i < 127)
                Just Near  -> within j l (const True)
                -- hasn't been specified at this point in history, okay so far
                Nothing    -> True
    within j l clamp =
        let Just at0 = lookup j jAt
            Just at1 = lookup l lAt
        in clamp (at1-at0)

    (lAt, jAt) = ixes ms

Stitching this together with the framework provided by Escardó and Oliva:

εs :: [[Move] -> J R Move]
εs = replicate n ε
  where ε h = let o = allj `setMinus` map fst h
              in arginf ([(j, Short) | j <- o , valid ((j, Short):h)] ++ [(j, Near) | j <- o])
        allj = collectj (assemblyj)
        n = length allj
p :: [Move] -> R
p ms = sum [ bytes e | (_, e) <- ms ]
optimalPlay :: [Move]
optimalPlay = bigotimes εs p

where

setMinus :: Ord a => [a] -> [a] -> [a]

works on ordered lists with no duplicates.

The replicate n creates a selection function for each jump.

Now we have the optimal encoding of jump instructions. See what happens when we change the size of the inner loop:

assembly :: [X86 ()]
assembly = [J () 1, J () 0] ++ replicate 40 Xor ++ [L 0, L 1]
ghci> optimalPlay
[(1,Short),(0,Short)]

assembly :: [X86 ()]
assembly = [J () 1, J () 0] ++ replicate 41 Xor ++ [L 0, L 1]
ghci> optimalPlay
[(0,Near),(1,Short)]

assembly :: [X86 ()]
assembly = [J () 1, J () 0] ++ replicate 42 Xor ++ [L 0, L 1]
ghci> optimalPlay
[(1,Near),(0,Near)]

That is, it picks the 2-byte encoding for inner and outer loops when possible (which depends on the inner jump being encoded in 2 bytes), and rejects invalid encodings.

For reference:

type K r x = (x -> r) -> r
overline :: J r x -> K r x
overline e = \p -> p (e p)
bigotimes :: [[x] -> J r x] -> J r [x]
bigotimes []     = \_ -> []
bigotimes (m:ms) = m [] . (\x -> bigotimes [ \xs -> d (x:xs) | d <- ms ])
  where
(*.*) :: J r x -> (x -> J r [x]) -> J r [x]
(*.*) m ind = \p ->
  let x = m (\x0 -> overline (ind x0) (\xs0 -> p (x0:xs0)))
      xs = ind x (\xs0 -> p (x:xs0))
  in x:xs

-- sorted list with no duplicates
delete :: Ord a => a -> [a] -> [a]
delete x [] = []
delete x (vs@(y:ys))
    | x == y = ys
    | x < y = vs
    | otherwise = y : delete x ys
setMinus :: Ord a => [a] -> [a] -> [a]
setMinus xs []     = xs
setMinus xs (y:ys) = setMinus (delete y xs) ys

Code is available here

[1] This is the history-dependent version.

return

blog

A Proper X86 Assembler in Haskell Using the Escardó-Oliva Functional