Haskell is a wonderfully abstract language, with a community that backs it up - many of the most successful libraries are based on relatively recent papers that heavily incorporate algebra and category theory.
As such, you can also do some very abstract things, and you can write code that is difficult to understand, which is precisely what we're going to do here. Brainfuck is already an intentionally esoteric language; we're going to write an interpreter that's as laconic as possible, bringing in lenses, recursion schemes, monad transformers, and monadic parser combinators, hopefully learning something along the way.
Here's the core code (omitting imports + command-line parser):
type St a = StateT IndexArr IO a
type IndexArr = (V.Vector Int, Int)
-- | Syntax tree for brainfuck
data Syntax a = Loop (Syntax a)
| Seq [Syntax a]
| Token a deriving (Show)
makeBaseFunctor ''Syntax
-- | Map a char to its action in the St
monad
toAction :: Char -> St ()
toAction = maybe (error mempty) id . flip M.lookup keys
where modifyVal f = flip modifyByIndex f . snd =<< get
modifyByIndex i = modifyState (_1 . sliced i 1 . forced) . fmap
modifyState lens = (lens %%=) . (pure .)
readChar = get >>= ((_,i) -> modifyByIndex i . const =<< (liftIO . (fmap fromEnum)) getChar)
displayChar = get >>= ((arr,i) -> liftIO . putChar . toEnum . (V.! i) $ arr)
keys = M.fromList [ ('.', displayChar)
, (',', readChar)
, ('+', modifyVal (+1))
, ('-', modifyVal (subtract 1))
, ('>', modifyState _2 (+1))
, ('<', modifyState _2 (subtract 1)) ]
-- | Parse to syntax tree
brainheck :: Parser (Syntax Char)
brainheck = Seq <$> many (Seq . (fmap Token) <$> (some . oneOf) "+-.,<>"
<|> Loop <$> between (char '[') (char ']') brainheck)
algebra :: Base (Syntax Char) (St ()) -> St ()
algebra (TokenF x) = toAction x
algebra (SeqF x) = foldr (>>) (pure ()) x
algebra l@(LoopF x) = check >>= (\bool -> if bool then pure () else x >> algebra l)
where check = get >>= ((arr,i) -> pure . (==0) . (V.! i) $ arr)
-- | Evaluate syntax tree
run :: (Syntax Char) -> IO ()
run parsed = fst <$> runStateT (cata algebra parsed) (V.replicate 30000 0, 0)
-- | Parse and return an error or a syntax tree
parseBrainheck :: FilePath -> T.Text -> Either (ParseError (Token T.Text) Dec) (Syntax Char)
parseBrainheck filepath = (parse (brainheck) filepath) . (T.filter (elem
"[]+-.,<>"))
The full code is available in the brainheck package, or via github.
We start off with a monad transformer, giving us St a
as our underlying monad:
this allows us to keep a global state and perform IO and have it all end up
in the right order without too much worry.
type St a = StateT IndexArr IO a
type IndexArr = (V.Vector Int, Int)
Brainfuck demands a 30000-cell array
initialized to zero, so our state will be a boxed (i.e., fixed-size) vector of
Int
s, but an Int
to index it (like a pointer).
Next is our syntax tree, for which we use an Algebraic Data Type (ADT) - the
"morally correct" way to parse syntax, though in this case we only have three
constructors: Loop (Syntax a)
, which means we process the contents
until we escape the loop, Seq [Syntax a]
,
corresponding to a sequence of actions, and Token a
, corresponding to an
action.
data Syntax a = Loop (Syntax a)
| Seq [Syntax a]
| Token a deriving (Show)
Note that this is parametrized by an underlying type, which might seem like overkill. But, by making it functorial, we will get the full advantages of recursion schemes later on!
With our data type set, we can think about parsing. Brainfuck is simple enough that we only need an applicative parser plus combinators. It ends up being a one-liner if we take advantage of recursion:
brainheck :: Parser (Syntax Char)
brainheck = Seq <$> many (Seq . (fmap Token) <$> (some . oneOf) "+-.,<>"
<|> Loop <$> between (char '[') (char ']') brainheck)
Now we need turn to the tree of characters into something useful - i.e. the IO
that the user should see when they run your program. Each character corresponds
to a particular action in the St
monad (brainfuck is not very complicated), so
we start with a function toChar :: Char -> St ()
to do just that:
toAction = maybe (error mempty) id . flip M.lookup keys
where modifyVal f = flip modifyByIndex f . snd =<< get
modifyByIndex i = modifyState (_1 . sliced i 1 . forced) . fmap
modifyState lens = (lens %%=) . (pure .)
readChar = get >>= ((_,i) -> modifyByIndex i . const =<< (liftIO . (fmap fromEnum)) getChar)
displayChar = get >>= ((arr,i) -> liftIO . putChar . toEnum . (V.! i) $ arr)
keys = M.fromList [ ('.', displayChar)
, (',', readChar)
, ('+', modifyVal (+1))
, ('-', modifyVal (subtract 1))
, ('>', modifyState _2 (+1))
, ('<', modifyState _2 (subtract 1)) ]
The meat of this is M.lookup
, which basically says "turn a list of tuples
into a function" - keys
being the list of tuples. This will return an error
type if we try to feed it any other Char
, but we've already filtered those out
since they have no effect in brainfuck.
modifyVal
and modifyState
each take a lens looking into our underlying state
and use that to modify the state. %%=
is an infix function that basically just
allows lenses to be applied to the underlying state rather than the value passed
into the function. Also note the use of sliced i 1
(a lens looking into our
underlying array) and forced
, a lens which forces the operations to be
performed with copying (increasing speed).
But of course, our program is not a Char
, nor is it even a list of Char
s:
our parser returns a Syntax Char
, so we're going to need something to evaluate
it. We could use pattern matching and recursion, but there's a more general
(and sometimes faster) way: recursion schemes!
algebra :: Base (Syntax Char) (St ()) -> St ()
algebra (TokenF x) = toAction x
algebra (SeqF x) = foldr (>>) (pure ()) x
algebra l@(LoopF x) = check >>= (\bool -> if bool then pure () else x >> algebra l)
where check = get >>= ((arr,i) -> pure . (==0) . (V.! i) $ arr)
-- | Evaluate syntax tree
run :: (Syntax Char) -> IO ()
run parsed = fst <$> runStateT (cata algebra parsed) (V.replicate 30000 0, 0)
The first function is our (Base t)
-algebra, and the second is the actual
catamorphism. Because of our earlier
makeBaseFunctor ''Syntax
we don't have to do anything fancy in terms of checking exactly what can be
folded or not - we can define all our behavior on a SeqF x
by treating it like a
list of St ()
s, and we can similarly define our behavior on a LoopF x
by
treating x
like an St ()
- check
is in fact just our function to check, in
the St
monad, whether we enter the loop again (in brainfuck, loops are entered
only when the value at the pointer is nonzero).
And that's basically it. parseBrainheck
is now a function accepting text
(i.e. your program) as input. Of course, you probably want to
wrap it with something nicer to read from a file, say, but as it stands,
42 lines + imports to get a Turing-Complete language really isn't that bad!