Lessons from Writing an Array Language Compiler

by Vanessa McHale | 2023-10-02 13:51

My Apple compiler started as an experiment in what I would implement as compiler for an array system. Several parts did not pan out so I offer my warnings and advice to other array language implementers.

Typed Array Programming

Apple combines a REPL and type inference

 > :ty (⊲)
(a → (Arr (i `Cons` Nil) a → Arr ((i + 1) `Cons` Nil) a))

This turns out to be quite good for the style of exploratory programming that J encourages. Such a type system is hardly new: the Apple type system is based on Repa and Accelerate, but they do not emphasize interactivity or terseness.

APL descendants look peculiar in light of their lack of static typing; a dialect that added shape types would immediately earn its place.

Compiler-as-a-Library

I wanted to have a function:

compile_src : String -> Bytes

which took source code to machine code.

With such a function one could compile to machine code on one machine and distribute it to others for execution, modify a text editor to execute code directly, or write a compiler plugin to extend GCC (say) with the ability to compile another language.

#include <stdio.h>
#include <stdlib.h>
double ff(double) {apple|
  λz.
  {
    ffact ← [()/ₒ 1 (𝒻 1 x (⌊x))];
    Σ ← λN.λa. (+)/ₒ 0 (a'(⍳ 0 N 1));
    (2%√𝜋)Σ 30 (λn. {nf⟜ℝn; ((_1^n)z^(2n+1))%((ffact nf)(2nf+1))})
  }
|};
int main(int argc, char *argv[]) {
    printf("%f\n", ff(2.0));
}

However, this is not so easy—foreign function calls cannot be compiled to machine code and must bundle context information.

There appears to be little precedent in writing compilers that can be threaded through one another, but an array language would be a good candidate: one can write a full compiler that only needs foreign calls to malloc and free.

Embedding

When the compiler is structured as a library function, one can easily embed it in other languages. The hope was that this would make Apple a replacement for NumPy.

A embedded language can perform deforestation that NumPy cannot; consider

np.sum(np.arange(0,10000,1.0,dtype=np.int64))

This builds up an array in-memory and then folds over it. In Apple, the compiler can analyze the whole expression

(+)/o 0 (irange 0 9999 1)

and compile it to a loop with no allocations:

 > :asm (+)/o 0 (irange 0 9999 1)
    sub rsp, 8
    xor r10, r10
    xor rax, rax
    mov rdx, 9999
apple_0:
    cmp r10, rdx
    jg apple_1
    mov rcx, r10
    add rax, rcx
    add r10, 1
    jmp apple_0
apple_1:
    add rsp, 8
    ret

Jupyter Notebooks

Jupyter notebooks are quite nice for calculations; the combination of interactivity and record-keeping is missing from the ML side of things (Futhark, Accelerate, Repa). Such languages have found great success in building systems but perhaps calculation has been forgotten.

JIT

Writing a full compiler backend was half the effort of writing the interpreter, at once perfunctory and fickle.

Unfortunately this was necessary: other array languages have found great success in tree-walking interpreters written in C, but this approach would not allow us to compile down to object files.

There is one advantage, which is that the compiler can be used from the REPL; one can inspect the assembly to see if SIMD instructions are used:

 > :disasm [(⋉)Λₒ 0 (x::Arr (i `Cons` Nil) float)]
    ⋮
    xor r10, r10                             4d 31 d2
    cmp r10, rcx                             49 39 ca
    jge apple_1                              0f 8d 1b 00 00 00
apple_0:
    movq [rax+8*r10+16], xmm3                66 4a 0f 7e 5c d0 10
    vmaxsd xmm3, xmm3, [rdx+8*r10+16]        c4 a1 e3 5f 5c d2 10
    add r10, 1                               49 83 c2 01
    cmp r10, rcx                             49 39 ca
    jl apple_0                               0f 8c e5 ff ff ff
apple_1:
    add rsp, 8                               48 83 c4 08
    ret                                      c3

A backend-as-a-library would suit array languages well: calculations and array operations map nicely to assembly, and the array language community cares about SIMD instructions.

Unfortunately, LLVM is poorly documented and fast-moving, and GCC is simply not structured suitably.

IR

The Apple compiler only has one IR, going from expressions to jumps and addresses.

Other array languages have written successful interpreters in C; an IR like C (with for loops) would make translating rank less tortuous.

return

blog

Lessons from Writing an Array Language Compiler

Typed Array Programming

Compiler-as-a-Library

Embedding

Jupyter Notebooks

JIT

IR