My Apple compiler started as an experiment in what I would implement as compiler for an array system. Several parts did not pan out so I offer my warnings and advice to other array language implementers.
Typed Array Programming
Apple combines a REPL and type inference
> :ty (⊲)
(a → (Arr (i `Cons` Nil) a → Arr ((i + 1) `Cons` Nil) a))
This turns out to be quite good for the style of exploratory programming that J encourages. Such a type system is hardly new: the Apple type system is based on Repa and Accelerate, but they do not emphasize interactivity or terseness.
APL descendants look peculiar in light of their lack of static typing; a dialect that added shape types would immediately earn its place.
Compiler-as-a-Library
I wanted to have a function:
compile_src : String -> Bytes
which took source code to machine code.
With such a function one could compile to machine code on one machine and distribute it to others for execution, modify a text editor to execute code directly, or write a compiler plugin to extend GCC (say) with the ability to compile another language.
#include <stdio.h>
#include <stdlib.h>
double ff(double) {apple|
λz.
{
ffact ← [()/ₒ 1 (𝒻 1 x (⌊x))];
Σ ← λN.λa. (+)/ₒ 0 (a'(⍳ 0 N 1));
(2%√𝜋)Σ 30 (λn. {nf⟜ℝn; ((_1^n)z^(2n+1))%((ffact nf)(2nf+1))})
}
|};
int main(int argc, char *argv[]) {
printf("%f\n", ff(2.0));
}
However, this is not so easy—foreign function calls cannot be compiled to machine code and must bundle context information.
There appears to be little precedent in writing compilers that can be
threaded through one another, but an array language would be a good candidate: one
can write a full compiler that only needs foreign calls to malloc
and free
.
Embedding
When the compiler is structured as a library function, one can easily embed it in other languages. The hope was that this would make Apple a replacement for NumPy.
A embedded language can perform deforestation that NumPy cannot; consider
np.sum(np.arange(0,10000,1.0,dtype=np.int64))
This builds up an array in-memory and then folds over it. In Apple, the compiler can analyze the whole expression
(+)/o 0 (irange 0 9999 1)
and compile it to a loop with no allocations:
> :asm (+)/o 0 (irange 0 9999 1)
sub rsp, 8
xor r10, r10
xor rax, rax
mov rdx, 9999
apple_0:
cmp r10, rdx
jg apple_1
mov rcx, r10
add rax, rcx
add r10, 1
jmp apple_0
apple_1:
add rsp, 8
ret
Jupyter Notebooks
Jupyter notebooks are quite nice for calculations; the combination of interactivity and record-keeping is missing from the ML side of things (Futhark, Accelerate, Repa). Such languages have found great success in building systems but perhaps calculation has been forgotten.
JIT
Writing a full compiler backend was half the effort of writing the interpreter, at once perfunctory and fickle.
Unfortunately this was necessary: other array languages have found great success in tree-walking interpreters written in C, but this approach would not allow us to compile down to object files.
There is one advantage, which is that the compiler can be used from the REPL; one can inspect the assembly to see if SIMD instructions are used:
> :disasm [(⋉)Λₒ 0 (x::Arr (i `Cons` Nil) float)]
⋮
xor r10, r10 4d 31 d2
cmp r10, rcx 49 39 ca
jge apple_1 0f 8d 1b 00 00 00
apple_0:
movq [rax+8*r10+16], xmm3 66 4a 0f 7e 5c d0 10
vmaxsd xmm3, xmm3, [rdx+8*r10+16] c4 a1 e3 5f 5c d2 10
add r10, 1 49 83 c2 01
cmp r10, rcx 49 39 ca
jl apple_0 0f 8c e5 ff ff ff
apple_1:
add rsp, 8 48 83 c4 08
ret c3
A backend-as-a-library would suit array languages well: calculations and array operations map nicely to assembly, and the array language community cares about SIMD instructions.
Unfortunately, LLVM is poorly documented and fast-moving, and GCC is simply not structured suitably.
IR
The Apple compiler only has one IR, going from expressions to jumps and addresses.
Other array languages have written successful interpreters in C; an IR like C (with for loops) would make translating rank less tortuous.