I just finished adding another mid-end to my Apple JIT compiler, motivated to get rank facilities right. However, there remain significant shortfalls.

Hardware Sizes

Apple only implements 64-bit floats and 64-bit integers to simplify implementation.

Famously, half-floats (16-bits) are superior for neural nets. However, 32-bit integers and smaller are also essential—copying an array of such is two times faster than 64-bit (I believe I learned this from a remark by Marshal Lochbaum).

Such situations arise in array programming—arrays are frequently smaller than 32767 elements and so an array of indices can use 16-bit integers.

Library Calls

Part of writing Apple was exploring compiler-as-a-library. Array languages can do nontrivial things with only malloc, free and CPU instructions. This simplifies JIT linking, and would be a boon when sending machine code across a network.

However, without optimized matrix multiplication from libraries, Apple is limited in ways that arrays languages shouldn't be.

Canonical Status

One of the aims of Apple was to make it easy to call from other languages. One could write a 7-day sliding average in Apple:

([((+)/x)%ℝ(:x)]`7)

which is then available in Python, R, C, &c.

However, I have not implemented SIMD. For the dot product, viz.

[(+)/ ((*)((x::Arr (i Cons` Nil) float)) y)]

the performance is inferior to NumPy[1].

Implementing the Apple JIT compiler as a library was a significant effort, but without going all the way, one cannot achieve canonical status.

[1] NumPy is not entirely satisfactory itself: it must have a special procedure for dot product—being a library rather than a language, it cannot perform fusion/deforestation.