Apple, being a JIT compiler with shape types, is able to do a number of optimizations based on inferred dimension (and rank). Rank is almost always known in practice, so such optimizations are pertinent.
This information comes from type inference:
from an expression known to have type Arr (i × j) a
, we know that it has
rank 2, from an expression of type Arr (6 × 128) a
we know it has dimension 6 × 128
.
Strides in Registers (Rank)
To write a procedure that accepts arbitrary rank arrays, we would have to account for dimensions (and strides) as an array, loading from memory in order to index into an array. A compiler can dispatch such procedures for any rank, while generating more efficient code that stores indices in registers.
Shifts for Strides
Suppose we have an m × n × l
array A
. Then an element A[i][j][k]
will be
at offset \( i \cdot mn + j \cdot m + k \). If \( m \) and/or \( mn \) is a power of 2, we
can optimize the multiplication into a shift, which is far more efficient
on hardware.
Loops
Suppose we wish to sum a vector:
> :disasm [(+)/ₒ 0 (x::Vec n float)]
mov x3, x0 e3 03 00 aa
eor x1, x1, x1 21 00 01 ca
fmov d0, x1 20 00 67 9e
ldr x1, [x3, #0x8] 61 04 40 f9
eor x2, x2, x2 42 00 02 ca
cmp x2, x1 5f 00 01 eb
b.GE apple1 ea 00 00 54
apple0:
add x4, x2, #0x2 44 08 00 91
ldr d2, [x3, x4, LSL #3] 62 78 64 fc
fadd d0, d0, d2 00 28 62 1e
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple0 6b ff ff 54
apple1:
ret c0 03 5f d6
If we know that the vector is nonempty, we can skip the branch instruction in the prologue, viz.
> :disasm [(+)/ₒ 0 (x::Vec 4 float)]
mov x3, x0 e3 03 00 aa
eor x1, x1, x1 21 00 01 ca
fmov d0, x1 20 00 67 9e
ldr x1, [x3, #0x8] 61 04 40 f9
eor x2, x2, x2 42 00 02 ca
apple0:
add x4, x2, #0x2 44 08 00 91
ldr d2, [x3, x4, LSL #3] 62 78 64 fc
fadd d0, d0, d2 00 28 62 1e
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple0 6b ff ff 54
ret c0 03 5f d6
This doesn't add much in terms of efficiency but makes things slightly easier to follow when debugging the compiler.
When computing a 7-day moving average, this does give some advantage; since the window array is known to have size 7, we can skip the prologue in the inner loop.
> :disasm ([((+)/x)%ℝ(:x)]`7)
⋮
apple2:
add x7, x4, #0x2 87 08 00 91
ldr d4, [x6, x7, LSL #3] c4 78 67 fc
fadd d2, d2, d4 42 28 64 1e
add x4, x4, #0x1 84 04 00 91
cmp x4, x3 9f 00 03 eb
b.LT apple2 6b ff ff 54
fmul d2, d2, d3 42 08 63 1e
add x3, x2, #0x2 43 08 00 91
str d2, [x0, x3, LSL #3] 02 78 23 fc
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple0 8b fc ff 54
apple1:
add sp, sp, #0x60 ff 83 01 91
ret c0 03 5f d6
See how dimension information is present after the AST has been annotated with the type of each expression:
> :ann ([((+)/x)%ℝ(:x)]\7)
((λ(x : Vec 7 float). (((((+ : float → float → float) (/ : (float → float → float) → Vec 7 float → float)) (x : Vec 7 float)) (% : float → float → float)) ((ℝ : int → float) ((: : Vec 7 float → int) (x : Vec 7 float))))) (\\
7 : (Vec 7 float → float) → Vec (i + 7) float → Vec i float))