Apple, being a JIT compiler with shape types, is able to do a number of optimizations based on inferred dimension (and rank). Rank is almost always known in practice, so such optimizations are pertinent.

This information comes from type inference:
from an expression known to have type `Arr (i × j) a`

, we know that it has
rank 2, from an expression of type `Arr (6 × 128) a`

we know it has dimension `6 × 128`

.

# Strides in Registers (Rank)

To write a procedure that accepts arbitrary rank arrays, we would have to account for dimensions (and strides) as an array, loading from memory in order to index into an array. A compiler can dispatch such procedures for any rank, while generating more efficient code that stores indices in registers.

# Shifts for Strides

Suppose we have an `m × n × l`

array `A`

. Then an element `A[i][j][k]`

will be
at offset \( i \cdot mn + j \cdot m + k \). If \( m \) and/or \( mn \) is a power of 2, we
can optimize the multiplication into a shift, which is far more efficient
on hardware.

# Loops

Suppose we wish to sum a vector:

> :disasm [(+)/ₒ 0 (x::Vec n float)]
mov x3, x0 e3 03 00 aa
eor x1, x1, x1 21 00 01 ca
fmov d0, x1 20 00 67 9e
ldr x1, [x3, #0x8] 61 04 40 f9
eor x2, x2, x2 42 00 02 ca
cmp x2, x1 5f 00 01 eb
b.GE apple*1 ea 00 00 54
apple*0:
add x4, x2, #0x2 44 08 00 91
ldr d2, [x3, x4, LSL #3] 62 78 64 fc
fadd d0, d0, d2 00 28 62 1e
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple*0 6b ff ff 54
apple*1:
ret c0 03 5f d6

If we know that the vector is nonempty, we can skip the branch instruction in the prologue, viz.

> :disasm [(+)/ₒ 0 (x::Vec 4 float)]
mov x3, x0 e3 03 00 aa
eor x1, x1, x1 21 00 01 ca
fmov d0, x1 20 00 67 9e
ldr x1, [x3, #0x8] 61 04 40 f9
eor x2, x2, x2 42 00 02 ca
apple*0:
add x4, x2, #0x2 44 08 00 91
ldr d2, [x3, x4, LSL #3] 62 78 64 fc
fadd d0, d0, d2 00 28 62 1e
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple*0 6b ff ff 54
ret c0 03 5f d6

This doesn't add much in terms of efficiency but makes things slightly easier to follow when debugging the compiler.

When computing a 7-day moving average, this does give some advantage; since the window array is known to have size 7, we can skip the prologue in the inner loop.

> :disasm ([((+)/x)%ℝ(:x)]`7)
⋮
apple*2:
add x7, x4, #0x2 87 08 00 91
ldr d4, [x6, x7, LSL #3] c4 78 67 fc
fadd d2, d2, d4 42 28 64 1e
add x4, x4, #0x1 84 04 00 91
cmp x4, x3 9f 00 03 eb
b.LT apple*2 6b ff ff 54
fmul d2, d2, d3 42 08 63 1e
add x3, x2, #0x2 43 08 00 91
str d2, [x0, x3, LSL #3] 02 78 23 fc
add x2, x2, #0x1 42 04 00 91
cmp x2, x1 5f 00 01 eb
b.LT apple*0 8b fc ff 54
apple*1:
add sp, sp, #0x60 ff 83 01 91
ret c0 03 5f d6

See how dimension information is present after the AST has been annotated with the type of each expression:

> :ann ([((+)/x)%ℝ(:x)]`7)
((λ(x : Vec 7 float). (((((+ : float → float → float) (/ : (float → float → float) → Vec 7 float → float)) (x : Vec 7 float)) (% : float → float → float)) ((ℝ : int → float) ((: : Vec 7 float → int) (x : Vec 7 float))))) (`7 : (Vec 7 float → float) → Vec (i + 7) float → Vec i float))