Apple, being a JIT compiler with shape types, is able to do a number of optimizations based on inferred dimension (and rank). Rank is almost always known in practice, so such optimizations are pertinent.

This information comes from type inference: from an expression known to have type Arr (i × j) a, we know that it has rank 2, from an expression of type Arr (6 × 128) a we know it has dimension 6 × 128.

Strides in Registers (Rank)

To write a procedure that accepts arbitrary rank arrays, we would have to account for dimensions (and strides) as an array, loading from memory in order to index into an array. A compiler can dispatch such procedures for any rank, while generating more efficient code that stores indices in registers.

Shifts for Strides

Suppose we have an m × n × l array A. Then an element A[i][j][k] will be at offset \( i \cdot mn + j \cdot m + k \). If \( m \) and/or \( mn \) is a power of 2, we can optimize the multiplication into a shift, which is far more efficient on hardware.

Loops

Suppose we wish to sum a vector:

> :disasm [(+)/ₒ 0 (x::Vec n float)] mov x3, x0 e3 03 00 aa eor x1, x1, x1 21 00 01 ca fmov d0, x1 20 00 67 9e ldr x1, [x3, #0x8] 61 04 40 f9 eor x2, x2, x2 42 00 02 ca cmp x2, x1 5f 00 01 eb b.GE apple1 ea 00 00 54 apple0: add x4, x2, #0x2 44 08 00 91 ldr d2, [x3, x4, LSL #3] 62 78 64 fc fadd d0, d0, d2 00 28 62 1e add x2, x2, #0x1 42 04 00 91 cmp x2, x1 5f 00 01 eb b.LT apple0 6b ff ff 54 apple1: ret c0 03 5f d6

If we know that the vector is nonempty, we can skip the branch instruction in the prologue, viz.

> :disasm [(+)/ₒ 0 (x::Vec 4 float)] mov x3, x0 e3 03 00 aa eor x1, x1, x1 21 00 01 ca fmov d0, x1 20 00 67 9e ldr x1, [x3, #0x8] 61 04 40 f9 eor x2, x2, x2 42 00 02 ca apple0: add x4, x2, #0x2 44 08 00 91 ldr d2, [x3, x4, LSL #3] 62 78 64 fc fadd d0, d0, d2 00 28 62 1e add x2, x2, #0x1 42 04 00 91 cmp x2, x1 5f 00 01 eb b.LT apple0 6b ff ff 54 ret c0 03 5f d6

This doesn't add much in terms of efficiency but makes things slightly easier to follow when debugging the compiler.

When computing a 7-day moving average, this does give some advantage; since the window array is known to have size 7, we can skip the prologue in the inner loop.

> :disasm ([((+)/x)%ℝ(:x)]`7) ⋮ apple2: add x7, x4, #0x2 87 08 00 91 ldr d4, [x6, x7, LSL #3] c4 78 67 fc fadd d2, d2, d4 42 28 64 1e add x4, x4, #0x1 84 04 00 91 cmp x4, x3 9f 00 03 eb b.LT apple2 6b ff ff 54 fmul d2, d2, d3 42 08 63 1e add x3, x2, #0x2 43 08 00 91 str d2, [x0, x3, LSL #3] 02 78 23 fc add x2, x2, #0x1 42 04 00 91 cmp x2, x1 5f 00 01 eb b.LT apple0 8b fc ff 54 apple1: add sp, sp, #0x60 ff 83 01 91 ret c0 03 5f d6

See how dimension information is present after the AST has been annotated with the type of each expression:

> :ann ([((+)/x)%ℝ(:x)]`7) ((λ(x : Vec 7 float). (((((+ : float → float → float) (/ : (float → float → float) → Vec 7 float → float)) (x : Vec 7 float)) (% : float → float → float)) ((ℝ : int → float) ((: : Vec 7 float → int) (x : Vec 7 float))))) (`7 : (Vec 7 float → float) → Vec (i + 7) float → Vec i float))