Optimizing Array Code With Inferred Type Information

by Vanessa McHale | 2024-07-18 18:11

Apple, being a JIT compiler with shape types, is able to do a number of optimizations based on inferred dimension (and rank). Rank is almost always known in practice, so such optimizations are pertinent.

This information comes from type inference: from an expression known to have type Arr (i × j) a, we know that it has rank 2, from an expression of type Arr (6 × 128) a we know it has dimension 6 × 128.

Strides in Registers (Rank)

To write a procedure that accepts arbitrary rank arrays, we would have to account for dimensions (and strides) as an array, loading from memory in order to index into an array. A compiler can dispatch such procedures for any rank, while generating more efficient code that stores indices in registers.

Shifts for Strides

Suppose we have an m × n × l array A. Then an element A[i][j][k] will be at offset \( i \cdot mn + j \cdot m + k \). If \( m \) and/or \( mn \) is a power of 2, we can optimize the multiplication into a shift, which is far more efficient on hardware.

Loops

Suppose we wish to sum a vector:

 > :disasm [(+)/ₒ 0 (x::Vec n float)]
    mov x3, x0                               e3 03 00 aa
    eor x1, x1, x1                           21 00 01 ca
    fmov d0, x1                              20 00 67 9e
    ldr x1, [x3, #0x8]                       61 04 40 f9
    eor x2, x2, x2                           42 00 02 ca
    cmp x2, x1                               5f 00 01 eb
    b.GE apple1                             ea 00 00 54
apple0:
    add x4, x2, #0x2                         44 08 00 91
    ldr d2, [x3, x4, LSL #3]                 62 78 64 fc
    fadd d0, d0, d2                          00 28 62 1e
    add x2, x2, #0x1                         42 04 00 91
    cmp x2, x1                               5f 00 01 eb
    b.LT apple0                             6b ff ff 54
apple1:
    ret                                      c0 03 5f d6

If we know that the vector is nonempty, we can skip the branch instruction in the prologue, viz.

 > :disasm [(+)/ₒ 0 (x::Vec 4 float)]
    mov x3, x0                               e3 03 00 aa
    eor x1, x1, x1                           21 00 01 ca
    fmov d0, x1                              20 00 67 9e
    ldr x1, [x3, #0x8]                       61 04 40 f9
    eor x2, x2, x2                           42 00 02 ca
apple0:
    add x4, x2, #0x2                         44 08 00 91
    ldr d2, [x3, x4, LSL #3]                 62 78 64 fc
    fadd d0, d0, d2                          00 28 62 1e
    add x2, x2, #0x1                         42 04 00 91
    cmp x2, x1                               5f 00 01 eb
    b.LT apple0                             6b ff ff 54
    ret                                      c0 03 5f d6

This doesn't add much in terms of efficiency but makes things slightly easier to follow when debugging the compiler.

When computing a 7-day moving average, this does give some advantage; since the window array is known to have size 7, we can skip the prologue in the inner loop.

 > :disasm ([((+)/x)%ℝ(:x)]`7)
    ⋮
apple2:
    add x7, x4, #0x2                         87 08 00 91
    ldr d4, [x6, x7, LSL #3]                 c4 78 67 fc
    fadd d2, d2, d4                          42 28 64 1e
    add x4, x4, #0x1                         84 04 00 91
    cmp x4, x3                               9f 00 03 eb
    b.LT apple2                             6b ff ff 54
    fmul d2, d2, d3                          42 08 63 1e
    add x3, x2, #0x2                         43 08 00 91
    str d2, [x0, x3, LSL #3]                 02 78 23 fc
    add x2, x2, #0x1                         42 04 00 91
    cmp x2, x1                               5f 00 01 eb
    b.LT apple0                             8b fc ff 54
apple1:
    add sp, sp, #0x60                        ff 83 01 91
    ret                                      c0 03 5f d6

See how dimension information is present after the AST has been annotated with the type of each expression:

 > :ann ([((+)/x)%ℝ(:x)]\7)
((λ(x : Vec 7 float). (((((+ : float → float → float) (/ : (float → float → float) → Vec 7 float → float)) (x : Vec 7 float)) (% : float → float → float)) ((ℝ : int → float) ((: : Vec 7 float → int) (x : Vec 7 float))))) (\\7 : (Vec 7 float → float) → Vec (i + 7) float → Vec i float))

return

blog

Optimizing Array Code With Inferred Type Information

Strides in Registers (Rank)

Shifts for Strides

Loops