For an array with dimensions \(n_1,n_2,\ldots n_r\) stored in column-major order, an element with indices \(a_1,a_2,\ldots a_n\) is located at offset
C (C++) is used to write performant software, however it is ill-suited to SIMD. In particular, its compilation of stepped reduction with lexical scoping opposes parallel execution.
One motivation for Apple was demonstrating typed array programming. Shape types are rich; we can use types as witnesses as in QuickCheck, generating test cases that are shape-correct.
It is quite possible to beat established (array) implementations with special cases and bittwiddling.
Consider a softmax layer from Aditya Srinivas Menon's tutorial:
prev | next