It is quite possible to beat established (array) implementations with special cases and bittwiddling.
Consider a softmax layer from Aditya Srinivas Menon's tutorial:
Apple, being a JIT compiler with shape types, is able to do a number of optimizations based on inferred dimension (and rank). Rank is almost always known in practice, so such optimizations are pertinent.
In Haskell, one can present a streaming compression/decompression API with lazy bytestrings.
I just finished adding another mid-end to my Apple JIT compiler, motivated to get rank facilities right. However, there remain significant shortfalls.
prev | next