Consider a softmax layer from Aditya Srinivas Menon's tutorial:
def softmax(x):
exp_element=np.exp(x-x.max())
return exp_element/np.sum(exp_element,axis=0)
This uses some of NumPy's broadcast facilities ("spooky action at a distance"); it is instructive to compare Apple's rank facility, which is entirely explicit (to a fault).
λxs.
{ m ⟜ (⋉)/* _1 xs; a ⟜ [e:(x-m)]`[0] xs
; n ⟜ ((+)/)`{1} (a::M float)
; ⍉([(%x)'y]`{0,1} n a)
}
((+)/)`{1}
takes sums over the array's 1-cells, defaulting to iterating over the first
axis. Rank in Apple was designed with type inference in mind; one has
> :ty \f.\x.\y. f`{0,1} x y
(a → Vec i b → c) → Arr sh a → Arr (i `Cons` sh) b → Arr sh c
([(%x)'y]`{0,1} n a)
selects 0-cells (floats) from n
and 1-cells from a
and normalizes these 1-cells. A transpose (⍉) is necessary to massage the
dimensions at the end, which suggests that the design of Apple's rank facilities
was ill-considered.