C (C++) is used to write performant software, however it is ill-suited to SIMD. In particular, its compilation of stepped reduction with lexical scoping opposes parallel execution.
As an example, consider the sigmoid,
\( x \mapsto \displaystyle \frac{1}{1+e^{-x}} \)
In Apple, one can map over an array with
λxs. [1%(1+e:(_x))]`{0} xs
One might want to compute \(e^x\) with the system function exp
. Arm and x86
have vector instructions for addition, division, &c. but calling a function
entails switching the (single) thread of execution, so function calls cannot be
vectorized[1]. C functions are no longer
composable.
λx. x*2
and λx. if x≥1 then sqrt x else x
are both functions in structured
and functional programming but must be handled differently when compiling.
The elegance of C's compilation model falls down.
In C, one writes a function, and it is exported in an object file. To appreciate why
this is special, consider sum :: Num a => [a] -> a
in Haskell. This function exists
only in the context of GHC. One could export this as a function taking taking a
record of functions as an argument, viz.
sum ::
{ (+) :: ∀ a. a -> a -> a
, (-) :: ∀ a. a -> a -> a
...
} ->
-> [a]
-> a
However, this is less efficient than inlining (+)
, so really one still depends
on sum
as it exists in the context of the compiler. Moreover, this discards
type instance resolution information.
Returning to SIMD: map (*2)
cannot depend on (*2)
as a sequence of assembly
instructions. Rather, it must treat (*2)
like Haskell handles sum
, an entity
that exists only in the context of the compiler.
To some extent this percolates compilers textbooks. Array languages naturally express SIMD calculations; perhaps there are more fluent methods for compilation (and better data structures for export à la object files).
[1] OS X provides
vFloat vexpf(vFloat x);
⋮
void vvexp(double *y, const double *x, const int *n);
to address this.