C (C++) is used to write performant software, however it is ill-suited to SIMD. In particular, its compilation of stepped reduction with lexical scoping opposes parallel execution.

As an example, consider the sigmoid,

\( x \mapsto \displaystyle \frac{1}{1+e^{-x}} \)

In Apple, one can map over an array with

λxs. [1%(1+e:(_x))]`{0} xs

One might want to compute \(e^x\) with the system function exp. Arm and x86 have vector instructions for addition, division, &c. but calling a function entails switching the (single) thread of execution, so function calls cannot be vectorized[1]. C functions are no longer composable.

λx. x*2 and λx. if x≥1 then sqrt x else x are both functions in structured and functional programming but must be handled differently when compiling. The elegance of C's compilation model falls down.

In C, one writes a function, and it is exported in an object file. To appreciate why this is special, consider sum :: Num a => [a] -> a in Haskell. This function exists only in the context of GHC. One could export this as a function taking taking a record of functions as an argument, viz.

sum :: { (+) :: ∀ a. a -> a -> a , (-) :: ∀ a. a -> a -> a ... } -> -> [a] -> a

However, this is less efficient than inlining (+), so really one still depends on sum as it exists in the context of the compiler. Moreover, this discards type instance resolution information.

Returning to SIMD: map (*2) cannot depend on (*2) as a sequence of assembly instructions. Rather, it must treat (*2) like Haskell handles sum, an entity that exists only in the context of the compiler.

To some extent this percolates compilers textbooks. Array languages naturally express SIMD calculations; perhaps there are more fluent methods for compilation (and better data structures for export à la object files).

[1] OS X provides

 vFloat vexpf(vFloat x);
 ⋮
 void vvexp(double *y, const double *x, const int *n);

to address this.