A question in compiler design is: what optimizations should a given compiler perform? Optimizations for functional languages in particular are not well-known; it is not obvious which optimizations will provide the greatest speedup on user code.

# Code

In our example, we count lengths of Collatz sequences, computed via Elgot algebras.

import Control.RecursioncollatzLength :: Integral a => a -> a
collatzLength = elgot algebra elgotCoalgebra
where algebra Nil        = 0
algebra (Cons _ x) = x + 1
elgotCoalgebra 1 = Left 1
elgotCoalgebra n
| n mod 2 == 0 = Right $Cons n (div n 2) | otherwise = Right$ Cons n (3 * n + 1)

And here is the same function implemented in ATS:

staload "\$PATSHOMELOCS/either-0.2.4/either.sats"datatype list0f(a: t@ype, x: t@ype+) =
| list0_consf of (a, x)
| list0_nilf of ()abstype functor_type(a: t@ype, x: t@ype+) = ptrtypedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef elgot_coalgebra(a: t@ype, b: t@ype, x: t@ype) = x -<cloref1> either(b, functor(a,x))fun {a:t@ype}{b:t@ype}{x:t@ype} elgot(f : algebra(x, a), g : elgot_coalgebra(x, a, b), x : b) : a =
either_(lam x0 =<cloref1> x0, lam x0 =<cloref1> f(map(lam y0 => elgot(f, g, y0), x0)), g(x))absimpl functor_type(a, x) = list0f(a, x)implement {a}{b}{t} map (f, x) =
case+ x of
| list0_consf (x, xs) => list0_consf(x, f(xs))
| list0_nilf() => list0_nilffn collatz_length(n : intGte(0)) : int =
let
fn alg(x : list0f(int, int)) :<> int =
case+ x of
| list0_nilf() => 0
| list0_consf (_, x) => 1 + xfn elgot_coalg(n : int) :<> either(int, list0f(int,int)) =
case n of
| 1 => left(1)
| n => if n % 2 = 0 then
right(list0_consf(n, n / 2))
else
right(list0_consf(n, 3 * n + 1))  in
elgot(lam x0 =<cloref1> alg(x0), lam x0 =<cloref1> elgot_coalg(x0), n)
end

This is essentially the same thing, but we implement some library functionality ourselves (functors, recursion schemes).

# Results

We could not benchmark elgot_length(837799) in ATS because it segfaulted.

Input Language Time
2223 Haskell (-O0) 20.45 μs
2223 ATS (gcc) 19.57 μs
2223 ATS (clang) 18.59 μs
2223 ATS (icc) 18.83 μs
2223 ATS (tcc) 38.97 μs
10971 Haskell (-O0) 30.27 μs
10971 ATS (gcc) 35.68 μs
10971 ATS (clang) 25.36 μs
10971 ATS (icc) 34.76 μs
10971 ATS (tcc) 62.02 μs
106239 Haskell (-O0) 40.25 μs
106239 ATS (gcc) 35.32 μs
106239 ATS (clang) 34.57 μs
106239 ATS (icc) 35.46 μs
106239 ATS (tcc) 66.62 μs

You can find the Haskell and ATS benchmarking setups here.

# Conclusions

Because ATS performs no optimizations aside optimizing tail-recursion, this is in effect a comparison of GCC and GHC (ATS compiles to C).

It is widely accepted that GCC's optimizations are best-in-class; GCC regularly appears at the top of the computer language benchmarks game. However, this provides a compelling demonstration that the language benchmarks game fails to cover functional techniques.

Of course, it is easy to write a sensible implementation of collatz_length in ATS or C; what this demonstrates is that higher-level techniques that are feasible via GHC become infeasible when using GCC: the Haskell code is around 56 times faster; compiling the Haskell code with -O0 confirms that it is in fact GHC's optimization passes that speed it up.

Thus, though optimizations may not significantly improve the performance of adroitly written code, higher-level optimizations are essential to making functional techniques feasible.