A question in compiler design is: what optimizations should a given compiler perform? Optimizations for functional languages in particular are not well-known; it is not obvious which optimizations will provide the greatest speedup on user code.

Code

In our example, we count lengths of Collatz sequences, computed via Elgot algebras.

Here is the Haskell function:

import Control.Recursion

collatzLength :: Integral a => a -> a collatzLength = elgot algebra elgotCoalgebra where algebra Nil = 0 algebra (Cons _ x) = x + 1 elgotCoalgebra 1 = Left 1 elgotCoalgebra n | n `mod` 2 == 0 = Right $ Cons n (div n 2) | otherwise = Right $ Cons n (3 * n + 1)

And here is the same function implemented in ATS:

staload "$PATSHOMELOCS/either-0.2.4/either.sats"

datatype list0f(a: t@ype, x: t@ype+) = | list0_consf of (a, x) | list0_nilf of ()

abstype functor_type(a: t@ype, x: t@ype+) = ptr

typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)

extern fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)

typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x typedef elgot_coalgebra(a: t@ype, b: t@ype, x: t@ype) = x -<cloref1> either(b, functor(a,x))

fun {a:t@ype}{b:t@ype}{x:t@ype} elgot(f : algebra(x, a), g : elgot_coalgebra(x, a, b), x : b) : a = either_(lam x0 =<cloref1> x0, lam x0 =<cloref1> f(map(lam y0 => elgot(f, g, y0), x0)), g(x))

absimpl functor_type(a, x) = list0f(a, x)

implement {a}{b}{t} map (f, x) = case+ x of | list0_consf (x, xs) => list0_consf(x, f(xs)) | list0_nilf() => list0_nilf

fn collatz_length(n : intGte(0)) : int = let fn alg(x : list0f(int, int)) :<> int = case+ x of | list0_nilf() => 0 | list0_consf (_, x) => 1 + x

fn elgot_coalg(n : int) :<> either(int, list0f(int,int)) =
  case n of
    | 1 => left(1)
    | n => if n % 2 = 0 then
      right(list0_consf(n, n / 2))
    else
      right(list0_consf(n, 3 * n + 1))

in elgot(lam x0 =<cloref1> alg(x0), lam x0 =<cloref1> elgot_coalg(x0), n) end

This is essentially the same thing, but we implement some library functionality ourselves (functors, recursion schemes).

Results

We could not benchmark elgot_length(837799) in ATS because it segfaulted.

Input Language Time
2223 Haskell 286.5 ns
2223 Haskell (LLVM) 293.9 ns
2223 Haskell (-O0) 20.45 μs
2223 ATS (gcc) 19.57 μs
2223 ATS (clang) 18.59 μs
2223 ATS (icc) 18.83 μs
2223 ATS (tcc) 38.97 μs
10971 Haskell 427.5 ns
10971 Haskell (LLVM) 443.3 ns
10971 Haskell (-O0) 30.27 μs
10971 ATS (gcc) 35.68 μs
10971 ATS (clang) 25.36 μs
10971 ATS (icc) 34.76 μs
10971 ATS (tcc) 62.02 μs
106239 Haskell 571.4 ns
106239 Haskell (LLVM) 685.1 ns
106239 Haskell (-O0) 40.25 μs
106239 ATS (gcc) 35.32 μs
106239 ATS (clang) 34.57 μs
106239 ATS (icc) 35.46 μs
106239 ATS (tcc) 66.62 μs

You can find the Haskell and ATS benchmarking setups here.

Conclusions

Because ATS performs no optimizations aside optimizing tail-recursion, this is in effect a comparison of GCC and GHC (ATS compiles to C).

It is widely accepted that GCC's optimizations are best-in-class; GCC regularly appears at the top of the computer language benchmarks game. However, this provides a compelling demonstration that the language benchmarks game fails to cover functional techniques.

Of course, it is easy to write a sensible implementation of collatz_length in ATS or C; what this demonstrates is that higher-level techniques that are feasible via GHC become infeasible when using GCC: the Haskell code is around 56 times faster; compiling the Haskell code with -O0 confirms that it is in fact GHC's optimization passes that speed it up.

Thus, though optimizations may not significantly improve the performance of adroitly written code, higher-level optimizations are essential to making functional techniques feasible.