A question in compiler design is: what optimizations should a given compiler perform? Optimizations for functional languages in particular are not well-known; it is not obvious which optimizations will provide the greatest speedup on user code.
Code
In our example, we count lengths of Collatz sequences, computed via Elgot algebras.
Here is the Haskell function:
import Control.Recursion
collatzLength :: Integral a => a -> a
collatzLength = elgot algebra elgotCoalgebra
where algebra Nil = 0
algebra (Cons _ x) = x + 1
elgotCoalgebra 1 = Left 1
elgotCoalgebra n
| n `mod` 2 == 0 = Right $ Cons n (div n 2)
| otherwise = Right $ Cons n (3 * n + 1)
And here is the same function implemented in ATS:
staload "$PATSHOMELOCS/either-0.2.4/either.sats"
datatype list0f(a: t@ype, x: t@ype+) =
| list0_consf of (a, x)
| list0_nilf of ()
abstype functor_type(a: t@ype, x: t@ype+) = ptr
typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)
extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)
typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef elgot_coalgebra(a: t@ype, b: t@ype, x: t@ype) = x -<cloref1> either(b, functor(a,x))
fun {a:t@ype}{b:t@ype}{x:t@ype} elgot(f : algebra(x, a), g : elgot_coalgebra(x, a, b), x : b) : a =
either_(lam x0 =<cloref1> x0, lam x0 =<cloref1> f(map(lam y0 => elgot(f, g, y0), x0)), g(x))
absimpl functor_type(a, x) = list0f(a, x)
implement {a}{b}{t} map (f, x) =
case+ x of
| list0_consf (x, xs) => list0_consf(x, f(xs))
| list0_nilf() => list0_nilf
fn collatz_length(n : intGte(0)) : int =
let
fn alg(x : list0f(int, int)) :<> int =
case+ x of
| list0_nilf() => 0
| list0_consf (_, x) => 1 + x
fn elgot_coalg(n : int) :<> either(int, list0f(int,int)) =
case n of
| 1 => left(1)
| n => if n % 2 = 0 then
right(list0_consf(n, n / 2))
else
right(list0_consf(n, 3 * n + 1))
in
elgot(lam x0 =<cloref1> alg(x0), lam x0 =<cloref1> elgot_coalg(x0), n)
end
This is essentially the same thing, but we implement some library functionality ourselves (functors, recursion schemes).
Results
We could not benchmark elgot_length(837799)
in ATS because it segfaulted.
Input | Language | Time |
---|---|---|
2223 | Haskell | 286.5 ns |
2223 | Haskell (LLVM) | 293.9 ns |
2223 | Haskell (-O0 ) |
20.45 μs |
2223 | ATS (gcc) | 19.57 μs |
2223 | ATS (clang) | 18.59 μs |
2223 | ATS (icc) | 18.83 μs |
2223 | ATS (tcc) | 38.97 μs |
10971 | Haskell | 427.5 ns |
10971 | Haskell (LLVM) | 443.3 ns |
10971 | Haskell (-O0 ) |
30.27 μs |
10971 | ATS (gcc) | 35.68 μs |
10971 | ATS (clang) | 25.36 μs |
10971 | ATS (icc) | 34.76 μs |
10971 | ATS (tcc) | 62.02 μs |
106239 | Haskell | 571.4 ns |
106239 | Haskell (LLVM) | 685.1 ns |
106239 | Haskell (-O0 ) |
40.25 μs |
106239 | ATS (gcc) | 35.32 μs |
106239 | ATS (clang) | 34.57 μs |
106239 | ATS (icc) | 35.46 μs |
106239 | ATS (tcc) | 66.62 μs |
You can find the Haskell and ATS benchmarking setups here.
Conclusions
Because ATS performs no optimizations aside optimizing tail-recursion, this is in effect a comparison of GCC and GHC (ATS compiles to C).
It is widely accepted that GCC's optimizations are best-in-class; GCC regularly appears at the top of the computer language benchmarks game. However, this provides a compelling demonstration that the language benchmarks game fails to cover functional techniques.
Of course, it is easy to write a sensible implementation of collatz_length
in
ATS or C; what this demonstrates is that higher-level techniques that are
feasible via GHC become infeasible when using GCC: the Haskell code is around 56
times faster; compiling the Haskell code with -O0
confirms that it is
in fact GHC's optimization passes that speed it up.
Thus, though optimizations may not significantly improve the performance of adroitly written code, higher-level optimizations are essential to making functional techniques feasible.