Compiler Optimizations for Functional Languages

by Vanessa McHale | 2019-04-07 10:32

A question in compiler design is: what optimizations should a given compiler perform? Optimizations for functional languages in particular are not well-known; it is not obvious which optimizations will provide the greatest speedup on user code.

Code

In our example, we count lengths of Collatz sequences, computed via Elgot algebras.

Here is the Haskell function:

import Control.Recursion
collatzLength :: Integral a => a -> a
collatzLength = elgot algebra elgotCoalgebra
    where algebra Nil        = 0
          algebra (Cons _ x) = x + 1
          elgotCoalgebra 1 = Left 1
          elgotCoalgebra n
             | n `mod` 2 == 0 = Right $ Cons n (div n 2)
             | otherwise = Right $ Cons n (3 * n + 1)

And here is the same function implemented in ATS:

staload "$PATSHOMELOCS/either-0.2.4/either.sats"
datatype list0f(a: t@ype, x: t@ype+) =
  | list0_consf of (a, x)
  | list0_nilf of ()
abstype functor_type(a: t@ype, x: t@ype+) = ptr
typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)
extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)
typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef elgot_coalgebra(a: t@ype, b: t@ype, x: t@ype) = x -<cloref1> either(b, functor(a,x))
fun {a:t@ype}{b:t@ype}{x:t@ype} elgot(f : algebra(x, a), g : elgot_coalgebra(x, a, b), x : b) : a =
  either_(lam x0 =<cloref1> x0, lam x0 =<cloref1> f(map(lam y0 => elgot(f, g, y0), x0)), g(x))
absimpl functor_type(a, x) = list0f(a, x)
implement {a}{b}{t} map (f, x) =
  case+ x of
    | list0_consf (x, xs) => list0_consf(x, f(xs))
    | list0_nilf() => list0_nilf
fn collatz_length(n : intGte(0)) : int =
  let
    fn alg(x : list0f(int, int)) :<> int =
      case+ x of
        | list0_nilf() => 0
        | list0_consf (_, x) => 1 + x
fn elgot_coalg(n : int) :<> either(int, list0f(int,int)) =
  case n of
    | 1 => left(1)
    | n => if n % 2 = 0 then
      right(list0_consf(n, n / 2))
    else
      right(list0_consf(n, 3 * n + 1))
  in
    elgot(lam x0 =<cloref1> alg(x0), lam x0 =<cloref1> elgot_coalg(x0), n)
  end

This is essentially the same thing, but we implement some library functionality ourselves (functors, recursion schemes).

Results

We could not benchmark elgot_length(837799) in ATS because it segfaulted.

Input	Language	Time
2223	Haskell	286.5 ns
2223	Haskell (LLVM)	293.9 ns
2223	Haskell (`-O0`)	20.45 μs
2223	ATS (gcc)	19.57 μs
2223	ATS (clang)	18.59 μs
2223	ATS (icc)	18.83 μs
2223	ATS (tcc)	38.97 μs
10971	Haskell	427.5 ns
10971	Haskell (LLVM)	443.3 ns
10971	Haskell (`-O0`)	30.27 μs
10971	ATS (gcc)	35.68 μs
10971	ATS (clang)	25.36 μs
10971	ATS (icc)	34.76 μs
10971	ATS (tcc)	62.02 μs
106239	Haskell	571.4 ns
106239	Haskell (LLVM)	685.1 ns
106239	Haskell (`-O0`)	40.25 μs
106239	ATS (gcc)	35.32 μs
106239	ATS (clang)	34.57 μs
106239	ATS (icc)	35.46 μs
106239	ATS (tcc)	66.62 μs

You can find the Haskell and ATS benchmarking setups here.

Conclusions

Because ATS performs no optimizations aside optimizing tail-recursion, this is in effect a comparison of GCC and GHC (ATS compiles to C).

It is widely accepted that GCC's optimizations are best-in-class; GCC regularly appears at the top of the computer language benchmarks game. However, this provides a compelling demonstration that the language benchmarks game fails to cover functional techniques.

Of course, it is easy to write a sensible implementation of collatz_length in ATS or C; what this demonstrates is that higher-level techniques that are feasible via GHC become infeasible when using GCC: the Haskell code is around 56 times faster; compiling the Haskell code with -O0 confirms that it is in fact GHC's optimization passes that speed it up.

Thus, though optimizations may not significantly improve the performance of adroitly written code, higher-level optimizations are essential to making functional techniques feasible.

return

blog

Compiler Optimizations for Functional Languages

Code

Results

Conclusions