A question in compiler design is: what optimizations should a given compiler perform? Optimizations for functional languages in particular are not well-known; it is not obvious which optimizations will provide the greatest speedup on user code.

# Code

In our example, we count lengths of Collatz sequences, computed via Elgot algebras.

Here is the Haskell function:

import Control.Recursion

collatzLength :: Integral a => a -> a
collatzLength = elgot algebra elgotCoalgebra
where algebra Nil = 0
algebra (Cons _ x) = x + 1
elgotCoalgebra 1 = Left 1
elgotCoalgebra n
| n `mod` 2 == 0 = Right $ Cons n (div n 2)
| otherwise = Right $ Cons n (3 * n + 1)

And here is the same function implemented in ATS:

staload "$PATSHOMELOCS/either-0.2.4/either.sats"

datatype list0f(a: t@ype, x: t@ype+) =
| list0_consf of (a, x)
| list0_nilf of ()

abstype functor_type(a: t@ype, x: t@ype+) = ptr

typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)

extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)

typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef elgot_coalgebra(a: t@ype, b: t@ype, x: t@ype) = x -<cloref1> either(b, functor(a,x))

fun {a:t@ype}{b:t@ype}{x:t@ype} elgot(f : algebra(x, a), g : elgot_coalgebra(x, a, b), x : b) : a =
either_(lam x0 =<cloref1> x0, lam x0 =<cloref1> f(map(lam y0 => elgot(f, g, y0), x0)), g(x))

absimpl functor_type(a, x) = list0f(a, x)

implement {a}{b}{t} map (f, x) =
case+ x of
| list0_consf (x, xs) => list0_consf(x, f(xs))
| list0_nilf() => list0_nilf

fn collatz_length(n : intGte(0)) : int =
let
fn alg(x : list0f(int, int)) :<> int =
case+ x of
| list0_nilf() => 0
| list0_consf (_, x) => 1 + x

```
fn elgot_coalg(n : int) :<> either(int, list0f(int,int)) =
case n of
| 1 => left(1)
| n => if n % 2 = 0 then
right(list0_consf(n, n / 2))
else
right(list0_consf(n, 3 * n + 1))
```

in
elgot(lam x0 =<cloref1> alg(x0), lam x0 =<cloref1> elgot_coalg(x0), n)
end

This is essentially the same thing, but we implement some library functionality ourselves (functors, recursion schemes).

# Results

We could not benchmark `elgot_length(837799)`

in ATS because it segfaulted.

Input | Language | Time |
---|---|---|

2223 | Haskell | 286.5 ns |

2223 | Haskell (LLVM) | 293.9 ns |

2223 | Haskell (`-O0` ) |
20.45 μs |

2223 | ATS (gcc) | 19.57 μs |

2223 | ATS (clang) | 18.59 μs |

2223 | ATS (icc) | 18.83 μs |

2223 | ATS (tcc) | 38.97 μs |

10971 | Haskell | 427.5 ns |

10971 | Haskell (LLVM) | 443.3 ns |

10971 | Haskell (`-O0` ) |
30.27 μs |

10971 | ATS (gcc) | 35.68 μs |

10971 | ATS (clang) | 25.36 μs |

10971 | ATS (icc) | 34.76 μs |

10971 | ATS (tcc) | 62.02 μs |

106239 | Haskell | 571.4 ns |

106239 | Haskell (LLVM) | 685.1 ns |

106239 | Haskell (`-O0` ) |
40.25 μs |

106239 | ATS (gcc) | 35.32 μs |

106239 | ATS (clang) | 34.57 μs |

106239 | ATS (icc) | 35.46 μs |

106239 | ATS (tcc) | 66.62 μs |

You can find the Haskell and ATS benchmarking setups here.

# Conclusions

Because ATS performs no optimizations aside optimizing tail-recursion, this is in effect a comparison of GCC and GHC (ATS compiles to C).

It is widely accepted that GCC's optimizations are best-in-class; GCC regularly appears at the top of the computer language benchmarks game. However, this provides a compelling demonstration that the language benchmarks game fails to cover functional techniques.

Of course, it is easy to write a sensible implementation of `collatz_length`

in
ATS or C; what this demonstrates is that higher-level techniques that are
feasible via GHC become infeasible when using GCC: the Haskell code is around 56
times faster; compiling the Haskell code with `-O0`

confirms that it is
in fact GHC's optimization passes that speed it up.

Thus, though optimizations may not significantly improve the performance of adroitly written code, higher-level optimizations are essential to making functional techniques feasible.