Here I would like to present benchmarks associated with my past post comparing different methods of summing the first $$n$$ numbers. In each case, we benchmarked sum(200), that is, $$\sum_{i=1}^{200} i$$.

Unfortunately, gcc and clang optimized away the actual benchmark, so I had to use CompCert to get accurate results; as such some of the examples may be slower than one would otherwise expect.

# Implementations

## Recursion

fun sum {m:nat} .<m>. (x : int(m)) :<> int =
case+ x of
| 0 => 0
| n =>> x + sum(x - 1)

Time: 409.0 ns

## Recursion + Call-by-Reference

fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
fun loop {n:nat} .<n>. (y : int(n), res : &int? >> int) :<!wrt> void =
case+ y of
| 0 => res := 0
| n =>> (loop(y - 1, res) ; res := res + y)
var res: int
val () = loop(x, res)
in
res
end

Time: 620.4 ns

## C FFI

%{
int sum(int n) {
int i, sum = 0;
for (i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
%}extern
fn sum(intGte(0)) :<> intGte(0) =
"mac#"

Time: 141.2 ns

## For Loop

fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
var res: int = 0
var i: int
val () = for* { i : nat | i <= m } .<i>. (i : int(i)) =>
(i := x ; i > 0 ; i := i - 1)
(res := res + i)
in
res
end

Time: 250.1 ns

## Recursion + Stack-Allocated Variable

fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
fun loop {n:nat}{l:addr} .<n>. (pf : !int @ l | n : int(n), res : ptr(l)) :<!wrt> void =
case+ n of
| 0 => ()
| n =>> let
val () = !res := !res + n
in
loop(pf | n - 1, res)
end
var ret: int with pf = 0
val () = loop(pf | x, addr@ret)
in
ret
end

Time: 77.00 ns

## Linear Streams

fn sum {m:nat}(x : int(m)) :<!ntm> int =
let
fun build_stream {i:nat} .<i>. (y : int(i)) :<> stream_vt(int) =
case+ y of
| 0 => $ldelay(stream_vt_nil) | n =>>$ldelay(stream_vt_cons(y, build_stream(y - 1)))
fun fold_stream(xs : stream_vt(int)) :<!ntm> int =
case+ !xs of
| ~stream_vt_nil() => 0
| ~stream_vt_cons (y, ys) => y + fold_stream(ys)
in
fold_stream(build_stream(x))
end

Time: 12.01 μs

## Hylomorphism

datatype list0f(a: t@ype, x: t@ype+) =
| list0_consf of (a, x)
| list0_nilf of ()abstype functor_type(a: t@ype, x: t@ype+) = ptrtypedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef coalgebra(a: t@ype, x: t@ype) = x -<cloref1> functor(a, x)fun {b:t@ype}{a:t@ype}{t:t@ype} hylo(f : algebra(t, b), g : coalgebra(t, a), x : a) : b =
f(map(lam x0 => hylo(f, g, x0), g(x)))absimpl functor_type(a, x) = list0f(a, x)implement {a}{b}{t} map (f, x) =
case+ x of
| list0_consf (x, xs) => list0_consf(x, f(xs))
| list0_nilf() => list0_nilffn sum (x : intGte(0)) : int =
let
fn alg(xs : list0f(int, int)) : int =
case+ xs of
| list0_consf (x, xs) => x + xs
| list0_nilf() => 0
fn coalg(x : int) : list0f(int, int) =
case+ x of
| 0 => list0_nilf
| x => list0_consf(x, x - 1)
in
hylo(lam x0 => alg(x0), lam x1 => coalg(x1), x)
end

Time: 8.496 μs

You can view the full benchmarking apparatus here.

# Conclusions

## C vs. ATS

First, note that (somewhat surprisingly) the fastest example was ATS, not C. The C presented here is by no means unidiomatic; one can find similar code on Stack Overflow.

Some of this may be due to the fact that CompCert is not as aggressive as gcc at certain optimizations (here, reversing the loop), but I think this nonetheless presents a good argument that C is not a high-level language.

## Hylomorphism vs. Stream

Though the implementation using a hylomorphism was quite slow, it still outperformed the implementation using a linear stream, which was a little surprising.

In general, I think this shows the importance of an optimizing compiler when doing functional programming. Though ATS has advanced facilities for safe imperative programming, its lack of an optimizer can be problematic for programmers used to GHC.