Here I would like to present benchmarks associated with my past post comparing different methods of summing the first \( n \) numbers. In each case, we benchmarked sum(200), that is, \( \sum_{i=1}^{200} i \).

Unfortunately, gcc and clang optimized away the actual benchmark, so I had to use CompCert to get accurate results; as such some of the examples may be slower than one would otherwise expect.

Implementations

Recursion

fun sum {m:nat} .<m>. (x : int(m)) :<> int = case+ x of | 0 => 0 | n =>> x + sum(x - 1)

Time: 409.0 ns

Recursion + Call-by-Reference

fn sum {m:nat}(x : int(m)) :<!wrt> int = let fun loop {n:nat} .<n>. (y : int(n), res : &int? >> int) :<!wrt> void = case+ y of | 0 => res := 0 | n =>> (loop(y - 1, res) ; res := res + y) var res: int val () = loop(x, res) in res end

Time: 620.4 ns

C FFI

%{ int sum(int n) { int i, sum = 0; for (i = 1; i <= n; i++) { sum += i; } return sum; } %}

extern fn sum(intGte(0)) :<> intGte(0) = "mac#"

Time: 141.2 ns

For Loop

fn sum {m:nat}(x : int(m)) :<!wrt> int = let var res: int = 0 var i: int val () = for* { i : nat | i <= m } .<i>. (i : int(i)) => (i := x ; i > 0 ; i := i - 1) (res := res + i) in res end

Time: 250.1 ns

Recursion + Stack-Allocated Variable

fn sum {m:nat}(x : int(m)) :<!wrt> int = let fun loop {n:nat}{l:addr} .<n>. (pf : !int @ l | n : int(n), res : ptr(l)) :<!wrt> void = case+ n of | 0 => () | n =>> let val () = !res := !res + n in loop(pf | n - 1, res) end var ret: int with pf = 0 val () = loop(pf | x, addr@ret) in ret end

Time: 77.00 ns

Linear Streams

fn sum {m:nat}(x : int(m)) :<!ntm> int = let fun build_stream {i:nat} .<i>. (y : int(i)) :<> stream_vt(int) = case+ y of | 0 => $ldelay(stream_vt_nil) | n =>> $ldelay(stream_vt_cons(y, build_stream(y - 1))) fun fold_stream(xs : stream_vt(int)) :<!ntm> int = case+ !xs of | ~stream_vt_nil() => 0 | ~stream_vt_cons (y, ys) => y + fold_stream(ys) in fold_stream(build_stream(x)) end

Time: 12.01 μs

Hylomorphism

datatype list0f(a: t@ype, x: t@ype+) = | list0_consf of (a, x) | list0_nilf of ()

abstype functor_type(a: t@ype, x: t@ype+) = ptr

typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)

extern fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)

typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x typedef coalgebra(a: t@ype, x: t@ype) = x -<cloref1> functor(a, x)

fun {b:t@ype}{a:t@ype}{t:t@ype} hylo(f : algebra(t, b), g : coalgebra(t, a), x : a) : b = f(map(lam x0 => hylo(f, g, x0), g(x)))

absimpl functor_type(a, x) = list0f(a, x)

implement {a}{b}{t} map (f, x) = case+ x of | list0_consf (x, xs) => list0_consf(x, f(xs)) | list0_nilf() => list0_nilf

fn sum (x : intGte(0)) : int = let fn alg(xs : list0f(int, int)) : int = case+ xs of | list0_consf (x, xs) => x + xs | list0_nilf() => 0 fn coalg(x : int) : list0f(int, int) = case+ x of | 0 => list0_nilf | x => list0_consf(x, x - 1) in hylo(lam x0 => alg(x0), lam x1 => coalg(x1), x) end

Time: 8.496 μs

You can view the full benchmarking apparatus here.

Conclusions

C vs. ATS

First, note that (somewhat surprisingly) the fastest example was ATS, not C. The C presented here is by no means unidiomatic; one can find similar code on Stack Overflow.

Some of this may be due to the fact that CompCert is not as aggressive as gcc at certain optimizations (here, reversing the loop), but I think this nonetheless presents a good argument that C is not a high-level language.

Hylomorphism vs. Stream

Though the implementation using a hylomorphism was quite slow, it still outperformed the implementation using a linear stream, which was a little surprising.

In general, I think this shows the importance of an optimizing compiler when doing functional programming. Though ATS has advanced facilities for safe imperative programming, its lack of an optimizer can be problematic for programmers used to GHC.