Here I would like to present benchmarks associated with my past
post comparing different methods of
summing the first \( n \) numbers. In each case, we benchmarked sum(200)
,
that is, \( \sum_{i=1}^{200} i \).
Unfortunately, gcc
and clang
optimized away the actual benchmark, so I had
to use CompCert to get accurate results; as such some of the examples may
be slower than one would otherwise expect.
Implementations
Recursion
fun sum {m:nat} .<m>. (x : int(m)) :<> int =
case+ x of
| 0 => 0
| n =>> x + sum(x - 1)
Time: 409.0 ns
Recursion + Call-by-Reference
fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
fun loop {n:nat} .<n>. (y : int(n), res : &int? >> int) :<!wrt> void =
case+ y of
| 0 => res := 0
| n =>> (loop(y - 1, res) ; res := res + y)
var res: int
val () = loop(x, res)
in
res
end
Time: 620.4 ns
C FFI
%{
int sum(int n) {
int i, sum = 0;
for (i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
%}
extern
fn sum(intGte(0)) :<> intGte(0) =
"mac#"
Time: 141.2 ns
For Loop
fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
var res: int = 0
var i: int
val () = for* { i : nat | i <= m } .<i>. (i : int(i)) =>
(i := x ; i > 0 ; i := i - 1)
(res := res + i)
in
res
end
Time: 250.1 ns
Recursion + Stack-Allocated Variable
fn sum {m:nat}(x : int(m)) :<!wrt> int =
let
fun loop {n:nat}{l:addr} .<n>. (pf : !int @ l | n : int(n), res : ptr(l)) :<!wrt> void =
case+ n of
| 0 => ()
| n =>> let
val () = !res := !res + n
in
loop(pf | n - 1, res)
end
var ret: int with pf = 0
val () = loop(pf | x, addr@ret)
in
ret
end
Time: 77.00 ns
Linear Streams
fn sum {m:nat}(x : int(m)) :<!ntm> int =
let
fun build_stream {i:nat} .<i>. (y : int(i)) :<> stream_vt(int) =
case+ y of
| 0 => $ldelay(stream_vt_nil)
| n =>> $ldelay(stream_vt_cons(y, build_stream(y - 1)))
fun fold_stream(xs : stream_vt(int)) :<!ntm> int =
case+ !xs of
| ~stream_vt_nil() => 0
| ~stream_vt_cons (y, ys) => y + fold_stream(ys)
in
fold_stream(build_stream(x))
end
Time: 12.01 μs
Hylomorphism
datatype list0f(a: t@ype, x: t@ype+) =
| list0_consf of (a, x)
| list0_nilf of ()
abstype functor_type(a: t@ype, x: t@ype+) = ptr
typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)
extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)
typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef coalgebra(a: t@ype, x: t@ype) = x -<cloref1> functor(a, x)
fun {b:t@ype}{a:t@ype}{t:t@ype} hylo(f : algebra(t, b), g : coalgebra(t, a), x : a) : b =
f(map(lam x0 => hylo(f, g, x0), g(x)))
absimpl functor_type(a, x) = list0f(a, x)
implement {a}{b}{t} map (f, x) =
case+ x of
| list0_consf (x, xs) => list0_consf(x, f(xs))
| list0_nilf() => list0_nilf
fn sum (x : intGte(0)) : int =
let
fn alg(xs : list0f(int, int)) : int =
case+ xs of
| list0_consf (x, xs) => x + xs
| list0_nilf() => 0
fn coalg(x : int) : list0f(int, int) =
case+ x of
| 0 => list0_nilf
| x => list0_consf(x, x - 1)
in
hylo(lam x0 => alg(x0), lam x1 => coalg(x1), x)
end
Time: 8.496 μs
You can view the full benchmarking apparatus here.
Conclusions
C vs. ATS
First, note that (somewhat surprisingly) the fastest example was ATS, not C. The C presented here is by no means unidiomatic; one can find similar code on Stack Overflow.
Some of this may be due to the fact that CompCert is not as aggressive as gcc
at certain optimizations (here, reversing the loop), but I think this
nonetheless presents
a good argument that C is not a high-level language.
Hylomorphism vs. Stream
Though the implementation using a hylomorphism was quite slow, it still outperformed the implementation using a linear stream, which was a little surprising.
In general, I think this shows the importance of an optimizing compiler when doing functional programming. Though ATS has advanced facilities for safe imperative programming, its lack of an optimizer can be problematic for programmers used to GHC.