Performance Implications of FP in ATS

by Vanessa McHale | 2018-08-21 17:40

Here I would like to present benchmarks associated with my past post comparing different methods of summing the first \( n \) numbers. In each case, we benchmarked sum(200), that is, \( \sum_{i=1}^{200} i \).

Unfortunately, gcc and clang optimized away the actual benchmark, so I had to use CompCert to get accurate results; as such some of the examples may be slower than one would otherwise expect.

Implementations

Recursion

fun sum {m:nat} .<m>. (x : int(m)) :<> int =
  case+ x of
    | 0 => 0
    | n =>> x + sum(x - 1)

Time: 409.0 ns

Recursion + Call-by-Reference

fn sum {m:nat}(x : int(m)) :<!wrt> int =
  let
    fun loop {n:nat} .<n>. (y : int(n), res : &int? >> int) :<!wrt> void =
      case+ y of
        | 0 => res := 0
        | n =>> (loop(y - 1, res) ; res := res + y)
    var res: int
    val () = loop(x, res)
  in
    res
  end

Time: 620.4 ns

C FFI

%{
int sum(int n) {
    int i, sum = 0;
    for (i = 1; i <= n; i++) {
        sum += i;
    }
    return sum;
}
%}
extern
fn sum(intGte(0)) :<> intGte(0) =
  "mac#"

Time: 141.2 ns

For Loop

fn sum {m:nat}(x : int(m)) :<!wrt> int =
  let
    var res: int = 0
    var i: int
    val () = for* { i : nat | i <= m } .<i>. (i : int(i)) =>
        (i := x ; i > 0 ; i := i - 1)
        (res := res + i)
  in
    res
  end

Time: 250.1 ns

Recursion + Stack-Allocated Variable

fn sum {m:nat}(x : int(m)) :<!wrt> int =
  let
    fun loop {n:nat}{l:addr} .<n>. (pf : !int @ l | n : int(n), res : ptr(l)) :<!wrt> void =
      case+ n of
        | 0 => ()
        | n =>> let
          val () = !res := !res + n
        in
          loop(pf | n - 1, res)
        end
    var ret: int with pf = 0
    val () = loop(pf | x, addr@ret)
  in
    ret
  end

Time: 77.00 ns

Linear Streams

fn sum {m:nat}(x : int(m)) :<!ntm> int =
  let
    fun build_stream {i:nat} .<i>. (y : int(i)) :<> stream_vt(int) =
      case+ y of
        | 0 => $ldelay(stream_vt_nil)
        | n =>> $ldelay(stream_vt_cons(y, build_stream(y - 1)))
    fun fold_stream(xs : stream_vt(int)) :<!ntm> int =
      case+ !xs of
        | ~stream_vt_nil() => 0
        | ~stream_vt_cons (y, ys) => y + fold_stream(ys)
  in
    fold_stream(build_stream(x))
  end

Time: 12.01 μs

Hylomorphism

datatype list0f(a: t@ype, x: t@ype+) =
  | list0_consf of (a, x)
  | list0_nilf of ()
abstype functor_type(a: t@ype, x: t@ype+) = ptr
typedef functor(a: t@ype, x: t@ype+) = functor_type(a, x)
extern
fun {a:t@ype}{b:t@ype}{t:t@ype} map (a -<cloref1> b, functor(t,a)) : functor(t, b)
typedef algebra(a: t@ype, x: t@ype) = functor(a, x) -<cloref1> x
typedef coalgebra(a: t@ype, x: t@ype) = x -<cloref1> functor(a, x)
fun {b:t@ype}{a:t@ype}{t:t@ype} hylo(f : algebra(t, b), g : coalgebra(t, a), x : a) : b =
  f(map(lam x0 => hylo(f, g, x0), g(x)))
absimpl functor_type(a, x) = list0f(a, x)
implement {a}{b}{t} map (f, x) =
  case+ x of
    | list0_consf (x, xs) => list0_consf(x, f(xs))
    | list0_nilf() => list0_nilf
fn sum (x : intGte(0)) : int =
  let
    fn alg(xs : list0f(int, int)) : int =
      case+ xs of
        | list0_consf (x, xs) => x + xs
        | list0_nilf() => 0
    fn coalg(x : int) : list0f(int, int) =
      case+ x of
        | 0 => list0_nilf
        | x => list0_consf(x, x - 1)
  in
    hylo(lam x0 => alg(x0), lam x1 => coalg(x1), x)
  end

Time: 8.496 μs

You can view the full benchmarking apparatus here.

Conclusions

C vs. ATS

First, note that (somewhat surprisingly) the fastest example was ATS, not C. The C presented here is by no means unidiomatic; one can find similar code on Stack Overflow.

Some of this may be due to the fact that CompCert is not as aggressive as gcc at certain optimizations (here, reversing the loop), but I think this nonetheless presents a good argument that C is not a high-level language.

Hylomorphism vs. Stream

Though the implementation using a hylomorphism was quite slow, it still outperformed the implementation using a linear stream, which was a little surprising.

In general, I think this shows the importance of an optimizing compiler when doing functional programming. Though ATS has advanced facilities for safe imperative programming, its lack of an optimizer can be problematic for programmers used to GHC.

return

blog