As ATS remains a somewhat obscure language, I figured I'd write up a recent success and give some commentary on some advanced features of ATS.
Our task is fairly simple: we wish to implement the Levenshtein distance for two strings. This is not entirely easy in ATS, and in fact I had to ask on the mailing list when I encountered difficulties in getting performance on par with C.
Our first sample is the "moral" implementation. This is standard ATS with safe
array and string accesses, providing better safety than the equivalent C. Note
also that loop2
and inner_loop
have a termination metric and hence
termination is guaranteed at compile time.
fn levenshtein { m, n : nat }(s1 : string(m), s2 : string(n)) : int =
let
val s1_l: size_t(m) = length(s1)
val s2_l: size_t(n) = length(s2)
val column = arrayptr_make_uninitized(s1_l + 1)
val () = arrayptr_initize(column, s1_l + 1) where
{ implement array_initize$init (i, x) =
x := sz2i(i) }
val column = arrayptr_refize(column)
fun loop2 { i : nat | i > 0 && i <= n+1 } .<n-i+1>. (x : int(i)) : void =
if x <= sz2i(s2_l) then
{
val () = column[0] := x
val () = let
fun inner_loop { j : nat | j > 0 && j <= m+1 } .<m-j+1>. (y : int(j), last_diag : int) : void =
if y <= sz2i(s1_l) then
let
fn min_3(x : int, y : int, z : int) : int =
min(x, (min(y, z)))
fn bool2int(x : char, y : char) : int =
if x = y then
0
else
1
var old_diag = column[y]
val () = column[y] := min_3( column[y] + 1
, column[y - 1] + 1
, last_diag + bool2int(s1[y - 1], s2[x - 1])
)
in
inner_loop(y + 1, old_diag)
end
in
inner_loop(1, x - 1)
end
val () = loop2(x + 1)
}
val () = loop2(1)
in
column[s1_l]
end
The second sample is be the "immoral" implementation. As you might guess, it is faster than the previous attempt. It required even more help from the mailing list, but it is significantly faster than the C implementation.
extern
fun alloca {n:int}(bsz : size_t(n)) :<!wrt> [l:agz] (b0ytes(n) @ l | ptr(l)) =
"mac#"
extern
castfn arrayptr_alloca_encode : {a:vt@ype} {l:addr} {n:int} (array_v(INV(a), l, n) | ptr(l)) -<0> arrayptr(a, l, n)
extern
fun {a:vt@ype} array_ptr_alloca {n:int} (asz : size_t(n)) :<!wrt> [l:agz] (array_v(a?, l, n) | ptr(l))
implement {a} array_ptr_alloca {n} (asz) =
let
val [l:addr](pf | p) = alloca(asz * sizeof<a>)
prval pf = __assert(pf) where
{ extern
praxi __assert(pf : b0ytes(n*sizeof(a)) @ l) : array_v(a?, l, n) }
in
(pf | p)
end
fn levenshtein { m, n : nat }(s1 : string(m), s2 : string(n)) : int =
let
val s1_l
: size_t(m) = length(s1)
val s2_l: size_t(n) = length(s2)
fn array_initialize() : arrayref(int, m+1) =
let
val (pf_arr | p_arr) = array_ptr_alloca<int>(succ(s1_l))
var p1_arr = ptr1_succ<int>(p_arr)
prval (pf1_at, pf_arr) = array_v_uncons{int?}(pf_arr)
val () = ptr_set<int>(pf1_at | p_arr, 0)
var i: int = 0
prval [l:addr]EQADDR () = eqaddr_make_ptr(p1_arr)
var p = p1_arr
prvar pf0 = array_v_nil{int}()
prvar pf1 = pf_arr
val () = while* { i : nat | i <= m } .<m-i>. ( i : int(i)
, p : ptr(l+i*sizeof(int))
, pf0 : array_v(int, l, i)
, pf1 : array_v(int?, l+i*sizeof(int), m-i)
) : ( pf0 : array_v(int, l, m)
, pf1 : array_v(int?, l+i*sizeof(int), 0)
) =>
(i < sz2i(s1_l))
{
prval (pf_at, pf1_res) = array_v_uncons{int?}(pf1)
prval () = pf1 := pf1_res
var c = g0ofg1(i)
val () = ptr_set<int>(pf_at | p, c)
val () = p := ptr1_succ<int>(p)
prval () = pf0 := array_v_extend{int}(pf0, pf_at)
val () = i := i + 1
}
prval () = pf_arr := pf0
prval () = array_v_unnil{int?}(pf1)
prval pf_arr = array_v_cons{int}(pf1_at, pf_arr)
var res = arrayptr_alloca_encode(pf_arr | p_arr)
in
arrayptr_refize(res)
end
val column = array_initialize()
fun loop2 { i : nat | i > 0 && i <= n+1 } .<n-i+1>. (x : int(i)) : void =
if x <= sz2i(s2_l) then
{
val () = column[0] := x
val () = let
fun inner_loop { j : nat | j > 0 && j <= m+1 } .<m-j+1>. (y : int(j), last_diag : int) : void =
if y <= sz2i(s1_l) then
let
fn min_3(x : int, y : int, z : int) : int =
min(x, (min(y, z)))
fn bool2int(x : char, y : char) : int =
if x = y then
0
else
1
var old_diag = column[y]
val () = column[y] := min_3( column[y] + 1
, column[y - 1] + 1
, last_diag + bool2int(s1[y - 1], s2[x - 1])
)
in
inner_loop(y + 1, old_diag)
end
in
inner_loop(1, x - 1)
end
val () = loop2(x + 1)
}
val () = loop2(1)
in
column[s1_l]
end
There are several things to note here. The first is the use of alloca
in place
of malloc
- this segfaults when compiled with icc
but seems to be fine with
gcc
and clang
0.
The second is the while*
loop, complete with a termination metric. ATS
allows us to use while
loops (to squeeze out the last bit of
performance) and still guarantee termination as we would with a tail-recursive
function.
This code makes heavy use of proof-level array manipulations.
ATS gives the ability to (safely) perform some of
the pointer arithmetic that C programmers regularly rely on, in this case the
ability to form a sub-array without any copying with array_v_uncons
. This is a deep
topic on its own, so I for now will simply say that this is what's responsible for the
prvar
and prval
s.
Acknowledgements
Thanks to Hongwei Xi and Artyom Shalkhakov for substantial performance improvements to my original code and help understanding ATS.
0: I also managed to get a bus error withicc