I stumbled across a comment by András Kovács on compiler performance, which brought up some of the difficulties writing adequate compilers in lower-level languages such as C++ or Rust.
Here, I'd like to report my experience with my Kempe toy compiler, which is written in Haskell.
I benchmarked compilation of the splitmix pseudorandom number generator:
typedef unsigned long int __uint64_t;
typedef __uint64_t uint64_t;
// modified to have ""multiple return"" (destination-passing style)
uint64_t next(uint64_t x, uint64_t* y) {
uint64_t z = (x += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
*y = x;
return z ^ (z >> 31);
}
First, the C compilers (alongside kc
, the Kempe compiler):
benchmarking bench/gcc-8 -fsyntax-only benchmarks/splitmix64.c
time 11.16 ms (9.301 ms .. 13.16 ms)
0.796 R² (0.644 R² .. 0.902 R²)
mean 11.37 ms (10.04 ms .. 12.50 ms)
std dev 3.028 ms (2.518 ms .. 3.741 ms)
variance introduced by outliers: 89% (severely inflated)
benchmarking bench/clang-11 -fsyntax-only benchmarks/splitmix64.c
time 14.26 ms (14.03 ms .. 14.53 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 14.34 ms (14.25 ms .. 14.49 ms)
std dev 285.7 μs (198.2 μs .. 404.4 μs)
benchmarking bench/kc typecheck examples/splitmix.kmp
time 4.915 ms (4.687 ms .. 5.105 ms)
0.964 R² (0.904 R² .. 0.993 R²)
mean 4.597 ms (4.321 ms .. 4.721 ms)
std dev 537.5 μs (210.8 μs .. 1.013 ms)
variance introduced by outliers: 71% (severely inflated)
benchmarking bench/icc -O0 -c benchmarks/splitmix64.c
time 24.55 ms (20.47 ms .. 28.48 ms)
0.902 R² (0.786 R² .. 0.968 R²)
mean 31.23 ms (28.70 ms .. 35.11 ms)
std dev 6.442 ms (4.370 ms .. 10.15 ms)
variance introduced by outliers: 75% (severely inflated)
benchmarking bench/gcc-8 -O0 -c benchmarks/splitmix64.c
time 20.61 ms (17.52 ms .. 24.59 ms)
0.903 R² (0.833 R² .. 0.970 R²)
mean 19.76 ms (18.48 ms .. 21.07 ms)
std dev 3.250 ms (2.695 ms .. 4.119 ms)
variance introduced by outliers: 70% (severely inflated)
benchmarking bench/clang-11 -O0 -c benchmarks/splitmix64.c
time 16.24 ms (16.12 ms .. 16.36 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 16.25 ms (16.19 ms .. 16.33 ms)
std dev 168.8 μs (125.5 μs .. 228.4 μs)
benchmarking bench/kc examples/splitmix.kmp splitmix.o
time 9.441 ms (6.886 ms .. 12.19 ms)
0.711 R² (0.521 R² .. 0.884 R²)
mean 10.18 ms (8.969 ms .. 11.18 ms)
std dev 3.025 ms (2.601 ms .. 3.538 ms)
variance introduced by outliers: 93% (severely inflated)
On the frontend, the C compilers are unimpressive. Kempe does not have operator
overloading, but that is not enough to explain why gcc
and clang
take 2-3
times as long. icc
is questionable; icc -fsyntax-only
isn't any faster
than icc -O0
.
On the backend, kc
holds its own: it's clearly faster than gcc
, icc
, and clang
;
the consolation is that the C compilers hopefully produce better
code. In any case it seems plausible that a compiler written in a Haskell could
compete with compilers written in C++.
benchmarking bench/rustc --crate-type=lib --emit=dep-info,metadata benchmarks/splitmix.rs
time 77.11 ms (76.79 ms .. 77.53 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 77.01 ms (76.82 ms .. 77.26 ms)
std dev 370.4 μs (236.4 μs .. 540.1 μs)
benchmarking bench/~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib -Z no-codegen benchmarks/splitmix.rs
time 23.78 ms (20.66 ms .. 26.90 ms)
0.914 R² (0.831 R² .. 0.974 R²)
mean 23.52 ms (22.12 ms .. 25.99 ms)
std dev 4.073 ms (2.490 ms .. 5.833 ms)
variance introduced by outliers: 72% (severely inflated)
benchmarking bench/rustc --crate-type=lib benchmarks/splitmix.rs
time 80.87 ms (79.95 ms .. 81.43 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 81.10 ms (80.76 ms .. 82.41 ms)
std dev 984.7 μs (249.9 μs .. 1.660 ms)
benchmarking bench/rustc --crate-type=cdylib benchmarks/splitmix.rs
time 157.2 ms (156.4 ms .. 158.9 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 157.6 ms (157.0 ms .. 158.6 ms)
std dev 1.229 ms (704.6 μs .. 1.798 ms)
variance introduced by outliers: 12% (moderately inflated)
rustc
presumably does more work than the C compilers, but we do
see that creating dynamic libraries (--crate-type=cdylib
) takes too long.
Also, for some reason it's impossible to only do typechecking without nightly.
Conclusion
The Kempe toy compiler is around 4000 lines, and depends mainly on the containers library, i.e. off-the-shelf data structures.
There's a lot of low-hanging fruit for compilers written in C++. My hope would be that we see more use of functional languages in the field of compilers, and more demand for performant compilers.
Appendix
Here is the Kempe source code:
; given a seed, return a random value and the new seed
next : Word -- Word Word
=: [ 0x9e3779b97f4a7c15u +~ dup
dup 30i8 >>~ xoru 0xbf58476d1ce4e5b9u *~
dup 27i8 >>~ xoru 0x94d049bb133111ebu *~
dup 31i8 >>~ xoru
]
%foreign kabi next
and the Rust source code:
#[no_mangle]
pub extern "C" fn next(x: u64) -> (u64, u64) {
let next_seed = x + 0x9e3779b97f4a7c15;
let mut z = next_seed;
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return (z ^ (z >> 31), next_seed);
}