Awk, Mawk, Gawk Performance

by Vanessa McHale | 2022-01-25 14:31

Suppose one would like to process compiler output to include span; vim uses awk (mve.awk) to do this.

There is a performant script to do so in awk:

BEGIN { FS="\|" }
/\|/ {
    p=match($2, "\^+")
    if(p) {
        colstart=RSTART-1
        col=colstart+RLENGTH
        printf("%d-%d\n", colstart, col)
    }
}

and the equivalent in Jacinda:

:set fs:=/\|/;
fn printSpan(str) :=
  (sprintf '%i-%i')"(match str /\^+/);
printSpan:?{% /\|/}{`2}

For benchmarking, I replicated compiler output 10000 times with perl -0777pe '$$_=$$_ x 10000' ...

benchmarking bench/ja run examples/span2.jac -i bench/data/span.txt
time                 53.73 ms   (53.55 ms .. 53.97 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 53.71 ms   (53.58 ms .. 53.82 ms)
std dev              232.8 μs   (152.9 μs .. 321.4 μs)
benchmarking bench/original-awk -f examples/span2.awk bench/data/span.txt
time                 69.90 ms   (69.78 ms .. 70.03 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 69.91 ms   (69.87 ms .. 70.00 ms)
std dev              107.7 μs   (51.12 μs .. 173.9 μs)
benchmarking bench/gawk -f examples/span2.awk bench/data/span.txt
time                 86.10 ms   (85.86 ms .. 86.31 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 86.17 ms   (86.06 ms .. 86.39 ms)
std dev              258.3 μs   (123.0 μs .. 393.5 μs)
benchmarking bench/mawk -f examples/span2.awk bench/data/span.txt
time                 18.03 ms   (18.00 ms .. 18.06 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 18.00 ms   (17.97 ms .. 18.03 ms)
std dev              78.42 μs   (47.70 μs .. 128.8 μs)
benchmarking bench/busybox awk -f examples/span2.awk bench/data/span.txt
time                 112.5 ms   (110.8 ms .. 114.5 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 112.6 ms   (112.0 ms .. 114.0 ms)
std dev              1.342 ms   (508.4 μs .. 2.101 ms)
variance introduced by outliers: 11% (moderately inflated)

One must credit regex for Jacinda's performance, but in any case it stands toe-to-toe with several prominent awk implementations.

return

blog

Awk, Mawk, Gawk Performance