As you may know, I have been working on polyglot for some time now. It is now the second-most popular ATS project on github0 and it has reached a state of relative maturity.
With that in mind, I would like to make a pitch for polyglot
by proving the
following:
polyglot
is the fastest code-counting tool availablepolyglot
is the only fast tool that correctly disambiguates file extensions when they conflict
Ambiguous Extensions
Unfortunately, despite the many other source-counting tools tested, nearly every
one fell victim to the same bug: they have no mechanism to distinguish languages
that use the same file extension. These collisions do actually happen in
practice; .v
is used by Coq and Verilog.
polyglot
, linguist
, enry
,
and cloc
were the only tools to correctly handle this issue.
Performance
Our first benchmark is against the rust
source repo.
Tool | Language | Time |
---|---|---|
polyglot |
ATS | 143.2 ms |
loc |
Rust | 171.8 ms |
tokei |
Rust | 304.6 ms |
scc |
Go | 471.1 ms |
gocloc |
Go | 839.8 ms |
cloc |
Perl | 5.052 s |
enry |
Go | 5.440 s |
linguist |
Ruby | 17.46 s |
Second, we look at the go
source repo:
Tool | Language | Time |
---|---|---|
polyglot |
ATS | 152.5 ms |
loc |
Rust | 177.3 ms |
tokei |
Rust | 299.1 ms |
scc |
Go | 502.7 ms |
gocloc |
Go | 1.201 s |
enry |
Go | 1.758 s |
linguist |
Ruby | 13.42 s |
cloc |
Perl | 17.16 s |
Third, the Linux source tree:
Tool | Language | Time |
---|---|---|
polyglot |
ATS | 1.113 s |
loc |
Rust | 2.034 s |
tokei |
Rust | 3.088 s |
scc |
Go | 5.841 s |
gocloc |
Go | 13.68 s |
enry |
Go | 2m 12.9s |
cloc |
Perl | 2m 3.9s |
linguist |
Ruby | 3m 11.3s |
Finally, the OpenBLAS
source tree:
Tool | Language | Time |
---|---|---|
polyglot |
ATS | 164.7 ms |
loc |
Rust | 273.7 ms |
tokei |
Rust | 373.6 ms |
scc |
Go | 633.3 ms |
gocloc |
Go | 1.501 s |
enry |
Go | 5.633 s |
cloc |
Perl | 24.17 s |
linguist |
Ruby | 29.72 s |
As you can see, polyglot
is the fastest tool. Moreover, it beats cloc
,
enry
, and linguist
by an order of magnitude and thus it is by far the fastest tool with
any claim of correctness.