It is quite possible to beat established (array) implementations with special cases and bittwiddling.
Rosetta
\( n \mapsto (-1)^n \) is \(1\) or \(-1\) depending on whether \(n\) is even or odd.
Apple
> :bench (_1.0^)'irange 0 999999 1
benchmarking...
time 1.486 ms (1.477 ms .. 1.494 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 1.512 ms (1.507 ms .. 1.522 ms)
std dev 20.39 μs (11.74 μs .. 36.11 μs)
Apple recognizes \(n \mapsto (-1)^n \) as an idiom.
BQN
)time:1000 ¯1⋆↕999999
4.487ms
Perhaps this is faster than the others because BQN uses 32-bit integers.
J
1000 (6!:2) '_1^(i.999999)'
0.0119482
APL
¯1*⍳999999
NumPy/Python
%timeit (-1)**np.arange(0,999999)
10.2 ms ± 85.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Conclusion
Apple leaves some performance on the table; we could use a conditional negate
instruction and avoid the need to put a constant -1
in d2
.
> :disasm ((_1.0)^)
movz x1, #0x3ff0, LSL #48 01 fe e7 d2
fmov d0, x1 20 00 67 9e
movz x1, #0xbff0, LSL #48 01 fe f7 d2
fmov d2, x1 22 00 67 9e
tst x0, 0b1 1f 00 40 f2
fcsel d0, d2, d0, NE 40 1c 60 1e
ret c0 03 5f d6
In fact the -1
is not even hoisted out of the loop when we map \((-1)^n\)
over a vector:
> :disasm (((1.0)^)')
⋮
apple0:
mov x3, #0x10 03 02 80 d2
add x3, x3, x1, LSL #3 63 0c 01 8b
ldr x4, [x6, x3, LSL #0] c4 68 63 f8
movz x3, #0x3ff0, LSL #48 03 fe e7 d2
fmov d3, x3 63 00 67 9e
movz x3, #0xbff0, LSL #48 03 fe f7 d2
fmov d2, x3 62 00 67 9e
tst x4, 0b1 9f 00 40 f2
fcsel d3, d2, d3, NE 43 1c 63 1e
add x3, x1, #0x2 23 08 00 91
str d3, [x0, x3, LSL #3] 03 78 23 fc
add x1, x1, #0x1 21 04 00 91
cmp x1, x2 3f 00 02 eb
b.LT apple_0 6b fe ff 54
In this example, it's hard to get high-level languages to perform. There's plenty of room at the bottom!