-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.
As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.
I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.
My machine is x86-64 linux and runs with 64-Core and 512GB memory.
Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.
speedup = (main - nodemorgan ) / nodemorganoverhead = (nodemorgan - main) / main
The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.
| Benchmark | Execution (main) | Execution (no-demorgan) | Speedup | Compilation (main) | Compilation (no-demorgan) | Overhead |
|---|---|---|---|---|---|---|
| blake3-scalar | 821,287 | 820,526 | 0.09% | 334,025,595 | 335,494,881 | 0.44% |
| blake3-simd | 904,391 | 902,968 | 0.16% | 215,000,060 | 216,887,637 | 0.88% |
| bz2 | 123,375,319 | 123,972,237 | -0.48% | 323,600,329 | 325,059,640 | 0.45% |
| pulldown-cmark | 7,527,276 | 7,528,312 | -0.01% | 685,357,632 | 687,241,047 | 0.27% |
| regex | 287,113,532 | 287,013,209 | 0.03% | 1,623,122,606 | 1,628,150,390 | 0.31% |
| shootout-ackermann | 7,766,207 | 7,769,915 | -0.05% | 98,442,831 | 99,049,461 | 0.62% |
| shootout-base64 | 377,876,186 | 377,986,725 | -0.03% | 94,119,875 | 94,616,896 | 0.53% |
| shootout-ctype | 796,212,604 | 796,195,661 | 0.00% | 90,769,795 | 90,728,785 | -0.05% |
| shootout-ed25519 | 11,062,252,786 | 11,041,529,973 | 0.19% | 505,160,708 | 511,230,333 | 1.20% |
| shootout-fib2 | 2,991,817,344 | 2,991,783,776 | 0.00% | 67,992,267 | 68,207,110 | 0.32% |
| shootout-gimli | 5,143,297 | 5,153,384 | -0.20% | 5,846,157 | 5,843,358 | -0.05% |
| shootout-heapsort | 2,374,978,997 | 2,375,615,158 | -0.03% | 29,560,353 | 29,690,677 | 0.44% |
| shootout-keccak | 48,797,823 | 39,731,401 | 22.82% | 292,241,108 | 253,254,413 | -13.34% |
| shootout-matrix | 697,653,531 | 697,060,503 | 0.09% | 93,389,415 | 93,786,432 | 0.43% |
| shootout-memmove | 37,572,864 | 37,679,507 | -0.28% | 95,438,341 | 95,867,972 | 0.45% |
| shootout-minicsv | 1,239,552,532 | 1,241,534,405 | -0.16% | 15,630,009 | 15,681,735 | 0.33% |
| shootout-nestedloop | 645 | 621 | 3.93% | 66,921,453 | 67,062,411 | 0.21% |
| shootout-random | 439,552,157 | 439,582,225 | -0.01% | 67,809,477 | 68,091,002 | 0.42% |
| shootout-ratelimit | 50,251,247 | 50,384,183 | -0.26% | 92,983,888 | 92,922,059 | -0.07% |
| shootout-seqhash | 15,249,759,981 | 15,255,584,809 | -0.04% | 126,530,505 | 127,360,907 | 0.66% |
| shootout-sieve | 844,263,240 | 844,508,149 | -0.03% | 67,092,099 | 67,628,053 | 0.80% |
| shootout-switch | 153,597,929 | 153,627,912 | -0.02% | 144,493,947 | 144,955,423 | 0.32% |
| shootout-xblabla20 | 4,924,967 | 4,926,304 | -0.03% | 96,463,543 | 96,976,115 | 0.53% |
| shootout-xchacha20 | 6,468,729 | 6,467,746 | 0.02% | 96,534,043 | 96,825,264 | 0.30% |
| spidermonkey | 742,879,491 | 744,941,257 | -0.28% | 23,687,766,691 | 23,749,945,287 | 0.26% |