Speedup write_moves using _mm512_mask_compressstoreu_epi16 #6233

KierenP · 2025-08-16T14:04:05Z

We can use the _mm512_mask_compressstoreu_epi16 instruction which does a compress + store in one instruction rather than two. It's part of the same AVX512_VBMI2 instruction set.

Baseline:

$ ./stockfish_master.exe speedtest
Stockfish dev-20250816-169737a9 by the Stockfish developers (see AUTHORS file)
info string Using 32 threads
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20250816-169737a9
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avx512icl
Compilation settings       : 64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest
Filled invocation          : speedtest 32 4096 150
Available processors       : 0-31
Thread count               : 32
Thread binding             : none
TT size [MiB]              : 4096
Hash max, avg [per mille]  :
    single search          : 48, 23
    single game            : 647, 456
Total nodes searched       : 5322766982
Total search time [s]      : 153.515
Nodes/second               : 34672618

With my change:

$ ./stockfish.exe speedtest
Stockfish dev-20250816-169737a9 by the Stockfish developers (see AUTHORS file)
info string Using 32 threads
Warmup position 3/3
Position 258/258
===========================
Version                    : Stockfish dev-20250816-169737a9
Compiled by                : g++ (GNUC) 15.1.0 on MinGW64
Compilation architecture   : x86-64-avx512icl
Compilation settings       : 64bit AVX512ICL VNNI AVX512 BMI2 AVX2 SSE41 SSSE3 SSE2 POPCNT
Compiler __VERSION__ macro : 15.1.0
Large pages                : no
User invocation            : speedtest
Filled invocation          : speedtest 32 4096 150
Available processors       : 0-31
Thread count               : 32
Thread binding             : none
TT size [MiB]              : 4096
Hash max, avg [per mille]  :
    single search          : 47, 22
    single game            : 655, 448
Total nodes searched       : 5403131641
Total search time [s]      : 153.519
Nodes/second               : 35195198

Speedup: 1.51%

Bench: 2996176

mstembera · 2025-08-16T21:37:00Z

See #6153 (comment)

mstembera · 2025-08-16T21:39:30Z

We should probably put a comment here just like

Stockfish/src/nnue/layers/affine_transform_sparse_input.h

Line 105 in 169737a

// Avoid _mm512_mask_compressstoreu_epi16() as it's 256 uOps on Zen4

KierenP · 2025-08-17T02:44:40Z

Ah thanks, I didn't realise. That being said, it is a speedup on Zen5 so in the future maybe we can have a target taking advantage of this.

Disservin · 2025-08-21T16:11:55Z

how much of a speedup is this for zen5? if it is only minor id simply avoid this and not introduce a new target, nor start parsing cpuid..

KierenP · 2025-08-28T13:04:24Z

It's 1.5% (as seen in the above speedtest comparison). That being said, the different would be bigger if I also applied _mm512_mask_compressstoreu_epi16 in the NNUE inference.

This PR can probably be closed.

Speedup write_moves

8c10502

Bench: 2996176

vondele added the discussion label Aug 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup write_moves using _mm512_mask_compressstoreu_epi16 #6233

Speedup write_moves using _mm512_mask_compressstoreu_epi16 #6233

Uh oh!

KierenP commented Aug 16, 2025

Uh oh!

mstembera commented Aug 16, 2025

Uh oh!

mstembera commented Aug 16, 2025

Uh oh!

KierenP commented Aug 17, 2025

Uh oh!

Disservin commented Aug 21, 2025

Uh oh!

KierenP commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Speedup write_moves using _mm512_mask_compressstoreu_epi16 #6233

Are you sure you want to change the base?

Speedup write_moves using _mm512_mask_compressstoreu_epi16 #6233

Uh oh!

Conversation

KierenP commented Aug 16, 2025

Uh oh!

mstembera commented Aug 16, 2025

Uh oh!

mstembera commented Aug 16, 2025

Uh oh!

KierenP commented Aug 17, 2025

Uh oh!

Disservin commented Aug 21, 2025

Uh oh!

KierenP commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants