-
Notifications
You must be signed in to change notification settings - Fork 7
Some WIP's and ideas for SIMD #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This is super interesting! I'll need to look over this and see what the performance difference will be. Thank you for the ideas! |
|
Also, added some micro-optimizations here https://github.com/delneg/FSharpPerformance/tree/wip/optimize-v10 It's a little bit faster for me & 31kb less memory hungry |
|
Interesting, on my machine it's slower. I have an AMD 3900x which is a Zen 2. Are you running on an Intel?
It is interesting to see the performance difference between NET6 and NET7. Version_10 is ~50us faster with NET7 |
|
Yep, I'm testing on an x64 macOS I've also tried it on an M1 Mac for interesting results: Both with .NET 7, both after commenting out value option ValuesCount caching (last commit just pushed) - commented it out because it was actually making it slower. So, maybe what I did was actually a No-Op after all |
|
Super interesting. This is why I've been tempted to build an Intel machine along with my AMD. I knew these performance differences were possible but that seems pretty dramatic. I wonder what the root cause is for the performance difference between our different CPUs. |
|
Well, arm64 is very different from x64 for sure and got many improvements in .net7 so no wonder P.S. I'd say it's tempting to have x86_64, ARM64, and RISC-V platforms for testing, not only amd and intel |
|
It seems that I've managed to make it faster by making Graph a Struct and passing it by inref Update: By moving all the EdgeTracker code to static inline members & passing around remainingEdges array with Span directly, I was able to get this (M1 here because it's more consistent) |
Hi,
I've studied version10 of Topological Sort a bit and I see that there's some operations which are done once per loop, and I had an idea that they can be parallelized.
Parallelization via Tasks can bring a lot of overhead, but probably one could use SIMD in this case for primitive operations.
I've added an example of Edge.BatchAdd, and some comments - although I can't get it fully working but I wanted to share the general idea
P.S. Also, .NET 7.0 is needed because they added ShiftRight / ShiftLeft only in this new version