-
Notifications
You must be signed in to change notification settings - Fork 115
(voodoo) Patch _small_filt_fir! performance for 1.12
#642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #642 +/- ##
==========================================
- Coverage 98.13% 98.11% -0.03%
==========================================
Files 19 19
Lines 3277 3289 +12
==========================================
+ Hits 3216 3227 +11
- Misses 61 62 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
_small_filt_fir!_small_filt_fir! performance for 1.12
703eb74 to
4345cab
Compare
|
These commits should fix the |
4345cab to
6937e9d
Compare
move store around, depending on N
1df88f1 to
b5b8f5d
Compare
This reverts commit be26c64.
|
Have mostly fixed the performance issues for stateful multi-dimensional arrays by This PR doesn't really affect LTS Julia. PRjulia> using DSP, BenchmarkTools
julia> dims = (10_000,); out = zeros(dims);
julia> for N in (8, 18, 64)
display(
@benchmark filt!($out, b, a, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
end
)
display(
@benchmark filt!($out, f, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
end
)
end
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max): 4.671 μs … 14.414 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.714 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.901 μs ± 607.822 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃ ▁ ▆▄ ▁ ▁
███████████▇█▆▆▆▆▇▆▆▅▅▄▄▄▃▃▄█▅▃▅▄▃▁▄▃▃▁▁▃▃▄▁▄▁▃▄▇▃▃▁▄▃▁▃▅▄▅ █
4.67 μs Histogram: log(frequency) by time 8.77 μs <
Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max): 4.700 μs … 15.429 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.729 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.892 μs ± 679.674 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅ ▅▃▃ ▁
███▇▇▆▆▅▇███▇▇▆▇▇▇▇▆██▅▅▆▆▆▅▅▅▃▄▅▄▅▅▃▃▄▅▄▅▁▄▁▃▁▃▄▁▄▄▃▃▁▄▄▁▄ █
4.7 μs Histogram: log(frequency) by time 7.11 μs <
Memory estimate: 224 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 16.800 μs … 67.900 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 17.000 μs ┊ GC (median): 0.00%
Time (mean ± σ): 17.590 μs ± 2.496 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂ ▁▆▁ ▁
██▆▄▁████▇▇▇▇▄▅▇▅▄▃▅▄▄▅▅▅▆▅▆▅▅▅▅▄▅▅▄▄▁▅▄▄▃▃▁▁▄▁▄▁▁▁▁▃▁▁▃▃▁█ █
16.8 μs Histogram: log(frequency) by time 30.3 μs <
Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 16.800 μs … 76.700 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 17.000 μs ┊ GC (median): 0.00%
Time (mean ± σ): 17.723 μs ± 2.856 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃ ▃▆ ▁
██▇▅███▇▇▇▆█▅▆▃▄▃▃▄▄▆▅▆▄▆▅▄▆▅▆▅▅▃▅▅▅▄▄▄▅▁▁▃▄▁▁▃▁▁▄▁▄▃▁▄▁▄▃█ █
16.8 μs Histogram: log(frequency) by time 31.2 μs <
Memory estimate: 304 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 29.500 μs … 128.500 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.600 μs ┊ GC (median): 0.00%
Time (mean ± σ): 31.022 μs ± 6.218 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆▅▁▄▁▁ ▁
████████▇█▇▇▆▅▅▅█▇▄▅▄▁▄▃▃▃▁▄▁▄▄▁▃▃▃▁▁▁▁▃▇▄▃▁▃▃▁▁▁▃▁▁▃▁▁▁▁▁▁█ █
29.5 μs Histogram: log(frequency) by time 64 μs <
Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 29.600 μs … 87.700 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.700 μs ┊ GC (median): 0.00%
Time (mean ± σ): 30.707 μs ± 4.272 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▁▅ ▂ ▄▁ ▁ ▁
███▇█▇████▆▇██▇▆▇▆▅▅▅▅▄▃▄▄▃▇▄▃▃▃▁▄▆▇▅▅▄▄▁▃▄▁▁▃▁▁▃▁▁▁▁▃▁▃▃▃▇ █
29.6 μs Histogram: log(frequency) by time 53.1 μs <
Memory estimate: 672 bytes, allocs estimate: 5.
julia> dims = (10_000, 1); out = zeros(dims);
julia> for N in (8, 18, 64)
display(
@benchmark filt!($out, b, a, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
end
)
display(
@benchmark filt!($out, f, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
end
)
end
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
Range (min … max): 5.133 μs … 19.450 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.167 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.432 μs ± 805.603 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▁▆▂▂▁ ▁
██▆▆██████▇▇███▇▅▅▅▅▅▅▅▄▅▅▄▅▃▄▃▃▁▅▃▄▃▄▄▃▁▃▄▁▄▃▃▃▄▃▅▅▄▅▅▇▇▇▇ █
5.13 μs Histogram: log(frequency) by time 10.1 μs <
Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max): 4.929 μs … 28.986 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.957 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.196 μs ± 1.027 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▁▁ ▅▂▂▂ ▁
█████████▇█▇▇█▆▅▅▄▄▅▅▄▅▅▃▄▄▅▄▄▄▄▁▆▆▃▅▄▃▃▃▃▄▄▄▄▄▅▅▄▄▅▅▅▅▅▄▆ █
4.93 μs Histogram: log(frequency) by time 10.4 μs <
Memory estimate: 224 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 17.800 μs … 73.200 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.000 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.932 μs ± 3.812 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▄▆ ▁ ▁
█▅████▆▅▄▇▇▇▇▆▅▄▅▅▃▄▃▁▄▄▃▅▅▄▃█▅▅▆▃▃▃▃▃▃▃▃▄▃▁▁▃▃▁▁▁▁▁▃▁▁▁▁▁▅ █
17.8 μs Histogram: log(frequency) by time 49.1 μs <
Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 17.300 μs … 89.100 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 17.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.762 μs ± 3.971 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▅▄▇▄▂▁▂ ▂
██████████▅▅▆██▇▆▆▆▆▆▆▅▅▄▄▄▇▇▅▄▃▄▄▅▃▄▄▆▅▇▆▆▅▅▇▅▄▆▁▃▁▃▃▁▁▃▁▅ █
17.3 μs Histogram: log(frequency) by time 39.4 μs <
Memory estimate: 304 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 29.500 μs … 120.200 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.700 μs ┊ GC (median): 0.00%
Time (mean ± σ): 31.136 μs ± 5.727 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▆▂ ▁ ▁
█▆▇████▇▇█▇▇▆▆█▅▅▅▅▅▅█▄▄▅▅▅▅▅▄▁▄▄▁▁▁▃▄▃▁█▃▁▁▄▁▁▁▁▃▁▁▁▁▁▁▁▁▁▇ █
29.5 μs Histogram: log(frequency) by time 60.7 μs <
Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 29.600 μs … 735.700 μs ┊ GC (min … max): 0.00% … 91.90%
Time (median): 29.700 μs ┊ GC (median): 0.00%
Time (mean ± σ): 30.503 μs ± 7.515 μs ┊ GC (mean ± σ): 0.22% ± 0.92%
█▆▃▁ ▃▅▂▁ ▁
█████▄▅▆▆▇▅▅▁▅▆▆▆▇▇██████▇▇▇▆█▆▆▅▅▆▆▆▆▆▇▆▅▆▅▆▄▅▄▅▅▅▅▅▅▄▄▄▅▃▅ █
29.6 μs Histogram: log(frequency) by time 36.8 μs <
Memory estimate: 672 bytes, allocs estimate: 5.
masterjulia> using DSP, BenchmarkTools
julia> dims = (10_000,); out = zeros(dims);
julia> for N in (8, 18, 64)
display(
@benchmark filt!($out, b, a, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
end
)
display(
@benchmark filt!($out, f, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
end
)
end
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max): 4.671 μs … 13.857 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.700 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.839 μs ± 396.348 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▅ ▅▁ ▅▃▃▄▂ ▁ ▁ ▂
█████▄▄▁▁▄██▇▆▇▄▆▇█████▆▆▆▆███▇▇▇▅▆▄▅▄▆▆█▇▆▅▅▁▅▄▆▄▆▄▆▁▅▃▄▃▄ █
4.67 μs Histogram: log(frequency) by time 5.93 μs <
Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
Range (min … max): 5.717 μs … 19.750 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.783 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.051 μs ± 928.442 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▃▃▃▆▂▁▁ ▁ ▁
████████████▆▆▆▅▆▄▅▆▅▄▄▆▅▅▅▅▄▄▃▄▃▄▃▄▁▁▃▄▃▃▄▃▁▁▄▄▃▃▄▄▄▁▁▅▄▁█ █
5.72 μs Histogram: log(frequency) by time 11.6 μs <
Memory estimate: 624 bytes, allocs estimate: 18.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 16.800 μs … 80.800 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 17.000 μs ┊ GC (median): 0.00%
Time (mean ± σ): 17.416 μs ± 1.715 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▂ ▁▄▆▃ ▂
████▅▅▁▁▁▃▃▃█████▇▆▆▅▆▅▆▅▅▆█▆▄▃▃▃▃▄▄▁▄▄▄▃▁▄▄▅▅▄▄▃▄▄▅▅▅▅▅▄▅▅ █
16.8 μs Histogram: log(frequency) by time 22.9 μs <
Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 18.300 μs … 135.500 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.500 μs ┊ GC (median): 0.00%
Time (mean ± σ): 19.611 μs ± 4.046 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▁▆▄▁▁ ▁
███████▇▇▅▆▇▇▆▆▆▇█▆▅▅▅▅▅▅▄▄▇▅▅▄▃▄▃▃▇▅▄▃▅▇▆▆▁▃▄▁▁▁▃▃▁▁▁▁▃▃▃▄▇ █
18.3 μs Histogram: log(frequency) by time 44.5 μs <
Memory estimate: 1.00 KiB, allocs estimate: 28.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 153.600 μs … 680.200 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 155.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 158.710 μs ± 20.466 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▅▂▄▂▁▁▁ ▄▂ ▁
█████████████████▇▆▆▆▆▆▆▅▅▁▆▆▅▅▄▅▅▆▃▄▅▄▄▁▄▄▄▄▄▄▃▃▃▃▃▁▄▄▁▁▃▁▃▄ █
154 μs Histogram: log(frequency) by time 219 μs <
Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 140.000 μs … 608.200 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 140.600 μs ┊ GC (median): 0.00%
Time (mean ± σ): 143.728 μs ± 19.290 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄ ▁▄▃▁▁▁ ▂▂ ▁
██▇██████████▇▇▇▇▆▆▅▅▆▆▅▆▅▅▅▄▅▅▄▁▅▄▅▃▅▅▄▄▄▅▄▁▅▃▄▅▁▁▃▄▁▅▄▃▃▃▄▃ █
140 μs Histogram: log(frequency) by time 199 μs <
Memory estimate: 2.81 KiB, allocs estimate: 74.
julia> dims = (10_000, 1); out = zeros(dims);
julia> for N in (8, 18, 64)
display(
@benchmark filt!($out, b, a, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
end
)
display(
@benchmark filt!($out, f, x) setup = begin
x = rand(Float64, dims)
b = rand($N)
a = 1.0
f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
end
)
end
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
Range (min … max): 5.133 μs … 23.550 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.183 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.654 μs ± 1.219 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▁ ▆▅▄▂ ▂▂▁ ▁
██▆████▇███▇▇▆▆▆▅▆▅▅▅▄▅▇▆▄▃▄▃▄▃▅▃▄▅▄▅▅▆▇▇▆▆▇▆▆▆▄▅▄▃▃▃▃▃▃▃▂ █
5.13 μs Histogram: log(frequency) by time 11.7 μs <
Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
Range (min … max): 6.050 μs … 22.733 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.117 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.431 μs ± 986.213 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄▁ ▁▅▆▃▃▁ ▂▁ ▁ ▁
███████████████▇▆▇▆▆▆▆▆▆▄▅▅▅▅▅▅▄▄▅▅▁▄▁▄▃▅▃▃▃▅▁▄▄▄▁▄▄▃▄▁▁▃▁▆ █
6.05 μs Histogram: log(frequency) by time 11.4 μs <
Memory estimate: 640 bytes, allocs estimate: 19.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 17.700 μs … 142.200 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.000 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.552 μs ± 2.931 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▃ ▂▃▄ ▁
███▇▅▄▄████▇▇▇▇█▅▇▅▅▄▄▃▅▅▅▅▅▅▅▅▅▅▄▅▅▆▅▄▅▁▄▃▄▃▁▅█▆▄▁▄▄▁▄▄▄▃▃▃ █
17.7 μs Histogram: log(frequency) by time 28.6 μs <
Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 18.800 μs … 118.700 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 20.226 μs ± 2.819 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▆
▅█▆█▆▃▅▅▃▅▄█▆▃▃▂▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂ ▃
18.8 μs Histogram: frequency by time 28.6 μs <
Memory estimate: 1.02 KiB, allocs estimate: 29.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 29.500 μs … 150.100 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.600 μs ┊ GC (median): 0.00%
Time (mean ± σ): 30.295 μs ± 2.750 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆▃▁ ▁▁ ▄▅▂▁ ▁
████▇▆▄██▆▃▃▁▄▅▆▅▆▆▇▆█████▇▇▄█▆▆▆▅▆▅▅▅▄▆▅▆▄▆▆▆▅▅▅▅▅▅▃▅▄▅▄▆▅▆ █
29.5 μs Histogram: log(frequency) by time 36.1 μs <
Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 40.700 μs … 173.100 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 40.900 μs ┊ GC (median): 0.00%
Time (mean ± σ): 42.346 μs ± 6.501 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅▃▁▁▁▃▄▂▁▁ ▁
████████████▇▇▆▆▇▆▅▅▅▄▅▃▅▃▄▃▃▄▁▃▅▃▄▄▁▄▁▁▁▃▃▁▃▁▁▄▁▁▁▄▇▆▅▅▅▆▅▅ █
40.7 μs Histogram: log(frequency) by time 70.6 μs <
Memory estimate: 2.83 KiB, allocs estimate: 75.
|
Benchmarks show that the previous hack for larger values of
19 <= N <= 66, effective in Julia 1.9 and 1.10 onwards, fails again in 1.12, but it appears that this can be salvaged again by moving the stores around. This mostly restores the previous level of performance for statelessfilts of vectors, but the stateful and array versions may still be left slightly worse in 1.12.Also adjusts theSMALL_FILT_VECT_CUTOFF, reduced from 19 to 18.Benchmarks included below:
I suppose we should bump the version for this patch?