(voodoo) Patch `_small_filt_fir!` performance for 1.12 #642

wheeheee · 2025-09-18T06:14:56Z

Benchmarks show that the previous hack for larger values of 19 <= N <= 66, effective in Julia 1.9 and 1.10 onwards, fails again in 1.12, but it appears that this can be salvaged again by moving the stores around. This mostly restores the previous level of performance for stateless filts of vectors, but the stateful and array versions may still be left slightly worse in 1.12.

~~Also adjusts the SMALL_FILT_VECT_CUTOFF, reduced from 19 to 18.~~
Benchmarks included below:

julia> using DSP, BenchmarkTools

julia> out = zeros(10_000);

julia> @benchmark filt!($out, b, a, x) setup=(x = rand(10_000); b = rand(15); a = 1)
### PR ###

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  12.600 μs … 105.800 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     12.800 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.801 μs ±   3.269 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄ ▂▆▂  ▁       ▁▁▁▁                                         ▁
  ██▆████▆█▇▆▇▆▆▆▇█████▇█▇▇▆▆▆▄▄▅▄▅▃▃▆▇█▆▆▆▇▆▇▆▅▆▅▆▅▁▄▄▁▃▄▁▆▁▆ █
  12.6 μs       Histogram: log(frequency) by time      28.9 μs <

 Memory estimate: 176 bytes, allocs estimate: 2.

### master ###

 Range (min … max):  12.500 μs … 103.800 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.971 μs ±   3.395 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▃  █▄▁  ▂          ▂▁▁   ▁▁                                ▂
  ███▆▄███▇▇██▅▆▆▅▆▅▆▇▇███▇▇▇███▇▇▆▆▆▅▅▄▅▃▁▄▅▇▅▅▆▆█▇▅▅▄▆█▇██▇▆ █
  12.5 μs       Histogram: log(frequency) by time      25.6 μs <

 Memory estimate: 176 bytes, allocs estimate: 2.

julia> @benchmark filt!($out, b, a, x) setup=(x = rand(10_000); b = rand(30); a = 1)
### PR ###

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  17.200 μs … 130.300 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.544 μs ±   3.109 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▂     ▆▅ ▁  ▃▅        ▃             ▁  ▁                   ▂
  ███▆▆▆▁▅█████▇██▇▇▆▅▄▅▆▅██▅▇█▇▇██▇██▇███▇██▇█▇▆▆▆▆▇▇▇▅▄▄▆▄▆▇ █
  17.2 μs       Histogram: log(frequency) by time      27.4 μs <

 Memory estimate: 288 bytes, allocs estimate: 2.

### master ###

 Range (min … max):  40.100 μs … 140.900 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     40.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.599 μs ±   5.875 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  ▁ ▆▅▃▁▁  ▃▁                                               ▁
  ██▅███████▇████▇▆▆▅▆▆▅▅▅▅▄▄▁▅▅▄▄▁▁▃▁▁▃▁▁▁▅▃▁▅▄▃▃▃▁▄▁▁▄▄▃▁▄██ █
  40.1 μs       Histogram: log(frequency) by time      75.2 μs <

 Memory estimate: 288 bytes, allocs estimate: 2.

julia> @benchmark filt!($out, b, a, x) setup=(x = rand(10_000); b = rand(50); a = 1)
### PR ###

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  26.200 μs … 147.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.300 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.524 μs ±   4.005 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁      ▄▅     ▂▂▃▁                                          ▁
  ████▄▅▆▆███▇█▇██████▇▇▅▇▆▇██▇▇▅▅▆▄▅▅▄▄▄▄▃▃▃▄▄▄▄▁▃▃▄▁█▃▁▃▄▄▇▆ █
  26.2 μs       Histogram: log(frequency) by time      40.2 μs <

 Memory estimate: 480 bytes, allocs estimate: 2.

### master ###

 Range (min … max):  94.300 μs … 324.500 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     95.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   98.593 μs ±  10.699 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃▁▁▇▃▁▁▁▄▁  ▃▁                                              ▁
  ████████████▇███▇▆▇▆▆▅▅▇▆▅▃▃▅▅▄▄▃▃▄▂▄▃▃▃▄▄▇▅▄▄▃▃▆▄▄▄▆▄▃▄▂▄▅▅ █
  94.3 μs       Histogram: log(frequency) by time       145 μs <

 Memory estimate: 480 bytes, allocs estimate: 2.

julia> @benchmark filt!($out, b, a, x) setup=(x = rand(10_000); b = rand(66); a = 1)
### PR ###

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  26.300 μs … 198.400 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.800 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.160 μs ±   4.524 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▆▂▃▂▂▃▁▅▆▃▁▁ ▂▂▃▃▂▁       ▁                                ▂
  █████████████████████▇██▇▆████▆▇▅▇▅▆▅▂▄▅▃▅▄▄▅▄▄▅▄▄▄▆▅▅▃▂▂▄▄▃ █
  26.3 μs       Histogram: log(frequency) by time      40.6 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.

### master ###

 Range (min … max):  157.500 μs …  2.324 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     164.300 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   168.404 μs ± 27.859 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▄█▆█▅▄▂▂▂▄▂▄▂▁▁                                             ▂
  ██████████████████▆▆▆▆▆▇▆▅▆▆▄▆▃▄▆▅▄▄▄▄▄▄▄▁▁▄▄▅▆▆▆██▆▆▆▅▇▆▆█▆ █
  158 μs        Histogram: log(frequency) by time       254 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.

julia> versioninfo()
Julia Version 1.12.0-rc2
Commit 72cbf019d0 (2025-09-06 12:00 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, tigerlake)
  GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null
  JULIA_DEPOT_PATH = Q:\.julia;
  JULIA_NUM_THREADS = auto

I suppose we should bump the version for this patch?

codecov · 2025-09-18T06:20:53Z

Codecov Report

❌ Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 98.11%. Comparing base (094c4fd) to head (3565e77).

Files with missing lines	Patch %	Lines
src/Filters/filt.jl	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
- Coverage   98.13%   98.11%   -0.03%     
==========================================
  Files          19       19              
  Lines        3277     3289      +12     
==========================================
+ Hits         3216     3227      +11     
- Misses         61       62       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wheeheee · 2025-09-19T10:40:47Z

These commits should fix the StoreSI version. Uses VERSION > v"1.12-" to facilitate testing with 1.12 release candidates.
So far, seems to work fine with Julia 1.10

move store around, depending on N

This reverts commit be26c64.

wheeheee · 2025-10-09T09:08:38Z

Have mostly fixed the performance issues for stateful multi-dimensional arrays by wrapping the columns of the state array. Reverted the commit removing inbounds because I previously accidentally tested it with PolynomialRatios instead of DF2TFilters...
CodeCov failure due to issues with missing @static coverage.

This PR doesn't really affect LTS Julia.
More benchmarks (with Julia 1.12.0):

PR

julia> using DSP, BenchmarkTools

julia> dims = (10_000,); out = zeros(dims);

julia> for N in (8, 18, 64)
           display(
               @benchmark filt!($out, b, a, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
               end
           )
           display(
               @benchmark filt!($out, f, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
                   f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
               end
           )
       end
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
 Range (min … max):  4.671 μs …  14.414 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.714 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.901 μs ± 607.822 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃ ▁ ▆▄     ▁                                               ▁
  ███████████▇█▆▆▆▆▇▆▆▅▅▄▄▄▃▃▄█▅▃▅▄▃▁▄▃▃▁▁▃▃▄▁▄▁▃▄▇▃▃▁▄▃▁▃▅▄▅ █
  4.67 μs      Histogram: log(frequency) by time      8.77 μs <

 Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
 Range (min … max):  4.700 μs …  15.429 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.729 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.892 μs ± 679.674 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅       ▅▃▃                                                ▁
  ███▇▇▆▆▅▇███▇▇▆▇▇▇▇▆██▅▅▆▆▆▅▅▅▃▄▅▄▅▅▃▃▄▅▄▅▁▄▁▃▁▃▄▁▄▄▃▃▁▄▄▁▄ █
  4.7 μs       Histogram: log(frequency) by time      7.11 μs <

 Memory estimate: 224 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  16.800 μs … 67.900 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.000 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.590 μs ±  2.496 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂   ▁▆▁                                                    ▁
  ██▆▄▁████▇▇▇▇▄▅▇▅▄▃▅▄▄▅▅▅▆▅▆▅▅▅▅▄▅▅▄▄▁▅▄▄▃▃▁▁▄▁▄▁▁▁▁▃▁▁▃▃▁█ █
  16.8 μs      Histogram: log(frequency) by time      30.3 μs <

 Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  16.800 μs … 76.700 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.000 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.723 μs ±  2.856 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃   ▃▆                                                     ▁
  ██▇▅███▇▇▇▆█▅▆▃▄▃▃▄▄▆▅▆▄▆▅▄▆▅▆▅▅▃▅▅▅▄▄▄▅▁▁▃▄▁▁▃▁▁▄▁▄▃▁▄▁▄▃█ █
  16.8 μs      Histogram: log(frequency) by time      31.2 μs <

 Memory estimate: 304 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.500 μs … 128.500 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     29.600 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   31.022 μs ±   6.218 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▅▁▄▁▁                                                      ▁
  ████████▇█▇▇▆▅▅▅█▇▄▅▄▁▄▃▃▃▁▄▁▄▄▁▃▃▃▁▁▁▁▃▇▄▃▁▃▃▁▁▁▃▁▁▃▁▁▁▁▁▁█ █
  29.5 μs       Histogram: log(frequency) by time        64 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.600 μs … 87.700 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     29.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.707 μs ±  4.272 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁▅ ▂ ▄▁     ▁                                              ▁
  ███▇█▇████▆▇██▇▆▇▆▅▅▅▅▄▃▄▄▃▇▄▃▃▃▁▄▆▇▅▅▄▄▁▃▄▁▁▃▁▁▃▁▁▁▁▃▁▃▃▃▇ █
  29.6 μs      Histogram: log(frequency) by time      53.1 μs <

 Memory estimate: 672 bytes, allocs estimate: 5.

julia> dims = (10_000, 1); out = zeros(dims);

julia> for N in (8, 18, 64)
           display(
               @benchmark filt!($out, b, a, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
               end
           )
           display(
               @benchmark filt!($out, f, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
                   f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
               end
           )
       end
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
 Range (min … max):  5.133 μs …  19.450 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.167 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.432 μs ± 805.603 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▁▆▂▂▁                                                   ▁
  ██▆▆██████▇▇███▇▅▅▅▅▅▅▅▄▅▅▄▅▃▄▃▃▁▅▃▄▃▄▄▃▁▃▄▁▄▃▃▃▄▃▅▅▄▅▅▇▇▇▇ █
  5.13 μs      Histogram: log(frequency) by time      10.1 μs <

 Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
 Range (min … max):  4.929 μs … 28.986 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.957 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.196 μs ±  1.027 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁▁ ▅▂▂▂                                                   ▁
  █████████▇█▇▇█▆▅▅▄▄▅▅▄▅▅▃▄▄▅▄▄▄▄▁▆▆▃▅▄▃▃▃▃▄▄▄▄▄▅▅▄▄▅▅▅▅▅▄▆ █
  4.93 μs      Histogram: log(frequency) by time     10.4 μs <

 Memory estimate: 224 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  17.800 μs … 73.200 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.000 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.932 μs ±  3.812 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▄▆ ▁                                                      ▁
  █▅████▆▅▄▇▇▇▇▆▅▄▅▅▃▄▃▁▄▄▃▅▅▄▃█▅▅▆▃▃▃▃▃▃▃▃▄▃▁▁▃▃▁▁▁▁▁▃▁▁▁▁▁▅ █
  17.8 μs      Histogram: log(frequency) by time      49.1 μs <

 Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  17.300 μs … 89.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.800 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.762 μs ±  3.971 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▄▇▄▂▁▂                                                   ▂
  ██████████▅▅▆██▇▆▆▆▆▆▆▅▅▄▄▄▇▇▅▄▃▄▄▅▃▄▄▆▅▇▆▆▅▅▇▅▄▆▁▃▁▃▃▁▁▃▁▅ █
  17.3 μs      Histogram: log(frequency) by time      39.4 μs <

 Memory estimate: 304 bytes, allocs estimate: 5.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.500 μs … 120.200 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     29.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   31.136 μs ±   5.727 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▆▂   ▁                                                   ▁
  █▆▇████▇▇█▇▇▆▆█▅▅▅▅▅▅█▄▄▅▅▅▅▅▄▁▄▄▁▁▁▃▄▃▁█▃▁▁▄▁▁▁▁▃▁▁▁▁▁▁▁▁▁▇ █
  29.5 μs       Histogram: log(frequency) by time      60.7 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.600 μs … 735.700 μs  ┊ GC (min … max): 0.00% … 91.90%
 Time  (median):     29.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.503 μs ±   7.515 μs  ┊ GC (mean ± σ):  0.22% ±  0.92%

  █▆▃▁               ▃▅▂▁                                      ▁
  █████▄▅▆▆▇▅▅▁▅▆▆▆▇▇██████▇▇▇▆█▆▆▅▅▆▆▆▆▆▇▆▅▆▅▆▄▅▄▅▅▅▅▅▅▄▄▄▅▃▅ █
  29.6 μs       Histogram: log(frequency) by time      36.8 μs <

 Memory estimate: 672 bytes, allocs estimate: 5.

master

julia> using DSP, BenchmarkTools

julia> dims = (10_000,); out = zeros(dims);

julia> for N in (8, 18, 64)
           display(
               @benchmark filt!($out, b, a, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
               end
           )
           display(
               @benchmark filt!($out, f, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
                   f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
               end
           )
       end
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
 Range (min … max):  4.671 μs …  13.857 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.839 μs ± 396.348 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▅       ▅▁      ▅▃▃▄▂     ▁           ▁                   ▂
  █████▄▄▁▁▄██▇▆▇▄▆▇█████▆▆▆▆███▇▇▇▅▆▄▅▄▆▆█▇▆▅▅▁▅▄▆▄▆▄▆▁▅▃▄▃▄ █
  4.67 μs      Histogram: log(frequency) by time      5.93 μs <

 Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
 Range (min … max):  5.717 μs …  19.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.783 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.051 μs ± 928.442 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃▃▃▃▆▂▁▁ ▁                                                 ▁
  ████████████▆▆▆▅▆▄▅▆▅▄▄▆▅▅▅▅▄▄▃▄▃▄▃▄▁▁▃▄▃▃▄▃▁▁▄▄▃▃▄▄▄▁▁▅▄▁█ █
  5.72 μs      Histogram: log(frequency) by time      11.6 μs <

 Memory estimate: 624 bytes, allocs estimate: 18.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  16.800 μs … 80.800 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.000 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.416 μs ±  1.715 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▂         ▁▄▆▃                                            ▂
  ████▅▅▁▁▁▃▃▃█████▇▆▆▅▆▅▆▅▅▆█▆▄▃▃▃▃▄▄▁▄▄▄▃▁▄▄▅▅▄▄▃▄▄▅▅▅▅▅▄▅▅ █
  16.8 μs      Histogram: log(frequency) by time      22.9 μs <

 Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  18.300 μs … 135.500 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.611 μs ±   4.046 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂▁▆▄▁▁                                                      ▁
  ███████▇▇▅▆▇▇▆▆▆▇█▆▅▅▅▅▅▅▄▄▇▅▅▄▃▄▃▃▇▅▄▃▅▇▆▆▁▃▄▁▁▁▃▃▁▁▁▁▃▃▃▄▇ █
  18.3 μs       Histogram: log(frequency) by time      44.5 μs <

 Memory estimate: 1.00 KiB, allocs estimate: 28.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  153.600 μs … 680.200 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     155.100 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   158.710 μs ±  20.466 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▅▂▄▂▁▁▁   ▄▂                                                ▁
  █████████████████▇▆▆▆▆▆▆▅▅▁▆▆▅▅▄▅▅▆▃▄▅▄▄▁▄▄▄▄▄▄▃▃▃▃▃▁▄▄▁▁▃▁▃▄ █
  154 μs        Histogram: log(frequency) by time        219 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  140.000 μs … 608.200 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     140.600 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   143.728 μs ±  19.290 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄ ▁▄▃▁▁▁  ▂▂                                                 ▁
  ██▇██████████▇▇▇▇▆▆▅▅▆▆▅▆▅▅▅▄▅▅▄▁▅▄▅▃▅▅▄▄▄▅▄▁▅▃▄▅▁▁▃▄▁▅▄▃▃▃▄▃ █
  140 μs        Histogram: log(frequency) by time        199 μs <

 Memory estimate: 2.81 KiB, allocs estimate: 74.

julia> dims = (10_000, 1); out = zeros(dims);

julia> for N in (8, 18, 64)
           display(
               @benchmark filt!($out, b, a, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
               end
           )
           display(
               @benchmark filt!($out, f, x) setup = begin
                   x = rand(Float64, dims)
                   b = rand($N)
                   a = 1.0
                   f = DF2TFilter(PolynomialRatio(b, a), dims[2:end])
               end
           )
       end
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
 Range (min … max):  5.133 μs … 23.550 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.183 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.654 μs ±  1.219 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁ ▆▅▄▂ ▂▂▁                                                ▁
  ██▆████▇███▇▇▆▆▆▅▆▅▅▅▄▅▇▆▄▃▄▃▄▃▅▃▄▅▄▅▅▆▇▇▆▆▇▆▆▆▄▅▄▃▃▃▃▃▃▃▂ █
  5.13 μs      Histogram: log(frequency) by time     11.7 μs <

 Memory estimate: 112 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
 Range (min … max):  6.050 μs …  22.733 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.117 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.431 μs ± 986.213 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁ ▁▅▆▃▃▁ ▂▁ ▁                                             ▁
  ███████████████▇▆▇▆▆▆▆▆▆▄▅▅▅▅▅▅▄▄▅▅▁▄▁▄▃▅▃▃▃▅▁▄▄▄▁▄▄▃▄▁▁▃▁▆ █
  6.05 μs      Histogram: log(frequency) by time      11.4 μs <

 Memory estimate: 640 bytes, allocs estimate: 19.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  17.700 μs … 142.200 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.552 μs ±   2.931 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▃    ▂▃▄                                                   ▁
  ███▇▅▄▄████▇▇▇▇█▅▇▅▅▄▄▃▅▅▅▅▅▅▅▅▅▅▄▅▅▆▅▄▅▁▄▃▄▃▁▅█▆▄▁▄▄▁▄▄▄▃▃▃ █
  17.7 μs       Histogram: log(frequency) by time      28.6 μs <

 Memory estimate: 192 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  18.800 μs … 118.700 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.800 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.226 μs ±   2.819 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █         ▆
  ▅█▆█▆▃▅▅▃▅▄█▆▃▃▂▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂ ▃
  18.8 μs         Histogram: frequency by time         28.6 μs <

 Memory estimate: 1.02 KiB, allocs estimate: 29.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  29.500 μs … 150.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     29.600 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.295 μs ±   2.750 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▃▁   ▁▁            ▄▅▂▁                                    ▁
  ████▇▆▄██▆▃▃▁▄▅▆▅▆▆▇▆█████▇▇▄█▆▆▆▅▆▅▅▅▄▆▅▆▄▆▆▆▅▅▅▅▅▅▃▅▄▅▄▆▅▆ █
  29.5 μs       Histogram: log(frequency) by time      36.1 μs <

 Memory estimate: 576 bytes, allocs estimate: 2.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  40.700 μs … 173.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     40.900 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   42.346 μs ±   6.501 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▃▁▁▁▃▄▂▁▁                                                  ▁
  ████████████▇▇▆▆▇▆▅▅▅▄▅▃▅▃▄▃▃▄▁▃▅▃▄▄▁▄▁▁▁▃▃▁▃▁▁▄▁▁▁▄▇▆▅▅▅▆▅▅ █
  40.7 μs       Histogram: log(frequency) by time      70.6 μs <

 Memory estimate: 2.83 KiB, allocs estimate: 75.

wheeheee changed the title ~~(voodoo) Patch 1.12 performance for _small_filt_fir!~~ (voodoo) Patch _small_filt_fir! performance for 1.12 Sep 18, 2025

wheeheee force-pushed the filt_perf_hack branch from 703eb74 to 4345cab Compare September 19, 2025 10:36

wheeheee force-pushed the filt_perf_hack branch from 4345cab to 6937e9d Compare September 20, 2025 04:55

wheeheee added 7 commits October 2, 2025 13:49

voodoo to fix small filt vectorization for 1.12

a7e72df

move store around, depending on N

_small_filt_fir_storesi!

42d65e9

fix and multi-version storesi sff

ee53126

remove default StoreSi argument from generated _filt_fir!

fd8d9fe

_filt!: restrict type of col argument to CartesianIndex

6d26a37

remove inbounds (needed on 1.11 only)

be26c64

use Base.wrap and memoryref for GC safety

b5b8f5d

wheeheee force-pushed the filt_perf_hack branch from 1df88f1 to b5b8f5d Compare October 9, 2025 03:22

Revert "remove inbounds (needed on 1.11 only)"

3565e77

This reverts commit be26c64.

wheeheee requested a review from martinholters October 9, 2025 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(voodoo) Patch `_small_filt_fir!` performance for 1.12 #642

(voodoo) Patch `_small_filt_fir!` performance for 1.12 #642

Uh oh!

wheeheee commented Sep 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

wheeheee commented Sep 19, 2025

Uh oh!

wheeheee commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(voodoo) Patch _small_filt_fir! performance for 1.12 #642

Are you sure you want to change the base?

(voodoo) Patch _small_filt_fir! performance for 1.12 #642

Uh oh!

Conversation

wheeheee commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wheeheee commented Sep 19, 2025

Uh oh!

wheeheee commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(voodoo) Patch `_small_filt_fir!` performance for 1.12 #642

(voodoo) Patch `_small_filt_fir!` performance for 1.12 #642

wheeheee commented Sep 18, 2025 •

edited

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading