The CUDA implementation of filtiflt is slower than the Matlab/CPU implementation (for zero-phase high-pass filtering)