Filter bank specifications¶
Length of the input signal (compulsory)¶
Before running setup
, it is compulsory so fill the size
field with
opts{1}.time.size = length(signal);
It must be a power of 2, which leads to optimally fast Fourier Transforms.
If you have K
signals of the same size N
, consider stacking them into a KxN
matrix. This wil automatically vectorize the computation and avoid high-level loop overhead.
Under development is a more general architecture that automates padding to the next power of 2, and adapts to all sizes.
Amount of invariance to translation¶
The integer T
is the amount of invariance to translation that you require. It must also be a power of 2.
A typical value for second-order scattering of audio is T=8192
, that is 370 ms at a sample rate of 22 kHz. A smaller T
will not integrate full musical notes or full phonemes ; on the contrary, a bigger T
will blur different notes/phonemes together.
The number of octaves in the filter bank is equal to J = log2(T)
.
By default, T
is set equal to size
which means that the corresponding scattering representation S
will be fully translation-invariant.
Quality factor¶
The quality factor max_Q
of a band-pass filter is defined as the ratio of its center frequency by its bandwidth. Consequently, for a given center frequency, increasing the quality factor will decrease the bandwidth proportionnally, hence yielding a “sharper” band-pass filter in the frequency domain. This increase in frequency sharpness comes at the cost of increasing the support of the filter in the time domain, which may prevent the representation to distinguish consecutive events.
All the wavelets in a filter bank share the same quality factor: this is why we refer to it as a constant-Q filter bank. Note that this toolbox also allows variable-Q filter banks in order to cope with time support limitations (see section below). This is why the quality factor is max_Q
.
Typical values for the first order in audio range from 4 to 16. Typical values for the second order along time are 1 or 2. In the context of multivariable scattering, the value 1 is strongly recommended for any derived variable.
A quality factor of 1, corresponding to the so-called ‘dyadic’ filter bank, is the default.
Maximum scale¶
Note that a potential drawback of the constant-Q filterbank is that the time support of the filters is unbounded at the low frequencies. In audio, it is undesirable that acoustic events more than 100 ms apart fall between the same first-order time bin. To address this issue, this toolbox provides a bound max_scale
that restricts the time support, at the cost of decreasing locally the quality factor.
For instance, for max_Q = 12
and a sample rate of 22 kHz, setting max_scale = 2048
(about 93 ms) will provide constant-Q filters for frequencies above Q/max_scale (about 130 Hz) and constant-bandwidth filters below that limit.
Setting max_scale = Inf
will remove the upper bound on the time support and will guarantee that the quality factor is indeed constant throughout the whole frequency range.
By default, max_scale
is set to size
, which means that the time support is only limited by the size of the whole signal.
Number of filters per octave¶
The integer nFilters_per_octave
specified the rational quantization of the gamma
log-scale variable. In order to cover the whole frequency axis, it is compulsory to have
nFilters_per_octave > max_Q
The number of filters in the filter bank is equal to nFilters_per_octave * log2(T)
. Henceforth, note that the computational complexity of the computation is linear in the number of filters per octave of each filter bank.