Shane Chu

Convolutional dictionary learning

So we saw convolutions. Given a filter d\bm d and a sparse code x\bm x, an important insight we saw from the convolution is that the sparse code plays the role of an "indicator", indicating where and how much the filter d\bm d should appear in the convolution dx\bm d * \bm x.

Let's leverage this intuition a bit further. What if we have a signal s\bm s, and that we try to "approximate" s\bm s using a sum of convolutions, i.e.,

d1x1+d2x2++dMxMs\bm d_1 * \bm x_1 + \bm d_2 * \bm x_2 + \cdots + \bm d_M * \bm x_M \approx \bm s

for which we can visualize as

Sum of convolutions

where the orange vector on the right is the signal s\bm s. Let's formalize this as an optimization problem (single signal case):

arg min{dm: dm2=1}, {xm} 12mdmxms22+λmxm1 \begin{align*} \argmin_{\{ \bm d_m:\,\|\bm d_m\|_2=1\},\, \,\{\bm x_m\}}\,\,\, \frac{1}{2} \left\| \sum_m \bm d_m * \bm x_m - \bm s \right\|^2_2 + \lambda\sum_{m}\|\bm x_m\|_1 \end{align*}

i.e., we want to find the best filters {dm}\{\bm d_m\} and the sparse code {xm}\{\bm x_m\} to represent the signal s\bm s. The hyperparameter λ\lambda balances the trade-off between data fitting loss and the sparsity of {xm}\{\bm x_m\}, and we enforce the constraint dm2=1\|\bm d_m\|_2=1 to avoid the scaling ambiguity in between the filters and the sparse code. We can easily extend this to the case where we have multiple signals:

arg min{dm: dm2=1}, {xmn} 12nmdmxmnsn22+λn,mxmn1 \begin{align*} \argmin_{\{ \bm d_m:\,\|\bm d_m\|_2=1\},\, \,\{\bm x_{mn}\}}\,\,\, \frac{1}{2} \sum_n\left\| \sum_m \bm d_m * \bm x_{mn} - \bm s_n \right\|^2_2 + \lambda\sum_{n,m}\|\bm x_{mn}\|_1 \end{align*}

The problem formulation of (2) and (3) is called a convolutional dictionary learning (CDL) problem. So what's the advantage of formulating the problem as a CDL?

Equation (1) also shows that CDL is a distributed representation. There's a particular name for this representation: it is a sparse representation — the model uses a sparse linear combination of the filters and the sparse code to describe each signal in the dataset.

Contact me by E-mail | Github | Linkedin
This work is licensed under CC BY-SA 4.0. Last modified: August 26, 2024.
Website built with Franklin.jl and the Julia language.