Soft-thresholding operator with [0,1] constraint

Proximal gradient descent

I often forget how to derive proximal gradient descent. So here we go.

Proximal gradient descent is for solving the problem of this form:

\min_{\bm z} f(\bm z)+g(\bm z)

where $f$ is a differentiable but $g$ is not. The idea of proximal operator is to do a quadratic approximation of $f$ at some point $\bm x$ and solve that with $g$ . We replace the hessian $\nabla^2 f$ by $\frac{1}{\eta}I$ . We want to solve:

\begin{align*} &\argmin_{\bm z}\, f(\bm x)+\nabla f(\bm x)^T(\bm z-\bm x)+\frac{1}{2\eta}\|\bm z-\bm x\|^2_2 + g(\bm z)\\ =&\argmin_{\bm z}\, \frac{1}{2\eta}\|\bm z-(\bm x-\eta\nabla f(\bm x))\|^2_2 + g(\bm z) \end{align*}

The proximal operator $\text{prox}_{g,\eta}(\bm z)$ is defined as the following:

\text{prox}_{g,\eta}(\bm x) = \argmin_{\bm z} \frac{1}{2\eta}\|\bm z - \bm x\|^2_2 + g(\bm z)

And proximal gradient descent is done by choosing an initial point $\bm x^{(0)}$ and execute the following iterative procedure:

\bm x^{(k)} = \text{prox}_{g,\eta^k}(\bm x^{(k-1)}-\eta^k\nabla f(\bm x^{(k-1)})),\quad k=1,2,3...

There are many theoretical properties to show that this basically behaves like gradient descent. I'd skip that for now.

Soft-thresholding operator with [0,1] constraint

The problem I'm interested in is

\min_{\bm z} f(\bm z)+g(\bm z)

where $f(\bm z)=\frac{1}{2}\|\bm A\bm z-\bm b\|^2_2$ and $g(\bm z)=\lambda\|\bm z\|_1+\Pi(\bm z)$ . Here $\Pi$ is an indicator function on $[0,1]^n$ hypercube that returns $\infty$ if any $(z_1,...,z_n)$ falls outside of it.

One can see that the proximal operator for this is

\bm z^{k+1}=\text{prox}_{g,\eta^k}(\bm z^k-\bm A^T(\bm A\bm z^k-\bm b))

and the problem is

\begin{align*} \text{prox}_{g,\eta}(\bm x)&=\argmin_{\bm z}\, \lambda\|\bm z\|_1+\Pi(\bm z) + \frac{1}{2\eta}\|\bm z-\bm x\|^2_2 \\ &=\argmin_{\bm z\in [0,1]^n}\, \lambda\|\bm z\|_1+ \frac{1}{2\eta}\|\bm z-\bm x\|^2_2 \end{align*}

since this is a separable problem, we can focus on the individual components:

\text{prox}_{g,\eta}(x_i) = \argmin_{z_i\in [0,1]} \lambda z_i + \frac{1}{2\eta}(z_i-x_i)^2

and hence the solution to this proximal operator problem is

\min\{\,\max \{0,\, x_i-\eta\lambda\, \},\, 1\}

for each component $x_i$ .

Contact me by E-mail | Github | Linkedin
This work is licensed under CC BY-SA 4.0. Last modified: January 15, 2025.
Website built with Franklin.jl and the Julia language.