Shane Chu

Proximal gradient descent

I often forget how to derive proximal gradient descent. So here we go.

Proximal gradient descent is for solving the problem of this form:

minzf(z)+g(z)\min_{\bm z} f(\bm z)+g(\bm z)

where ff is a differentiable but gg is not. The idea of proximal operator is to do a quadratic approximation of ff at some point x\bm x and solve that with gg. We replace the hessian 2f\nabla^2 f by 1ηI\frac{1}{\eta}I. We want to solve:

arg minz f(x)+f(x)T(zx)+12ηzx22+g(z)=arg minz 12ηz(xηf(x))22+g(z) \begin{align*} &\argmin_{\bm z}\, f(\bm x)+\nabla f(\bm x)^T(\bm z-\bm x)+\frac{1}{2\eta}\|\bm z-\bm x\|^2_2 + g(\bm z)\\ =&\argmin_{\bm z}\, \frac{1}{2\eta}\|\bm z-(\bm x-\eta\nabla f(\bm x))\|^2_2 + g(\bm z) \end{align*}

The proximal operator proxg,η(z)\text{prox}_{g,\eta}(\bm z) is defined as the following:

proxg,η(x)=arg minz12ηzx22+g(z) \text{prox}_{g,\eta}(\bm x) = \argmin_{\bm z} \frac{1}{2\eta}\|\bm z - \bm x\|^2_2 + g(\bm z)

And proximal gradient descent is done by choosing an initial point x(0)\bm x^{(0)} and execute the following iterative procedure:

x(k)=proxg,ηk(x(k1)ηkf(x(k1))),k=1,2,3... \bm x^{(k)} = \text{prox}_{g,\eta^k}(\bm x^{(k-1)}-\eta^k\nabla f(\bm x^{(k-1)})),\quad k=1,2,3...

There are many theoretical properties to show that this basically behaves like gradient descent. I'd skip that for now.

Soft-thresholding operator with [0,1] constraint

The problem I'm interested in is

minzf(z)+g(z)\min_{\bm z} f(\bm z)+g(\bm z)

where f(z)=12Azb22f(\bm z)=\frac{1}{2}\|\bm A\bm z-\bm b\|^2_2 and g(z)=λz1+Π(z)g(\bm z)=\lambda\|\bm z\|_1+\Pi(\bm z). Here Π\Pi is an indicator function on [0,1]n[0,1]^n hypercube that returns \infty if any (z1,...,zn)(z_1,...,z_n) falls outside of it.

One can see that the proximal operator for this is

zk+1=proxg,ηk(zkAT(Azkb))\bm z^{k+1}=\text{prox}_{g,\eta^k}(\bm z^k-\bm A^T(\bm A\bm z^k-\bm b))

and the problem is

proxg,η(x)=arg minz λz1+Π(z)+12ηzx22=arg minz[0,1]n λz1+12ηzx22\begin{align*} \text{prox}_{g,\eta}(\bm x)&=\argmin_{\bm z}\, \lambda\|\bm z\|_1+\Pi(\bm z) + \frac{1}{2\eta}\|\bm z-\bm x\|^2_2 \\ &=\argmin_{\bm z\in [0,1]^n}\, \lambda\|\bm z\|_1+ \frac{1}{2\eta}\|\bm z-\bm x\|^2_2 \end{align*}

since this is a separable problem, we can focus on the individual components:

proxg,η(xi)=arg minzi[0,1]λzi+12η(zixi)2 \text{prox}_{g,\eta}(x_i) = \argmin_{z_i\in [0,1]} \lambda z_i + \frac{1}{2\eta}(z_i-x_i)^2

and hence the solution to this proximal operator problem is

min{ max{0, xiηλ }, 1} \min\{\,\max \{0,\, x_i-\eta\lambda\, \},\, 1\}

for each component xix_i.

Contact me by E-mail | Github | Linkedin
This work is licensed under CC BY-SA 4.0. Last modified: January 15, 2025.
Website built with Franklin.jl and the Julia language.