I often forget how to derive proximal gradient descent. So here we go.
Proximal gradient descent is for solving the problem of this form:
zminf(z)+g(z) where f is a differentiable but g is not. The idea of proximal operator is to do a quadratic approximation of f at some point x and solve that with g. We replace the hessian ∇2f by η1I. We want to solve:
=zargminf(x)+∇f(x)T(z−x)+2η1∥z−x∥22+g(z)zargmin2η1∥z−(x−η∇f(x))∥22+g(z) The proximal operator proxg,η(z) is defined as the following:
proxg,η(x)=zargmin2η1∥z−x∥22+g(z) And proximal gradient descent is done by choosing an initial point x(0) and execute the following iterative procedure:
x(k)=proxg,ηk(x(k−1)−ηk∇f(x(k−1))),k=1,2,3... There are many theoretical properties to show that this basically behaves like gradient descent. I'd skip that for now.
The problem I'm interested in is
zminf(z)+g(z) where f(z)=21∥Az−b∥22 and g(z)=λ∥z∥1+Π(z). Here Π is an indicator function on [0,1]n hypercube that returns ∞ if any (z1,...,zn) falls outside of it.
One can see that the proximal operator for this is
zk+1=proxg,ηk(zk−AT(Azk−b)) and the problem is
proxg,η(x)=zargminλ∥z∥1+Π(z)+2η1∥z−x∥22=z∈[0,1]nargminλ∥z∥1+2η1∥z−x∥22 since this is a separable problem, we can focus on the individual components:
proxg,η(xi)=zi∈[0,1]argminλzi+2η1(zi−xi)2 and hence the solution to this proximal operator problem is
min{max{0,xi−ηλ},1} for each component xi.