# Proximal operator or mapping

## What do you need to know to understand this topic?

• Basics of mathematics

## What is the proximal operator?

The proximal operator is used to make an approximation to a value, while making a compromise between the accuracy of the approximation and a cost associated to the new value. Say we have the following minimization problem: $$\underset{\mathbf{u}}{\min } \frac{1}{2}\left\Vert \mathbf{u} - \mathbf{x}\right\Vert _2^2.$$ It is clear that the best approximation to $\mathbf{x}$ is the $\mathbf{x}$ itself ($\mathbf{u} = \mathbf{x}$). Now we add a cost to the choice we make of $\mathbf{u}$: $$\textrm{prox}_h(\mathbf{x}) = \underset{\mathbf{u}}{\textrm{argmin }} \frac{1}{2}\left\Vert \mathbf{u} - \mathbf{x}\right\Vert _2^2 + h(\mathbf{u}).$$ Now the best approximation depends how costly it is to choose $\mathbf{u} = \mathbf{x}$, given $h(\mathbf{u})$, and a compromise must be made. The first example is the case that $h(\mathbf{u}) = 0$. The proximal operator $\textrm{prox}_h(\mathbf{x})$ is the analytic solution to the previous minimization problem.

### Examples

• Consider $h(\mathbf{u}) = \lambda \left\Vert \mathbf{u} \right\Vert_1$. The minimization function is $$\frac{1}{2}\left\Vert\mathbf{u} - \mathbf{x}\right\Vert _2^2 + \lambda \left\Vert \mathbf{u} \right\Vert_1$$ To find the minimum we must solve for the root of the gradient of this function. You may note that both terms are separable in $\mathbf{u}$ and thus we can minimize each element of $\mathbf{u}$ individually. This simplifies the expression to: $$\frac{1}{2} (u_i - x_i)^2 + \lambda |u_i|, \forall i$$ To find the derivative of $|u_i|$, we have two situations: either $u_i \gt 0$ or $u_i \lt 0$. For $u_i \gt 0$, the derivative is: $$u_i - x_i + \lambda = 0$$ $$u_i = x_i - \lambda.$$ Since $u_i \gt 0$, this is only valid for $x_i \gt \lambda$. The same logic applies for $u_i \lt 0$, with which we get: $$u_i = x_i + \lambda.$$ and it is only valid for $x_i \lt -\lambda$. For $-\lambda \lt x_i \lt \lambda$, $u_i = 0$ to make the derivative somewhere between $-\lambda$ and $\lambda$. Summing all up: $$u_{i}=\begin{cases} x_{i}-\lambda & x_{i}>\lambda\\ x_{i}+\lambda & x_{i} \lt -\lambda\\ 0 & -\lambda \le x_{i} \le \lambda\end{cases}$$ This is equivalent to the expression: $$u_i = \textrm{prox}_h(x_i) = \max(|x_i| - \lambda, 0) \textrm{sign}(x_i)$$ This particular case of the proximal operator is known as the shrinkage operator and is used in iterative soft thresholding (IST) algorithms.
• Consider $h(\mathbf{u}) = \lambda \left\Vert \Sigma\mathbf{u} \right\Vert_1$, where $\Sigma$ is a diagonal matrix with elements $\sigma_i$. The minimization function is $$\frac{1}{2}\left\Vert\mathbf{u} - \mathbf{x}\right\Vert _2^2 + \lambda \left\Vert \Sigma \mathbf{u} \right\Vert_1$$ To find the minimum we must solve for the root of the gradient of this function. Since the matrix $\Sigma$ is diagonal, the terms are separable in $\Sigma \mathbf{u}$ and thus we can minimize each element of $\mathbf{u}$ individually. This simplifies the expression to: $$\frac{1}{2} (u_i - x_i)^2 + \lambda \sigma_i|u_i|, \forall i$$ To find the derivative of $|u_i|$, we have two situations: either $u_i \gt 0$ or $u_i \lt 0$. For $u_i \gt 0$, the derivative is: $$u_i - x_i + \lambda \sigma_i= 0$$ $$u_i = x_i - \lambda \sigma_i.$$ Since $u_i \gt 0$, this is only valid for $x_i \gt \lambda \sigma_i$. The same logic applies for $u_i \lt 0$, with which we get: $$u_i = x_i + \lambda \sigma_i.$$ and it is only valid for $x_i \lt -\lambda \sigma_i$. For $-\lambda \sigma_i \lt x_i \lt \lambda \sigma_i$, $u_i = 0$ to make the derivative somewhere between $-\lambda \sigma_i$ and $\lambda \sigma_i$. Summing all up: $$u_{i}=\begin{cases} x_{i}-\lambda \sigma_i & x_{i}>\lambda \sigma_i\\ x_{i}+\lambda\sigma_i & x_{i} \lt -\lambda\sigma_i\\ 0 & -\lambda\sigma_i \le x_{i} \le \lambda\sigma_i \end{cases}$$ This is equivalent to the expression: $$u_i = \textrm{prox}_h(x_i) = \max(|x_i| - \lambda \sigma_i, 0) \textrm{sign}(x_i)$$ This expression may be useful when you want to give different costs to each element of $\mathbf{x}$.
• Consider $h(\mathbf{u}) = I_C(\mathbf{u})$, where: $$I_C(\mathbf{u}) = \begin{cases} 0 & \mathbf{u} \in C\\ +\infty & \mathbf{u} \notin C\end{cases}$$ It is easy to see that if $\mathbf{u}$ does not belong to a certain set C, the cost is infinite and therefore not an option. We can convert this cost function into the proximal operator: $$\textrm{prox}_h(\mathbf{x}) = \underset{\mathbf{u}}{\textrm{argmin }} \frac{1}{2}\left\Vert \mathbf{u} - \mathbf{x}\right\Vert _2^2, \mathbf{u} \in C$$ The closer $\mathbf{u}$ to $\mathbf{x}$ belonging to the set C is just the projection of $\mathbf{x}$ in the set C.
• Consider $h(\mathbf{u}) = I_{\left\Vert \mathbf{u} \right\Vert_\infty \le \lambda}(\mathbf{u})$, where. $$I_{\left\Vert \mathbf{u} \right\Vert_\infty \le \lambda}(\mathbf{u}) = \begin{cases} 0 & \left\Vert\mathbf{u}\right\Vert_\infty = \max(\mathbf{u}) \le \lambda\\ +\infty & \textrm{otherwise}\end{cases}$$ and $\max(\mathbf{u})$ is the biggest element in $\mathbf{u}$. This constraint assures that the largest element of the approximation $\mathbf{u}$ is lower than $\lambda$, otherwise the cost is infinite. The proximal operator for this constraint is: $$\mathbf{u} = \textrm{prox}_h(\mathbf{x}) = \min\left(1, \frac{\lambda}{|x_i|}\right) x_i, x_i = \textrm{each element in }\mathbf{x}$$ Intuitively, the proximal operator leaves any element below $\lambda$ unchanged ($\lambda / |x_i| \gt 1$) and any element above $\lambda$ is shrunk to $\lambda$.