# Legendre and Legendre-Fenchel transforms

## What do you need to know to understand this topic?

• Derivatives of a function
• What are convex and non-convex functions

## What is the Legendre Transform?

The Legendre transform is a transformation from a convex differentiable function $f(\mathbf{x})$ to a function that depends on the family of tangents $\mathbf{s}=\nabla_\mathbf{x} f(\mathbf{x})$. To see how it works, you must first realize that for a convex and smooth function there is a one-to-one correspondence between $\mathbf{x}$ and $\mathbf{s}(\mathbf{x})$, because $\mathbf{s}(\mathbf{x})$ is a monotonic function of $\mathbf{x}$, i.e., the derivative is always increasing with $\mathbf{x}$. You can see this notion in the plot below. In some situations, it is more meaningful or convenient to work with a function of $\mathbf{s}$ than with the original function $f(\mathbf{x})$. That's when the Legendre transform comes in handy. Let's go into the math. At any point $\mathbf{x^\star}$, we have the tangent with slope: $$\mathbf{s}=\left.\frac{df(\mathbf{x})}{d\mathbf{x}}\right|_{\mathbf{x}=\mathbf{x}^{\star}}$$ If we want a function of $\mathbf{s}$, we could first find the inverse of $\mathbf{s}(\mathbf{x})$, say $\mathbf{x}(\mathbf{s})$, and apply it to $f(\mathbf{x})$. We would then get $f(\mathbf{x}(\mathbf{s}))$. Of course there are many ways to get a function of $\mathbf{s}$, not necessarily the one presented right now. Instead, the Legendre transform is the expression: $$f^\star(\mathbf{s})=\mathbf{s}^{T}\mathbf{x}(\mathbf{s})-f(\mathbf{x}(\mathbf{s}))$$ To find out how this expression is derived, let's write the full expression of a tangent (see the picture below): $$y=\mathbf{s}^{T}\mathbf{x}+b,$$ where $b$ represents the value of the tangent when $\mathbf{x=0}$. We know that at $\mathbf{x}^{\star}$ the tangent touches $f(\mathbf{x})$, therefore: $$f(\mathbf{x}^{\star}) = \mathbf{s}^{T}\mathbf{x}^{\star}+b$$ $$-b = \mathbf{s}^{T}\mathbf{x}^{\star}-f(\mathbf{x}^{\star})$$ Let's call the term $-b$, $f^{\star}(\mathbf{s})$. The reason for it is that this term changes with $\mathbf{x}^{\star}$ and therefore with $\mathbf{s}$. Then, for any point $\mathbf{x}$: $$f^{\star}(\mathbf{s}) = \mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})$$ In words, for any point $\mathbf{x}=\mathbf{x}^{\star}$, the transform is the negative of parameter $b$ of the tangent line at that point (the value of the tangent line when $\mathbf{x}=\mathbf{0}$). Graphically, this is depicted in figure below: In a geometric sense, there is a triangle shown. In it, the slope of the tangent ($s$) times the horizontal side ($x^\star$) equals the vertical side, which is the sum of $f(x)$ and $f^{\star}(s)$.

Although this seems all a bit random, this transform has some nice properties. For one, we can apply it again to the transform to get back the original function: $$\begin{eqnarray*} f^{\star\star}(\mathbf{x}) & = & \mathbf{x}^{T}\mathbf{s}-f^{\star}(\mathbf{s}) \\ & = & \mathbf{x}^{T}\mathbf{s}-\left(\mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})\right) \\ & = & f(\mathbf{x}) \end{eqnarray*}$$ In the same way that the slope of $f(\mathbf{x})$ is $\mathbf{s}$, we have that the slope of $f^{\star}(\mathbf{s})$ is $\mathbf{x}$: $$\frac{df^{\star}(\mathbf{s})}{d\mathbf{s}}=\frac{d\mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})}{d\mathbf{s}}=\mathbf{x}$$ Say that we want to find the minimum of $f(\mathbf{x})$ and we have $f^{\star}(\mathbf{s})$. Suffice to say that at the minimum, the slope of the tangent is zero. Therefore: $$\begin{eqnarray*} f^{\star}(\mathbf{0}) & = &\mathbf{0}^{T}\mathbf{x}_{min}-f(\mathbf{x}_{min}) \\ f(\mathbf{x}_{min}) & = & -f^{\star}(\mathbf{0}) \end{eqnarray*}$$ Pretty cool, uh? For some examples of the Legendre transform in the domains of physics and mechanics, check out this educational paper.

## What is the Legendre-Fenchel transform?

The Legendre transform is pretty useful on its own, but it is limited to convex and differentiable functions. If any of these properties fail, the transform cannot be used. Consider the case $f(x)=|x|$. The slope at $x=0$ is not defined. All we know is that, at point $x=0$, the subdifferential is between -1 and 1 ($s \in [-1,1]$). Also, non-convex functions have the same derivative for different values of $\mathbf{x}$, which makes it impossible to form a unique correspondence between $\mathbf{x}$ and $\mathbf{s}$. Here comes a nifty trick to generalize the Legendre transform to this type of functions. As you now know, the value $f^{\star}(\mathbf{s})$ comes from finding $b$ of the tangent, which is the smallest $b$ term for a line (hyperplane for several dimensions) with slope $\mathbf{s}$ that crosses the convex funcion $f(\mathbf{x})$. As shown in the picture below, the smallest $b$ that crosses the $f(\mathbf{x})$ is the one from the tangent. Smaller than that and the line will not even touch $f(\mathbf{x})$. So, in a more general sense, we are finding all values of $b$ with which lines with slope $\mathbf{s}$ cross $f(\mathbf{x})$ and then find the minimum $b$. This more general rule applies to non-differentiable or non-convex functions. So, when a line with slope $\mathbf{s}$ crosses $f(\mathbf{x})$, we have: $$\begin{eqnarray*} \mathbf{s}^{T}\mathbf{x}+b & = & f(\mathbf{x}) \\ b & = & f(\mathbf{x})-\mathbf{s}^{T}\mathbf{x} \end{eqnarray*}$$ and we want the smallest value of $b$ for all $\mathbf{x}$. Then: $$\begin{eqnarray*} b & = & \underset{\mathbf{x}}{\inf}\left(f(\mathbf{x})-\mathbf{s}^{T}\mathbf{x}\right) \\ & = & \underset{\mathbf{x}}{\inf}\left(-\left(\mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})\right)\right) \\ & = & -\underset{\mathbf{x}}{\sup}\left(\mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})\right) \end{eqnarray*}$$ where we used the property $\inf\left(\mathbf{x}\right)=-\sup\left(-\mathbf{x}\right)$. Since $f^{\star}(\mathbf{s})=-b$, we finally have: $$f^{\star}(\mathbf{s}) =-b= \underset{\mathbf{x}}{\sup}\left(\mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})\right)$$ This is the Legendre-Fenchel transform, also known as convex conjugate. Note that now the transform is not reversible, i.e., you cannot get the original function by applying the transform to the transform. On the other hand, the transform of the transform is convex, even if the original function is not.

### Examples

#### Linear function

For the linear function: $$f(\mathbf{x})=\mathbf{c}^{T}\mathbf{x}-b$$ the Legendre transform is $$\sup \mathbf{s}^{T}\mathbf{x}-f(\mathbf{x})$$ $$\sup \left(\mathbf{s}^{T}-\mathbf{c}^{T}\right)\mathbf{x}+b$$ For $\mathbf{s}^{T}\ne\mathbf{c}^{T}$, the supremum is found for infinite values of $\mathbf{x}$. Therefore, the condition $\mathbf{s}^{T}=\mathbf{c}^{T}$ is necessary to have a bounded conjugate: $$f^{\star}(\mathbf{s})=\begin{cases} b & \mathbf{s}^{T}=\mathbf{c}^{T}\\ \infty & \mbox{otherwise} \end{cases}$$

#### Weighted Least squares

The weighted least squares problem is defined as: $$f(\mathbf{x})=\frac{1}{2}\left\Vert \b{A} (\b{y}-\b{x})\right\Vert _{2}^{2}$$ The Legendre transform is then: $$\sup_{\mathbf{x}} \mathbf{s}^{T}\mathbf{x}-\frac{1}{2}\left\Vert \b{A}(\mathbf{y}-\mathbf{x})\right\Vert_{2}^{2}$$ To find the value of $\mathbf{x}$ that maximizes the above expression, we equate to zero its derivative w.r.t. $\mathbf{x}$: $$\mathbf{s}+\b{A}^T \b{A}(\mathbf{y}-\mathbf{x}) = \mathbf{0}$$ $$\mathbf{x} = \left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}+\mathbf{y}$$ Now we replace $\mathbf{x}$ in the original transform: $$\sup_{\mathbf{x}}\mathbf{s}^{T}\mathbf{x}-\frac{1}{2}\left\Vert \b{A}(\mathbf{y}-\mathbf{x})\right\Vert _{2}^{2}$$ $$\mathbf{s}^{T}\left(\left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}+\mathbf{y}\right)-\frac{1}{2}\left\Vert \b{A}\left(\mathbf{y}-\left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}-\mathbf{y}\right)\right\Vert _{2}^{2}$$ $$\mathbf{s}^{T}\left(\left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}+\mathbf{y}\right)-\frac{1}{2}\left\Vert \b{A}\left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}\right\Vert _{2}^{2}$$ $$\b{s}^{T}\left(\b{A}^T \b{A}\right)^{-1}\b{s}+\b{s}^{T}\b{y}-\frac{1}{2}\mathbf{s}^{T}\left(\b{A}^T \b{A}\right)^{-1}\mathbf{s}$$ $$\frac{1}{2}\b{s}^{T}\left(\b{A}^T \b{A}\right)^{-1}\b{s}+\b{s}^{T}\b{y}$$ The common case when $\b{A}$ is a diagonal matrix $\b{\Lambda}$, the above expression can be simplified: $$\frac{1}{2}\b{s}^{T}\left(\b{\Lambda}\right)^{-2}\b{s}+\b{s}^{T}\b{y}=\frac{1}{2}\left\Vert \b{\Lambda}^{-1}\b{s}\right\Vert+\b{s}^{T}\b{y}$$ When $\b{A}$ is the identity matrix, it becomes: $$\frac{1}{2}\left\Vert \mathbf{s}\right\Vert _{2}^{2}+\mathbf{s}^T\mathbf{y}$$

#### $\ell_1$-norm

For the $\ell_1$-norm: $$f(\mathbf{x})=a\left\Vert \mathbf{x}\right\Vert _{1}$$ the Legendre transform is $$\sup \mathbf{s}^{T}\mathbf{x}-a\left\Vert \mathbf{x}\right\Vert _{1}$$ The transform can be written as the sum of elements of $\mathbf{s}$ and $\mathbf{x}$ as: $$\sup\sum\left(s_{i}x_{i}-a\left|x_{i}\right|\right)=\sup\sum\left(\begin{cases} \left(s_{i}-a\right)x_{i} & ,x_{i}\ge 0 \\ \left(s_{i}+a\right)x_{i} & ,x_{i}\lt 0 \end{cases}\right)$$ This way is easy to see that if any element $s_{i}>a$, then any vector $\mathbf{x}$ with $x_{i}\rightarrow\infty$ will push the transform to infinity. Likewise, if any element $s_{i}\lt-a$, then any vector $\mathbf{x}$ with $x_{i}\rightarrow-\infty$ will push the transform to $\infty$ too. For any $-a\le s_{i}\le a$, $\left(s_{i}-a\right)\le 0$ and $\left(s_{i}+a\right)\ge 0$. Then, under this condition, the expressions above are always lower than zero because they always multiply terms with opposite signs. That in turn, means that the supremum is found for $x_{i}=0$. In conclusion, the transform is: $$f^{\star}(\mathbf{s})=\underset{\mathbf{x}}{\sup}\left(\mathbf{s}^{T}\mathbf{x}-a\left\Vert \mathbf{x}\right\Vert _{1}\right)=\left\Vert \mathbf{s}\right\Vert _{\infty}^{a}=\begin{cases} 0 & \left|s_{i}\right|\le a,\forall i\\ +\infty & \mbox{otherwise} \end{cases}$$

#### Weighted $\ell_1$-norm

For the weighted $\ell_1$-norm case: $$f(\mathbf{x})=a\left\Vert \mathbf{W^{-1}}\mathbf{x}\right\Vert _{1}$$ the Legendre transform is $$\sup \mathbf{s}^{T}\mathbf{x}-a\left\Vert \mathbf{W}^{-1}\mathbf{x}\right\Vert _{1}$$ $$\sup \left(\mathbf{s}^{T}\mathbf{W}\right)\mathbf{z}-a\left\Vert \mathbf{z}\right\Vert _{1}$$ with $\mathbf{z}=\mathbf{W}^{-1}\mathbf{x}$ and assuming matrix $\mathbf{W}$ can be inverted. Following a similar approach to the previous example, the values for $s_{i}$ are limited, but now each is scaled differently: $$\sup\sum\left(\mathbf{s}^{T}\mathbf{w}_{i}z_{i}-a\left|z_{i}\right|\right)=\sup\sum\left(\begin{cases} \left(\mathbf{s}^{T}\mathbf{w}_{i}-a\right)z_{i} & ,z_{i}\ge 0\\ \left(\mathbf{s}_{i}^{T}\mathbf{w}_{i}-a\right)z_{i} & ,z_{i} \lt 0 \end{cases}\right)$$ where $\mathbf{w}_{i}$ is the $i^{th}$ column of $\mathbf{W}$. For the supremum to be bounded, we must have that each $-a\lt\mathbf{s}^{T}\mathbf{w}_{i}\lt a$. In matrix notation, we get: $$f^{\star}(\mathbf{s})=\underset{\mathbf{z}}{\sup}\left(\mathbf{s}^{T}\mathbf{W}\mathbf{z}-a\left\Vert \mathbf{z}\right\Vert _{1}\right)=\left\Vert \mathbf{s}^{T}\mathbf{W}\right\Vert _{\infty}^{a}=\left\Vert \mathbf{W}^{T}\mathbf{s}\right\Vert _{\infty}^{a}=\begin{cases} 0 & \left|\mathbf{s}^{T}\mathbf{w}_{i}\right|\le a,\forall i\\ +\infty & \mbox{otherwise} \end{cases}$$ We would reach the same result if we simply noticed that compared to the previous example, the variable $\mathbf{s}$ is now replaced by $\mathbf{W}^T\mathbf{s}$, so that's what goes inside the $\left\Vert \cdot \right\Vert_\infty^a$.

#### Difference $\ell_1$-norm

For the difference $\ell_1$-norm: $$f(\mathbf{x})=a\left\Vert \mathbf{x} - \b{y} \right\Vert _{1}$$ the Legendre transform is $$\sup_\b{x} \mathbf{s}^{T}\mathbf{x}-a\left\Vert \mathbf{x} - \b{y}\right\Vert _{1}$$ The transform can be written as the sum of elements of $\mathbf{s}$ and $\mathbf{x}$ as: $$\sup\sum\left(s_{i}x_{i}-a\left|x_{i} - y_i\right|\right)=\sup\sum\left(\begin{cases} \left(s_{i}-a\right)x_{i} +a\cdot y_i & ,x_{i} - y_i \ge 0 \\ \left(s_{i}+a\right)x_{i} -a\cdot y_i & ,x_{i} - y_i \lt 0 \end{cases}\right)$$ This way is easy to see that if any element $s_{i}>a$, then any vector $\mathbf{x}$ with $x_{i}\rightarrow\infty$ will push the transform to infinity. Likewise, if any element $s_{i}\lt-a$, then any vector $\mathbf{x}$ with $x_{i}\rightarrow-\infty$ will push the transform to $\infty$ too. For any $-a\le s_{i}\le a$, $\left(s_{i}-a\right)\le 0$ and $\left(s_{i}+a\right)\ge 0$. Then, under this condition, the expressions $(s_i - a)x_i$ and $(s_i + a)x_i$ are always lower than zero because they always multiply terms with opposite signs. That in turn, means that the supremum is found for $x_{i}=y_i$, which leads to: $$\sup_{x_i} (s_i x_i - a|x_i - y_i| ) = \begin{cases} (s_i - a)y_i + a \cdot y_i = s_i y_i & x_i - y_i \ge 0\\ (s_i + a)y_i - a \cdot y_i = s_i y_i & x_i - y_i \lt 0\\ \end{cases}$$ In conclusion, the transform is: $$f^{\star}(\mathbf{s})=\underset{\mathbf{x}}{\sup}\left(\mathbf{s}^{T}\mathbf{x}-a\left\Vert \mathbf{x} - \b{y} \right\Vert _{1}\right)=\left\Vert \mathbf{s}\right\Vert _{\infty}^{a}=\begin{cases} \b{s}^T\b{y} & \left|s_{i}\right|\le a,\forall i\\ +\infty & \mbox{otherwise} \end{cases}$$ When $\b{y = 0}$, the above expression reverts to the $\ell_1$-norm case.