# Important mathematical statements

## What do you need to know to understand this topic?

That really depends from statement to statement. In most of them, a strong mathematical background is required.

## Descent Lemma (Proposition A.24 from Bertsekas 1999)

Let $f:\mathbb{R}^n \rightarrow \mathbb{R}$ be continuously differentiable, and let $\mathbf{x}$ and $\mathbf{y}$ be two vectors in $\mathbb{R}^n$. Suppose that $$\left\Vert \nabla f(\mathbf{x}+t\mathbf{y})- \nabla f(\mathbf{x}) \right\Vert \le Lt\left\Vert \mathbf{y} \right\Vert,\qquad \forall t \in [0,1]$$ where L is some scalar. Then $$f(\mathbf{x}+\mathbf{y}) \le f(\mathbf{x}) + \mathbf{y}^T\nabla f(\mathbf{x}) + \frac{L}{2}\left\Vert \mathbf{y} \right\Vert_2^2$$

### Proof

$$f(\mathbf{x}+\mathbf{y})-f(\mathbf{x}) = \left.f(\mathbf{x}+t\mathbf{y})\right|_{t=1}-\left.f(\mathbf{x}+t\mathbf{y})\right|_{t=0}$$ The two terms can be seen as the limits of an integral: $$\left.f(\mathbf{x}+t\mathbf{y})\right|_{t=1}-\left.f(\mathbf{x}+t\mathbf{y})\right|_{t=0} = \int_{0}^{1}\frac{df(\mathbf{x}+t\mathbf{y})}{dt}dt=\int_{0}^{1}\mathbf{y}^{T}\nabla f(\mathbf{x}+t\mathbf{y})dt$$ Let's add positive and negative versions of the same term: $$\int_{0}^{1}y^{T}\nabla f(\mathbf{x}+t\mathbf{y})dt = \int_{0}^{1}\mathbf{y}^{T}\nabla f(\mathbf{x}+t\mathbf{y}) - \mathbf{y}^{T}\nabla f(\mathbf{x}) dt + \int_{0}^{1} \mathbf{y}^{T}\nabla f(\mathbf{x}) dt$$ and take the norm of the first term, which guarantees the result is equal or greater than the original: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \int_{0}^{1}\left\Vert\mathbf{y}^{T}(\nabla f(\mathbf{x}+t\mathbf{y}) - \nabla f(\mathbf{x})) \right\Vert_2dt + \int_{0}^{1} \mathbf{y}^{T}\nabla f(\mathbf{x}) dt$$ Now we make use of the Hölder's inequality $\left\Vert \mathbf{x}^T\mathbf{y} \right\Vert \le \left\Vert \mathbf{x} \right\Vert \left\Vert \mathbf{y} \right\Vert$: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \left\Vert\mathbf{y}\right\Vert_2 \int_{0}^{1}\left\Vert\nabla f(\mathbf{x}+t\mathbf{y}) - \nabla f(\mathbf{x})\right\Vert_2 dt + \mathbf{y}^{T}\nabla f(\mathbf{x}) \int_{0}^{1} dt$$ Now we make use of the condition in the lemma to simplify the inequality: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \left\Vert\mathbf{y}\right\Vert_2 L \int_{0}^{1} t\left\Vert \mathbf{y} \right\Vert_2 dt + y^{T}\nabla f(\mathbf{x}) = \frac{L}{2}\left\Vert\mathbf{y}\right\Vert_2^2 + y^{T}\nabla f(\mathbf{x})$$ QED.

If I helped you in some way, please help me back by liking this website on the bottom of the page or clicking on the link below. It would mean the world to me!

 Tweet