Important mathematical statements

What do you need to know to understand this topic?

That really depends from statement to statement. In most of them, a strong mathematical background is required.


Descent Lemma (Proposition A.24 from Bertsekas 1999)

Let $f:\mathbb{R}^n \rightarrow \mathbb{R}$ be continuously differentiable, and let $\mathbf{x}$ and $\mathbf{y}$ be two vectors in $\mathbb{R}^n$. Suppose that $$\left\Vert \nabla f(\mathbf{x}+t\mathbf{y})- \nabla f(\mathbf{x}) \right\Vert \le Lt\left\Vert \mathbf{y} \right\Vert,\qquad \forall t \in [0,1]$$ where L is some scalar. Then $$f(\mathbf{x}+\mathbf{y}) \le f(\mathbf{x}) + \mathbf{y}^T\nabla f(\mathbf{x}) + \frac{L}{2}\left\Vert \mathbf{y} \right\Vert_2^2$$


$$f(\mathbf{x}+\mathbf{y})-f(\mathbf{x}) = \left.f(\mathbf{x}+t\mathbf{y})\right|_{t=1}-\left.f(\mathbf{x}+t\mathbf{y})\right|_{t=0}$$ The two terms can be seen as the limits of an integral: $$ \left.f(\mathbf{x}+t\mathbf{y})\right|_{t=1}-\left.f(\mathbf{x}+t\mathbf{y})\right|_{t=0} = \int_{0}^{1}\frac{df(\mathbf{x}+t\mathbf{y})}{dt}dt=\int_{0}^{1}\mathbf{y}^{T}\nabla f(\mathbf{x}+t\mathbf{y})dt$$ Let's add positive and negative versions of the same term: $$\int_{0}^{1}y^{T}\nabla f(\mathbf{x}+t\mathbf{y})dt = \int_{0}^{1}\mathbf{y}^{T}\nabla f(\mathbf{x}+t\mathbf{y}) - \mathbf{y}^{T}\nabla f(\mathbf{x}) dt + \int_{0}^{1} \mathbf{y}^{T}\nabla f(\mathbf{x}) dt$$ and take the norm of the first term, which guarantees the result is equal or greater than the original: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \int_{0}^{1}\left\Vert\mathbf{y}^{T}(\nabla f(\mathbf{x}+t\mathbf{y}) - \nabla f(\mathbf{x})) \right\Vert_2dt + \int_{0}^{1} \mathbf{y}^{T}\nabla f(\mathbf{x}) dt$$ Now we make use of the Hölder's inequality $\left\Vert \mathbf{x}^T\mathbf{y} \right\Vert \le \left\Vert \mathbf{x} \right\Vert \left\Vert \mathbf{y} \right\Vert$: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \left\Vert\mathbf{y}\right\Vert_2 \int_{0}^{1}\left\Vert\nabla f(\mathbf{x}+t\mathbf{y}) - \nabla f(\mathbf{x})\right\Vert_2 dt + \mathbf{y}^{T}\nabla f(\mathbf{x}) \int_{0}^{1} dt$$ Now we make use of the condition in the lemma to simplify the inequality: $$f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}) \le \left\Vert\mathbf{y}\right\Vert_2 L \int_{0}^{1} t\left\Vert \mathbf{y} \right\Vert_2 dt + y^{T}\nabla f(\mathbf{x}) = \frac{L}{2}\left\Vert\mathbf{y}\right\Vert_2^2 + y^{T}\nabla f(\mathbf{x})$$ QED.