I am taking Convex Optimization this semester (2021 Spring). The notes are based on my understanding of the course materials provided by Prof. Zhouchen Lin. The notes only offer concise definition and formulas with few proofs and examples.

In this chapter, we introduce convexity. We define convex sets first and will discuss convex functions in the next chapter. We go from affine sets to convex sets and cones. Later on, we will talk about some operations that could preserve convexity. Also, we will discuss about how to separate two convex sets.

Convex Sets#

Table of Contents:

Affine and convex sets
- Affine
- Convex
- Cone
Operators that preserve convexity
Separating and supporting hyperplanes

Affine and convex sets#

Affine#

Line segment

The line passing through two given points $ \mathbf{x}{1}, \mathbf{x}{2}$ can be described parametrically by $\theta$ :

$\mathbf{y}=\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2} = \mathbf{x}_{2}+\theta( \mathbf{x}_{1}- \mathbf{x}_{2})$
Affine sets

A set is affine if for any $ \mathbf{x}{1}, \mathbf{x}{2}\in {C}, \theta\in\mathbb{R}$, we have $\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2}\in{C}$ .

e.g. The solution set of a linear equation is an affine set:

$\mathbf{Ax}_1=\mathbf{b}, \mathbf{Ax}_2=\mathbf{b}\Rightarrow \mathbf{A}(\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2})=\mathbf{b}$

In reverse, every affine set can be expressed as the solution set of a system of linear equations.
Affine hull

Given a set $\mathcal{C}$ , generalize to finite(to ensure the affine combination to be finite) multiple points and we have its affine hull:

$\operatorname{aff} C=\left\{\theta_{1} \mathbf{x}_{1}+\ldots+\theta_{k} \mathbf{x}_{k} \mid \mathbf{x}_{1}, \ldots, \mathbf{x}_{k} \in C, \theta_{1}+\ldots+\theta_{k}=1\right\}$

We can define relative interior and relative boundary of an affine set. By relative, we mean that when $\text{dim} {C} < \text{dim} {B}(\mathbf{x}, r)$ , only $B(\mathbf{x}, r) \cap \operatorname{aff} {C} \subseteq {C}$ is required (not the whole ball).

Convex#

Convex sets

A set is convex if for any $\mathbf{x}_{1}, \mathbf{x}_{2}\in {C}, 0\le\theta\le 1$ , we have $\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2}\in{C}$ . (i.e. line segment between any 2 points lies in the set)
- Not necessary to be closed
- $\empty$ is also convex
Practice. How to determine whether a set is convex?
1. By definition
  
  Example: $\left\{\mathbf{x} \in \mathbb{R}^{n} \mid \alpha \leq \mathbf{a}^{\top} \mathbf{x} \leq \beta\right\}$ is convex:
  
  $\mathbf{a}^\top(\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2}) = \theta\mathbf{a^\top}\mathbf{x}_{1}+(1-\theta)\mathbf{a^\top}\mathbf{x}_{2}\\\theta\alpha\le \theta\mathbf{a^\top}\mathbf{x}_{1}\le \theta\beta,\quad(1-\theta)\alpha \le (1-\theta)\mathbf{a^\top}\mathbf{x}_{2}\le (1-\theta)\beta\\\Rightarrow \alpha \le \mathbf{a}^\top(\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2})\le \beta$
2. By constructing the condition of the set to be the form of a convex set.
  
  Example: $\left\{\mathbf{x} \mid\|\mathbf{x}-\mathbf{a}\|_{2} \leq \theta\|\mathbf{x}-\mathbf{b}\|_{2}\right\}(\mathbf{a} \neq \mathbf{b},0\le\theta\le 1)$ is convex.
  
  $\begin{aligned}\|\mathbf{x}-\mathbf{a}\|_{2} \leq& \theta\|\mathbf{x}-\mathbf{b}\|_{2}\\(\mathbf{x}-\mathbf{a})^\top (\mathbf{x}-\mathbf{a})\le& \theta^2(\mathbf{x}-\mathbf{b})^\top (\mathbf{x}-\mathbf{b})\\(1-\theta^2)\mathbf{x}^\top \mathbf{x}-2(\mathbf{a}^\top-\theta^2\mathbf{b}^\top)\mathbf{x}\le &\theta^2\mathbf{b}^\top \mathbf{b}-\mathbf{a}^\top \mathbf{a}\\\mathbf{x}^\top \mathbf{x}-\frac{2(\mathbf{a}^\top-\theta^2\mathbf{b}^\top)\mathbf{x}}{1-\theta^2}\le &\frac{\theta^2\mathbf{b}^\top \mathbf{b}-\mathbf{a}^\top \mathbf{a}}{1-\theta^2}\\\left\|\mathbf{x}-\frac{\mathbf{a}-\theta^2\mathbf{b}}{1-\theta^2}\right\|_2^2\le& \frac{\theta^2(\mathbf{a}-\mathbf{b})^2}{(1-\theta^2)^2}\\\left\|\mathbf{x}-\frac{\mathbf{a}-\theta^2\mathbf{b}}{1-\theta^2}\right\|_2\le&\frac{\theta\|\mathbf{a}-\mathbf{b}\|_2}{1-\theta^2}\\\end{aligned}$
  
  The set is a n-dimensional ball, which is convex.
Convex hull

Convex hull of a set $C$ is the set of all convex combinations of points in $C$ .

Examples:
- $\operatorname{conv}\{\mathbf{e}_i\mathbf{e}_j^\top\} = x_{ij}\ge 0,\sum_{ij}x_{ij}=1$
- $\operatorname{conv}\{\mathbf{u}\mathbf{u}^\top\mid\|\mathbf{u}\|=1 \}$
  
  By definition of convex hull, $S = \{\sum_{i}\lambda_i\mathbf{u}_i\mathbf{u}_i^\top\mid \lambda_i\ge0,\sum_i\lambda_i=1\}$ . If $\mathbf{X}\in S$ , then $\mathbf{X}\succeq 0, \operatorname{tr}(\mathbf{X})=1$ .
  1. Sum of semidefinite matrices is still semidefinite.
  2. $\operatorname{tr}\left(\sum_{i}\lambda_i\mathbf{u}_i\mathbf{u}_i^\top\right) = \sum_i\lambda_i\operatorname{tr}\left(\mathbf{u}_i\mathbf{u}_i^\top\right)=\sum_{i}\lambda_i=1$ .
  Now we have another set $\tilde{S}=\{\mathbf{X}\mid\mathbf{X}\succeq 0, \operatorname{tr}(\mathbf{X})=1 \}$ . We have already proved that $S\subset\tilde{S}$ .
  
  Suppose that $\mathbf{X}\in\tilde{S}$ . By eigenvalue decomposition, $\mathbf{X} = \sum_i{\lambda_i}\mathbf{u}_i\mathbf{u}_i^\top$ . $\sum_i\lambda_i = \operatorname{tr}(\mathbf{X}) = 1$ . Thus, $\tilde{S}\subset S$ , $\tilde{S}= S$ .
- $\operatorname{conv}\{\mathbf{u}\mathbf{v}^\top\mid\|\mathbf{u}\|=\|\mathbf{v}\|=1 \}$
  
  $S = \{\sum_{i}\sigma\mathbf{u}_i\mathbf{v}_i^\top\mid \sigma\ge0,\sum_i\sigma_i=1\}$ . $\tilde{S} = \tilde{S}=\{\mathbf{X}\mid\|\mathbf{X}\|_*\le 1\}$ .
  1. $S\subset \tilde{S}$
    
    $\begin{aligned} \|\mathbf{X}\|_* =&\|\sum_i\sigma\mathbf{u}_i\mathbf{v}_i^\top\|_*\quad\text{by triangle inequality of norm,}\\ \le& \sum_i\sigma_i \|\mathbf{u}_i\mathbf{v}_i^\top\|_*\quad \text{the rank is 1, so its SVD=itself, eigenvalue is }(1,0,\cdots)\\ =&\sum_i\sigma_i = 1 \end{aligned}$
  2. $\tilde{S}\subset S$
    
    $\|\mathbf{X}\|=\mathbf{U\Sigma V}^\top = \sum_i^p\sigma_i\mathbf{u}_i\mathbf{v}_i^\top, \quad p=\min(m,n)$
    
    Since $\sigma_i$ is singular value, it satisfies $\sigma_i\ge 0$ . Let $\delta = 1-\sum_{i=1}^p\sigma_i$ . Take $\sigma_{p+1}=\sigma_{p+2} = \delta/2$ , take $\mathbf{u}_{p+2}=-\mathbf{u}_{p+1}, \mathbf{v}_{p+1}=\mathbf{v}_{p+1}$ . Now we have:
    
    $\mathbf{X}=\sum_{i=1}^p\sigma_i\mathbf{u}_{i}\mathbf{v}_{i}^\top+\frac{\delta}{2}\mathbf{u}_{p+1}\mathbf{v}_{p+1}^\top-\frac{\delta}{2}\mathbf{u}_{p+1}\mathbf{v}_{p+1}^\top$

Cone#

A set $C$ is a cone if for every $\mathbf{x}\in C,\theta>0$ , we have $\theta\mathbf{x}\in C$ .

A set $C$ is a convex cone if for any $\mathbf{x}_1,\mathbf{x}_2\in C, \theta_1,\theta_2>0$ , we have $\theta_1\mathbf{x}_1+\theta_2\mathbf{x}_2\in C$ .

The conic hull of a set $C$ is the set of all conic combinations of points in $C$ .

$\left\{\theta_{1} \mathbf{x}_{1}+\ldots+\theta_{k} \mathbf{x}_{k} \mid \mathbf{x}_{i} \in C, \theta_{i} \geq 0, i=1, \ldots, k\right\}$

Dual cone

Let $K$ be a cone, its dual cone $K^*$ satisfies:

$K^* = \{\mathbf{y\mid \mathbf{x}^\top \mathbf{y}\ge 0, \forall\mathbf{x}\in K}\}$

$K^*$ is always closed convex. $K_1\subseteq K_2$ implies $K_2^*\subseteq K_1^*$ . $K^{**}$ is the closure of the convex hull of $K$ .

Examples:
- Subspace. $K=\{\mathbf{x}\mid \mathbf{Ax}=\mathbf{0}\}$ . $K^*=\{\mathbf{y}\mid \mathbf{(A^\perp )^\top y}=\mathbf{0}\}$ . Because $\mathbf{x}=\mathbf{A^\perp z}$ .
- Positive semidefinite cone. Dual cone is itself. $\operatorname{tr}(\mathbf{X Y}) \geq 0 \text { for all } \mathbf{X} \succeq \mathbf{0} \Longleftrightarrow \mathbf{Y} \succeq \mathbf{0}$
  1. $\Rightarrow$ .
    
    If $\mathbf{Y\notin}S_+^n$ , then there exists $\mathbf{q}$ that $\mathbf{q}^\top \mathbf{Y}\mathbf{q}=\operatorname{tr}(\mathbf{q}\mathbf{q}^\top\mathbf{Y} )<0$ . Let $\mathbf{X}=\mathbf{q}\mathbf{q}^\top\succeq0$ , we have $\operatorname{tr}(\mathbf{X Y})<0$ . Thus, $\operatorname{tr}(\mathbf{X Y}) \geq 0 \text { for all } \mathbf{X} \succeq \mathbf{0} \Longrightarrow \mathbf{Y} \succeq \mathbf{0}$
  2. $\Leftarrow$ .
    
    $\operatorname{tr}(\mathbf{Y X})=\operatorname{tr}\left(\mathbf{Y} \sum_{i=1}^{n} \lambda_{i} \mathbf{q}_{i} \mathbf{q}_{i}{ }^{T}\right)=\sum_{i=1}^{n} \lambda_{i} \mathbf{q}_{i}{ }^{T} \mathbf{Y} \mathbf{q}_{i} \geq 0$
- Norm cone. $K=\left\{(\mathbf{x}, t) \in \mathbb{R}^{n+1} \mid\|\mathbf{x}\| \leq t\right\}$ . Dual cone: $K^*=\left\{(\mathbf{u}, v) \in \mathbb{R}^{n+1} \mid\|\mathbf{u}\|^* \leq v\right\}$
  
  We need to prove that $\mathbf{x}^\top \mathbf{u}+tv\ge 0$ for all $\|\mathbf{x}\|\le t\Longleftrightarrow \|\mathbf{u}\|^*\le v$ .
  1. $\Leftarrow$ .
    
    For some $t>0,$ $\|\mathbf{x}\|\le t$ . By definition of dual norm, $\mathbf{u}^\top (-\mathbf{x}/t)\le \|\mathbf{u}\|^* \le v$ , and thus $\mathbf{u}^\top\mathbf{x} +tv\ge 0$ .
  2. $\Rightarrow$ .
    
    If $\|\mathbf{u}\|^*>v$ , then by definition of dual norm, there exists an $\mathbf{x}$ that $\|\mathbf{x}\|\le 1$ and $\mathbf{x^\top u}>v$ .
    
    Then we have $\|-\mathbf{x}\| = \|\mathbf{x}\|\le 1$ and take $t=1$ : $\mathbf{u}^\top(-\mathbf{x}) +v< 0$ , which contradicts with the lefthand side(assumption). Thus, $\|\mathbf{u}\|^*\le v$ .
Given $K$ , we usually guess a dual cone $K^*$ . If it is hard to prove that $\mathbf{x^\top y}\ge 0,\forall \mathbf{x}\in K\Leftrightarrow \mathbf{y}\in K^*$ , we can prove $\mathbf{x^\top y}\ge 0,\forall \mathbf{y}\in K^*\Leftrightarrow \mathbf{x}\in K$ instead.

Operators that preserve convexity#

Intersection#

$S_\alpha$ is convex $\forall \alpha\in\mathcal{A}\Rightarrow\bigcap_{\alpha\in\mathcal{A}}S_\alpha$ is convex (can be intersection of an infinite number of sets)

Affine function#

$S\subseteq \R^n$ is convex, $f(\mathbf{x})= \mathbf{Ax}+\mathbf{b}:\R^n\to \R^m\Rightarrow f(S) =\{f(\mathbf{x})\mid \mathbf{x}\in S\}$ is convex

Similarly, the inverse image of $S$ under $f$ : $f^{-1}(S)=\{\mathbf{x}\mid f(\mathbf{x})\in S\}$ is also convex.

Sum and Cartesian product#

$S_1,S_2$ are convex, then $S_1+S_2=\{\mathbf{x}+\mathbf{y}\mid \mathbf{x}\in S_1,\mathbf{y}\in S_2 \}$ and $S_1\times S_2=\{(\mathbf{x}_1,\mathbf{x}_2)\mid \mathbf{x}_1\in S_1,\mathbf{x}_2\in S_2 \}$ are convex.

Examples:

Polyhedron

$\{\mathbf{x}\mid\mathbf{Ax}\preceq\mathbf{b},\mathbf{Cx}=\mathbf{d} \}=\{\mathbf{x}\mid f(\mathbf{x})\in\R_+^m\times\{0\} \}, f(\mathbf{x})=(\mathbf{b}-\mathbf{Ax},\mathbf{d}-\mathbf{Cx})$
Solution set of $A(\mathbf{x})=\sum_{i=1}^n\mathbf{x}_i\mathbf{A}_i\preceq\mathbf{B}$

the inverse image of positive semidefinite cone under $f(\mathbf{x})=\mathbf{B}-A(\mathbf{x})$
Hyperbolic cone $\{\mathbf{x}\mid\mathbf{x}^\top \mathbf{Px}\le (\mathbf{c}^\top\mathbf{x})^2,\mathbf{c}^\top\mathbf{x}\ge 0 \}$ , $\mathbf{P}\in\mathbb{S}^n_+$

the inverse image of the second order cone $\{(\mathbf{z},t)\mid \mathbf{z}^\top \mathbf{z}\le t^2.t\ge 0 \}$ under $f(\mathbf{x})=(\mathbf{P}^{1/2}\mathbf{x}, \mathbf{c}^\top\mathbf{x})$
Ellipsoid $\{\mathbf{x}\mid(\mathbf{x}-\mathbf{x}_c)^\top \mathbf{P}^{-1}(\mathbf{x}-\mathbf{x}_c)\le 1\}$

The image of Euclidean ball $\{\mathbf{u}\mid \|\mathbf{u}\|_2\le 1\}$ under $\mathbf{x}=f(\mathbf{u}) = \mathbf{P}^{1/2}\mathbf{u}+\mathbf{x}_c$

The inverse image of the unit ball under $\mathbf{u}=g(\mathbf{x} )=\mathbf{P}^{-1/2}(\mathbf{x}-\mathbf{x}_c)$

Perspective function#

$P:\R^{n+1}\to\R^n, P(\mathbf{z},t)=\mathbf{z}/t,\mathbf{z}\in\R^n,t>0$ . The perspective function actually scales the vector so that the last component becomes 1, and can then be dropped.

If $C$ Is convex, then $P(C)=\{P(\mathbf{x})\mid \mathbf{x}\in C\}$ and $P^{-1}(C)=\{(\mathbf{x},t)\in\R^{n+1}\mid \mathbf{x}/t\in C, t>0\}$ are convex.

Linear-fractional functions#

Applying perspective function to affine functions. If $g(\mathbf{x})=\left[\begin{array}{l} \mathbf{A} \\ \mathbf{c}^{T} \end{array}\right] \mathbf{x}+\left[\begin{array}{l} \mathbf{b} \\ d \end{array}\right]$ , then $f(\mathbf{x}) = (\mathbf{Ax}+\mathbf{b})/ (\mathbf{c^\top x}+d), \textbf{dom} f=\{\mathbf{x}\mid \mathbf{c^\top x}+d>0\}$ .

Separating and supporting hyperplanes#

Separating hyperplane theorem#

Two convex sets $C\cap D=\empty \Leftrightarrow \exist \mathbf{a\ne0}, b$ that $\mathbf{a^\top x}\le b,\forall \mathbf{x}\in C,\mathbf{a^\top x}\ge b,\forall \mathbf{x}\in D$ .

Example: Separation of an affine and a convex set

$C$ Is convex and $D=\{\mathbf{Fu+g}\mid \mathbf{u}\in\R^m\}$ is affine. Suppose that they are disjoint, then there exist $\mathbf{a\ne 0}$ and $b$ such that $\mathbf{a^\top x}\ge b,\forall \mathbf{x}\in D$ . Thus, $\mathbf{a^\top Fu}\ge b-\mathbf{a^\top g}$ . Since the righthand side is a number, the linear function $\mathbf{a^\top Fu}$ is bounded below iff $\mathbf{a^\top F=0}$ .

Thus, there exist $\mathbf{a\ne 0}$ such that $\mathbf{F^\top a=0}, \mathbf{a^\top x}\le \mathbf{a^\top g}, \forall \mathbf{x}\in C$ .

Strict separation#

Two convex sets $C\cap D=\empty \Leftrightarrow \exist \mathbf{a\ne0}, b$ that $\mathbf{a^\top x}< b,\forall \mathbf{x}\in C,\mathbf{a^\top x}> b,\forall \mathbf{x}\in D$ .

Example: Separation of a point and a closed convex set

That is to say, the two sets $C, B(\mathbf{x}_0,\epsilon)$ do not intersect for some $\epsilon > 0$ . Since $B(\mathbf{x}_0,\epsilon)=\{\mathbf{x}_0+\mathbf{u}\mid \|\mathbf{u}\mid_2\le \epsilon \}$ , we have

$\mathbf{a^\top}\mathbf{x}\le b,\forall \mathbf{x}\in C\\ \mathbf{a^\top}(\mathbf{x}_0+\mathbf{u})\ge b, \forall \|\mathbf{u}\|_2\le \epsilon$

The $\mathbf{u}$ minimizes the lefthand side is $\mathbf{u}=\epsilon \mathbf{a}/\|\mathbf{a}\|_2$ , so $\mathbf{a}^\top \mathbf{x}_0-\epsilon\|\mathbf{a}\|_2\ge b$ . Therefore, $f(\mathbf{x}) = \mathbf{a}^\top \mathbf{x}-\epsilon\|\mathbf{a}\|_2/2- b$ is negative on $C$ and positive at $\mathbf{x}_0$ .

Corollary: A closed convex set is the intersection of all half spaces that contain it.

Obviously if a point is in the closed convex, it is in all the half spaces that contain it.

In reverse, if there exists a point $\mathbf{x}$ that is of the intersection $S$ but not in the convex set $C$ , then by strict separation, there exists a hyperplane that separates $\mathbf{x}$ and the $C$ . Thus, this half space contains $C$ but not $\mathbf{x}$ , so $\mathbf{x}\notin S$ .

Converse separating hyperplane theorems#

Any two convex sets $C, D$ , at least one of which is open, are disjoint iff there exists a separating plane.

Farkas Lemma

For $\mathbf{A}\in\R^{m\times n}, \mathbf{b}\in\R^m$ , the following are strong alternatives:

$\exist \mathbf{x}\in \R_+^n, \mathbf{Ax=b}$
$\exist\mathbf{y}\in\R^m, \mathbf{A^\top y}\ge \mathbf{0}, \mathbf{b^\top y}<0$

Proof.

$1\Rightarrow\neg 2$ : $\mathbf{b^\top y}=(\mathbf{x^\top A^\top})\mathbf{y} \ge 0$

$\neg1 \Rightarrow 2$ : $C=cone(\mathbf{A})$ is a closed convex cone that does not contain $\mathbf{b}$ . Hence by separating hyperplane theorem, there exists a $\mathbf{y}\in\R^m$ that $\langle \mathbf{y}, \mathbf{x}\rangle \ge 0 > \langle\mathbf{y},\mathbf{b}\rangle, \forall \mathbf{x}\in C$ , i.e. $\mathbf{A^\top y}\ge \mathbf{0}, \mathbf{b^\top y}<0$ .

Supporting hyperplanes#

If $\mathbf{a\ne 0}$ satisfies $\mathbf{a^\top x}\le \mathbf{a^\top x}_0, \forall \mathbf{x}\in C$ , then the hyperplane $\{\mathbf{x}\mid \mathbf{a^\top x} = \mathbf{a^\top x}_0\}$ is the supporting hyperplane to $C$ at point $\mathbf{x}_0$ . Geometrically, the hyperplane is tangent to $C$ at point $\mathbf{x}_0$ .

Supporting hyperplane theorem

For any nonempty convex set $C$ and any $\mathbf{x}_0\in\partial C$ , there exists a supporting hyperplane to $C$ at point $\mathbf{x}_0$ .