I am taking Convex Optimization this semester (2021 Spring). The notes are based on my understanding of the course materials provided by Prof. Zhouchen Lin. The notes only offer concise definition and formulas with few proofs and examples.

This chapter is the last of the basics, in the next chapter we will start to introduce specific algorithms. We first define convex function, how to judge convexity, and properties related with convexity. Next, we discuss operations that preserve convexity just like the previous chapter. Finally, we will introduce conjugate function, envelop function, and proximal mapping, three important concepts for future use.

Convex Function#

Table of Contents:

Basic definitions and examples#

Convex function#

A function f:RnRf:\R^n\to\R is convex if domf\operatorname{dom} f is convex and for all x,ydomf\mathbf{x, y}\in \operatorname{dom} f, and 0θ10\le \theta\le 1,

f(θx+(1θ)y)θf(x)+(1θ)f(y)f(\theta\mathbf{x}+(1-\theta)\mathbf{y}) \leq \theta f(\mathbf{x})+ (1-\theta) f(\mathbf{y})

A function is strictly convex if strict inequality holds whenever xy\mathbf{x}\ne \mathbf{y} and 0<θ<10<\theta<1.

A function is μ\mu-strongly convex if

f(θx+(1θ)y)θf(x)+(1θ)f(y)θ(1θ)μ2yx2,θ[0,1]f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x})+(1-\theta) f(\mathbf{y})-\frac{\theta(1-\theta) \mu}{2}\|\mathbf{y}-\mathbf{x}\|^{2}, \quad \forall \theta \in[0,1]

Extended-value extensions#

Corollary. Convex function is differentiable almost everywhere on the relative interior of its domain.

Define the extended-value extension of a convex function ff as: f~:RnR{}\tilde{f}:\R^n\to\R\cup\{\infty\}

f~(x)={f(x),xdomf,xdomf\tilde{f}(\mathbf{x})=\left\{\begin{array}{ll} f(\mathbf{x}), & \mathbf{x} \in \operatorname{dom} f \\ \infty, & \mathbf{x} \notin \operatorname{dom} f \end{array}\right.

  • Indicator function of a convex set CC

    I~C(x)={0,xC,xC\tilde{I}_C(\mathbf{x})=\left\{\begin{array}{ll} 0, & \mathbf{x} \in C \\ \infty, & \mathbf{x} \notin C \end{array}\right.

    Now we have:

    minxf(x)s.t.xCminxf(x)+I~C(x)\min_\mathbf{x}f(\mathbf{x})\quad s.t.\mathbf{x}\in C\Leftrightarrow \min_\mathbf{x}f(\mathbf{x})+\tilde{I}_C(\mathbf{x})

Proper function#

ff is proper if f(x)<f(\mathbf{x})<\infty for at least one x\mathbf{x} and f(x)>f(\mathbf{x})>-\infty for all x\mathbf{x}.

A function is proper iff its epigraph is nonempty and does not contain a vertical line.

First-order conditions (to prove convexity)#

Suppose ff is differentiable, then ff is convex iff domf\operatorname{dom} f is convex and for all x,ydomf\mathbf{x, y}\in \operatorname{dom} f,

f(y)f(x)+f(x),yxf(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle

image-20210407131535956

This indicates that from local information of a convex function, we can derive some global information (a global under-estimator). For instance, if f(x)=0\nabla f(\mathbf{x})=0, then x\mathbf{x} is a global minimizer.

Proof.

  1. \Rightarrow Given ff is convex.

    f((1α)x+αy)(1α)f(x)+αf(y)f(y)f(x)+f(x+α(yx))f(x)αf(y)f(x)+f(x),yx,α0+\begin{aligned} f((1-\alpha) \mathbf{x}+\alpha \mathbf{y}) \leq&(1-\alpha) f(\mathbf{x})+\alpha f(\mathbf{y})\\ f(\mathbf{y})\ge &f(\mathbf{x})+\frac{f(\mathbf{x}+\alpha( \mathbf{y}-\mathbf{x}))-f(\mathbf{x}) }{\alpha}\\ f(\mathbf{y}) \geq& f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle,\quad\alpha\to0^+ \end{aligned}

  2. \Leftarrow Given the inequality f(y)f(x)+f(x),yxf(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle.

    Let y,x\mathbf{y},\mathbf{x} be αx+(1α)y\alpha \mathbf{x}+(1-\alpha) \mathbf{y} separately:

    f(αx+(1α)y)f(x)(1α)f(αx+(1α)y),xyf(αx+(1α)y)f(y)+αf(αx+(1α)y),xy\begin{array}{l} f(\alpha \mathbf{x}+(1-\alpha) \mathbf{y}) \leq f(\mathbf{x})-(1-\alpha)\langle\nabla f(\alpha \mathbf{x}+(1-\alpha) \mathbf{y}), \mathbf{x}-\mathbf{y}\rangle \\ f(\alpha \mathbf{x}+(1-\alpha) \mathbf{y}) \leq f(\mathbf{y})+\alpha\langle\nabla f(\alpha \mathbf{x}+(1-\alpha) \mathbf{y}), \mathbf{x}-\mathbf{y}\rangle \end{array}

    Multiply the first inequality by α\alpha and the second by 1α1-\alpha and add up together, we will get f(αx+(1α)y)αf(x)+(1α)f(y)f(\alpha\mathbf{x}+(1-\alpha)\mathbf{y}) \leq \alpha f(\mathbf{x})+ (1-\alpha) f(\mathbf{y}).

Strictly convex iff

f(y)>f(x)+f(x),yxf(\mathbf{y}) > f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle

Note that in the proof of \Rightarrow, when we take the limit α0+\alpha\to0^+, we can only get f(y)f(x)+f(x),yxf(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle but not >>. So the proof should be altered a little:

f((1α)x+αy)<(1α)f(x)+αf(y)f(y)>f(x)+f(x+α(yx))f(x)αSince f(x+α(yx))f(x)+f(x),α(yx) by convexity,f(y)>f(x)+f(x),yx\begin{aligned} f((1-\alpha) \mathbf{x}+\alpha \mathbf{y}) <&(1-\alpha) f(\mathbf{x})+\alpha f(\mathbf{y})\\ f(\mathbf{y})> &f(\mathbf{x})+\frac{f(\mathbf{x}+\alpha( \mathbf{y}-\mathbf{x}))-f(\mathbf{x}) }{\alpha}\\ \text{Since }f(\mathbf{x}+\alpha( \mathbf{y}-\mathbf{x}))\ge &f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \alpha(\mathbf{y}-\mathbf{x})\rangle\text{ by convexity,}\\ f(\mathbf{y}) >& f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle \end{aligned}

Strongly convex iff f(x)μ2x2f(\mathbf{x})-\frac{\mu}{2}\|\mathbf{x}\|^2 is convex, i.e.

f(y)f(x)+f(x),yx+μ2yx2f(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\frac{\mu}{2}\|\mathbf{y}-\mathbf{x}\|^2

Second-order conditions#

Suppose that ff is twice differentiable, then ff is convex iff domf\operatorname{dom} f is convex, its Hessian is positive semidefinite, i.e. for all xdomf\mathbf{x}\in \operatorname{dom} f,

2f(x)0\nabla^2f(\mathbf{x})\succeq0

Note that the reverse is not true, for example the saddle point also satisfies f=0\nabla f=0.

Strictly convex: 2f(x)0\nabla^2f(\mathbf{x})\succ0

Strongly convex: 2f(x)μI\nabla^2f(\mathbf{x})\succeq\mu\mathbf{I}

🌟 Why do we need the condition ‘domf\operatorname{dom}f is convex’?

e.g. f(x)=1/x2f(x)=1/x^2 with domf={xRx0}\operatorname{dom} f=\{x\in\R\mid x\ne 0\} satisfies f(x)>0f''(x)>0 for all xdomfx\in\operatorname{dom} f, but it is not a convex function.

Examples#

Functions on R\R#

  • Exponential: eaxe^{ax}
  • Powers: xax^a on R++\R_{++} is convex when a(,0][1,+)a\in(-\infty,0]\cup[1,+\infty), concave when 0a10\le a\le 1
  • Powers of absolute value: xp,p1|x|^p,p\ge 1
  • Logarithm: logx\log x is concave on R++\R_{++}
  • Negative entropy: xlogxx\log x is convex on R+\R_{+} (define f(x)=0)f(x)=0)

Functions on Rn\R^n#

  • Norms

    Can be proved by triangle inequality:

    f(θx+(1θ)y)f(θx)+f((1θ)y)=θf(x)+(1θ)f(y)f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq f(\theta \mathbf{x})+f((1-\theta) \mathbf{y})=\theta f(\mathbf{x})+(1-\theta) f(\mathbf{y})

  • Max function: f(x)=max{x1,,xn}f(\mathbf{x}) = \max\{x_1,\cdots,x_n\}. Prove by definition.

  • Quadratic over linear: f(x,y)=x2/y,domf=R×R++f(x,y)=x^2/y,\operatorname{dom}f=\R\times\R_{++}

    2f(x,y)=2y3[y2xyxyx2]=2y3[yx][yx]T0\nabla^{2} f(x, y)=\frac{2}{y^{3}}\left[\begin{array}{cc} y^{2} & -x y \\ -x y & x^{2} \end{array}\right]=\frac{2}{y^{3}}\left[\begin{array}{c} y \\ -x \end{array}\right]\left[\begin{array}{c} y \\ -x \end{array}\right]^{T} \succeq \mathbf{0}

    image-20210408164606815
  • Log-sum-exp: f(x)=log(ex1+exn)f(\mathbf{x}) = \log(e^{x_1}+\cdots e^{x_n})

    Differentiable approximation of max function:

    max{x1,,xn}f(x)max{x1,,xn}+logn\max\{x_1,\cdots,x_n\}\le f(\mathbf{x}) \le \max\{x_1,\cdots,x_n\}+\log n

    image-20210408164621674

    Prove by 2f(x)0\nabla^2f(\mathbf{x})\succeq0:

    2f(x)=11,z2(1,zdiag(z)zzT),z=(ex1,,exn)v,vT2f(x)v=11,z2((i=1nzi)(i=1nvi2zi)(i=1nvizi)2)0\nabla^{2} f(\mathbf{x})=\frac{1}{\langle\mathbf{1}, \mathbf{z}\rangle^{2}}\left(\langle\mathbf{1}, \mathbf{z}\rangle \operatorname{diag}(\mathbf{z})-\mathbf{z z}^{T}\right) ,\mathbf{z}=\left(e^{x_{1}}, \ldots, e^{x_{n}}\right)\\ \forall\mathbf{v},\quad \mathbf{v}^{T} \nabla^{2} f(\mathbf{x}) \mathbf{v}=\frac{1}{\langle\mathbf{1}, \mathbf{z}\rangle^{2}}\left(\left(\sum_{i=1}^{n} z_{i}\right)\left(\sum_{i=1}^{n} v_{i}^{2} z_{i}\right)-\left(\sum_{i=1}^{n} v_{i} z_{i}\right)^{2}\right) \geq 0

    The last inequality is derived from Cauchy-Schwarz inequality: a,ab,ba,b2,ai=zi,bi=vizi\langle\mathbf{a},\mathbf{a}\rangle\cdot \langle\mathbf{b},\mathbf{b}\rangle\ge \langle\mathbf{a},\mathbf{b}\rangle^2, a_i=\sqrt{z_i}, b_i=v_i\sqrt{z_i}.

  • Geometric mean: f(x)=(i=1nxi)1/nf(\mathbf{x})=\left(\prod_{i=1}^n x_i\right)^{1/n} is concave on domf=R++n\operatorname{dom}f=\R_{++}^n

  • Log-determinant: f(X)=logdetXf(\mathbf{X})=\log\operatorname{det}\mathbf{X} is concave on domf=S++n\operatorname{dom}f=\mathbb{S}^n_{++}

    Consider an arbitrary line g(t)=f(Z+tV),Z,VSng(t)=f(\mathbf{Z}+t\mathbf{V}),\mathbf{Z,V}\in\mathbb{S}^n. tt Is restricted to value that X=Z+tV0\mathbf{X}=\mathbf{Z}+t\mathbf{V}\succeq 0.

    g(t)=logdet(Z+tV)=logdet(Z1/2(I+tZ1/2VZ1/2)Z1/2)=i=1nlog(1+tλi)+logdetZg(t)=i=1nλi1+tλig(t)=i=1nλi2(1+tλi)20\begin{aligned} g(t) &=\log \operatorname{det}(\mathbf{Z}+t \mathbf{V}) \\ &=\log \operatorname{det}\left(\mathbf{Z}^{1 / 2}\left(\mathbf{I}+t \mathbf{Z}^{-1 / 2} \mathbf{V} \mathbf{Z}^{-1 / 2}\right) \mathbf{Z}^{1 / 2}\right) \\ &=\sum_{i=1}^{n} \log \left(1+t \lambda_{i}\right)+\log \operatorname{det} \mathbf{Z} \end{aligned}\\ g^{\prime}(t)=\sum_{i=1}^{n} \frac{\lambda_{i}}{1+t \lambda_{i}} \\ g^{\prime \prime}(t)=-\sum_{i=1}^{n} \frac{\lambda_{i}^{2}}{\left(1+t \lambda_{i}\right)^{2}}\le 0

Basic properties#

Sublevels#

α\alpha-sublevel set of a function f:RnRf:\R^n\to\R is defined as

Cα={xdomff(x)α}C_\alpha=\{\mathbf{x}\in\operatorname{dom}f\mid f(\mathbf{x})\le\alpha\}

Sublevel sets of a convex function are convex for any α\alpha. The converse is not true, like concave function f(x)=exf(x)=-e^x.

To establish convexity of a set, we can express it as a sublevel set of a convex function, or as the superlevel set {xdomff(x)α}\{\mathbf{x}\in\operatorname{dom}f\mid f(\mathbf{x})\ge\alpha\} of a concave function.

Example:

G(x)=(i=1nxi)1/n,A(x)=1ni=1nxi{xR+nG(x)αA(x)}G(\mathbf{x})=\left(\prod_{i=1}^{n} x_{i}\right)^{1 / n}, \quad A(\mathbf{x})=\frac{1}{n} \sum_{i=1}^{n} x_{i} \\ \{\mathbf{x}\in\R_+^n\mid G(\mathbf{x})\ge\alpha A(\mathbf{x})\}

It is the 0-sublevel set of αA(x)G(x)\alpha A(\mathbf{x})-G(\mathbf{x}), which is convex. So the set is convex.

Epigraph#

epif={(x,t)xdomf,f(x)t}\operatorname{epi}f=\{(\mathbf{x},t)\mid\mathbf{x}\in\operatorname{dom}f,f(\mathbf{x})\le t\}

A function is convex iff its epigraph is a convex set.

Interpret first-order conditions geometrically using epigraph

If (y,t)epif(\mathbf{y},t)\in\operatorname{epi} f, then

tf(y)f(x)+f(x),yx(y,t)epif[f(x)1]T([yt][xf(x)])0t \geq f(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle\\ (\mathbf{y}, t) \in \operatorname{epi} f \Longrightarrow\left[\begin{array}{c} \nabla f(\mathbf{x}) \\ -1 \end{array}\right]^{T}\left(\left[\begin{array}{l} \mathbf{y} \\ t \end{array}\right]-\left[\begin{array}{c} \mathbf{x} \\ f(\mathbf{x}) \end{array}\right]\right) \leq 0

Thus, the hyperplane defined by (f(x),1)(\nabla f(\mathbf{x}), -1) is the supporting plane of epigraph at boundary point (x,f(x))(\mathbf{x},f(\mathbf{x})).

image-20210409231604393

Jensen’s Inequality#

f(θx+(1θ)y)θf(x)+(1θ)f(y)f(θ1x1++θkxk)θ1f(x1)++θkf(xk)f(Sp(x)xdx)Sf(x)p(x)dxf(Ex)Ef(x)f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x})+(1-\theta) f(\mathbf{y}) \\ f\left(\theta_{1} \mathbf{x}_{1}+\cdots+\theta_{k} \mathbf{x}_{k}\right) \leq \theta_{1} f\left(\mathbf{x}_{1}\right)+\cdots+\theta_{k} f\left(\mathbf{x}_{k}\right) \\ f\left(\int_{S} p(\mathbf{x}) \mathbf{x} d \mathbf{x}\right) \leq \int_{S} f(\mathbf{x}) p(\mathbf{x}) d \mathbf{x} \\ f(\mathbb{E} \mathbf{x}) \leq \mathbb{E} f(\mathbf{x})

Thus, adding a zero-mean random vector $\mathbf{z} $ will not decrease the value of a convex function on average:

Ef(x+z)f(x)\mathbb{E}f(\mathbf{x}+\mathbf{z})\ge f(\mathbf{x})

Inequality#

Many inequalities can be derived by applying Jensen’s Inequality to some convex function.

  • Arithmetic-geometric mean inequality (f(x)=logx,θ=1/2f(x)=-\log x, \theta=1/2)

    ab(a+b)/2\sqrt{ab}\le(a+b)/2

  • Holder’s Inequality

    i=1nxiyi(i=1nxip)1/p(i=1nyiq)1/q\sum_{i=1}^{n} x_{i} y_{i} \leq\left(\sum_{i=1}^{n}\left|x_{i}\right|^{p}\right)^{1 / p}\left(\sum_{i=1}^{n}\left|y_{i}\right|^{q}\right)^{1 / q}

    Proof.

    Still by f(x)=logxf(x)=-\log x,

    aθb1θθa+(1θ)ba=xipj=1nxjp,b=yiqj=1nyjq,θ=1/p,(xipj=1nxjp)1/p(yiqj=1nyjq)1/qxippj=1nxjp+yiqqj=1nyjqa^{\theta} b^{1-\theta} \leq \theta a+(1-\theta) b\\ \begin{array}{c} a=\frac{\left|x_{i}\right|^{p}}{\sum_{j=1}^{n}\left|x_{j}\right|^{p}}, \quad b=\frac{\left|y_{i}\right|^{q}}{\sum_{j=1}^{n}\left|y_{j}\right|^{q}}, \quad \theta=1 / p, \\ \left(\frac{\left|x_{i}\right|^{p}}{\sum_{j=1}^{n}\left|x_{j}\right|^{p}}\right)^{1 / p}\left(\frac{\left|y_{i}\right|^{q}}{\sum_{j=1}^{n}\left|y_{j}\right|^{q}}\right)^{1 / q} \leq \frac{\left|x_{i}\right|^{p}}{p \sum_{j=1}^{n}\left|x_{j}\right|^{p}}+\frac{\left|y_{i}\right|^{q}}{q \sum_{j=1}^{n}\left|y_{j}\right|^{q}} \end{array}

Bregman distance#

ff is a differentiable strictly convex function f:CR,CRnf:C\to\R, C\sub\R^n is a convex set.

Bf(y,x)=f(y)f(x)f(x),yxB_f(\mathbf{y},\mathbf{x})=f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x} \rangle

By first-order conditions, we know that Bf(y,x)0B_f(\mathbf{y},\mathbf{x})\ge 0, but it may not be symmetric: Bf(y,x)Bf(x,y)B_f(\mathbf{y},\mathbf{x})\ne B_f(\mathbf{x},\mathbf{y})

🌟 Subgradient#

f(x)={gf(y)f(x)+g,yx,ydomf}\partial f(\mathbf{x})=\{\mathbf{g} \mid f(\mathbf{y}) \geq f(\mathbf{x})+\langle\mathbf{g}, \mathbf{y}-\mathbf{x}\rangle, \forall \mathbf{y} \in \operatorname{dom} f\}

Generalization of gradient: non-vertical supporting hyperplane.

image-20210410095748900 $$ f'(\mathbf{x};\mathbf{u})=\max_{\mathbf{g}\in\partial f(\mathbf{x})}\langle\mathbf{y},\mathbf{g}\rangle $$

Examples#

  • x|x|

    x={1,x>01,x<0[1,1],x=0\partial|x| = \left\{\begin{array}{ll} 1,&x>0\\ -1, &x<0\\ [-1,1],&x=0 \end{array} \right.

    IMG_5D13AE0A8829-1
  • max{0,12(x21)}\max\{0, \frac{1}{2}(x^2-1)\}

    x={x,x>10,x<1[1,0],x=1[0,1],x=1\partial|x| = \left\{\begin{array}{ll} x,&|x|>1\\ 0, &|x|<1\\ [-1,0],&x=-1\\ [0,1],&x=1 \end{array} \right.

    IMG_90C5F53124D9-1
  • IC(x)={+,xC0,xCI_C(\mathbf{x})=\left\{\begin{array}{ll} +\infty,&\mathbf{x}\in C\\ 0, &\mathbf{x}\notin C \end{array} \right.

    f(y)=f(x)+g,yxf(\mathbf{y})=f(\mathbf{x})+\langle g,\mathbf{y}-\mathbf{x}\rangle

    When xC\mathbf{x}\in C^\circ, 0g,yxg=00\ge \langle g, \mathbf{y}-\mathbf{x}\rangle\Rightarrow g=0.

    When xC\mathbf{x}\in\partial C, yx\mathbf{y}-\mathbf{x} forms the tangent cone, so gg would be its normal cone.

    IMG_FFC3F8233E11-1

Properties#

  • Continuity

    xkxgkf(xk),gkg}gf(x)\left. \begin{array}{l} x_k\to x^*\\ g_k\in\partial f(x_k), g_k\to g^* \end{array} \right\}\Rightarrow g^*\in\partial f(x^*)

  • Chain Rule

F(x)=f(Ax)F(x)=Af(Ax)F(x;y)=h(f(x))f(x;y) F(\mathbf{x})=f(\mathbf{Ax}) \Rightarrow\partial F(\mathbf{x}) = \mathbf{A}^\top\partial f(\mathbf{Ax})\\ F'(\mathbf{x};\mathbf{y}) = h'(f(\mathbf{x}))f'(\mathbf{x};\mathbf{y})

  • Subgradient of norms

    For a real Hilbert space endowed with inner product and norm, then

    x={yy,x=x,y1}\partial \|\mathbf{x}\| = \{\mathbf{y}\mid \langle\mathbf{y},\mathbf{x}\rangle=\|\mathbf{x}\|,\|\mathbf{y}\|^*\le 1 \}

    Proof. Let S={yy,x=x,y1}S =\{\mathbf{y}\mid \langle\mathbf{y},\mathbf{x}\rangle=\|\mathbf{x}\|,\|\mathbf{y}\|^*\le 1 \}.

    1. For every yx\mathbf{y}\in\partial\|\mathbf{x}\|, we have

      wxy,wx,w\|\mathbf{w}\|-\|\mathbf{x}\|\ge \langle\mathbf{y}, \mathbf{w}-\mathbf{x}\rangle,\forall \mathbf{w}

      Choose w=0,w=2x\mathbf{w}=0,\mathbf{w}=2\mathbf{x} we have x=y,x\|\mathbf{x}\|=\langle\mathbf{y},\mathbf{x}\rangle.

      wxwxy,wx,wy,wxwx1,wx\|\mathbf{w}-\mathbf{x}\|\ge\|\mathbf{w}\|-\|\mathbf{x}\|\ge \langle\mathbf{y}, \mathbf{w}-\mathbf{x}\rangle,\forall \mathbf{w}\\ \left\langle\mathbf{y},\frac{\mathbf{w}-\mathbf{x}}{\|\mathbf{w}-\mathbf{x}\|}\right\rangle\le 1,\forall\mathbf{w}\ne\mathbf{x}

      Therefore y1,xS\|\mathbf{y}\|^*\le 1, \partial\|\mathbf{x}\|\subset S.

    2. For every yS\mathbf{y}\in S,

      y,wx=y,wy,x=y,wxywxwx,w\langle\mathbf{y}, \mathbf{w}-\mathbf{x}\rangle=\langle\mathbf{y}, \mathbf{w}\rangle-\langle\mathbf{y}, \mathbf{x}\rangle=\langle\mathbf{y}, \mathbf{w}\rangle-\|\mathbf{x}\| \leq\|\mathbf{y}\|^{*}\|\mathbf{w}\|-\|\mathbf{x}\| \leq\|\mathbf{w}\|-\|\mathbf{x}\|,\forall \mathbf{w}

      Therefore, yx,Sx\mathbf{y}\in\partial\|\mathbf{x}\|, S\subset\partial \|\mathbf{x}\|.

  • Danskin’s Theorem

    If Z\mathcal{Z} is a compact subset of Rm\R^m, ϕ:Rn×ZR\phi:\R^n\times\mathcal{Z}\to \R is continuous and ϕ(,z):RnR\phi(\cdot,\mathbf{z}):\R^n\to\R is convex for each zZ\mathbf{z}\in\mathcal{Z}. Define that

    f(x)=maxzZϕ(x,z)Z(x)=argmaxzZϕ(x,z)f(\mathbf{x}) = \max_{\mathbf{z}\in\mathcal{Z}}\phi(\mathbf{x},\mathbf{z})\\ \mathcal{Z}(\mathbf{x}) = \arg\max_{\mathbf{z}\in\mathcal{Z}}\phi(\mathbf{x},\mathbf{z})

    Theorem: If ϕ(,z)\phi(\cdot,\mathbf{z}) is differentiable for all z\mathbf{z} and xϕ(x,)\nabla_x\phi(\mathbf{x},\cdot) is continuous on Z\mathcal{Z}, then

    f(x)=conv{xϕ(x,z)zZ(x)}\partial f(\mathbf{x}) = conv\{\nabla_x\phi(\mathbf{x},\mathbf{z})\mid \mathbf{z}\in\mathcal{Z}(\mathbf{x}) \}

Operations that preserve convexity#

Nonnegative weighted sums#

f=w1f1++wmfm,wi0f = w_1f_1+\cdots+w_mf_m, w_i\ge 0

Proof. Since the image of a convex set under a linear mapping is convex, we have

epi(wf)=[I00w]epif\operatorname{epi}(wf) = \begin{bmatrix} \mathbf{I} &\mathbf{0}\\ \mathbf{0}^\top &w \end{bmatrix}\operatorname{epi}f

Affine mapping#

ff Is convex, then g(x)=f(Ax+b)g(\mathbf{x}) = f(\mathbf{Ax}+\mathbf{b}) is also convex.

Pointwise maximum and supremum#

f(x)=max{f1(x),f2(x)},domf=domf1domf2f(\mathbf{x}) = \max\{f_1(\mathbf{x}), f_2(\mathbf{x}) \}, \operatorname{dom} f = \operatorname{dom}f_1\cap\operatorname{dom}f_2

The max function can be extended to f1,,fmf_1,\cdots,f_m.

  • Sum of rr largest components:

    x[1]x[n]f(x)=i=1rx[i]x_{[1]}\ge\cdots\ge x_{[n]}\\ f(\mathbf{x}) = \sum_{i=1}^r x_{[i]}

    As for the proof, the function can be considered the maximum of all “sum of rr elements” functions.

  • Support function of a set

    SC(x)=sup{x,yyC}S_C(\mathbf{x}) = \sup\{\langle\mathbf{x},\mathbf{y}\rangle\mid\mathbf{y}\in C\} is the point wise supremum of a family of linear functions, hence convex. It describes the distance of the supporting hyperplane at x\mathbf{x} from the origin.

  • Distance to farthest point of a set

    f(x)=supyCxyf(\mathbf{x}) = \sup_{\mathbf{y}\in C}\|\mathbf{x}-\mathbf{y}\| (Pointwise supremum of convex function xy\|\mathbf{x}-\mathbf{y}\|.)

  • Least-squares cost

    g(w)=infxi=1nwi(ai,xbi)2g(\mathbf{w}) = \inf_\mathbf{x}\sum_{i=1}^nw_i(\langle\mathbf{a}_i,\mathbf{x}\rangle-b_i)^2 Is the infimum of a family of linear functions, hence concave.

  • Norm of a matrix

    f(X)=sup{uXvu2=1,v2=1}f(\mathbf{X}) = \sup\{\mathbf{u^\top Xv}\mid \|\mathbf{u}\|_2=1,\|\mathbf{v}\|_2=1\}

Composition#

For function f(x)=h(g(x))f(\mathbf{x}) = h(g(\mathbf{x})):

  • ff Is convex if hh is convex, h~\tilde{h} is nondecreasing, and gg is convex.
  • ff Is convex if hh is convex, h~\tilde{h} is nonincreasing, and gg is concave

Note that we need the extended-value extension(\infty for points not in dom) to be nonincreasing or nondecreasing.

Minimization#

ff Is convex and CC is convex nonempty set, then g(x)=infyCf(x,y)g(\mathbf{x})=\inf_{\mathbf{y}\in C} f(\mathbf{x}, \mathbf{y}) is convex.

Proof. epig={(x,t)(x,y,t)epif,yC}\operatorname{epi}g=\{(\mathbf{x},t)\mid (\mathbf{x},\mathbf{y},t)\in\operatorname{epi}f,\mathbf{y}\in C\} Is convex because it is a projection from the convex epif\operatorname{epi} f.

Example: dist(x,S)=infySxy\operatorname{dist}(\mathbf{x}, S)=\inf_{\mathbf{y}\in S}\|\mathbf{x}-\mathbf{y}\| Is convex if SS is convex, because xy\|\mathbf{x}-\mathbf{y}\| is convex in (x,y)(\mathbf{x},\mathbf{y}).

Perspective of a function#

f:RnRf:\R^n\to\R, the perspective of ff is g(x,t)=tf(x/t)g(\mathbf{x}, t)=tf(\mathbf{x}/t), domg={(x,t)x/tdomf,t>0}\operatorname{dom}g=\{(\mathbf{x},t)\mid \mathbf{x}/t\in\operatorname{dom}f, t>0\}.

Perspective preserves convexity, because (x,t,s)epig(\mathbf{x},t,s)\in \operatorname{epi}g indicates that tf(x/t)s,f(x/t)s/ttf(\mathbf{x}/t)\le s,f(\mathbf{x}/t)\le s/t, which is (x/t,s/t)epif(\mathbf{x}/t, s/t)\in \operatorname{epi} f.

Example: The perspective function of convex function f(x)=logxf(x)=-\log x on R++\R_{++} is g(x,t)=tlog(x/t)=tlogttlogxg(x,t)=-t\log(x/t)=t\log t-t\log x is convex on R++2\R_{++}^2. gg Is called relative entropy.

🌟 Conjugate function#

f:RnRf:\R^n\to\R

f(y)=supxdomf(y,xf(x))=supxdomf(y1)T(xf(x))=sup(x,t)epif(y1)T(xt)f^{*}(\mathbf{y})=\sup _{\mathbf{x} \in \operatorname{dom} f}(\langle\mathbf{y}, \mathbf{x}\rangle-f(\mathbf{x})) =\sup _{\mathbf{x} \in \operatorname{dom} f}\left(\begin{array}{c} \mathbf{y} \\ -1 \end{array}\right)^{T}\left(\begin{array}{c} \mathbf{x} \\ f(\mathbf{x}) \end{array}\right) =\sup _{(\mathbf{x}, t) \in \operatorname{epi} f}\left(\begin{array}{c} \mathbf{y} \\ -1 \end{array}\right)^{T}\left(\begin{array}{c} \mathbf{x} \\ t \end{array}\right)

It is the support function of the epigraph (or say, supremum of affine functions), so it is always convex. The domain is where the supremum is finite.

image-20210421215210588

The conjugate function f(y)f^*(\mathbf{y}) is the maximum gap between y,x\langle\mathbf{y},\mathbf{x}\rangle and f(x)f(\mathbf{x}). If ff is differentiable, the sup xx^* occurs when f(x)=yf'(x^*)=y.

Example. 1. Confirm the domain of yy. 2. Take derivative to compute xx when the function reaches maximum.

  • f(x)=exf(x)=e^x.

    xyexxy-e^x Is bounded over only when y0y\ge 0.

    Take the derivative and we have yex=0,x=logyy-e^x=0, x=\log y. Substituting and we have f(y)=ylogyyf^*(y) = y\log y-y.

    Note that in y=0y=0 case, supxex=0\sup_x -e^x=0, so we can interpret 0log0=00\log 0=0.

Some special conjugate functions#

Indicator function#

Indicator function of any set SS is IS(x)=0I_S(\mathbf{x})=0 on domIS=S\operatorname{dom}I_S=S. Its conjugate is IS(y)=supxSy,xI_S^*(\mathbf{y}) = \sup_{\mathbf{x}\in S}\langle\mathbf{y},\mathbf{x}\rangle, which is the support function of SS.

If SS is a closed convex set, then the conjugate of support function is the indicator function of SS.

Log Determinant#

f(X)=logdetX1f(\mathbf{X})=\log\det \mathbf{X}^{-1} on S++n\mathbf{S}_{++}^n, f(Y)=supX0(tr(YX+logdetX))f^*(\mathbf{Y})=\sup_{\mathbf{X}\succ 0}(\operatorname{tr}(\mathbf{Y^\top X}+\log\det\mathbf{X})).

We first prove the dominant is Y0\mathbf{Y}\prec0. If Y⊀0\mathbf{Y}\not\prec 0, then it has an eigenvector v,v2=1\mathbf{v}, \|\mathbf{v}\|_2=1 and eigenvalue λ0\lambda\ge 0. Take X=I+tvv\mathbf{X}=\mathbf{I}+t\mathbf{vv^\top}:

tr(YX)+logdetX=tr(Y)+tλ+log(1+t)+\operatorname{tr}(\mathbf{Y^\top X})+\log\det\mathbf{X} = \operatorname{tr}(\mathbf{Y})+t\lambda + \log(1+t)\to +\infty

Now take the derivative:

X(tr(YX)+logdetX)=Y+X1=0X=Y1\nabla_{\mathbf{X}}(\operatorname{tr}(\mathbf{Y^\top X})+\log\det\mathbf{X}) = \mathbf{Y}+\mathbf{X}^{-1}=0\Rightarrow \mathbf{X}=-\mathbf{Y}^{-1}

Thus, f(Y)=logdet(Y1)n,domf=S++nf^*(\mathbf{Y}) = \log\det(-\mathbf{Y}^{-1})-n, \operatorname{dom}f^*=-\mathbb{S}_{++}^n.

Norm#

Conjugate function of f(x)=xf(\mathbf{x})=\|\mathbf{x}\| is f(y)=IB(y),B={yy1}f^*(\mathbf{y})=I_B(\mathbf{y}),B=\{\mathbf{y}\mid\|\mathbf{y}\|^*\le 1 \}.

Fenchel’s Inequality#

f(x)+f(y)x,y,x,yf(\mathbf{x})+f^*(\mathbf{y})\ge \langle\mathbf{x},\mathbf{y}\rangle, \forall \mathbf{x},\mathbf{y}

Example. 12x2+12y2x,y\frac{1}{2}\|\mathbf{x}\|^{2}+\frac{1}{2}\|\mathbf{y}\|_{*}^{2} \geq\langle\mathbf{x}, \mathbf{y}\rangle.

Conjugate of conjugate#

  • If ff is proper, convex, and closed, then f=ff^{**}=f.
  • For any ff, ff^{**} is the largest convex function not exceeding ff, i.e. the convex envelop of ff.

  • If ff is convex and differentiable, then maximizer xx^* of y,xf(x)y=f(x)\langle y,x\rangle-f(x)\Leftrightarrow y=\nabla f(x^*).

    Hence, if we can solve y=f(z)y=\nabla f(z) for zz, then we can compute f(y)f^*(y) by f(y)=zf(x)f(x)f^*(y) = z^\top \nabla f(x^*)-f(x^*).

  • Affine: a>0,bRa>0, b\in \R, the conjugate of g(x)=af(x)+bg(x)=af(x)+b is g(y)=af(y/a)+bg^*(y)=af^*(y/a)+b.

    Example. g(x)=f(Ax+b)g(\mathbf{x}) = f(\mathbf{Ax}+\mathbf{b}), g(y)=f(Ay)bAyg^*(\mathbf{y}) = f^*(\mathbf{A}^{-\top }\mathbf{y})-\mathbf{b^\top A^{-\top}y}.

  • Separable: f(u,v)=f1(u)+f2(v)f(u,v)=f_1(u)+f_2(v) where f1f_1 and f2f_2 are convex, then f(w,z)=f1(w)+f2(z)f^*(w,z)=f_1^*(w)+f_2^*(z).

Envelope function and Proximal mapping#

f:Rn(,+]f:\R^n\to(-\infty,+\infty] is a closed proper function. For a scalar c>0c>0, define:

Envcf(x)=infw{f(w)+12cwx2}Proxcf(x)=Argminw{f(w)+12cwx2}.\begin{aligned} \operatorname{Env}_{c} f(\mathbf{x}) &=\inf _{\mathbf{w}}\left\{f(\mathbf{w})+\frac{1}{2 c}\|\mathbf{w}-\mathbf{x}\|^{2}\right\} \\ \operatorname{Prox}_{c} f(\mathbf{x}) &=\underset{\mathbf{w}}{\operatorname{Argmin}}\left\{f(\mathbf{w})+\frac{1}{2 c}\|\mathbf{w}-\mathbf{x}\|^{2}\right\} . \end{aligned}

Motivation. We are attempting to reduce the value of ff without straying too far from x\mathbf{x}. You can imagine that this would be a useful sub-step in an optimization algorithm. The parameter cc can be viewed as a “step size” that controls how far from x\mathbf{x} we are allowed to move. When cc is very small, the penalty for moving away is severe.

image-20210427130448248

Caption. Proxcf(xk)=xk+1\operatorname{Prox}_{c} f(\mathbf{x}_{k})=\mathbf{x}_{k+1}. At the point x=xk+1x=x_{k+1}, we have f(x)+12ckxxk2=γk=Envcf(xk)f(x)+\frac{1}{2c_k}\|x-x_k\|^2=\gamma_k = \operatorname{Env}_cf(x_k). (Given 12ckxxk2-\frac{1}{2c_k}\|x-x_k\|^2, the least distance γk\gamma_k to move it upward and intersect with f(x)f(x) would be the envelop.) And the slope is exactly the sub gradient.

Involution:

(fg)(x)=infy+z=xf(y)+g(z)(fg)(x)=y+z=xf(y)g(z)dydz(f \boxtimes g)(\mathbf{x})=\inf _{\mathbf{y}+\mathbf{z}=\mathbf{x}} f(\mathbf{y})+g(\mathbf{z})\\ (f \otimes g)(\mathbf{x})=\iint _{\mathbf{y}+\mathbf{z}=\mathbf{x}} f(\mathbf{y})g(\mathbf{z})d\mathbf{y}d\mathbf{z}

  • When we take w=x\mathbf{w}=\mathbf{x}, we know that Envcf(x)\operatorname{Env}_{c} \le f(\mathbf{x}).
  • u=Proxcf(x)xucf(u)\mathbf{u}=\operatorname{Prox}_{c} f(\mathbf{x}) \Leftrightarrow \mathbf{x}-\mathbf{u} \in c \partial f(\mathbf{u}) By definition of subgradient.

Usage of proximal mapping#

  1. Let T=IαfT=I-\alpha\nabla f and T~=(I+αf)1\tilde{T} = (I+\alpha\nabla f)^{-1}. We have updates in two ways:

    Forward: xk+1=T(xk)=xkαf(xk)Backward: xk+1=T~(xk)=argminxf(x)+12αxxk2xkxk+1cf(xk+1)\text{Forward: }\mathbf{x}_{k+1}=T(\mathbf{x}_k)=\mathbf{x}_{k}-\alpha\nabla f(\mathbf{x}_{k})\\ \text{Backward: }\mathbf{x}_{k+1}=\tilde{T}(\mathbf{x}_k)=\arg\min_\mathbf{x}f(\mathbf{x})+\frac{1}{2\alpha}\|\mathbf{x}-\mathbf{x}_{k}\|^2\\ \mathbf{x}_k-\mathbf{x}_{k+1} \in c \partial f(\mathbf{x}_{k+1})

  2. Sufficient descent and encourages convergence

    f(xk+1)f(xk)12αxk+1xk2k=0+xk+1xk2<+f(\mathbf{x}_{k+1})-f(\mathbf{x}_k)\le -\frac{1}{2\alpha}\|\mathbf{x}_{k+1}-\mathbf{x}_k\|^2\\ \Rightarrow \sum_{k=0}^{+\infty}\|\mathbf{x}_{k+1}-\mathbf{x}_k\|^2<+\infty

    By KL condition, we have k=0+xk+1xk<+\sum_{k=0}^{+\infty}\|\mathbf{x}_{k+1}-\mathbf{x}_k\|<+\infty.

Example#

  • f(x)=IC(x)f(\mathbf{x}) = I_C(\mathbf{x})

    Proxcf(x)=argminyf(y)+12αyx2=argminyCyx2=PC(x)\operatorname{Prox}_{c} f(\mathbf{x}) = \arg\min_{\mathbf{y}} f(\mathbf{y})+\frac{1}{2\alpha}\|\mathbf{y}-\mathbf{x}\|^2 = \arg\min_{\mathbf{y}\in C} \|\mathbf{y}-\mathbf{x}\|^2=P_C(\mathbf{x})

    • Halfspace C={xaxb}C=\{\mathbf{x}\mid\mathbf{a^\top x}\le b \}

      miny12yx2s.t.ay=bL(y,λ)=12yx2+λ(ayb){(yx)+λa=0ay=by=xλa,λ=aTxba2a\min_\mathbf{y}\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2\quad s.t.\mathbf{a^\top y}= b\\ L(\mathbf{y},\lambda)=\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2+\lambda(\mathbf{a^\top y}-b)\\ \left\{\begin{array}{l} (\mathbf{y}-\mathbf{x})+\lambda\mathbf{a}=0\\ \mathbf{a^\top y}=b \end{array} \right. \Rightarrow \mathbf{y}=\mathbf{x}-\lambda\mathbf{a},\lambda=\frac{\mathbf{a}^{T} \mathbf{x}-b}{\|\mathbf{a}\|^{2}} \mathbf{a}\\

      If xC\mathbf{x}\in C, take y=x\mathbf{y}=\mathbf{x}. If not, then take the projection of y\mathbf{y} onto the halfspace:

      PC(x)={x+baTxa2a, if aTx>bx, if aTxbP_{\mathcal{C}}(\mathbf{x})=\left\{\begin{array}{ll} \mathbf{x}+\frac{b-\mathbf{a}^{T} \mathbf{x}}{\|\mathbf{a}\|^{2}} \mathbf{a}, & \text { if } \mathbf{a}^{T} \mathbf{x}>b \\ \mathbf{x}, & \text { if } \mathbf{a}^{T} \mathbf{x} \leq b \end{array}\right.

    • Hyperbox C={xlxu}C=\{\mathbf{x}\mid\mathbf{l}\le \mathbf{x}\le\mathbf{u} \}

      (PC(x))i={li, if xilixi, if lixiuiui, if xiui\left(P_{\mathcal{C}}(\mathbf{x})\right)_{i}=\left\{\begin{array}{ll} l_{i}, & \text { if } x_{i} \leq l_{i} \\ x_{i}, & \text { if } l_{i} \leq \mathbf{x}_{i} \leq u_{i} \\ u_{i}, & \text { if } x_{i} \geq u_{i} \end{array}\right.

    • Nonnegative orthant: C=R+nC=\R_+^n

      PC(x)=max(x,0)P_C(\mathbf{x}) = \max(\mathbf{x},\mathbf{0})

Properties#

  • Separable

    f(u,v)=f1(u)+f2(v)f(\mathbf{u},\mathbf{v}) = f_1(\mathbf{u})+f_2(\mathbf{v}). Proxcf(u,v)=(Proxcf1(u),Proxcf2(v))\operatorname{Prox}_{c} f(\mathbf{u}, \mathbf{v})=\left(\operatorname{Prox}_{c} f_{1}(\mathbf{u}), \operatorname{Prox}_{c} f_{2}(\mathbf{v})\right).

  • Scaling

    a>0,bR,g(x)=f(ax+b)a>0,b\in\R,g(x)=f(ax+b).

    Proxcg(x)=a1(Proxca2f(ax+b)b)\operatorname{Prox}_{c} g(x)=a^{-1}\left(\operatorname{Prox}_{c a^{2}} f(a x+b)-b\right)

    Suppose g(x)=f(Ax+b)g(\mathbf{x})=f(\mathbf{A} \mathbf{x}+\mathbf{b}), where ARn×m\mathbf{A} \in \mathbb{R}^{n \times m} satisfies AAT=λ1I(λ>0)\mathbf{A} \mathbf{A}^{T}=\lambda^{-1} \mathbf{I}(\lambda>0)
    and bRn\mathbf{b} \in \mathbb{R}^{n}. Then

    Proxcg(x)=(IλATA)x+λAT(Proxcλ1f(Ax+b)b)\operatorname{Prox}_{c} g(\mathbf{x})=\left(\mathbf{I}-\lambda \mathbf{A}^{T} \mathbf{A}\right) \mathbf{x}+\lambda \mathbf{A}^{T}\left(\operatorname{Prox}_{c \lambda^{-1}} f(\mathbf{A} \mathbf{x}+\mathbf{b})-\mathbf{b}\right)

  • Gradient

    ff Closed, proper, convex,c>0c>0.

    Envcf(x)=1c(xProxcf(x))\nabla \operatorname{Env}_{c} f(\mathbf{x})=\frac{1}{c}\left(\mathbf{x}-\operatorname{Prox}_{c} f(\mathbf{x})\right)

  • 🌟 Moreau Decomposition

    x=Proxcf(x)+cProxc1f(c1x)\mathbf{x}=\operatorname{Prox}_{c} f(\mathbf{x})+c \operatorname{Prox}_{c^{-1}} f^{*}\left(c^{-1} \mathbf{x}\right)

    Proof. Let u=Proxcf(x)\mathbf{u}=\operatorname{Prox}_{c} f(\mathbf{x}). Then

    u=Proxcf(x)c1(ux)f(u)uf(c1(ux))\begin{aligned} \mathbf{u}=\operatorname{Prox}_{c} f(\mathbf{x}) & \Longleftrightarrow-c^{-1}(\mathbf{u}-\mathbf{x}) \in \partial f(\mathbf{u}) \\ & \Longleftrightarrow \mathbf{u} \in \partial f^{*}\left(-c^{-1}(\mathbf{u}-\mathbf{x})\right) \end{aligned}

    Let z=c1(ux)\mathbf{z}=-c^{-1}(\mathbf{u}-\mathbf{x}). Then u=xcz\mathbf{u}=\mathbf{x}-c \mathbf{z} and thus

    xczf(z)0f(z)+c(zc1x)z=Proxc1f(c1x)\begin{aligned} & \Longleftrightarrow \mathbf{x}-c \mathbf{z} \in \partial f^{*}(\mathbf{z}) \\ & \Longleftrightarrow \mathbf{0} \in \partial f^{*}(\mathbf{z})+c\left(\mathbf{z}-c^{-1} \mathbf{x}\right) \\ & \Longleftrightarrow \mathbf{z}=\operatorname{Prox}_{c^{-1}} f^{*}\left(c^{-1} \mathbf{x}\right) \\ \end{aligned}

    So $ \mathbf{x}=\mathbf{u}+c \mathbf{z}=\operatorname{Prox}{c} f(\mathbf{x})+c \operatorname{Prox}{c^{-1}} f^{*}\left(c^{-1} \mathbf{x}\right)$.

    • Example: proximal mapping of norm

      We know that if f(x)=xf(\mathbf{x})=\|\mathbf{x}\|, then f(y)=IB(y)f^{*}(\mathbf{y})=I_{\mathcal{B}}(\mathbf{y}), where B\mathcal{B} is the unit ball of the dual norm .\|\cdot\|_{*} . Then by Moreay decomposition:

      Proxcf(x)=xcProxc1f(x/c)=xcPB(x/c)=xPcB(x)\begin{aligned} \operatorname{Prox}_{c} f(\mathbf{x}) &=\mathbf{x}-c \operatorname{Prox}_{c^{-1}} f^{*}(\mathbf{x} / c) \\ &=\mathbf{x}-c P_{\mathcal{B}}(\mathbf{x} / c) \\ &=\mathbf{x}-P_{c \mathcal{B}}(\mathbf{x}) \end{aligned}