Ugrás a tartalomhoz

## Convex Geometry

Csaba Vincze (2013)

University of Debrecen

1.6 Convex functions

## 1.6 Convex functions

In what follows let K be a non-empty open convex subset in the coordinate space of dimension n and consider a function

 $f:K\to R.$ (1.33)

Definition The function f is convex if for any points p and q in K

 $f\left(\left(1-\lambda \right)p+\lambda q\right)\le \left(1-\lambda \right)f\left(p\right)+\lambda f\left(q\right),$ (1.34)

where λ is in [0,1]. The function f is concave if - f is convex.

The geometric meaning of equation 1.34 is that chords joining the points on the graph are above it. This can be expressed in terms of the so-called epigraph as the following theorem shows.

Figure 6: The epigraph of a function.

Proposition 1.6.1 The function f is convex if and only if itsepigraph

is a convex subset in the coordinate space of dimension n+1.

Proof Let f be a convex function and suppose that (p,t) and (q,s) are in epi f. Then

 $\left(1-\lambda \right)\left(p,t\right)+\lambda \left(q,s\right)=\left(\left(1-\lambda \right)p+\lambda q,\left(1-\lambda \right)t+\lambda s\right),$

where v:=(1 - λ)p+λq is in K because of its convexity and the scalar "coordinate" satisfies the inequalities

 $\left(1-\lambda \right)t+\lambda s\ge \left(1-\lambda \right)f\left(p\right)+\lambda f\left(q\right)\ge f\left(v\right)$

because of the convexity of the function. Therefore epi f is convex (as a set). Conversely, if epi f is a convex set then the chords joining its boundary points are "above" the graph of f and inequality 1.34 follows immediately. ▮

Proposition 1.6.2 (Jensen, Johan) The function f is convex if and only if

 $f\left({\lambda }_{1}{v}_{1}+\mathrm{\dots }+{\lambda }_{k}{v}_{k}\right)\le {\lambda }_{1}f\left({v}_{1}\right)+\mathrm{\dots }+{\lambda }_{k}f\left({v}_{k}\right)$ (1.35)

for any convex combination of elements from K.

Proof Inequality 1.35 gives the definition of convex functions under the special choice k=2. Conversely, if a function is convex then inequality 1.35 is satisfied in case of k=2 because of the definition of convex functions (if k=1 then there is nothing to prove). For convex combinations involving more that two terms the proof is based on a simple induction. Suppose that 1.35 is true for convex combinations containing at most k - 1 vectors and consider the convex combination

 $v:={\lambda }_{1}{v}_{1}+\mathrm{\dots }+{\lambda }_{k}{v}_{k}$

of the elements v(1), ..., v(k) in K. Because at least one of the coefficients must be different from 1 we can write, for example, that

 $v=\left(1-{\lambda }_{k}\right)\left(\frac{{\lambda }_{1}}{1-{\lambda }_{k}}{v}_{1}+\mathrm{\dots }+\frac{{\lambda }_{k-1}}{1-{\lambda }_{k}}{v}_{k-1}\right)+{\lambda }_{k}{v}_{k},$

where

 $w:=\frac{{\lambda }_{1}}{1-{\lambda }_{k}}{v}_{1}+\mathrm{\dots }+\frac{{\lambda }_{k-1}}{1-{\lambda }_{k}}{v}_{k-1}$

is in K because of its convexity and

 $v=\left(1-{\lambda }_{k}\right)w+{\lambda }_{k}{v}_{k}.$

Then

 $f\left(v\right)\le \left(1-{\lambda }_{k}\right)f\left(w\right)+{\lambda }_{k}f\left({v}_{k}\right)$ (1.36)

and, by the inductive hypothesis,

 $f\left(w\right)\le \frac{{\lambda }_{1}}{1-{\lambda }_{k}}f\left({v}_{1}\right)+\mathrm{\dots }+\frac{{\lambda }_{k-1}}{1-{\lambda }_{k}}f\left({v}_{k-1}\right).$ (1.37)

Relations 1.36 and 1.37 give that

 $f\left(v\right)\le {\lambda }_{1}f\left({v}_{1}\right)+\mathrm{\dots }+{\lambda }_{k-1}f\left({v}_{k-1}\right)+{\lambda }_{k}f\left({v}_{k}\right)$

as was to be proved. ▮

Figure 7: Johan Jensen, 1859-1925.

Proposition 1.6.3 Let K be a non-empty open convex set. If the function

 $f:K\to R$

is convex then it is continuous at any point in K.

Proof Let p in K be a given point (recall that K is a non-empty open convex subset). Without loss of generality we can suppose that p is just the origin 0. As the first step we are going to prove that f is locally bounded. Consider an open box R of dimension n centered at the origin in K. Since the elements in R can be expressed as a convex combination of the vertices

 ${v}_{1},\mathrm{\dots },{v}_{m}\left(m={2}^{n}\right)$

we have that for any v in R

 $f\left(v\right)\le {\lambda }_{1}f\left({v}_{1}\right)+\mathrm{\dots }+{\lambda }_{m}f\left({v}_{m}\right)\le M\left({\lambda }_{1}+\mathrm{\dots }+{\lambda }_{m}\right)=M,$

where

 $M:=\mathrm{m}\mathrm{a}\mathrm{x}\left\{f\left({v}_{1}\right),\mathrm{\dots },f\left({v}_{m}\right)\right\}.$

On the other hand

 $0=\frac{1}{2}v+\frac{1}{2}\left(-v\right),$

where - v is in R because the origin is the center of the box. Using the upper bound M and the convexity of the function

 $f\left(0\right)\le \frac{1}{2}f\left(v\right)+\frac{1}{2}f\left(-v\right)\le \frac{1}{2}f\left(v\right)+\frac{1}{2}M$

and, consequently,

 $m:=2f\left(0\right)-M\le f\left(v\right)$

is a lower bound. Therefore

 $|f\left(v\right)|\le C:=\mathrm{m}\mathrm{a}\mathrm{x}\left\{|m|,|M|\right\}\left(v\in R\right).$

Figure 8: The proof of Proposition 1.6.3.

In the second step we claim that f is locally Lipschitzian. Consider an open ball B centered at p with radius r such that 2B is contained in the box R. Then for each q in B we have a point z not in B but in R such that

Explicitly

 $q=\left(1-\lambda \right)p+\lambda z,$

where

 $\lambda =\frac{||q-p||}{||z-p||}$

is the simple ratio among the points. Using the convexity of the function we have that

 $f\left(q\right)\le \left(1-\lambda \right)f\left(p\right)+\lambda f\left(z\right)$

and, consequently,

 $\frac{f\left(q\right)-f\left(p\right)}{||q-p||}\le \frac{f\left(z\right)-f\left(p\right)}{||z-p||}\le \frac{2C}{||z-p||}\le \frac{2C}{r}$ (1.38)

because z is not in B but z is in R. Therefore

 $f\left(q\right)-f\left(p\right)\le \frac{2C}{r}||q-p||.$ (1.39)

Using the same argumentation as above for the triplet q, p and u:= - z we have that

 $\frac{f\left(p\right)-f\left(q\right)}{||p-q||}\le \frac{f\left(u\right)-f\left(q\right)}{||u-q||}$ (1.40)

and, consequently,

 $\frac{f\left(q\right)-f\left(u\right)}{||u-q||}\le \frac{f\left(q\right)-f\left(p\right)}{||p-q||}.$

The last equation allows us to present a lower estimation

 $-\frac{2C}{r}\le -\frac{2C}{||u-q||}\le \frac{f\left(q\right)-f\left(u\right)}{||u-q||}\le \frac{f\left(q\right)-f\left(p\right)}{||p-q||}.$ (1.41)

Therefore

 $-\frac{2C}{r}||p-q||\le f\left(q\right)-f\left(p\right).$ (1.42)

Inequalities 1.39 and 1.42 say that

 $|f\left(q\right)-f\left(p\right)|\le \frac{2C}{r}||p-q||,$

i.e. the function is locally Lipschitzian and, consequently, it is continuous at p as was to be proved. ▮

Figure 9: Rudolf Lipschitz, 1832-1903.

Inequalities 1.38 and 1.42 imply more than the continuity: the existence of the one-sided directional derivatives

 ${D}_{v}^{+}f\left(p\right)=\underset{t\to {0}^{+}}{\mathrm{l}\mathrm{i}\mathrm{m}}\frac{f\left(p+tv\right)-f\left(p\right)}{t}$

at each point into each direction. Indeed, consider the function

 $h\left(t\right):=\frac{f\left(p+tv\right)-f\left(p\right)}{t}\left(0

defined on a sufficiently small open interval. Using the notation q=p+tv inequality 1.42 says that h is bounded from below. Taking t < s and z=p+sv 1.38 shows that h is monotone increasing. Therefore its infimum M*=inf h exists and

 $\underset{t\to {0}^{+}}{\mathrm{l}\mathrm{i}\mathrm{m}}\frac{f\left(p+tv\right)-f\left(p\right)}{t}={M}^{\mathrm{*}}.$

For further regularity properties of convex functions see Lebesgue's theorem and [8]. Figure 10 shows why it is important for the point p to be in the interior of the domain.

Figure 10: Discontinuity on the boundary of the domain.

Definition The element w is called a subgradient of the function f at the point p in K if the inequality

 $〈w,q-p〉\le f\left(q\right)-f\left(p\right)$ (1.43)

holds for any point q in K. The subdifferential of the function f is the set of its subgradients.

For the geometric description of a subgradient vector write inequality 1.43 into the form

 $〈\left(w,-1\right),\left(q,f\left(q\right)\right)-\left(p,f\left(p\right)\right)〉\le 0$

to express that the graph of the function must be entirely above the hyperplane

 $〈\left(w,-1\right),\left(x,t\right)-\left(p,f\left(p\right)\right)〉=0$ (1.44)

passing through the point (p, f(p)) in the coordinate space of dimension n+1. The vector (w, - 1) plays the role of the normal vector to the hyperplane 1.44.

The subgradient involves a global property whereas the derivative has a local character. Nevertheless the convexity of the function allows us to describe the set of subgradients locally in terms of the directional derivative.

Proposition 1.6.4 (Local characterization.) Let K be a non-empty open convex set and consider a convex function

 $f:K\to R.$

The element w is a subgradient at the point p in K if and only if the inequality

 $〈w,v〉\le {D}_{v}^{+}f\left(p\right)$

holds for any element v in the coordinate space.

Proof Suppose that w is a subgradient of the function f at the point p and let us choose the point q in the special form

 $q:=p+tv,$

where v is a nonzero vector and t is a positive real number which is small enough for q to be in K. Then the relation

 $〈w,v〉\le \frac{f\left(p+tv\right)-f\left(p\right)}{t}$

follows immediately from the definition of the subgradient. Therefore

 $〈w,v〉\le \underset{t\to {0}^{+}}{\mathrm{l}\mathrm{i}\mathrm{m}}\frac{f\left(p+tv\right)-f\left(p\right)}{t}={D}_{v}^{+}f\left(p\right).$

In order to see the converse statement let q be an arbitrary point in K and consider the line segment

 $c\left(t\right):=\left(1-t\right)p+tq=p+tv$

joining p and q. Since the function is convex, the formula

 $f\left(c\left(t\right)\right)\le \left(1-t\right)f\left(p\right)+tf\left(q\right)$

holds for any parameter t between 0 and 1. Therefore

 ${D}_{v}^{+}f\left(p\right)=\left(f\circ c\right)\mathrm{\text{'}}\left(0\right)=\underset{t\to {0}^{+}}{\mathrm{l}\mathrm{i}\mathrm{m}}\frac{f\left(c\left(t\right)\right)-f\left(c\left(0\right)\right)}{t}\le$

 $\underset{t\to {0}^{+}}{\mathrm{l}\mathrm{i}\mathrm{m}}\frac{\left(1-t\right)f\left(p\right)+tf\left(q\right)-f\left(p\right)}{t}=f\left(q\right)-f\left(p\right).$

This means that

 $〈w,q-p〉=〈w,v〉\le {D}_{v}^{+}f\left(p\right)$

implies that

 $〈w,q-p〉\le f\left(q\right)-f\left(p\right)$

as was to be proved. ▮

Corollary 1.6.5 Let K be a non-empty open convex set and consider a convex function

 $f:K\to R.$

The following conditions are equivalent:

• The point p in K is a global minimizer.

• The zero vector 0 belongs to the subdifferential of f at p.

• For any element v

 $0\le {D}_{v}^{+}f\left(p\right).$

Proof If p is a global minimizer then for any q in K

 $0\le f\left(q\right)-f\left(p\right)$

showing that 0 is one of the subgradient at p. If 0 is one of the subgradient at p then, by definition

 $0=〈0,q-p〉\le f\left(q\right)-f\left(p\right)$

and we have that p is a global minimizer. The equivalence of (ii) and (iii) is a direct consequence of the local characterization Proposition 1.6.4 of the subgradient vectors. ▮

Figure 12: The zero vector as a subgradient.

Definition Suppose that

 $f:K\to R$

is differentiable at the point p. The gradient vector is defined in terms of the usual partial derivatives:

 $grad{f}_{p}:=\left({D}_{1}f\left(p\right),\mathrm{\dots },{D}_{n}f\left(p\right)\right).$

Actually it is a special notation for the Jacobian matrix at the point p.

For the sake of simplicity we restrict ourselves to the coordinate plane to present the geometric characterization of the gradient vector. We will use the standard symbols x and y for the coordinates of the points in the plane. Let U be a non-empty open subset and consider a (not necessarily convex) function

 $f:U\to R.$

Suppose that f is continuously differentiable, i.e. it is differentiable everywhere and the partial derivatives are continuous. Let p be a point in U with a non-zero gradient vector. This means, for example, that the partial derivative with respect to the second coordinate at p is different from zero:

 ${D}_{2}f\left(p\right)\ne 0.$

Let us define the mapping

 $\mathrm{\Phi }:U\to {E}^{2},\left(x,y\right)↦\mathrm{\Phi }\left(x,y\right)=\left(x,f\left(x,y\right)\right).$

The Jacobian

 $\mathrm{d}\mathrm{e}\mathrm{t}J=\mathrm{d}\mathrm{e}\mathrm{t}\left(\begin{array}{ll}1& 0\\ {D}_{1}f& {D}_{2}f\end{array}\right)={D}_{2}f$

is different from zero at p.

Figure 13: The inverse mapping theorem.

Using the inverse mapping theorem we have an inverse function defined on an open neighbourhoof Φ(V) of Φ(p). We are going to give a local parameterization for the level curve

 $f\left(x,y\right)={c}_{0}$ (1.45)

passing through the point p. Let r be a sufficiently small positive real number such that

is a parametrization of the horizontal segment passing through Φ(p) and t is between r and - r. Then

 $w\left(t\right):={\mathrm{\Phi }}^{-1}\left(v\left(t\right)\right)$

is just a local parametrization of the level curve 1.45 because

Therefore

 $0=\left(f\circ w\right)\mathrm{\text{'}}={w\mathrm{\text{'}}}_{1}{D}_{1}f\left(w\right)+{w\mathrm{\text{'}}}_{2}{D}_{2}f\left(w\right)$ (1.46)

which means that the gradient vector field along the level curves is orthogonal to the tangent lines represented by the derivative vector w'.