Projections of Structured Convex Sets - Fitting Tractable Convex Sets to Support Function Evalu

Chapter III: Fitting Tractable Convex Sets to Support Function Evaluations

3.2 Projections of Structured Convex Sets

To illustrate the relationship between this formulation and our setup (withC =∆^q), suppose we specify the linear map A ∈ R^d×q in (3.2) in terms of its columns as A=[a₁|. . .|a_q]. Then the optimization problem (3.2) can be reformulated as

argmin

{aj}^q_j=1⊂R^d n

i=1

y⁽ⁱ⁾−max

a_j,u⁽ⁱ⁾E2

. (3.3)

One can view (3.3) as a problem of computing a partition of the collection of support function measurements intoqdisjoint clusters, with each cluster being assigned to an extreme point that minimizes the squared-loss error. In Section 3.4 we highlight the analogies between our algorithms for computing ˆK and the widely used Lloyd’s algorithm forK-means clustering [93].

3.1.4 Outline

In Section 3.2 we discuss the geometric and algebraic aspects of the optimization problem (3.2), which serve as the foundation for the subsequent statistical analysis in Section 3.3. Throughout both of these sections, we describe several examples that provide additional insight into the mathematical development. We describe algorithms for (3.2) in Section 3.4, and we demonstrate the application of these methods in a range of numerical experiments in Section 3.5. We conclude with a discussion of future directions in Section 3.6.

Probabilistic Model for Support Function Measurements. We assume that the support function evaluations are measurement pairs– these are pairs of the form (u,y) ∈ S^d−1×R, whereu denotes the direction at which the support function is evaluated, andy denotes the noisy support function evaluation – are independently and identically distributed according to the model:

PK^? : y= hK^?(u)+ε.

Here,u∈ S^d−¹is a random vector distributed uniformly at random (u.a.r.) over the unit sphere, εis a centered random variable with variance σ² (i.e. E[ε] = 0, and E[ε²]=σ²), anduandεare independently distributed.

In the following, we describe the geometrical, algebraic, and analytical aspects of our procedure. The proofs of all results we state are straightforward. To simplify the exposition in this section, we defer these proofs to the Appendix.

3.2.1 Geometrical Aspects of Our Method

We quantify the difference between pairs of convex sets in terms of the Lp metric applied to their respective support functions. Let K₁andK₂ be a pair of compact convex sets inR^d, and let hK₁(·)and hK₁(·)be the respective support functions of these sets. We denote

ρp(K₁,K₂)= ∫

S^d⁻¹

|hK₁(u) −hK₂(u)|^p µ(du) 1/p

, 1≤ p< ∞, (3.4) where the integral is performed with respect to the Lebesgue measure overS^d−¹; as usual, we denote ρ∞(K₁,K₂)=max_u|hK₁(u) −hK₂(u)|. In Section 3.3.1, we prove our convergence guarantees in terms of the Lp-metric.

Theρp-metric represents an important class of distance measures over convex sets.

For instance, it features prominently in the literature of approximating convex sets as polytopes [25]. In addition, the specific instance of p = ∞ coincides with the Hausdorff distance [124].

A basic question we seek to address in Section 3.3.1 is to characterize the settings in which the sequences of estimators obtained using our method converge, and the limit to which these sequences converge to. As the estimators we consider are computed by minimizing an empirical loss function, a natural strategy is to understand the minimizers of the loss function at the population level. Let Q be a probability measure over the pairs(u,y) ∈ S^d−1×R. We denote the loss function with respect

toQas follows:

ΦC(A,Q):= EQ{(hC(A⁰u) −y)²}. First, we note thatΦC(·,Q)is continuous ifyisQ-integrable:

Proposition 3.2.1 LetQbe any probability distribution over the measurement pairs (u,y)satisfying E^Q[|y|] < ∞. Then the function A 7→ ΦC(A,Q) is continuous at every A.

Next, we denote the set of minimizers of the loss function at the population level ΦC(·,PK^?)byM_K^?_,C:

M_K?,C:= argmin

ΦC(A,PK^?).

The following result states a series of properties about the setM_K^?_,C. Crucially, it shows thatM_K^?_,Ccharacterizes optimal approximations ofK^?as projections ofC:

Proposition 3.2.2 The setM_K?,Cis compact and non-empty. Moreover, we have Aˆ ∈ M_K^?_,C ⇔ Aˆ ∈ argmin

A∈L(R^q,R^d)

ρ₂(A(C),K^?).

It follows from Proposition 3.2.2 that an optimal approximation ofK^? as the projection of C always exists. In Section 3.3.1, we show that the estimators obtained using our method converge to such a set if it is also unique.

Example. SupposeK^?is the regularq-gon inR², andCis the free spectrahedron O². ThenM_K^?_,C uniquely specifies`₂-ball.

Example. Suppose K^? is the unit `₂-ball in R², and C is the simplex ∆^q. Then M_K?,Cspecifies a centered regularq-gon, but with an unspecified rotation.

In light of our previous remark, a natural question to consider is to obtain a full characterization of the settings in whichM_K^?_,Cdefines a unique set. Unfortunately, such an undertaking is difficult as the setM_K?,Cis highly dependent on the setsK^? and C. Based on Proposition 3.2.2, we can provide a simple sufficient condition under whichM_K?,C defines a unique set:

Corollary 3.2.3 Suppose we haveK^?= A^?(C)for some A^?∈ L(R^q,R^d). Then the set of minimizersM_K?,C uniquely defineK^?; i.e.,K^?= A(C)for all A∈ M_K?,C.

In our earlier example where K^?is the unit`₂-ball inR² andC is the simplex∆^q, the setM_K?,C contains multiple orbits because of non-trivial symmetries in K^?– many sets that one encounters in practice do not contain such symmetries, in which case we expectM_K^?_,Cto define a unique set.

3.2.2 Algebraic Aspects of Our Method

Next, we introduce the necessary tools to view our geometric problem of recon- structing a set as an algebraic task of recovering a linear map.

We begin by describing the identifiability issues that arise with such an approach.

Given a compact convexC, letgbe the linear transformation that preservesC; i.e., g(C)= C. Then the linear map defined by Agspecifies the same convex set:

[Ag](C)= A(g(C))= A(C)= K.

As such, every projection mapA∈L(R^q,R^d)is a member of the equivalence relation defined by:

A∼ Ag, g ∈Aut(C). (3.5)

Here Aut(C) denotes the subset of linear transformations that preserve C. Since the cone generated by C is full dimensional, every element of Aut(C)must be an invertible matrix, and hence Aut(C) forms a subgroup of GL(q,R). In particular, the equivalence classOC(A) := {Ag : g ∈ Aut(C)} specified by (3.5) is also the orbit of A∈ L(R^q,R^d)under group action by Aut(C). A further consequence of C being compact convex is that Aut(C)forms a compact matrix Lie group (see, e.g., Corollary 3.6 of [73]), and henceOC(A)is also a smooth manifold.

It follows that the space of projection mapsL(R^q,R^d)can be viewed as a union of orbits OC(A). An important property of the Hausdorff distance is that it defines a metric over collections of non-empty compact sets, and hence the collection of all orbits{O_C(A)}_A∈L(

R^q,R^d) endowed with the Hausdorff distance defines a metric space. In our subsequent analysis in Section 3.3, it is useful to view our set regression instance as one of recovering an orbit

Notice that the functionΦC(·,PK^?)is invariant over orbits ofA: for everyg ∈Aut(C), we have hC(A⁰u) = hC((Ag)⁰u). It follows immediately that the set of minimizers M_K^?_,Cmust also be a union of orbits. In Section 3.3.1, we show that the sequence of orbits corresponding to the minimizers of the empirical loss approach M_K^?_,C as the number of measurements increase. Our results in Sections 3.3.2 and 3.3.3 consider the specialized setting where M_K?,C is a single orbit, in which case the

sequence of orbits also converges to M_K?,C. It is straightforward to see that such a condition is satisfied ifK^?is a polytope with q extreme points, and we choose the lifting setCto be the simplex∆^qinq-dimensions. The situation for projections of the free spectrahedron is more delicate, as the number of extreme points may be infinite. One simple instance in which M_K^?_,C is a single orbit is when K^? is the projection ofO^qunder a linear map A, and AmapsO^qbijectively toK^?. The following result provides a procedure to construct further examples of convex sets that are representable as projections of the free spectrahedron, and where the set of minimizersM_K?,Cis a single orbit:

Proposition 3.2.4 Let{K_i}_i^k₌

1 ⊂ R^d be a collection of sets with the property that K_i is expressible as the projection of O^qⁱ, and that M_K_i_,Oqi is a single orbit. Let K = conv(∪i{K_i}_i^k₌₁). Suppose that K_i are exposed faces of K. Then K is expressible as the projection ofO^q, where q= Í

iqi. In addition,M˛,^O^q is a single orbit.

3.2.3 Analytical Aspects of Our Method

Third, we state the main derivative computations in our paper. These are useful for describing our examples in Section 3.3.2, and our algorithms in Section 3.4.

Given a compact convex C, the support function hC(·)is differentiable atx if and only if argmax_g∈_Rqhg,xiis a singleton, and in which case, the derivative is given by argmax_g∈_Rqhg,xi(see Corollary 1.7.3 in [124]). We denote the derivative of hC at xby eC(x):=∇_x(hC(x)).

Example. SupposeC =∆^q ⊂ R^qis the simplex. The functionhC(·)is the maximum entry of the input vector, and is differentiable if and only if the maximum is unique, in which case the derivativeeC(·)is the corresponding standard basis vector.

Example. Suppose C = O^p ⊂ S^p is the free spectrahedron. The function hC(·) is the largest eigenvalue of the input matrix, and is differentiable if and only if the largest eigenvalue has multiplicity one, in which case the derivative eC(·) is the corresponding rank-one unit-norm positive definite matrix.

The following result computes the first derivative ofΦC(·,PK^?).

Proposition 3.2.5 LetQ be a probability distribution over the measurement pairs (u,y), and suppose that EQ{y²} < ∞. Suppose that A ∈ L(R^q,R^d) is a linear

map such that hC(·) is differentiable at A⁰ufor Q-a.e. u. Then A 7→ ΦC(A,Q)is differentiable at Awith derivative2EQ{(hC(A⁰u) −y)u⊗ eC(A⁰u)}.

Dalam dokumen Fitting Convex Sets to Data: Algorithms and Applications (Halaman 73-78)