Homology Theory - Multimedia Big Data: Content Analysis and Retrieval

Multimedia Big Data: Content Analysis and Retrieval

4.3 Homology Theory

The fundamental starting point of TDA is the definition and identification of appropriate homology groups [4].

Homology groups are algebraic entities, which quantify specific topological properties in a space. Although it does not capture all topological aspects of a space as two spaces with the same homology groups may not be topologically equivalent, two spaces that are topologically equivalent must have isomorphic homology groups. Loosely speaking, homology formalises the properties of groups that are relevant according to specific rules. Furthermore, an important aspect of homology theory is that it provides a theoretical description which can be expressed via computationally efficient techniques, with a variety of data science application.

The main motivation of homology is the identification of objects that follow specific invariant rules, very much like in the above example. More specifically, assume we have a set of data represented by points in an n-dimensional vector space. How can we extract information in a meaningful way, whilst ensuring both efficiency and accuracy?

Consider, for example, a ball of radiusraround each point in the dataset. Ifr is too small, each ball would only contain a single point, and we would only have a topology consisting of disjoint balls. On the other hand,r is too large, and the induced topology is just one big ball containing all the point. Therefore, we need to have a radius to avoid such extreme cases and obtain enough information to capture some of the geometric structure of the underlying geometric object. Furthermore, in every dataset, there is always some level of noise, which should be addressed and perhaps ignored in order to only extract relevant information. Therefore, when choosing the “right” radius, it is important to define balls that capture “good”

information. Whatever properties associated with the corresponding topology, the power of such an approach is that the topology reveals geometric features of the data set that is independent of how it is represented in lower dimensions, whilst minimising the impact of noise.

4.3.1 Simplicial Complexes

The fundamental components of homology aresimplicial complexes, which consist of collections of simplicials [3]. These are based on specific of a space. This chapter will not discuss triangulations in details, and for a detailed description, please refer to [3]. In a nutshell, triangulation is the process of covering a shape with joined, nonoverlapping polyhedra, which can be viewed as a “shape approximation”.

Strictly speaking, a triangulation does not only refer to 2-dimensional objects, and in fact, they are in general defined as polyhedral.

Fig. 4.2 An example of triangulation and mesh

Fig. 4.3 A convex hull

Figure4.2depicts an example of triangulation, where it can be easily seen that such process allows an effective approximation of the corresponding shape. Clearly, such approximation depends on how all the polyhedra are defined.

The triangulation of a spaceSis defined by a convex combination of the points inS. This can be written as an affine combination where all the weights are non- negative or, in other words, i.e.wi0for alli.

The convex hull ofS, denoted convS, is the set of all convex combinations of points inS, as shown in Fig.4.3.

Tetrahedra, triangles, edges and vertices are all instances of simplices as triangulation refers to any dimension. For example, a 0-simplex is a vertex, a 1-simplex is an edge, and a 2-simplex is a triangle, as depicted in Fig.4.4.

Fig. 4.4 A depiction of a 0-dimensional (a), 1-dimensional (b) and 2-dimensional (c) simplicials

Recall that a point setCis convex if for every pair of pointsa, bbelonging toC and the line segment joiningaandbis included inC[3].

Simplices and convex polyhedral, in particular, are convex hulls of finite point sets, wherek-simplices are the “simplest” possiblek-dimensional polyhedra.

A simplicial complex is defined as a collection of simplicial, such that they contain every face of every simplex in it and that the intersection of any two of its simplicials is either empty or it is a face belonging to both of them.

In particular, ak-simplex is said to have dimensionk. A face ofsis a simplex that is the convex hull of a non-empty subset of P. Faces ofsmay have different dimensions from zero, i.e. vertices, tok,assis itself a face ofs. Furthermore, the (k1)-faces ofsare called facets ofs, so thatshaskC1 facets. For instance, the facets of a tetrahedron are its four triangular faces.

4.3.2 Voronoi Diagrams and Delaunay Triangulations

A specific example of a triangulation is Voronoi diagrams and Delaunay triangulations [3]. These are based on the concept of distance between points in a space.

In particular, the notion of neighbourhood plays a crucial role. This is, as the word suggests, the set of points within a certain distance a specific point.

Voronoi diagrams and Delaunay triangulations provide a method to approximate shapes based on the concept of neighbourhood in the discrete domain. More specifically, a Voronoi diagram consists of a collection of cells, or Voronoi cells.

These are defined as the sets of pointsVxso that no other point is closer to it thanx and each Voronoi cell is a convex polygon. Delaunay triangulations follow Voronoi diagrams as they are defined by joining the centres of each Voronoi cell, as depicted in Fig.4.5

Even though Fig. 4.5 refers to 2-dimensional Voronoi diagrams, the same definition applies to any arbitrary dimension. In such case, a Delaunay triangulation consists of polyhedra.

Fig. 4.5 A Voronoi and Delaunay triangulation

Fig. 4.6 An example of a ˇCech complex

4.3.3 Vietoris and ˇ Cech Complexes

Another important example of triangulation includes the Vietoris complex, which is a simplicial complex based on a distancedby forming a simplex for every finite set of points that has diameter at mostd.

In other words, it has the property that the if distance between every pair of points is at mostd, then this will define a complex.

A ˇCech complex, on the other hand, is defined by a set of balls with a specific radius. Points are in the same cell if their corresponding balls have non-empty intersection, as depicted in Fig.4.6.

Fig. 4.7 An example of a graph-induced complex

The main difference with a Vietoris complex is that it only considers balls which have pairwise intersections.

4.3.4 Graph-Induced Complexes

Although the formulation of a Vietoris complex tends to be simple to compute and provides an efficient tool for extracting topology of sampled spaces, its size tends to be very large. In [5], the graph-induced complex is introduced. This approach provides an improvement as it works on a subsample but still retains the descriptive power of capturing the topology as the Vietoris complex. The main advantage of this approach is that it only requires a graph connecting the original sample points from which it defines a complex on the subsample. The consequence is that the overall performance is much more efficient (Fig.4.7).

4.3.5 Chains

An important concept in homology is achain. More specifically, ap-chaincin K is a formal sum ofp-simplices added with some coefficients, that is,P

aisi, wheresi

are thep-simplices andaiare the coefficients. In particular, ifsD˚

v0;: : : ; vp

, we define the boundary asDpsDPaOi, that is, we omit thep-th element ofs.

ExtendingDpto ap-chain, we obtain a (p1)-chain, that is,DpWCp!Cp1. The property that (p1)-chains exhibit is that

Cp !Cp1!Cp2 !C₀!C1!0:

Dalam dokumen Big Data Analytics and Cloud Computing (Halaman 70-75)