Mining Strength and Weakness Rules of Cricket Players in the Presence of External
6.2 Learning Strength and Weakness Rules in the Presence of External FactorsExternal Factors
In this section, we propose a computational method that identifies the strength and weakness rules of the individual player in the presence of external features. In Section 4.2, the rule learner (CA method) finds directions in an unconstrained manner. CA’s objective is to minimize the sum of the squared X2 distance under no constraints. In the presence of external features, the objective is to minimize thisX2 distance along the weighted linear combination of external features. By doing so, it not only achieves the dependency between batting and bowling features as given in Definition 4.1 - Definition 4.5, but also constrains the search space along the weighted linear combination of external features as given in Definition 6.1 - Definition 6.5. Redundancy Analysis (RDA) [144]
and Canonical Correspondence Analysis (CCA) [28] methods are used for this task. RDA is an extension of PCA to include external features. Similarly, CCA is the extension of CA include the external features. As we have previously ruled out PCA for our analysis (batting features and bowling features are discrete random variables), we can not use RDA. CCA is employed to incorporate external features to refine rule learning. CCA projects the data onto a subspace defined by the external features and performs CA. The steps of CCA are presented separately for batting analysis and bowling analysis in the following sections.
6.2.1 Batting Analysis through CCA
Refer to Figure 6.1 for the steps involved in batting analysis through CCA. For batting analysis, the relationships between batting features and external features are obtained through the bowling features. To achieve this, TCM of bowling features×batting features (denoted asN) and ECM of bowling features×external features (denoted asX) are constructed. CCA first obtains the residual matrixAfromN. The projection matrixQis obtained fromX. ThenAis projected ontoQto get the projected matrixA∗(=QA), which is the interpretable part ofXby the external features. SVD is applied toA∗ to obtain the bowling principal components (F), batting principal components (G)
Xn x e Bowling Features
External Features
ECM
Input Data: External and Technical Confrontation Matrix (ECM and TCM)
Nn x m Bowling Features
Batting Features
TCM Constrained Matrix
Qn x n Projection Matrix
An x m Bowling Features
Batting Features
Residual Matrix
A*n x m Bowling Features
Batting Features
Projected Matrix (A* = QA)
Singular Value Decomposition r
U
n
Left Singular Vectors
Σ
r
Diagonals of Singular Values
r r VT
m
Right Singular Vector
Principal Components (PCs)
Batting Features
First Two Column PCs
PC1 PC2
Gm x 2
Eex 2
Batting Features Bowling Features
First Two Row PCs
PC1 PC2
Fnx 2
Fn x r
Row PCs F = UΣ Bowling Features
PC1PC2PC3…
Column PCs G = VΣ Gm x r PC1PC2PC3…
Regression Coefficients (RCs)
External Features (e)
RCs of the CCA dimensions on External Features
E
RC1RC2RC3…
External Features
RCs of First Two CCA dimensions on External Features
RC1RC2
Regression Coefficient
Figure 6.1: Batting Analysis through CCA.
Algorithm 8 CCA Algorithm (Batting Analysis)
Require: T CMbowl×bat (NI×J) and ECMbowl×ext (XI×M)
1: Matrix sum: n=PIi=1PJj=1Nij 2: Row masses(r): ri= Nni., i= 1,2,· · · , I
3: Diagonal matrix: Dr=diag(r1, r2, ..., rI)
4: Column masses(c): cj = Nn.j, j = 1,2,· · · , J
5: Diagonal matrix: Dc=diag(c1, c2, ..., cJ)
6: Correspondence matrix: P = 1nN
7: Standardized residuals: A=D−
1
r 2(P−rcT)D−
1
c 2
8: I×I projection matrix: Q=D−
1
r 2X(XTDrX)−1D−
1
r 2
9: Project A onto Q to obtain constrained correspondence matrix: A∗=QA
10: Singular value decomposition: A∗ =UΣVT
11: Principal components of rows: G=D−
1
r 2UΣ
12: Principal components of columns: F =D−
1
c 2VΣ
13: return F and G
in the constrained space of external features. Additionally, coordinates for the external features (E) are obtained, which are the standardized regression coefficients obtained by performing the weighted least square regression of the external features on the two principal axes. The CCA algorithm for batting analysis is presented in Algorithm 8. To plot all the three features in a two- dimensional plot, triplot [28] is used, where bowling features and batting features are represented by points and external features are represented by arrows. For the bowling features and batting features, the biplot interpretation holds. That is, the closer they are, the more dependent they are.
The relationship between the batting features and external features is through the bowling features they have in common.
6.2.2 Bowling Analysis through CCA
Refer to Figure 6.2 for the steps involved in bowling analysis through CCA. For bowling analysis, the relationships between bowling features and external features are obtained through the batting features. To achieve this, TCM of batting features×bowling features (denoted as N) and ECM of batting features×external features (denoted asX) are constructed. CCA first obtains the residual matrixAfromN. The projection matrixQis obtained fromX. ThenAis projected ontoQto get the projected matrixA∗(=QA), which is the interpretable part ofXby the external features. SVD is applied toA∗ to obtain the batting principal components (F), bowling principal components (G) in the constrained space of external features. Additionally, coordinates for the external features (E) are obtained. The coordinates for the external features (E) are the standardized regression coefficients obtained by performing weighted least square regression of the external features on the two principal axes. The CCA algorithm for bowling analysis is presented in Algorithm 9. To plot all the three features in a two-dimensional plot, triplot [28] is used, where bowling features and batting features are represented by points and external features are represented by arrows. For the
Xm x e Batting Features
External Features
ECM
Input Data: External and Technical Confrontation Matrix (ECM and TCM)
Nm x n Batting Features
Bowling Features
TCM Constrained Matrix
Qm x m
Projection Matrix
Am x n Batting Features
Bowling Features
Residual Matrix
A*m x n Batting Features
Bowling Features
Projected Matrix (A* = QA)
Singular Value Decomposition r
U
m
Left Singular Vectors
Σ
r
Diagonals of Singular Values
r r VT
n
Right Singular Vector
Principal Components (PCs)
Bowling Features First Two Column PCs
PC1 PC2
Gnx 2
Eex 2
Bowling Features BattingFeatures
First Two Row PCs
PC1 PC2
Fmx 2
Fm x r
Row PCs F = UΣ
BattingFeatures
PC1PC2PC3…
Column PCs G = VΣ
Gn x r
PC1PC2PC3…
Regression Coefficients (RCs)
External Features (e)
RCs of the CCA dimensions on External Features
E
RC1RC2RC3…
External Features
RCs of First Two CCA dimensions on External Features
RC1RC2
Regression Coefficient
Figure 6.2: Bowling Analysis through CCA.
Algorithm 9 CCA Algorithm (Bowling Analysis) Require: T CMbat×bowl (NI×J) and ECMbat×ext (XI×M)
1: Matrix sum: n=PIi=1PJj=1Nij 2: Row masses(r): ri= Nni., i= 1,2,· · · , I
3: Diagonal matrix: Dr=diag(r1, r2, ..., rI)
4: Column masses(c): cj = Nn.j, j = 1,2,· · · , J
5: Diagonal matrix: Dc=diag(c1, c2, ..., cJ)
6: Correspondence matrix: P = 1nN
7: Standardized residuals: A=D−
1
r 2(P−rcT)D−
1
c 2
8: I×I projection matrix: Q=D−
1
r 2X(XTDrX)−1D−
1
r 2
9: Project A onto Q to obtain constrained correspondence matrix: A∗=QA
10: Singular value decomposition: A∗ =UΣVT
11: Principal components of rows: G=D−
1
r 2UΣ
12: Principal components of columns: F =D−
1
c 2VΣ
13: return F and G
bowling features and batting features, the biplot interpretation holds. That is, the closer they are, the more dependent they are. The relationship between the bowling features and external features is through the batting features they have in common.