• Tidak ada hasil yang ditemukan

Mining Strength and Weakness Rules of Cricket Players

4.6 Baseline Comparison

square superimposition of one biplot to another reference biplot. The lower the sum of the squared residual, the greater the similarity between the two biplots. The biplots obtained from the training data and test data are compared using the procrustes test. When there is a similarity in the obtained biplots, it is considered that the proposed method is reliable.

Table 4.6 presents the sum of squared residuals for each year (∆212

year) and average sum of squared residuals (∆212

avg) for eight batsmen. The lowest average sum of the squared residual is 0.15 for batsman Joe Root, implying his training and test data sets have the highest similarity.

The highest average sum of the squared residual is 0.44 for batsman David Warner, implying his training and test data sets have the highest dissimilarity.

In both extrinsic and intrinsic validation, we obtain high values in terms of the derived rules suggesting the accuracy of the proposed method in mining individual player’s strength and weakness rules. The data and results generated during the validation process can be accessed at https:

//www.dropbox.com/sh/sa4xmz3pqf6y9np/AABf5TzE1cbU_RgrFj7pxG3pa?dl=0.

(a) Strength Word Cloud. (b) Weakness Word Cloud.

Figure 4.9: Bigram Word Clouds for Batsman Steve Smith.

Though the obtained rules are interpretable, contradictions are observed in the constructed strength and weakness rules using the word cloud based visualization method. For instance, the first strength rule and first weakness rule for Steve Smith are identical. However, the same rule cannot be interpreted as strength and weakness simultaneously. In cricket, bowlers frequently deliver outside off stump ball, and this is a well-known fact. This makes the constructed rule trivial, and the confidence in such rules will be low in practice. This limits the use of word cloud based visualization methods. Contradictions will not be observed when the proposed method is followed for rule construction and is evident from the strength and weakness rules presented for Steve Smith through biplot. Bigram word clouds for few other players are provided in https:

//www.dropbox.com/s/srim227j149njjp/Bigram%20Wordclouds.zip?dl=0.

4.6.2 Strength and Weakness Association Rules

We apply the Association Rule Mining (ARM) technique to construct rules that account for a player’s strengths and weaknesses using short text commentary data. We analyze the association of strength/weakness exhibited by batsmen with the type of delivery they have faced. In essence, we investigate the bowling features that may be associated with the batsman’s batting features.

Agrawal et al. [123] introduced ARM to discover interesting co-occurrence between products in supermarket data (market basket analysis). ARM extracts frequent sets of items that are purchased together and generates association rules of the form A =⇒ B, where A and B are disjoint sets of items, and B is likely to be purchased whenever A is purchased [124, 125,126]. ARM is widely used in many domains, such as health care [127, 128, 129], financial transactions [130, 131], and retail [132,133], etc. ARM is applied in the sports domain as well [134,135,136]. In cricket, Raj et al. [137] used ARM to find the association between the factors in cricket matches such as toss outcome and playing conditions with the game’s outcome. UmaMaheswari et al. [138] modeled an automated framework to identify correlations among play patterns in cricket. Using their model, they have learned association rules for an individual player and generic rules for all players. In the literature, ARM has not been applied on sports text commentary to identify player-specific rules.

For player-specific analysis, we extract a subset of short text commentaries using the filter tuple hPlayer, Opponent Player, Time, Typei. As discussed in Chapter 3, each delivery is represented as a set of extracted bowling and batting features, similar to the set of items representing a transaction in ARM (Example: fullLength legStump fast attacked). This is the input for ARM.

We provide computational definitions of the strength and weakness rules and then use ARM to construct player-specific rules given the definitions.

Definition 4.6 Rule. In the association rule A =⇒ B, when A comprises a set of bowling features and B comprises a batting feature.

Definition 4.7 Strength Rule of Batsman. In Definition 4.6, when B or batting feature of the player (batsman) corresponds to attacked.

Definition 4.8 Weakness Rule of Batsman. In Definition 4.6, when B or batting feature of the player (batsman) correspond to beaten.

Definition 4.9 Strength Rule of Bowler. In Definition 4.6, when B or batting feature of the opponent players (batsmen) corresponds to beaten.

Definition 4.10 Weakness Rule of Bowler. In Definition 4.6, when B or batting feature of the opponent player (batsmen) corresponds to attacked.

For constructing the strength and weakness association rules, we use the apriori algorithm [123].

The parameters on which the strength of the association ofA =⇒ Bis dependent are - (i)Support is an indication of how frequently A and B appear in the dataset, (ii)Confidence is an indication of how often the rule is true, i.e., the conditional probability of occurrence of B given A, and (iii)Lift is the rise in the probability of having B with the knowledge of A being present over the probability of having B without any knowledge about the presence of A. A lift value greater than 1 signifies high association between A and B. In this work, the support for the analysis is varied from 0.001 to 0.1 and the confidence for the analysis is set at 0.5. The analysis has resulted in some interesting results, giving insights into the player’s strengths and weaknesses.

The results of the strength and weakness analysis for batsman Steve Smith against all bowlers in Test matches are presented in Table. 4.7. The strength rule of Steve Smith is - Smith attacks slow and shot-length deliveries. The weakness rule of Steve Smith is - Smith gets beaten on the good-length and swinging deliveries. Both the strength rule and weakness rule are similar to the rules obtained using the proposed method.

Similarly, we can obtain rules other than strengths and weaknesses by choosing the consequent of the association rule as other batting features such as footwork, shot area, and outcome. We present these rules for batsman Steve Smith in Table. 4.7. Similar strength and weakness analyses can be performed for the bowlers as well. The code and result of ARM analysis for more than 250 players are provided in https://www.dropbox.com/sh/dq981ub7gdh3n04/AADIhE6cph8sVxXgL- e6Bt6ba?dl=0.

Association Rule (A =⇒ B) Support(%) Confidence(%) Lift

{shortlength, slow} =⇒ {attacked} 2.6 72.1 1.7

{goodlength, swing} =⇒ {beaten} 0.1 37.9 3.9

{offstump} =⇒ {defended} 26.8 59.8 1.3

{fast, shortlength} =⇒ {backfoot} 9.2 91.4 1.9

{fulllength, offstump} =⇒ {frontfoot} 8.5 81.9 1.6

{fast, offstump} =⇒ {0run} 22.8 82.2 1.2

{fast, offstump} =⇒ {squareoff} 11.2 50.4 1.4

{legstump, slow} =⇒ {squareleg} 17.8 86.9 2.3

{legstump, movein, spin} =⇒ {fineleg} 0.01 100 23.7

Table 4.7: Identified Strength and Weakness Association Rules for Batsman Steve Smith.

(a) Batting Analysis. (b) Bowling Analysis.

Figure 4.10: Visualization of Player’s Strength and Weakness Rules.