Validation - Mining Strength and Weakness Rules of Cricket Players

Mining Strength and Weakness Rules of Cricket Players

4.5 Validation

Feature Fast Bowlers Spin Bowlers

Response Attacks full-length deliveries Attacks short-length deliveries

Gets beaten on moving-in deliveries Gets beaten on moving-away deliveries Outcome Gets out on moving-in deliveries Gets out on moving-away deliveries Footwork Plays moving-in deliveries on the back

foot

Plays moving-in deliveries on the back foot

Shot-area Plays full-length deliveries to the long- on and long-off areas of the field

Plays full-length deliveries to the long- on and long-off areas of the field

Table 4.4: Rules Obtained from Steve Smith’s Batting Analysis Against Fast Bowling and Spin Bowling.

Figure 4.8: A Section of the Strategy Sheet.

Batsman Commonality Percentage (%)

Angelo Mathews 80.00

Asela Gunaratne 75.00

Dhananjaya De Silva 75.00

Kusal Mendis 75.00

Niroshan Dickwella 66.66

Dilruwan Perera 60.00

Dimuth Karunaratne 50.00

Upul Tharanga 50.00

Table 4.5: Extrinsic Validation Using Strategy Sheet.

We obtain the strength and weakness rules for all the mentioned batsmen using the proposed method. Refer to Table 4.5 which lists the overlap (Commonality Percentage) of the obtained rules with the rules listed in the strategy sheet. We observe that forAngelo Mathews, the overlap is 80%.

For other batsmen, also we see a high degree of overlap.

Expert Analysis

For expert analysis, the rules identified by using the proposed method are verified against expert sources. One video blog from a domain expert in which the expert shares the strength and weakness rules of Steve Smith is identified and its analysis is presented below. Sanjay Manjrekar is a former Indian cricket player with significant domain experience and expertise. He is a regular television commentator for international cricket matches. He has published a video²with ESPNCricInfo titled -“What is Steve Smith’s weakness?" on 1^st June, 2017. In this video, the author has provided one strength rule and one weakness rule. The strength and weakness rules are extracted from the video.

These rules are then compared with the ones obtained from the proposed method. Following is a transcript of selective parts of the video, along with the rules obtained by the proposed method.

1. Weakness Rule Validation:

• Proposed Method: “Steve Smith attacks deliveries that are bowled on themiddle stump".

• Expert Analysis (Video time0.22⁰⁰): “Bowlers tend to attack him on the stump (middle stump line), but then his wonderful angle of the bat carves everything on the leg side.

2https://www.espncricinfo.com/video/what-is-steven-smith-s-weakness-1100538

Batsman ∆²₁₂

2013 ∆²₁₂

2014 ∆²₁₂

2015 ∆²₁₂

2016 ∆²₁₂

2017 ∆²₁₂

2018 ∆²₁₂_avg

Joe Root 0.09 0.18 0.12 0.13 0.17 0.19 0.15

Cheteshwar Pujara 0.15 0.14 0.37 0.10 0.16 0.18 0.18

Steve Smith 0.35 0.26 0.25 0.18 0.21 0.39 0.27

Dimuth Karunaratne 0.68 0.28 0.15 0.25 0.20 0.30 0.31

Dean Elgar 0.76 0.58 0.17 0.19 0.14 0.21 0.34

Virat Kohli 0.18 0.71 0.24 0.28 0.41 0.20 0.34

Kane Williamson 0.28 0.16 0.38 0.60 0.42 0.42 0.38

David Warner 0.65 0.47 0.38 0.43 0.19 0.51 0.44

Table 4.6: Intrinsic Validation (K-fold Cross Validation).

Gets lot of runs on the leg side". In other words, Steve Smith scores lot of runs on the leg side for balls bowled on the middle stump.

2. Strength Rule Validation:

• Proposed Method: “Steve Smith gets beaten on the move-away and move-in (seam) deliveries".

• Expert Analysis (Video time 0.49⁰⁰): “Bowl seamers as much as possible".

Both the strength and weakness rules provided by the expert are validated with the proposed method.

4.5.2 Intrinsic Validation

For intrinsic validation, we use the k-fold cross validation strategy, in which text commentary data is divided into k (number of years) subsets, and the holdout method is repeated k times. Each time, one of the k subsets (one year of text commentary data) is used as a test set with the remaining data as a training set. Finally, the average error across all k trials is computed.

The collected text commentaries span over thirteen years, from May 2006 to April 2019. As the data for the year 2019 is incomplete, we have not considered 2019 for the intrinsic validation.

The top-ranked batsmen in Test cricket are considered for intrinsic validation. Most of the batsmen in this list started playing Test cricket regularly from the year 2013. Hence, for intrinsic validation, six years (2013 to 2018) are considered. Each time, one year of text commentaries are used as a test set and remaining text commentaries as a training set. For example, if text commentaries for 2013 are selected as a test set, then the remaining text commentaries (from 2014 to 2018) are selected as the training set. This method is repeated for all six years.

The proposed rule mining method is applied to the training data, and a training-biplot is obtained. Similarly, the proposed rule mining method is applied to the test data, and a test-biplot is obtained. Procrustes analysis [122] is used to compare these two biplots. The main idea is to minimize the sum-of-squared differences between the two biplots. This test performs the least

square superimposition of one biplot to another reference biplot. The lower the sum of the squared residual, the greater the similarity between the two biplots. The biplots obtained from the training data and test data are compared using the procrustes test. When there is a similarity in the obtained biplots, it is considered that the proposed method is reliable.

Table 4.6 presents the sum of squared residuals for each year (∆²₁₂

year) and average sum of squared residuals (∆²₁₂

avg) for eight batsmen. The lowest average sum of the squared residual is 0.15 for batsman Joe Root, implying his training and test data sets have the highest similarity.

The highest average sum of the squared residual is 0.44 for batsman David Warner, implying his training and test data sets have the highest dissimilarity.

In both extrinsic and intrinsic validation, we obtain high values in terms of the derived rules suggesting the accuracy of the proposed method in mining individual player’s strength and weakness rules. The data and results generated during the validation process can be accessed at https:

//www.dropbox.com/sh/sa4xmz3pqf6y9np/AABf5TzE1cbU_RgrFj7pxG3pa?dl=0.

Dalam dokumen Learning Player-specific Strategies Using Cricket Text Commentary (Halaman 79-82)