• Tidak ada hasil yang ditemukan

Predictive Simulation of Amount of Claims with Zero Truncated Negative Binomial Distribution

Dalam dokumen Advances on (Halaman 79-84)

The First International Conference on Statistical Methods in Engineering, Science, Economy, and Education 1st SESEE-2015Department of Statistics, Yogyakarta September 19-20 2015

Predictive Simulation of Amount of Claims with Zero Truncated

Kurniawan. H., β€œPredictive Simulation of Amount of Claims with Zero Truncated Negative Binomial Distribution”

2. Literature Review

2.1 Modeling of the Claims

To determine the amount of the claim needs to be done modeling that involves elements of the frequency rnumber of visitsutilization of healthservices, and a large element of the cost for each time of the visit. So the modeling of the claimsare multiplying frequency models withcost model (amount of claims model).Modeling development of the claims such as one part model (OPM), two partmodel (TPM), three part model (TRPM), and four part model (FPM) as well as a comparison of these models can be seen in Duan, et.all[2]. In this paper will be focused on Annual Expenditures Model(AEM) proposed by Frees, et.all[3]which is aspecial case of Two partsModel(TPM) whoseimplementation is still to beimproved, especially if the modeling is done not involve frequency data=0.Thus, in this paper described a solution to overcome itis to perform predictive simulation of the claimsusing Zero Truncated Negative Binomial distribution.

2.1.1 Modeling of Frequency

For Frequency Model (model first part) in the AEM used negative binomial regression with the frequency of occurrence 𝑁𝑖 as the dependent variable and x1i as an independent variable vector with the regression coefficients Ξ²i. This modelcan be describedas follows:

Random variables⁑⁑𝑁𝑖 which is the negative binomial distribution with parameters rand p, have the probability density function

π‘ƒπ‘Ÿ(𝑁 = π‘˜|π‘Ÿ, 𝑝) = (π‘˜ + π‘Ÿ βˆ’ 1π‘Ÿ βˆ’ 1 ) π‘π‘Ÿ(1 βˆ’ 𝑝)π‘˜ (1) in Generalized Linear Model (GLM) , negative binomial probability distribution can be expressed within the parameters parameter πœ‡ and πœ…, with πœ‡ =π‘Ÿ(1βˆ’π‘)

𝑝 , and πœ… =1

π‘Ÿ, so that equation (1) can be expressed as follows [1]:

π‘ƒπ‘Ÿ(𝑁 = π‘˜|πœ‡, πœ…) =Ξ“(π‘˜+

1 πœ…) π‘˜!Ξ“(πœ…1)( 1

1+πœ…πœ‡)

1 πœ…( πœ…πœ‡

1+πœ…πœ‡)π‘˜ (2)

From the equation (2) with 𝐸[𝑁] = πœ‡, π‘‰π‘Žπ‘Ÿ(𝑁) = πœ‡(1 + πœ…πœ‡), so that the equation model from negative binomial regression is :

𝑙𝑛(πœ‡π‘–) = 𝑙𝑛(𝑛) + π’™πŸπ’Šβ€²πœ·πŸ (3)

2.1.2 Modeling of Amount of Claims

Models for amount of claims follow OPM, but conditional on the event ri=1, so the response variable being modeled is (π‘Œπ‘–π‘—|𝑁𝑖 β‰₯ 1). Vector of independent variables is x2, the x2 is asubset of x1. The amount of claims models using mixed linear regression model developed byMcCulloch[5], which is expressedby the following equation[3].

ln(π‘Œπ‘–π‘—) = 𝛼𝑖+ π’™πŸπ’Šβ€² 𝜷𝟐+ 𝑁𝑖𝛽𝑁+ πœ€π‘–π‘— (4)

with 𝛼𝑖 = intrafamily correlation atorau Intraclass correlation (ICC); πœ€β‘~𝑁(0, πœŽπœ€2)i.i.d between observations; and 𝛽𝑁⁑~𝑁(0, πœŽπ›½2𝑁)i.i.d between family

2.2 Amount of Claims Prediction with ZTNB

To predict of the health care expenditures or amount of claims, can be derived from equation (4), so that would be obtained:

π‘ŒΜ‚π‘–π‘— = exp⁑(𝛼̂𝑖+ π’™πŸπ’Šβ€² πœ·Μ‚πŸ+ 𝑁𝑖𝛽̂𝑁) (5)

the health care expenditures Predictionfor one year(one period) using AEM is a function of 𝑁𝑖 𝑆̂𝑖(𝑁𝑖) = π‘ŒΜ‚π‘–π‘—π‘π‘– (6)

However, this prediction can not be done at the beginning of the year because Ni is not unknown for one year of observation. To overcome these problemsit must find the expected value of Ni from amount of claims for one year (a period) showed the following results

𝑆̂𝑖 = exp(𝛼̂𝑖+ π’™πŸπ’Šβ€² πœ·Μ‚πŸ) . 𝑀𝑁′𝑖(𝛽̂𝑁) (8)

Kurniawan. H., β€œPredictive Simulation of Amount of Claims with Zero Truncated Negative Binomial Distribution”

ISBN : 978-602-99849-2-7

Copyright Β© 2015by Advances on Science and Technology

𝑀𝑁′𝑖(𝛽̂𝑁)⁑known as the first derivative of the moment generating functionNiat the point 𝛽̂𝑁. As the random variableNi is adiscrete random variablewitha negative binomial distribution which classes (a, b, 1), So thatthe first derivative ofthe moment generating functionaboveis:

𝑀𝑁′𝑖(𝛽̂𝑁) = βˆ‘ π‘π‘˜ 𝑖. π‘’π‘π‘–π›½Μ‚π‘π‘ƒπ‘ŸΜ‚π‘–(𝑁𝑖 = π‘˜), π‘˜ = 1,2, … (9)

π‘ƒπ‘ŸΜ‚π‘–(𝑁𝑖 = π‘˜) is the probability obtained from the zero truncated negative binomial distribution[4]. There are problems in calculating probability with zero truncated negative binomial distribution because observational data that should be complete within one year of observation, is precisely not fulfilled. This happens because the new data collected in the current year, so it is unknown how many times a patient will use health care services for one year or the period of observation. Therefore we need the simulation of frequency during the period of observation for zero truncated negative binomial so the claims can be predicted. The process of generating random numbers distributed ZTNB can be seen in the appendix.

3. Material & Methodology

The data used in this research is data claims paid by the insurance companyXYZ in the period of filing and payment of claims in 2012. Data claims paid by the insurance company XYZ is divided into three groups of data. Due to the first data group has fewer exogenous variables then used the data group II and group III.

Tabel 3.1 Observation Data for Each Group

Group I Group II Group III

Number of Participants 16377 7881 858

Participants filing a claim 1418 865 26

Many claims were filed 1844 1305 53

Then performed the cleaning process the data with the following criteria which the participant group I excluded due to differences in field with group II and III, the participants make a claim one or more but not approvedby the company, the participants wer enot recorded the data of age, participants were not recorded the type of illness. Frequency of Claims has a value of0,1,2,3. the composition of the variable frequency ofclaims(N) ieN=0(89.9%) andNβ‰₯1(10.1%).

Tabel 3.2Frequency of Claims

N 0 1 2 3 4 5 6 7 8 24 25 28 Total

Frequency 7854 642 158 43 22 8 5 2 2 1 1 1 8739

Percentage 89.9 7.3 1.8 0.5 0.3 0.1 0.1 0 0 0 0 0 100

Modeling of the claims is modeling based on the frequency of claims, assuming a negative binomial distribution. So that the data collected should be pass of Goodness of fit test and parameter estimation for negative binomial distribution. By using Matlab for the negative binomial distribution with parameters r and Ο€ obtained estimation results r = Ο€ = 0.148438 and 0.489505. in Generalized Linear Model (GLMs), the parameters of the binomial distribution negative is ΞΌ and ΞΊ, where E (N) = ΞΌ = (r (1-Ο€)) / Ο€ and ΞΊ = 1 / r with Var (N) = ΞΌ ( 1 + ΞΊΞΌ). So it can be calculated value E [N] = 0.154709 and Var [N] = 0.316051. by looking at the value of E [N] <Var [N], then the initial assumption that the data Frequency of Claims was proven as negative binomial distribution. But to test this assumption, then tested the Goodness of Fit (GoF) of the negative binomial distribution. GoF test performed using Pearson chi-square test. From the results of the test by removing extreme data obtained p-value = 0.072 which states that the data frequency of claims following the negative binomial distribution.

In addition tothe frequency of claims, large claims also have to meet the criteria of normality, so can meet the model assumption ofa linear mixed modeling. However, because the distribution of the claims are always skewed to the right, then the transformation process logarithm so data

74

Kurniawan. H., β€œPredictive Simulation of Amount of Claims with Zero Truncated Negative Binomial Distribution”

transformation results meet the criteria of normality(p-value = 0.066usingthe Kolmogorov-Smirnov testtest). After all the data fulfill to the modeling assumptions, modeling both frequency and large claims, the next step isto predict the frequency and the prediction of the claims.

4. Results and Discussion

Predictive simulation begins by generating random numbers for the frequency of claims with Zero truncated negative binomial distribution (ZTNB) withk=1,2,3, .... This is because the claims can only be modeled if the claim frequencyof at least once for a predetermined type of disease is stroke, high blood pressure, diabetes and kidney failure.

Tabel 3.3 The Comparative of Amount of Claims between Simulation Result and Real Data

Obs N

hasil simulasi untuk satu tahun kedepan Diprediksi dari data Real

Si_hat Ln_y Y Si

1 1 6,301,100.38 15.62192 6088566 6,088,566.00

2 1 3,686,969.84 14.8437 2796000 2,796,000.00

3 2 17,782,796.29 15.7994 7271000 14,542,000.00

…. …. …. …. …. ….

22 1 4,162,755.40 15.35453 4660000 4,660,000.00

23 1 7,535,073.98 16.03074 9163500 9,163,500.00

24 1 2,400,855.12 13.44445 690000 690,000.00

25 1 5,317,410.86 15.39922 4873000 4,873,000.00

26 2 18,052,021.48 14.4226 1835080 3,670,160.00

Simulations conducted to determine the prediction of amount of the claims for the next year, if the data is observed not until less than one year, therefore the data observed frequency of claims is random. Here are the results of a simulation conducted on the number of participants of the third group of participants who filed at least claims assumed that the data has not been up to one year of observation. From the table above can be seen the results of simulations carried out by generating 10,000 random numbers distributed Z T N Bandreplication processes performed10 times(r =10). The simulation processwill produce better results if done generating random numbers>10000 and replication more reproduced so that it will get better predictionresults.

5. Conclusion

From the above explanation can be concluded as follows:

a. Annual expenditure Model is used if the data is not completewithin one year of observation, so as to determine the prediction of the claims at the end of years of observation required a simulation predictions.

b.

Predictive simulation conducted on data with a frequency β‰₯ 1 claims so that the necessary modifications to the data frequency of claims negative binomial distribution becomes Zero Truncated NegativeBinomial.

References

[1] De Jong. P., Heller, G. Z. Generalized Linier Models for Insurance Data. Cambridge University Press.

2008.

[2] Duan, N. H. , Manning, W. G. Jr., Morris, C. N., Newhouse, J. P. A Comparison of Alternative Models for Demand for Medical Care. Journal of Business & Economic Statistics 1(2): 115–126 (1983).

[3] Frees, E.W., Jie. G., Rosenberg, M. A. Predicting The Frequency and Amount of Health Care Expenditures. North American Actuarial Journal. Vol 15. No3: 377-392 (2011).

Kurniawan. H., β€œPredictive Simulation of Amount of Claims with Zero Truncated Negative Binomial Distribution”

ISBN : 978-602-99849-2-7

Copyright Β© 2015by Advances on Science and Technology

[4] Klugman, S. A., Panjer, H. H., Wilmot, G. E. Loss Models from Data to Decision. John Willey & Sons.

2004.

[5] McCulloch, Charles. E, Shayle R., Searle. Generalized, Linier, and Mixed Models. Jhon Willey and Sons. 2001.

[6] Tse, YiuKuen. Nonlife Actuarial Models: Theory, Methods and Evaluation. Cambridge University Press. 2009.

Appendix

Generating random numbers distributed ZTNB

1. Determine the distribution parameter base (base distribution) of the distribution that will be raised is the negative binomial distribution with parameters (r, Ξ²).

2. Determine the CDF fromTruncated ZeroNegativeBinomial

3. Generate arandom number from distribution U (0,1) method Mixed-congruential method as the random number that will be raised from the distribution ZeroTruncated NegativeBinomial 4. Determine the area of the definition of a random variable X distributed ZeroTruncated

NegativeBinomial

5. Identify the location ofthe value of the random numberU(0.1) in thearea ofthe definition ofa random variable X. and Therandom variable X with distribution Zero Truncated Negative Binomial success fully obtained

76

The First International Conference on Statistical Methods in Engineering, Science, Economy, and Education 1st SESEE-2015Department of Statistics, Yogyakarta September 19-20 2015

Deterministic and Probabilistic Seismic Hazard Risk Analysis in Bantul

Dalam dokumen Advances on (Halaman 79-84)