September 2016
Rule Generation from NIS in SQL (No.5)
A Case: Congressional Voting Records data set with missing values in UCI (Experiment environment: Windows desktop PC 3.3GHz)
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
(1)
Congressional Voting Records set consists of the following.
435 objects, 17 attributes:
1. Class Name: 2 (democrat, republican) (Decision attribute) 2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n) 5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)
7. religious-groups-in-schools: 2 (y,n) 8. anti-satellite-test-ban: 2 (y,n) 9. aid-to-nicaraguan-contras: 2 (y,n) 10. mx-missile: 2 (y,n)
11. immigration: 2 (y,n)
12. synfuels-corporation-cutback: 2 (y,n) 13. education-spending: 2 (y,n)
14. superfund-right-to-sue: 2 (y,n) 15. crime: 2 (y,n)
16. duty-free-exports: 2 (y,n)
17. export-administration-act-south-africa: 2 (y,n) 288 missing values
Each attribute value consists of yes or no.
There are 2^288 = (10^3)^30 = 10^90 derived DISs.
392 missing values (The obtained rules are not affected by this revision at all.)
There are 2^392 = (10^3)^40 > 10^100 derived DISs
(2) CSV format: Table 1 in nrdf(congress_test)
We store the table in the following at first. The symbol ‘?’ means a missing value.
We replace each ‘?’ with a set of all possible values, namely non-deterministic information.
(3) NRDF format: nrdf
.
(4) Rule generation by NIS-Apriori in SQL
The procedure NIS-Apriori in SQL generates rules in the following:
Certain rule: a rule in each of more than 10100 derived DISs Possible rule: a rule in at least one derived DIS
defined by support>=ALPHA and accuracy>=BETA.
Step 1: con_{1}=>decision
Step 2: con_{1}&con_{2}=>decision
Step 3: con_{1}&con_{2}&con_{3}=>decision
(A) In the following step1(‘a1’,435,0.3,0.6), step2(‘a1’,435,0.3,0.6), step3(‘a1’,435,0.3,0.6), we specified that the decision attribute is ‘a1’, 435 objects, support>=0.3, and accuracy>=0.6.
(5) Obtained certain rules
(6) Obtained possible rules