PDF Rule Generation from NIS in SQL (No.5) - 九州工業大学

(1)

September 2016

Rule Generation from NIS in SQL (No.5)

A Case: Congressional Voting Records data set with missing values in UCI (Experiment environment: Windows desktop PC 3.3GHz)

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

(1)

Congressional Voting Records set consists of the following.

435 objects, 17 attributes:

1. Class Name: 2 (democrat, republican) (Decision attribute) 2. handicapped-infants: 2 (y,n)

3. water-project-cost-sharing: 2 (y,n)

4. adoption-of-the-budget-resolution: 2 (y,n) 5. physician-fee-freeze: 2 (y,n)

6. el-salvador-aid: 2 (y,n)

7. religious-groups-in-schools: 2 (y,n) 8. anti-satellite-test-ban: 2 (y,n) 9. aid-to-nicaraguan-contras: 2 (y,n) 10. mx-missile: 2 (y,n)

11. immigration: 2 (y,n)

12. synfuels-corporation-cutback: 2 (y,n) 13. education-spending: 2 (y,n)

14. superfund-right-to-sue: 2 (y,n) 15. crime: 2 (y,n)

16. duty-free-exports: 2 (y,n)

17. export-administration-act-south-africa: 2 (y,n) 288 missing values

Each attribute value consists of yes or no.

There are 2^288 = (10^3)^30 = 10^90 derived DISs.

392 missing values (The obtained rules are not affected by this revision at all.)

There are 2^392 = (10^3)^40 > 10^100 derived DISs

(2)

(2) CSV format: Table 1 in nrdf(congress_test)

We store the table in the following at first. The symbol ‘?’ means a missing value.

We replace each ‘?’ with a set of all possible values, namely non-deterministic information.

(3) NRDF format: nrdf

.

(3)

(4) Rule generation by NIS-Apriori in SQL

The procedure NIS-Apriori in SQL generates rules in the following:

Certain rule: a rule in each of more than 10¹⁰⁰derived DISs Possible rule: a rule in at least one derived DIS

defined by support>=ALPHA and accuracy>=BETA.

Step 1: con_{1}=>decision

Step 2: con_{1}&con_{2}=>decision

Step 3: con_{1}&con_{2}&con_{3}=>decision

(A) In the following step1(‘a1’,435,0.3,0.6), step2(‘a1’,435,0.3,0.6), step3(‘a1’,435,0.3,0.6), we specified that the decision attribute is ‘a1’, 435 objects, support>=0.3, and accuracy>=0.6.

(4)

(5) Obtained certain rules

(5)

(6) Obtained possible rules