• Tidak ada hasil yang ditemukan

Encoding of Sensing Data for Effective Data Stream Mining

N/A
N/A
Protected

Academic year: 2019

Membagikan "Encoding of Sensing Data for Effective Data Stream Mining"

Copied!
6
0
0

Teks penuh

(1)

Encoding of Sensing Data for Effective Data Stream

Mining

Dong-Hyok Suh1, Young-Oh Han

1Department of Multimedia Communication, Far East University,

Eumseong, Chungbuk, 369-700, Korea Department of Electronics, Namseoul University

Cheonan city, South Korea

1{hanhwaco@naver.com},{youngoh@nsu.ac.kr}

Abstract : This study proposed an encoding scheme of sensing data for efficient data stream mining the environment which takes into account the frequency analysis and association rule mining of data stream. In this research, sensing data was encoded in the ways of presenting the features of the events detected by sensors and of including the changes of the sensing data stream. Therefore, it was found that the proposed scheme reached the point of reasoning a situation which will be detected and recognized through sensing of multiple sensors.

Keywords : Stream Data mining, Sensor data processing, Context inference

1.

Introduction

Sensors detect the information about an object with a sensor and about the surroundings of the object, and report it to a host. By adopting heterogeneous multiple sensors, they can make the information become higher quality. Sensors do sensing work at a fixed interval and report the sensing information to a host. In this case, they have the criteria as to whether signals are reported or not, and filter data in accordance with the criteria. The process is called data filtering. Sensors compare the value detected at each time zone with certain criteria to see if there is a noticeable increase or decrease. And if so, they report to a host. Something detected and reported by a sensor is called event information. The sensor data containing the event information is not ordinary and has the change to see.

As for data stream mining, frequency analysis and association rule mining have mainly been conducted. In those cases, if the data from a sensor is described as the absolute value measured by a sensor, several cumbersome calculation steps are required. Accordingly, to analyze data stream, it is necessary to convert the signals or data from a sensor into symbols or numbers.

There are two reasons for encoding the data from a sensor. First, the sensing data

(2)

transmitted from a sensor is a continuous value and constantly grows in size. So it is necessary to convert them into simpler symbols or numbers to be processed.

The second reason for encoding is to analyze data stream efficiently. Given association rule mining and semantic analysis, the way of converting the sensing values reasonably can be useful.

This study proposed an encoding scheme of sensing data for efficient data stream mining in the environment which takes into account the frequency analysis and association rule mining of data stream.

This thesis is comprised of the following: in chapter 2 relevant studies are described; in chapter 3 an encoding scheme of sensing data for efficient data stream mining is proposed; in chapter 4 the proposed scheme is evaluated and verified, and the result is described; and in chapter 5 the conclusion of this study is described.

2

Previous Studies

Data stream analysis has actively been conducted in various fields. In particular, with a rising necessity of big data application, research on sensing data, a sort of big data, has drawn more attention.

The typical research subjects of data stream analysis are 'frequency analysis' and 'association rule'. Frequency analysis focuses on recognition of the frequency of events among all data. The scheme introduced in the analysis is sliding window technique in which a certain time interval is set to a window. So the frequency analysis scheme recognizes the frequency of events occurring in a relevant window, and moves the window with the lapse of time to recognize the frequency of events in the next time zone.

At the time of mining the conventional database, the analysis of association rules is frequently conducted. The association rule analysis has also been applied actively to data stream. Association rule analysis includes the analysis of recognizing other events which occur in association with specific events and the analysis of other events which occur in serials in a fixed interval.

3 Encoding of Sensing Data for Data Stream Mining

The way of using the sensor data as absolute values causes inconvenience for data stream analysis. The values obtained by multiple sensors have different physical properties. So to process the data from multiple sensors, it is necessary to convert them into symbols or numbers in a different system reflecting the features and properties of the measured values.

The proposed scheme in this thesis reflects only the change of the value detected by a sensor, but presents the value including the type and change of a sensor. As for the change of a sensor, two ways are described: one is to compare with a previous sensing value to present a variation; the other is to set the criteria of changes and compare with the criteria.

(3)

The total set of data stream after n unit time is represented as DS={D1,D2,D3,...,Dn}, and

According to the k number of sensor types, the matrix Dk with k rows and 1 column is created.

A change means the continuous relationship between the numerical values in the direction of positive time for a random Dk matrix.

When sensor data become numerical values, what is basically considered is to present how much measured values change.

The criteria of presenting the change are shown as follows: first, the change is presented as the value drawn from a comparison with the value measured in the previous time zone, or a comparison change value. It is shown in the below formula.

(matrix delta)(relative n) = Dn-Dn-1

Secondly, the specific criteria by sensor are set up, and the absolute change drawn from comparison with the criteria is presented for the change. It is shown in the below formula.

The absolute change for the threshold omega is: (matrix delta)(absolute n) = Dn-(omega)

Such a change is different depending on sensors. The physical property of the measured value of each sensor is different. The change measured by each sensor is also different. The changes measured by sensors need to be normalized.

The normalization scheme is displayed as follows: i(m,1)n = |σ(d(m,1)n)|

(σ(x) is sigmoid function, and its median value complies with threshold omega. The result value is one within -1<=σ(x)<=1. The range of the positive m is equal to 1<=m<=k).

Here d(m,1) means the element of the row m and column 1 of matrix delta. The elements of I = i(k,1) matrix are the normalized changes of each sensor. Here, sensors are heterogeneous ones with different functions. So for the analysis of data stream, it is very useful to present the source sensor of each sensor data. The positional information on sensors used for measurement is very essential to situational reasoning.

The scheme of presenting the sensor data including the information about the change of a sensor and about the measurement position of sensors is proposed as follows:

In (=i(k,2), in the row 2, the sensor types of the data values in the row 1 are described; n in the matrix means that the reporting time point is n unit time.)

To sum up the proposed scheme, the sensor data includes the change of sensors, the type of sensors, and the measurement position of sensors.

4

Evaluation

To prove the suitability of the proposed scheme, it was evaluated in the following two aspects:

(4)

BPA can be calculated. Based on BPA, it is possible to calculate each focal element of DST, or the number of cases of situational factors comprised on the basis of sensor types, and the reliability of each situational factor. Temperature, humidity and illumination sensors were adopted, and the data reported by sensors via a sink node were recorded as shown in the table 1.

Table.1 Time-based measurement values detected and reported by temperature, humidity and illumination sensors

Time(s) Temperation(℃) Humidity(%) Ilumination(Lx)

1 23.5 65.4 105

2 23.5 65.4 104

3 23.6 65.3 105

4 23.7 65.3 106

5 23.6 65.4 106

6 23.6 65.4 105

7 23,6 65.3 106

8 23.5 65.5 106

9 23.6 65.3 106

10 23.7 65.3 105

11 24.0 64.8 251

12 24.5 63.5 178

13 24.7 63.2 106

14 25.2 62.1 98

15 26.8 62.1 105

The sensor data recorded in the table 1 were converted into the values in the table 2.

Table.2 The result from conversion of the measurement values detected and reported by temperature, humidity, and illumination sensors

Time(s) Temperation(℃) Humidity(%) Ilumination(Lx)

(5)

12 [0.5, 0.5] [1.3, 1.5] [73.0, 68.0] 13 [0.2, 0.7] [0.3, 1.8] [72.0, 4.0] 14 [0.5, 1.2] [1.1, 2.9] [8.0, 12.0] 15 [1.6, 2.8] [0.0, 2.9] [7.0, 5.0]

Based on the converted values, the entire power set was comprised. Each element of the set is called a focal element which is shown in the table 3. Each BPA calculated with the changes was recorded in the table 4.

Table 3. Focal elements and Power set

Focal element

H1 H2 H3 H1∩H2 H1∩H3 H2∩H3 H1∩H2∩H3

Tempe ration

Humd ity

Ilumi nation

Temp.+ Humd.

Temp.+ Ilum.

Humd.+I Ilum.

Temp.+Humd. + Ilum

Table 4. BPA by focal element

Focal

element H1 H2 H3 H1∩H2 H1∩H3 H2∩H3 H1∩H2∩H3

BPA 0.132 0.223 0.049 0.227 0.161 0.021 0.187

When each BPA by focal element is calculated, DST is applied to calculate each reliability by focal element. Focal elements mean situational factors that affect an event or a situation which can occur. By using the reliability of each focal element, it is possible to reason a situation. The table 5 shows the reliability of each focal element.

Table 5. reliability of each focal element

m bel cmn pl

H1∪H2∪H3 0.187 1.000 0.187 1.000

H1∪H2 0.227 0.582 0.414 0.951

H1∪H3 0.161 0.343 0.348 0.777

H2∪H3 0.021 0.293 0.207 0.868

H1 0.132 0.132 0.707 0.707

H2 0.223 0.223 0.657 0.657

H3 0.049 0.049 0.418 0.418

(6)

highest reliability. It indicates that among many factors affecting a situation or situational factors, the fourth factor gives the strongest influence on the situation.

In addition, the reliability of the fourth factor increases with the lapse of time. It means that the situation affected by the factor constantly spreads and that the causes affecting the situation constantly work. In other words, it indicates that an event gets worse.

5

Conclusion

This study proposed an encoding scheme for various sensor data being reported constantly. The encoding scheme for the constantly reported and transmitted sensor data makes possible more accurate analysis than frequent pattern scheme and association rule scheme do. A sensor aims to recognize the high-quality information about the state and activity of its monitoring object from a long distance. Through the encoding scheme for sensor data, this study suggested that it would be necessary to encode sensor data in a reasonable and valid way in order to perform situational reasoning and extract a variety of high-quality information even in the data stream environment where the continuously transmitted massive sensor data are analyzed one time. In the future, it is necessary to study more detailed schemes to recognize situations in the data stream environment.

References

1. Jong-Hwan, K.,: Ubiquitous Robot, http://rit.kaist.ac.kr, Research topics Intelligent Robotics, December(2009)

2. Garofalakis, M., Gehrke,J., and Rastogi, R.,: Querying and mining data streams: you only get one look a tutorial. In the tutorial notes of the 28th Int'l Conference on Very Large Databases, Hong Kong, China, pp. 635- 635, Aug(2002)

3. Joong Hyuk, C.,and Won Suk, L.,: Finding recent frequent itemsets adaptively over online data streams: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 487-492, August( 2003)

4. Shin,S.J., and Lee, W.S.,: On-line generation association rules over data streams: Proceeding of the Information and Software Technology Journal, Vol. 50, Issue 6, pp. 569-578, June(2007)

5. Deligiannakis, A., Kotidis,Y., and Roussopoulos,:Hierarchical In-Network Data Aggregation with Quality Guarantees: LNCS(EDBT 2004),(2004)

6. Chen,G., Wu,X., and Zhu, X.: Mining Sequential Patterns Across Data Streams: Univ. of Vermont Computer Science Technical Report(CS-05-04)(2005)

7. Yang, L.H., Lee, M.L.,and Hsu,W.: Finding Hot Query Patterns over An XQuery Stream: VLDB Journal Special Issue on Data Stream Processing(2004)

8. Hwang,J.H., and Gu, M.S.,: Finding Frequent Structures in XML Stream Data: Computational Science and Its Applications, ICCSA(2009)

9. Hsieh, M.C., Wu, Y.H., and Chen, A.H.,: Discovering Frequent Tree Patterns over Data Stream: In Proc of SIAM( 2006)

10. Leung, C.K.S., Khan, Q.I., and Hoque,T.: CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Pattern Sets: In proc. ICDM( 2005)

Gambar

Table 5.  reliability of each focal element

Referensi

Dokumen terkait

Penulis mengajak kepada kita, para aktivis mahasiswa untuk merefleksikan kembali hakekat dari pergerakan mahasiswa.. Baca: Bagian

Sebelum diundangkannya PBI No. 8/4/PBI/2006 Tentang Pelaksanaan Good Corporate Governance bagi Bank Umum, ada beberapa prinsip GCG yang diharapkan diterapkan di dunia perbankann

Alasa e gapa Ka poe g Wisata Ta a Lele layak untuk di redesain kembali adalah kare a se ara ko tekstual, Ka poe g Wisata Ta a Lele e iliki a yak potensi-potensi yang

[r]

Subjek Retribusi Izin Gangguan adalah orang pribadi atau badan yang memperoleh izin. gangguan dari

Dan dalam hal ini organisme tanah perombak bahan organik memegang peranan yang sangat penting karena sisa organik yang telah mati diurai menjadi unsur-unsur yang dikembalikan ke

Pada Proses pembuatan aplikasi ini perangkat lunak yang digunakan adalah XAMPP yang merupakan gabungan dari Apache Web Server, PHP, dan MySQL, Sublime Text 2, Browser

Simpulan dari penelitian ini antara lain: 1) Terdapat pengaruh Locus Of Control internal terhadap literasi keuangan mahasiswa Fakultas Ekonomi di Universitas Negeri