• Tidak ada hasil yang ditemukan

M02107

N/A
N/A
Protected

Academic year: 2017

Membagikan " M02107"

Copied!
9
0
0

Teks penuh

(1)

AN X M L ALGEBRA FOR

ON LI N E PROCESSI N G OF

X M L D OCUM EN TS

By : H a n dok o a n d Ja n u sz R. Ge t t a

Ou t lin e

v

Int roduct ion

v

Relat ed Work

§ XML Data Model & Algebra exists

v

XM L St ruct ures

v

XM L Algebra

v

Conclusion & Fut ure w orks

I n t r odu ct ion - M ot iv a t ion

v Since XM L document accept ed as a st andard in Informat ion Syst em, online processing of semist ruct ured dat a becomes more import ant . v Online processing applies online algorit hm w hich

process dat a piece by piece.

v The performance of semist ruct ured dat a processing is affect ed by some aspect s:

§ Semistructured data has more complex structure than columns and rows.

§ Data model exists requires data to be completed before can be processed, while online processing needs to process data piece-by-piece.

§ In some of data model exists, semistructured data is treated as relational-type data model:

• Evaluations of XML Stream are in tuple-based approach. • Unnestand nestoperations have to be employed. • Requires more resources of CPU and memory

3

An Ex a m ple of On lin e Pr oce ssin g - On lin e I n t e gr a t ion

APPLI CATI ONS D AT A

SOUR CES D AT A SOUR CES

PREVI OUS RESULT RECENT DATA

INCLUDES:

1. Fragmentation Handling 2. XML Algebra

3. Execution Plan Algorithms 4. Scheduling Algorithms

(2)

Re la t e d W or k -

X M L D a t a M ode l & Alge br a

v

XM L Algebra (XAL) [4]

§ XML as rooted connected directed graph cyclic or acyclic

§ Vertices represent elements, edges represent simple values

§ Has three groups of operators:

Extractionoperators

– Projection, Selection, distinct, join, sort, product

MetaOperators

– Map, Kleene Star

• Construction operator

– Create vertex, create edge, copy

5

Lit e r a t u r e Re v ie w -

X M L D a t a M ode l & Alge br a

v

XAnsw er [6]

§ Algebra in relational-like data structure

§ Uses data structure called Envelope <he|be|re>

§ Header heis unordered set of attribute (A), body beis a set of pair (A,v) where vis value, and rerepresents result

§ XAnswer provides Unary operators (Function Execution,

selection, projection, sort, index, nest, unnest, and duplicate) and Binary operators such as union, cross product, and left outer joinoperators.

§ Union operator in XAnswer does not remove duplicates.

§ XAnswer also provides left-outer-join operation instead of expressing it using selection, cross product and union operators.

Lit e r a t u r e Re v ie w -

X M L D a t a M ode l & Alge br a

v

TAX (Tree Algebra for XM L)[5]

§ Represents the document in an ordered labeled tree. Every XML element will be represented as a node which has:

tagattribute: single-valued attribute which indicates the type of element;

contentattribute: representing atomic value which can be any of atomic types;

pedigreeattributes: carry the information of element's predecessor which will be very useful for manipulation and comparison.

§ TAX proposed an idea of tree pattern (TP) but represents a very different concept from classical relational algebra.

§ TAX provides Selection, Projection, Product, Grouping,

Aggregation, Renaming, Reordering, Copy and Paste, Value Updates, Node Deletion and Node Insertion operation,and some set operators.

7

X M L D a t a M ode l –

Ex t e n de d Tr e e Gr a m m a r

v

A st ruct ure of an XM L document is a pair <ETG, EIG> w here ETG(Ext ended Tree Grammar)
(3)

Ex a m ple of ETG a n d it s se n t e n ce

9

X M L D a t a M ode l –

Ex t e n de d I n st a n ce Gr a m m a r

Applying EI G t o ETG sent ence

v

When w e get any t erminal symbol library(...) t hen w e apply it t o product ion rule w hich mat ch t o t he t erminal symbol. library(x)<library>x</library>

v

x in st ep 1 consist s of nodes books and aut hors at t he same level, so w e need t o apply t he

corresponding product ion rules

books(x)<books>x</books> and authors(x)<authors>x</authors>

Our inst ance XM L document w ill become:

<library>

<books>x</books><authors>x<authors>

</library>

11

v

As reaching a t erminal symbol w hich has no inner st ruct ure (t erminal w hich is not follow ed by

opening curly bracket ), w e t ranslat e it int o relat ed product ion rule. For example

title<title>#PCDATA</title> w ill be t ranslat ed

int o: <title>Basic XML</title>

v

Terminal symbol w it h no inner st ruct ure and follow ed by square bracket ([) w ill be t ranslat ed using product ion rules defined. For example
(4)

X M L D a t a M ode l – I n de x e d ETG

v

Then w e ext end ETG t o accommodat e recursive st ruct ures as:

13

X M L Alge br a

v

XM L algebra in t his syst em consist s of: § Basic operations

• Restructuring Operation (π) • Filtering Operation (σ) • Cross Product operation (×) § Set Operations (∪,∩, and -)

§ Derived Operations (join, semijoin, and antijoin)

X M L Alge br a - Su b Gr a m m a r

v

Let Giand Gjbe ETGs. Let Gjincludes t he follow ing produc on rules Y

ty{ry(X)}, X

tx{rx(Z)}, Z

tz{rz} w here ry(X) is a regular expression t hat includes a non-t erminal symbol X and rx(Z) is a regular

expression t hat includes a non-t erminal symbol Z.

v

We say t hat Giis a sub-grammar of Gjw hen Gican be obt ained from Gjby t he applicat ion of t he follow ing t ransformat ion rules:

15

X M L Alge br a – Su b gr a m m a r

v

Transformat ion of document st ruct ure:
(5)

X M L Alge br a - Re st r u ct u r in g

v

Rest ruct uring operat ion can be defined as:

17

Alg or it h m f or Re st r u ct u r in g

X M L Alge br a - Filt e r in g

v

Filt ering operat ion can be defined as:

Definit ion 8. Filt ering is an unary operat or denot ed as

σ

ϕ(D) = {R1,R2, …,RD: Ri

X, w here D is a set of docum ent s,

ϕ

is a t riple <P,min,max>, P is a valid pat h, min is t he minimum occurrence of P (default -1), and max is t he maximum occurrence of P

(default -1).

v

To different iat e bet w een preserve and remove semant ics of t he operat or, w e int roduce t he symbols

σ

+and

σ

-19

(6)

X M L Alge br a – Cr oss Pr odu ct

v

Cross product is a binary operat or w hich creat es all possible pairs of t he document s from t w o set s.

Definit ion 9. Cross product is defined as

RxρS = {

ρ

{r s}:r

R

s

S, w here R and S are set s of document ;

ρ

is a valid name of element w hich w ill be t he parent node of every combinat ion of

document s from R and S.

21

Algor it h m for Cr oss Pr odu ct

Tr e e Pa t t e r n r e pr e se n t a t ion

23

N={BOOK,YEAR,AUTHOR}

T={book,year,aut hor}

A={}

S={BOOK}

P={BOOK→book{YEAR AUTHOR+},

YEAR→year,

AUTHOR→aut hor}

Qu e r y a ddit ion

v

Rat her difficult t o do query:

§ retrieve all books which have at most 1 author

(7)

X M L Alge br a – On lin e Pr oce ssin g

v The argument s of t he XM L algebra are set s of document s, and w e assume t hat every increment / decrement of an argument is an XM L document

v For online int egrat ion, consider an int egrat ion as a UNION operat ion, w e should be able t o comput e

increment / decrement dat a (δAi) and int egrat e t he result w it h t he previous one:

e(A1,…,Ai⊕δAi,…,AD) = e(A1,…,Ai,…,AD) ⊕f(A1,…,δAi,…,AD) v Example, applying U over ⊕:

e(R1⊕δ1)UR2= (R1UR2) ⊕ δ1

v fis a funct ion t hat need t o be defined so t hat all operat ors over increment / decrement dat a follow s t he form.

v We found t hat f is a funct ion of eit her ⊕or -.

25

Con clu sion

v

XM L Algebra proposed is consist ent w it h relat ional algebra.

v

It meet s t he need of online processing:

§ It works in tree structure to avoid nestand unnestoperations.

§ It possible to find a function to process increment data

Re fe r e n ce s

[1] C. Beeri and Y. Tzaban. SAL: An algebra for semist ruct ured dat a and XM L. In Informal Proc. Of Workshop on The Web and Dat abases, ACM SIGM OD, pages 37{42. ACM Press, 1999.

[2] S. Bose, L. Fegaras, D. Levine, and V. Chaluvadi. A query algebra for fragment ed XM L st ream dat a. In Proceeding of 9t h Int ernat ional Conference on Dat a Base Programming Languages (DBPL), pages 275{277, Pot sdam, Germany, Sept ember 6-8 2003.

[3] G. Burat t i. A M odel and an Algebra for Semi-St ruct ured and Full-Text Queries. PhD t hesis, Informat ica, Universit a di Bologna, Padova, 2007.

[4] F. Frasincar, G.-J. Houben, and C. Pau. XAL: an algebra for XM L query opt imizat ion. Aust . Comput . Sci. Commun., 24(2):49{56, January 2002.

[5] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivast ava, and K. Thompson. TAX: A t ree algebra for XM L. In In Proc. DBPL Conf, pages 149{164, 2001.

[6] M . Lukichev, B. Novikov, and P. M ehra. An XM L-algebra for eficient set -at -a-t ime execu-a-t ion. ComSIS, 9(1):64{80, January 2012.

[7] M . M urat a, D. Lee, M . M ani, and K. Kaw aguchi. Taxonomy of XM L schema languages using formal language t heory. ACM Trans. Int ernet Technol., 5(4):660{704, Nov. 2005.

(8)

Re se a r ch Pr ogr e ss

v

A st ruct ure of an XM L document can be formally defined by a Regular Tree Grammar

29

Re se a r ch Pr ogr e ss

v

We int roduce a grammar for creat ing inst ance XM L document w hich is called Inst ance Grammar (IG). IG is a cont ext sensit ive grammar w hich is

t ransformat ion of regular t ree grammar sent ences int o inst ances of XM L document .

X M L D a t a M ode l –

Re gu la r Tr e e Gr a m m a r

v

A st ruct ure of an XM L document can be formally defined by an RTG (Regular Tree Grammar).

31

X M L D a t a M ode l – I n st a n ce Gr a m m a r

(9)

Referensi

Garis besar

Dokumen terkait

Simpulan penelitian ini adalah melalui penerapan permainan kooperatif dapat meningkatkan kemampuan berhitung pada anak kelompok B TK Atraktif Widya Putra DWP

bahwa tugas akhir yang berjudul “Perancangan Sistem Laporan Hasil. Tagihan PT Purinusa Ekapersada Bawen – Semarang” dapat

Untuk mengetahui dan menganalisa pengaruh variabel kepuasan konsumen dengan variabel loyalitas konsumen handphone Nokia pada mahasiswa Fakultas Ekonomi Universitas

Tak hanya itu saja layanan yang diberikan, karena pada tahun 2011 DinoMarket.com memberikan pelayanan yang lebih untuk pelanggannya seperti pengiriman yang cepat terutama

Based on the errors found in the sample of student’s writing text, POWER strategy is one of a ways that can be adapted by teachers in teaching writing. The POWER

Perencanaan proses pembelajaran meliputi silabus dan rencana pelaksanaan pembelajaran (RPP) yang memuat identitas mata pelajaran, standar kompetensi (SK), kompetensi dasar

Efektivitas Konseling Keterampilan Hidup untuk meningkatkan keerampilan mengelola stres siswa .... v Firman Ratna Nur

Rokhim and Wulandary (2012) with their panel data regression found that the implementation of the deposit guarantee does not afect the level of commercial bank deposits in