• Tidak ada hasil yang ditemukan

PPT Extracting and Exploring the GeoTemporal Semantics of Textual Information

N/A
N/A
Protected

Academic year: 2023

Membagikan "PPT Extracting and Exploring the GeoTemporal Semantics of Textual Information"

Copied!
15
0
0

Teks penuh

(1)

Complex Data Transformations in Digital Libraries

with Spatio-Temporal Information

B. Martins, N. Freire, J. Borbinha

Instituto Superior Técnico, Technical University of Lisbon

2008 International Conference on Asia-Pacific Digital Libraries

(2)

Introduction and Motivation

The DIGMAP project addressed the development of a digital library for materials related to old maps

Collecting metadata from different providers (e.g. OAI-PMH servers)

Processing the metadata and enriching it with inferred spatio-temporal information

Challenges in handling heterogeneous metadata

Transforming the original sources into the DIGMAP format (i.e., TEL profile)

Dealing with data inconsistency, non-uniformity, incorrectness and incompleteness Handling the spatio-temporal information (e.g. dates and geospatial coordinates)

Challenges in DIGMAP service interoperability

Using the results from DIGMAP services to enrich the metadata

DIGMAP required appropriate XML processing technology

for dealing with the above challenges

(3)

The Proposed Solution

Use XML processing languages like XSLT and XQuery

Extend the XPath 2.0 function library

Functions for managing geospatial informationFunctions for managing temporal informationFunctions for text processing

Other miscellaneous functions

All the advantages of declarative languages like XSLT and XQuery, together with powerful methods for

handling complex transformations

(4)

Outline

Introduction

Proposed Extensions to the XPath Function Library

Implementation Issues

Test Cases Within the DIGMAP Project

Conclusions and Future Work

(5)

The Proposed Extensions

Extensions for geospatial data handling

Combining spatial elements according to a geospatial predicates such as distance or intersection Input given in GML, KML or textual strings with geospatial coordinates

Extensions for temporal reasoning

Combining temporal information according to the predicates of Allen’s Algebra for temporal intervals Input given in GML or string encodings (e.g. the ISO 8601 formats)

Extensions for text mining

Keyword matching and textual similarity

Standard text mining operations (e.g. language recognition)

Other miscellaneous extensions

Handling JDBC calls and calls to external Web services

(6)

Geospatial Data Handling

Operators for performing geospatial analysis based on the OGC Simple Features and Filter Encoding specifications

Distance, union, intersection or difference between two geometriesValidity of a given spatial filter

Check if two geometries are spatially related (e.g. containment or overlap)Check if two geometries fall bellow a given distance threshold

Area, length, buffer, centroid, boundary or envelope of a geometryGeometric computations (e.g. translation or scaling) over a geometryConversion between GML, KML, C-Square, Geohash or WKT encodingsTransformations on the coordinate systems used in geometries

(7)

Temporal Data Handling

Operators for temporal analysis based on Allen's interval algebra

Distance, union, intersection or difference between temporal intervalsCheck if two intervals are related (e.g. containment or overlap)

Other operators for temporal data handling

Compute lengths for temporal intervals (e.g. return seconds or years)Conversion between GML and string encodings

(8)

Textual Data Handling

Keyword matching and textual similarity

Tokenization and keyword-based search

Phonetic similarity (Soundex and Double Metaphone)

String similarity (e.g. Edit Distance, Jaro, Jaro-Winkler, Q-grams, …)

Standard text mining operations

Language recognition

Keyword extraction (statistically significant keywords)

Named entity recognition (regexp, dictionaries or machine learning)

Text classification (machine learning)

(9)

Miscellaneous Functions

Calling external Web services (REST and SOAP)

Conversion from XML to JavaScript Object Notation (JSON)

Handling Java DataBase Connectivity (JDBC) calls

Reading malformed HTML

Converting MARC formats into XML (MarcXml or MarcXchange)

(10)

Implementation Issues

Proposed extensions implemented on top of SAXON

SAXON is an open source XSLT/XQuery processor Extension functions coded in Java (static methods)

Extension functions called by binding the Java class to a specific namespace SAXON takes care of converting the arguments to make the functions fit

Most extensions are wrappers over existing open-source libraries

GeoTools and Java Topology Suite (JTS) for the geospatial functionsLucene and Nux for keyword matching

SimPack for textual similarity

NGramJ and LingPipe for text mining

MARC4J for metadata crosswalks (i.e. handling MARC formats)Apache AXIS for external Web service calls

(11)

Test Cases Within DIGMAP

Conversion between different metadata standards

Converting UNIMARC, MARC21 and other formats into the DIGMAP format Geospatial coordinates were often given originally in general textual fields DIGMAP currently indexes over 40.000 metadata records from different sources

Wrappers around DIGMAP XML service interfaces

The DIGMAP Gazetteer uses formats like Alexandria DL Gazetteer Service format, KML, geoRSS, … The DIGMAP GeoParser uses formats like SpatialML, geoRSS, OGC GeoParser, …

Converting between the different formats and calling the services for processing the metadata records

Internal development of several DIGMAP services

Data integration within the DIGMAP Gazetteer

Convert different input sources into the Alexandria DL Gazetteer Content Standard Handling duplicates and small corrections to the data

The proposed approach was found to be expressive and computational

performance was within acceptable bounds

(12)

An Example XQuery

An XQuery for reading gazetteer data from an HTML source and convert the data Into the Alexandria DL Gazetteer Content format

(13)

Conclusions

Data transformations in Digital Libraries can be very complex

Standard XML processing technology is often not enough

But simple extensions can add the required extra functionality

We propose using extension functions to the XPath 2.0 library

Declarative syntax of XSLT and XQuery is not affectedExtension functions add the required extra functionality

Used in DIGMAP collection building and service composition

Converting between different metadata formats

Handling the spatio-temporal information included in the metadataCalling DIGMAP services to enrich the metadata records

(14)

Currently Ongoing Work

Implementing a visual interface for encoding the metadata transformations

Visual “pipelines” converted into XQuery instructions

Hide the complexity of the XSLT/XQuery languages from non-expert users

(15)

Thanks for your attention.

www.digmap.eu

http://transform.digmap.eu

Referensi

Dokumen terkait

Payments to areas affected The people of the area being mined, referred to by Woodward 1974: 114 as the ‘local community’ and currently defined in the Land Rights Act as people who

Ambedkar Birth Day 14 April 2016 th 150 Madanmohan, Changual,Kharagpur Local Independence Day 15 August 2016 th 150 Goushala Compound, Kantageria,Benapur Teachers Day 5 September th