• Tidak ada hasil yang ditemukan

SILForum2009 003.

N/A
N/A
Protected

Academic year: 2017

Membagikan "SILForum2009 003."

Copied!
20
0
0

Teks penuh

(1)

From Database to Publication:

Tools for Typesetting a Three-Language Dictionary

Dennis Walters

SIL International

2009

SIL Forum for Language Fieldwork 2009-003, December 2009

© Dennis Walters and SIL International

(2)

Abstract

This document describes a process for producing a three-language

dictionary using a specific set of software tools. The solutions described

herein are things that many linguists will be able to do themselves, although

some steps may require help from a software specialist.

The lexical data included roughly 11,000 entries, with English, International

Phonetic Alphabet (IPA), Chinese characters, romanized Chinese characters,

Nuosu Yi characters, and romanized Nuosu Yi characters. The data records

were prepared using SIL FieldWorks Language Explorer (FLEx). The FLEx

data were exported to a custom standard format (SFM) database, then

manipulated with a Python script to produce an intermediate SFM database.

The intermediate database served as input to “Shlex,” a Perl program that

produces a formatted dictionary and reversed finder lists. These documents

then became subdocuments in an OpenOffice master document and were

combined with title page, cataloging-in-publication data, preface,

intro-duction, character indexes, etc., into a single master document. After adjusting

the style details for paragraph, character, page, list, and outline; and

generating indexes, the master was exported as a .pdf document which could

be used to make photo plates for printing.

This document will be of interest to working linguists and project

supervisors who want to have hands-on control of their data throughout the

process of publication.

Overview

! " # $ ! " #

$ %

& ' ( ' ' %

) ' !' %

* ! ' ' !%

* ' ' ) + , - ( + ' .,+ / % * ,+

' ! ) , ! 0 ( .),0/1 ! '

' ' ! ),0 % * !

' 2) 3 ' ! ' !

% * ) 4' 4 .% / !

' ' '' % * ! ! !

4' 4 ! ! % * ! ' 5 '

' ' %5 ' '

4' 4 ! ' ! ! % !

,+ ! ( 4' 4 6

“FieldWorks Language Explorer is the lexical and text tools component of SIL FieldWorks. It is an open source desktop application designed to help field linguists perform many common tasks.” http://www.sil.org/computing/fieldworks/flex/

1 The SFM is a simple scheme for structuring data in a text document. A backslash [\] followed by a letter or series of letters at the

(3)

' ' ' 7 ' '

' ! 7 % * ' '

4' 4 ! %' ! ! ( ' ' ' %

Skills and tools

* ' ' ' ' !' !

% * ! ' !

! ' ' !' ' % *

! ' ' ( ! ( %

* ! ' % )

8 ' 7 !! ' % ) !

' . ! / !! !

! ' ' ! ( 0 - %

Preparing the data in FLEx

* ' ' , - ( 9%1 0 1 :% *

) ' ),0 ! 6 !

!! ' ! % - * ;

' ' %<

; “Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for

parsing and interlinearizing text, but it can be used to manage virtually any kind of data.” See

http://www.sil.org/computing/catalog/show_software.asp?id=79

< The WeSay team is developing a software tool, “SOLID,” designed to help a consultant repair inconsistencies in SFM lexical

data. See http://projects.mseag.org/solid/wiki/Overview and http://wesay.org/wiki/SOLID.

(4)

* ' ' ' ' !

' % ! !

,+ % ' '

' 6 ' ! ! ' 6

% 4 ! (

'' ! % * ( ! !

! ' % ,+

' ' ' ' %

& ' ' ' ' " ! ' !

% ' (

'' %

-' ! , => ' ' ( '

% > !'

! ( ! ' ' ' ' %

, !' ? ! ,+

! '

' ! " % .* ! " ! ! ' !

%/ .@ / ,+

' #) ! ' ! " @ %

\revChn

;

;

\revChP tong3; duo3; chuan4

4 ! ! ( ' ! % * ' '

' ' ( ' ! ! ! ' % * (

! ( ! % ! %

A . A/

,+ ! ' % , ' ' ' !

% !

! ,+ ! '

' % ' '' ,+ %9#

A ( !' ' ' % * A

' ),0 \eah 0 \eah 1% ! ' '

' %

* ' ! ,+ ! ' export.xml% * !

' ' ! A % *

! ' ? + !

! $ $ $ % * ' ' (

' %B. ! ? '

* %/

* ' ' ! ' !

' 8 ! % * C (

,+ ' ' % * $ C (

9 Filtered export to XHTML format is supported in the current development version of FLEx.

B Alternatively, the custom export could have allowed missing data in these fields. Then the postprocess script could have flagged

(5)

% * 2$-D $ $3 ! * E F! '' ' $ $ % - + ! ) , + ! .$ / * , ' $ $ '' % ! ' ' % ,+ ! ! %:- ' ( ! # $ % * % , ' '

2G $3 2$ G3

ꀊꃚa fu (var.ꀊꉻa ho) adj. 粗(指圆柱形;条形)thick; fat; big

around (of long, cylindrical things) ꀁꃚ ix fu

ꀊꉻa ho (var. of ꀊꃚ a fu) adj. 粗 thick; fat; big around (of long, cylindrical things) ꀁꉻix ho

ꀊꉻa ho ,+ * ' 2& H 3 ꀊꃚa fu

0 % ,+ ?

% > ! 0

H ( 0 % * ' '

),0

\lx

ꀊꃚ

\lxYiP a fu

\va

ꀊꉻ

\et Main entry

\lx

ꀊꉻ

\lxYiP a ho

\mn

ꀊꃚ

\et Dialectal variant

) ! # $ ! ' % - '

! ' % ) ! ! ! (

A ,+ % * '

' ! ! ' ' ! ' ! ? % ,

! ' 1 ; ! '

! < 9% * ! ' ?

' ? %I '

homograph.py

? ,+ % 4 ' '

( ' ! ! ' ? % *

! ' ' ' ! % *

( !' ! ' ! ' !%

F TecKit is software for converting electronic data between different character encodings. See

http://www.sil.org/computing/catalog/show_software.asp?id=77. The mapping file is simply a table that provides replacement values for a list of items, such as /yy/ for the Nuosu Yi character ꒉ.

: During the summer of 2007, the FLEX Discussion email list hosted a long and detailed discussion of the available and preferred

options for modeling variant lexical forms in FLEx. Depending on how they are analyzed, variants can be recorded as allomorphs, inflectional or derivational variants, dialect variants, spelling variants, etc.

I This is a violent thing to do to the lexicon and would be difficult to reverse. At this stage of the process, I was using a version of

(6)

' ! '

' ' " ! " ! % ! $ !

% ' " !

$ ! % - ,+ ' ' ! ' " !% *

? % ) ! ' ' !

! ) ' " ! '

! % * ' !

. / ,+ ' "

! ! ' ) % * )

config.xml ( ' " ! '

%

Custom export

+ ' ! ,+ ! 0&,

! + ,* % * ! ' ' !

C:\Program Files\SIL\FieldWorks\Language Explorer\Export Templates% ,

! ' ' ),0 data0.db%

* ! ' ! 0 & , ! .0&,/ %

! ),0 ! (

! ' ! @ $ @ $ @ $ %

' 0&, @ * ' 7 @ A H 7 @! 0 7 @' )'

7 @ ! > 7 @ J * ' @ A %

+ ' ! ' !

' ' ! ,+ ! ' %

, ' ! ,+ '

,+ '

% , ! @ 方言变体 @ & H '

' ' ! % , ! '

PostProcess.py '

! '' ' ' %

Postprocessing: PostProcess.py(data0.db)

' ),0 data0.db ! ' ' PostProcess.py% * '

! ! ( ' '' ' ' ' ! !

( ? K $ % )' PostProcess.py

• * ' ! L四字格L L, !L L L

L ! L L L

• + , ! L反义L 2 !3 L L

• ( ' !

The MDF is a scheme for constructing a lexical database using a set of predefined fields, and for extracting and typesetting the data in a multilingual dictionary with or without reversal indexes. MDF is described in Coward, David F. and Charles E. Grimes. 1995. Making dictionaries: A guide to lexicography and the Multi-Dictionary Formatter. Waxhaw, North Carolina: Summer Institute of Linguistics.

LIFT (Lexicon Interchange FormaT) is an XML format for storing lexicons/dictionaries. See more on the LIFT standard at

(7)

• ( !' • ( • @ ? • ( ! ! ( ' • ! • $ ! • ' ! ,+ %

, PostProcess.py >

! ! A > % ! ! !

% , ! ! !

! %

* ' PostProcess.py : ),0 ! ( '% * data1.db%

* ' ' 2 %' M ' % 3%

! PostProcess.py ! ( errorreport.txt %

! ,+ ' % 4 !

' ' % - ,+ ' PostProcess.py

' ! .data1.db/ ' ) %

4 C0 *< !' PostProcess.py ( < %

Typesetting Part I: Shlex(data1.db,config.xml)

* ' PostProcess.py.data1.db/ ' 2) 3

! ' ' ! .% /

* G ! % ) config.xml ' ' ?

' %Config.xml ' ' ! '' ' ( %

, ) $ K K & K $

K $ % * ' ) ' '

data1.db ' config.xml ''

%

* ' ! config.xml ) @

! % * ' '

& % * @ $ @ $ @ $ ' $ $

' % ) ' ''

%Config.xml ) ' ? ( @ $

' '' ? ( % )

! ' ' %

config.xml ' ' ' '

" ! ' ' %

, , + config.xml )

(8)

' ! ) @ $ @ $ '

! '' ' % * , + ! ' %

* )

% ! N O N P O N ( O N O

.4 ! !' ) ( 9 ! % %/

Typesetting Part II: Using OpenOffice

) ' 0 & $ $ %

* ' ( 4' 4 ! !

!% ! ! !

' % * ! !

' ' ' ! %

-' ! ! !

! ! 8 % , ' ' ! !

2* C , 3 + ' % A 2*

C , 3 ! ! * ! # > ! 1

' ' ! % ' ' 2* C , 3

'' ! ( ! ' ! ! %

! ! ' ! ! % 4

4' 4 - ! % *

* * $

C $ % C !

! 44 - % * ' ! ! 44 - !

' ! % !' $

' ' ! ! % C ' G0+ !'

' ! ! : % 44

- ! J' %

4 6 ! ! ' ) ! (

! 2 3 %

2 3 ! : !% !

4' 4 ' ! ! % *

' ! ! ) % J

, ! ) ! !

! !! ! %

* ! ! ' ' %

- ( 4' 4 ! ! '

! ! % . ! ! ' '

' ' ! ! %/ * ! ( '' '

! 6 ! ! ! %

* ' ! ! '

! ( ' ? %

* ! ' ) ''

' ! % * ! ! ' 8 !

(9)

( ( ' ! % *

' ! '' ! ! %

4 ' ' ( ! ? !'

'' '' ' % * !' !' ( ''

' ! %

* J Q

! ! % 2* 3 ' '

! % ' ' ' ' ! - ' ! ' ' !' % * ! !' ' ' ' . / ' ( ! ! % ' ( !

! % A ! ! (Q ! ( '

% , ' ' ( '

%

* ! !

% ) ' 2* K 3 2* K 3

' ' '' ' %

J ' % *

! ! ' '

%

! % * % * ' 4' 4

-6 !' ! (% *

' ' %' % !

' (! ( %' % *

%' ' (

! ! ! (! ( '' ' %' %

) ' ' ' % ! 6

' 7 % % 0 & ,

0 & % 0 & , %

0 & ' ! % *

' 2词 汇 正 文3 ' !

% * ' 20 & 3 '

! % C '

' %

4 !!

! ' ' Q ( ' ' %

' ! ( !'

' ' (% >

' ' Q % ,

' ꉁ,ꉂ,ꉃ,ꉄ,ꉅ, ꉆ '

% 0 ' %

(10)

!! 4' 4 ' '

! !' (% * ( !

' 2 '' %3

Conclusion

' ' ( '

' % A ! '

%

! ! > ! % ' '

% & !

' ! ( ( ! %

' ! % J !

! ' ! " !% 0&, ! ( '

! ( ( > ! % +

$ ! $ ),0

' ! % ) ' ! ! ( ( (% ' ' ' !! % , ! ! ' ' ' % ! ' '' ' ' ' ! % % ) ! ' ! !' ' !! ? ' ! ! ! %

Acknowledgments

* !' ' # $ & '

' 6 ' ) + ) J

# ) .西南民族大学西南民族研究院)% * '

' 1 : # A .民族出版社/ C 6 % *

!' 0 + ) - ! % * , - (

' + ) & '! ' ! ) +

% C 1 < , - ( ! ( ! '

' ! ' ' # $ % E D (

!' # $ % ) 0 H > ! .) +

'/ ' ' ! ' ' % H > ! ' ' !

Homograph.py ' PostProcess.py ' Config.xml ' ) % H

> ! G0+ : R ' !

!' % 0 A ( .) + # > ! ) ' / ) '

Config.xml ' % S S ( ) - E D (

( ' % *

' ' # $ & '

' '' % 4 ! ' ! "

(11)

Appendix – Sample dictionary pages

(12)

Chinese Table of Contents

(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)

Appendix – Scripts and Files Referenced

' ' !

P T % % * ! !' ? !

! & - ' %

% ! J ) % & ' %

% 4 ' ,+ ' ' % ! %

% 4 ' %' ' ) %

' % 4 ' %' ! ! ' %

' % ! ! ' ' ' ! ,+

),0 %

,+ , - ( + ' !' ) +

, - ( %

! ' %' ' ! ' ?

,+ %

%' ' ' ),0 ' ! ,+ %

) ' ' %

References

& ,% % ! % II9% 0 ( '

Gambar

Figure 1. Screenshots of FLEx lexical entry and SFM lexical entry

Referensi

Garis besar

Dokumen terkait

Using the whole numbers from 1 to 22 inclusive, Horatio wants to form eleven fractions by choosing one number as the numerator, and one number as the

Demikian sistem Ekonomi Indonesia yang mampu mewujudkan cita-cita Keadilan Sosial bagi seluruh rakyat Indonesia menentang individualisme dan kapitalisme yang kini

Jika dicermati, ketentuan dalam ayat (1) dan ayat (2) tersebut di atas me- nunjukkan terjadi pergeseran kekuasaan legislatif yang semula berada di tangan

11/1.C/POKJA.IV/ULP-Kaur/KK/2015 tanggal 18 April 2015 pada Badan Perencanaan Pembangunan Daerah Kabupaten Kaur, maka Pokja IV Unit Layanan Pengadaan Kabupaten Kaur

menempatkan orang lain yang sudah dewasa, yang selalu berada dalam keadaan. boros, dungu sakit ingatan (gila) atau mata gelap di bawah pengampuan

Undang-undang No.20 Tahun 2008 tanggal 4 Juli 2008, maka perusahaan ini termasuk dalam kategori badan usaha berskala menengah. Pangsa pasar yang sudah diraih oleh

[r]

bahwa berdasarkan pertimbangan sebagaimana dimaksud dalam huruf a dan untuk melaksanakan ketentuan Pasal 17 ayat (2) Undang-Undang Nomor 7 Tahun 1983 tentang Pajak