3 8 5 6 5 018

13  15 

Loading....

Loading....

Loading....

Loading....

Loading....

Teks penuh

(1)

Part 3/3, Trilingual: Arabic/Indonesian/English, available on Academia.edu, Linked-In and researchgate.net

(2)
(3)
(4)

اراول ّشوتسُس ىىسذٌهولا

ىىُبىساحلا نهلشفب

ٍف جاتًإ دساىه تُودؼه شُغ ؟تقُقد اهو ؟لوؼلا

بي ىن شكفٚ ٌٕعذًُٓنا ٌٕٛثٕعبحنا

جسبخ حُٓي خيٕظُي ذػإمنا

،حذحإنا ىْٔ

ٌٕنٔبحٚ زُي 22 ، بيبػ فٕعٔ

ٌٔدٔبؼٚ خنٔبحًنا

ادذدي بثجػٔ ٍئ ٌٔد ٖٔذخ دبجثلإ ٖذي " تهءلاه تُِّلاّؼَفو " ىٓدُٓي ىٓربٛيصسإخٔ .

خفبػلإبث ٗنإ

ىٓيبًزْإ ،دبٛيصسإخنبث

ىَٓئف ٌٕخبزحٚ ٗنإ

حدبػإ شظُنا ٙف جرًُٕنا ٘ذٛهمزنا بمفٔ فاذْلأ عاشغأٔ

فشظنا ّٙثٕعبحنا .

بي ىن ٍكٚ ىٓٚذن شجظنا ضّحفزهن كٛلذنا

مٛطبفزن فٚشظزنا

،ّراشٛغزئ ىَٓئف

ًٌٕكحي ٌأ

إهػبفزٚ غي ءاشجخ ازْ

ممحنا .

زٌه 52 ،اهاػ نػذت ثاهدلا تُػاشلا يُسذٌهولا يُُبىساحلا

نهتابساقهو تقهاشولا

(5)

Bisakah Para Ahli Komputer Berpikir Beda?

Translation to Indonesian by Prihantoro

http://pri2011.blogspot.fr/

Frasa ‘rich and complex morphology” yang disematkan pada bahasa Arab, banyak digunakan

sebagai kata kunci pada karya tulis oleh ahli komputer, sekitar 50%.Salah satu betapa ‘kaya’ dan ‘kompleksnya’ morfologi bahasa arab adalah adanya 4 pola infleksi (Gender, Jumlah, Definiteness, Kasus) sementara bahasa Perancis hanya memiliki dua.

Ada dua sisi kompleksitas yang kadang-kadang sulit dibedakan. Perrtama adalah kompleksitas

morfofonemiknya; dan yang ke dua adalah kompleksitas pada tataran aplikasi. Sejauh ini para

ahli komputer terperangkap pada terbatasnya implementasi dan pola pikir mereka.

Berbagai model yang berbasiskan morfologi tradisional sudah diaplikasikan oleh beberapa tim

peneliti. Ada dua kelemahan umum. Yang pertama, konsep morfologi tradisional ditelan

bulat-bulat tanpa memahami makna dan tujuannya. Yang ke dua, tidak ada lexical resource yang

akurat yang dihasilkan..

Berikut adalah bagian dari paper yang dihasilkan oleh tim komputer dari Columbia University.

Mereka memiliki pemahaman morfofonemis yang cukup dan mengadaptasi pola morfologi

tradisional.

« We also exclude all analyses involving non-triliteral roots and non-templatic word

stems since we do not even attempt to handle them in the current version of our

rules». (Altantawy et al., 2010:856).

(6)

The rules are compatible only with triliteral nouns we are not evaluating our lexicon

coverage (...). Future work will address the crucial issue of creating and evaluating a

comprehensive lexicon » (Altantawy et al., 2010:856).

Bagian dari paper mereka menunjukan ada hal yang tidak diteliti, seperti akar konsonan yang

non-triliteral (3 huruf). Ada juga aturan yang hanya bisa diaplikasikan pada nomina trilateral. Ini

menunjukan masih banyak celah yang bisa diperbaiki, atau bahkan tak mungkin untuk sempurna

jika

Dengan kata lain, cukup sulit untuk melakukan formalisasi leksikon bahasa arab dengan

model root/pattern/rules yang mereka gunakan. Dengan menggunakan model ini, formalisasi

akar konsonan tidak bisa dipisahkan dari kelas infleksi, serta dilakukan dengan mengikuti

sejumlah aturan yang berlaku untuk setiap entri. Tim Columbia University, harus mengorbankan

cakupan kosakata untuk mengikuti model ini.

Menciptakan sejumlah rumus dan mengupdatenya adalah pekerjaan yang tidak mudah. Pada

tahun 2009, sejumlah rumus fonologis telah diadaptasi dalam sistem text-to-speech bahasa Arab

yang digunakan di http://translate.google.com. Perubahan yang dilakukan (aturan baru,

modifikasi, perubahan tata urut) dilakukan atas dasar perbandingan dengan membandingkan

penemuan fonologis terbaru dengan yang lalu. Tanpa disadari, perubahan di positif di satu sisi

kadang memiliki dampak negatif di sisi lain. Ini karena rumus-rumus itu dimasukan pada satu

kolam yang sama, sehingga perbaikan tidak dapat terkontrol dengan baik.

Lima belas tahun sebelum paper Alantawy et al, Beesley telah menunjukan potensi permasalahan

akar konsonan yang merupakan underlying representation. Menurut Beesley, linguis

terpengaruh oleh kompleksitas ini sehingga kadang menyederhanakan aturan pada tataran

(7)

Mengapa para ahli komputer tanpa pikir panjang langsung mengadaptasi model tradisional ini?

Ini karena focus mereka adalah algoritma, bukan morfologi. Mereka tidak tertarik pada sisi

ilmiah morfologi atau pandangan-pandangan kritis mengenai morfologi tradisional.

Namun demikian, rumus tersebut tidaklah mengakomodir kelas infleksi leksikal: sebuah

rumus baru akan mempengaruhi entri lain, meskipun sudah dilakukan indeksasi pada entri mana

saja rumus itu akan berlaku. Para ahli komputer kadang tidak melihat sisi negatif ini. Jika jumlah

rumus dalam satu kolom yang sama bertambah, maka persilangannya dengan rumus lain akan

bertambah banyak, dan membuat kompleksitas meningkat sehingga sulit dikontrol. Yang lebih

penting lagi adalah tata urut aplikasi rumus tersebut. Memeriksa, memperbaiki dan mengupdate

rumus-rumus dalam satu kolam yang sama memang sangat sulit.

Sebaliknya, dalam perspektif kelas infleksi, rumus-rumus tadi dikelompokan dalam kolam yang

berbeda, sehingga persilangan yang terjadi sangatlah terbatas pada satu kelas khusus. Dalam

leksikologi, kategorisasi ini bermanfaat untuk mereduksi persilangan antar kelas, meskipun

terkadang ada pengulangan entri. Akan tetapi hal ini tidak terlalu sulit ditangani.

Seorang linguis bahasa arab akan mampu memberikan penjelasan yang detil mengenai fenomena

morfologi kompleks dengan merujuk pada kelas infleksi yang berbeda. Kadang kompleksitas

inilah yang tidak terlalu disukai oleh para ahli komputer.

Bisakah para ahli komputer membuat lexical resource yang komprehensif? Tidak jika mereka

tidak mengubah cara berpikir mereka. Model yang mereka gunakan sejak 1990 sudah terbukti

gagal karena gagal menyesuaikan transisi morfologi tradisional ke morfologi komputasi. Jika

mereka tidak memiliki kesabaran untuk mempelajari morfologi bahasa Arab, maka mereka harus

(8)

Selama 25 tahun, para sponsor telah mendukung pandangan naïf ahli komputer yang

menjanjikan lexical resource yang komprehensif, tanpa bantuan dari ahli morfofonemik bahasa

(9)

Can computer scientists think

Arabic computational morphology

out of the box?

Alexis Amid Neme and Eric Laporte

Part 3/3

At the end of the nineties, Arabic gained the reputation of having a “rich and

complex morphology”. As much as half of the scientific papers in Arabic NLP start with this key

phrase. The richness of Arabic morphology is obvious, since for instance Arabic nouns have four

inflectional features (Gender, Number, Definiteness, Case) while French nouns have two.

However, the complexityhas two facets: the complexity of the details of morpho-phonological

facts; and the complexity due to the implementation of their description. The stereotype of the

rich and complex morphology might well hide confusion between the two facets, where

computer scientists are overwhelmed by the inadequacy of their own approach and

implementation.

Various models based on the traditional model were implemented by research teams. These

implementations have two common points: the traditional model was taken for granted totally or

partially without questioning its aims and definitions; and no accurate lexical resource was

produced meanwhile.

Below, two extracts from publications of the Columbia University team, one of the best teams of

computer scientists working on Arabic NLP, since 2005. This team has a suitable knowledge of

(10)

« We also exclude all analyses involving non-triliteral roots and non-templatic word

stems since we do not even attempt to handle them in the current version of our

rules». (Altantawy et al., 2010:856).

« The project has an on-going part for nouns, including BPs (Altantawy et al., 2010).

The rules are compatible only with triliteral nouns: we are not evaluating our lexicon

coverage (...). Future work will address the crucial issue of creating and evaluating a

comprehensive lexicon » (Altantawy et al., 2010:856).

In other words, encoding such rules for the entire Arabic lexicon is a challenge, or is even almost

impossible, in the root/pattern/rules approach. In this approach, root alternations are not

formalized separately for each inflectional class, but controlled by a single pool of rules a priori

applicable to all entries. The Columbia team had to shrink the lexical coverage to be able to

stick to the approach.

Making a set of rules, and even updating it, is an incredibly complex task. In 2009, in an Arabic

Text-To-Speech system, now used in http://translate.google.com, we have set up a pool of

Arabic phonological rules. The assessment of a change in the pool (inserting a new rule,

modifying a rule, changing the order of rules) involved comparing on-the-fly the new

phonological output with a previous one, taken as the best reference so far. Each change required

a trial and error process, where each output displayed some improvements and some damaging

of the previous reference. Fine-tuning a pool of rules is inevitably an uncontrolled, empirical

groping process.

Fifteen years before Altantawy et al.'s paper, Beesley had pointed out in his own model a related

issue, the useless complexity induced by the « deep » root, the underlying lexical representation

(11)

influences the linguist to simplify the rules by postulating a rather surfacy lexical

level » (Beesley, 1996:91).

Why do computer scientists take for granted so promptly the traditional model of morphology

and the traditional view of deep roots?

- Their main concern is algorithms, not morphology. They are not sufficiently inclined to

scientific curiosity for morphology or to critical thinking about traditional views.

But the usual rule approach lacks the notion of lexical inflectional class: a rule affects several

kinds of lexical entries since it is not limited by an index of the entries (or classes) affected by

each rule. Rules are powerful devices, and computer scientists tend to overlook the drawbacks of

such power. If the number of rules in the same pool increases, rule interdependency increases

significantly, fuelling complexity to the point of uncontrollability. Moreover, the order of

application of the rules is significant and must be decided and encoded. Checking, correcting and

updating such a pool of rules are heavy tasks.

Contrariwise, in an approach with the notion of inflectional class, the dependencies of a rule are

limited to a class, which is a range of lexical entries. In lexicology for language processing, the

notion of class allows for devising a common process shared by the entries of a class, reducing

the complexity to the number of classes (typically a few hundred) instead of the number of

lexical entries (in the dozens of thousands). Experience shows that a separate, simpler cluster of

rules for each class is more convenient to handle, even if at the cost of some redundancy between

classes.

An expert in Arabic morphology is able to give precise and detailed descriptions of

(12)

implement Arabic morphology in a rather simple way, even in full details; although computer

scientists find the same details tedious and even annoying.

In fact, part of the complexity is intrinsic to the facts of Arabic morphology proper, but we can

dispense with the complexity that originates from the root/pattern/rules approach.

Back to our question in Part 1, why will computer scientists continue failing to produce an

accurate Arabic lexical resource? And what to do?

- Unless computer scientists think out of the box of the rules approach, they are trying

since 1990 and will retry in vain to prove the adequacy of this approach.

- Beyond their interest for algorithms, they need to rethink the traditional model according

to computational morphology’s goals and purposes.

- Finally, unless they have patience for meticulous observation in Arabic morphology, they

need to interact with those that have.

For 25 years, sponsors have been supporting computer scientists’ naive vision of projects

promising an accurate Arabic lexical resource and without an expert in morpho-phonology. Till

(13)

References

Beesley, Kenneth R. (1996). Arabic finite state morphological analysis and generation. In

Proceedings of the InternationalConference on Computational Linguistics (COLING),

Copenhagen, Center for Sprogteknologi, volume 1, pages 89-94.

Buckwalter, Timothy (1990). Lexicographic notation of Arabic noun pattern morphemes and

their inflectional features. In Proceedings of the Second Cambridge Conference on Bilingual

Computing of Arabic and English. 7 pages.

Altantawy, Mohamed; Habash, Nizar; Rambow, Owen; Saleh, Ibrahim (2010). Morphological

Analysis and Generation ofArabic Nouns: A Morphemic Functional Approach. In Proceedings of

the Language Resource and Evaluation Conference (LREC), Malta, pages 851-858.

Altantawy, Mohamed; Habash, Nizar; Rambow, Owen (2011). Fast Yet Rich Morphological

Analysis. In Proceedings of the 9th International Workshop on Finite State Methods and Natural

Language Processing (FSMNLP), pages 116-124.

Neme, Alexis, Laporte Éric (2013). Pattern-and-root inflectional morphology: the Arabic broken

plural. Language Sciences.

Neme, Alexis (2011). A lexicon of Arabic verbs constructed on the basis of Semitic taxonomy

and using finite-state transducers. In Proceedings of the International Workshop on Lexical

Resources (WoLeR) at ESSLLI.

Neme, Alexis (2014). Why Microsoft Arabic Spell checker is ineffective.

Figur

Memperbarui...

Referensi

Memperbarui...

Lainnya : 3 8 5 6 5 018