• Tidak ada hasil yang ditemukan

translation and parallel corpus

N/A
N/A
Protected

Academic year: 2024

Membagikan "translation and parallel corpus"

Copied!
11
0
0

Teks penuh

(1)

English-Hindi Neural machine translation and parallel corpus generation

EKANSH GUPTA ROHIT GUPTA

(2)

Advantages of Neural Machine Translation Models

• Require only a fraction of the memory needed by traditional statistical machine translation (SMT) models

• Deep Neural Nets out-perform previous state of the art methods assuming availability of large parallel corpora

• Can be combined with word-alignment approach to address the rare-word problem

(3)

Performance of NMT Models

Source: T Luong et al, ACL 2015

(4)

Motivation

• Advantages of Neural Machine Translation

• Very large parallel English-Hindi corpora are unavailable

However comparable corpora available

(5)

Encoding Scheme

• Using one-hot encoding for top-N words chosen from large monolingual corpora for each language

• Out Of Vocabulary (OOV) words represented by unk

• Monolingual corpora used: http://corpora.heliohost.org/

(6)

Encoding Scheme: Example (top 10 words)

the 1000000000…...

to 0100000000…...

and 0010000000…...

a 0001000000…...

of 0000100000…...

in 0000010000…...

for 0000001000…...

that 0000000100…...

is 0000000010…...

on 0000000001…...

के 1000000000…...

में 0100000000…...

की 0010000000…...

को 0001000000…...

से 0000100000…...

है 0000010000…...

ने 0000001000…...

का 0000000100…...

और 0000000010…...

कक 0000000001…...

(7)

Recurrent Neural Networks

• A standard RNN maps a sequence of inputs to outputs by iterating the following equations:

𝑡 = 𝜎(𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎ𝑡−1)

𝑦𝑡 = 𝑊𝑦ℎ𝑡

• LSTM:

𝑝(𝑦1, … , 𝑦𝑇|𝑥1, … , 𝑥𝑡)= 𝑡=1𝑇 𝑝(𝑦𝑡|𝑣, 𝑦1, … , 𝑦𝑡−1)

Distribution is represented with a softmax over all the words in the vocabulary

(8)

Summary of Methodology

• Training weak translator using limited parallel corpus

• Weak translator and aligning heuristic (ex: Hunalign) used to create additional parallel corpus

• Neural translator re-trained on generated bigger parallel corpus

(9)

Pipeline for creating parallel corpora

Source: K Wołk et al, 2014

(10)

Aligning sentences

Original Hindi Translated

English

Original English

Comparison Heuristics Basic Translation Engine

Adapted From: K Wołk et al, 2014

(11)

Questions ?

Referensi

Dokumen terkait

California: SAGE Publication. Tanslation memory systems: Enlightening users' perspective. London: Imperial College. Meaning-based translation: A guide to cross-language

Research data and analysis were taken from the documentation of the translation results of different machine translation, that is Google Translate and IMTranslator.. The

Figure 5: Phrase Table Injection 4.2 Backtranslation Neural Machine Translation has achieved state- of-the-art results for many languages pairs while only using parallel data for

4.1 Model 1 We propose a neural machine translation model using the recurrent highway network architecture given by [4], and we use encoder-decoder system on this architecture with

Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert

Admittedly, empirical studies, especially in the form of monographs, focused on analysing the practical use of corpora in the preparation of translation students are still lacking,

In this article, deep neural network models are used to perform automatic detection of strabismus and specify the type of strabismus according to different sources of image acquisition

ISSN: 2476-9843,accredited by Kemenristekdikti, Decree No: 200/M/KPT/2020 DOI: 10.30812/matrik.v23i2.3465 r 465 Implementation of Neural Machine Translation in Translating from