• Tidak ada hasil yang ditemukan

CS671: NLP Neural Network based translation and parallel corpus generation

N/A
N/A
Protected

Academic year: 2024

Membagikan "CS671: NLP Neural Network based translation and parallel corpus generation"

Copied!
1
0
0

Teks penuh

(1)

CS671: NLP

Neural Network based translation and parallel corpus generation

 

Basis: As of the current scheme in machine translation, Statistical Machine Translation  (SMT) is prefered to Neural Network based systems. Also, making a system that learns  translation requires the availability of a one to one correspondence between the sources and  target sentences, i.e., having parallel corpora at one’s disposal is crucial. However, 

commonly available parallel corpora contain hardly more than a 100,000 words. Therefore,  the translators trained using them are naturally weak. 

 

Project schema: We propose to train a system that learns translation as well as generates  parallel corpus using comparable corpus (which is readily available) for any 2 languages. We  achieve this by: 

 

1. Training a weak translator using the limited parallel corpus available. 

2. Assuming we have a corpus X and its comparable counterpart Y, we use this weak  translator to translate X into Y’s language yielding a corpus Z. 

3. An aligner like ​Hunalign​ or ​LF Aligner​ (again based on hunalign) is used to match the  concepts within sentences in Z to the concepts within sentences in Y. 

4. The above step outputs matching pairs of sentences in Y and Z (both in the same  language, of course). For instance, if Y had sentences from Y​1, Y​2, . . . , Y​N while Z  had sentences from Z​1, Z​2,. . . , Z​M, the aligner produces sentence pairs: 

{Y​1​, Z​1​}, {Y​2​, Z​2​}, . . . , {Y​K​, Z​K​}. Note that the output numbering may not be the same  as input numbering of sentences. 

5. The Z sentences in the pair are mapped back to their counterparts in X and we get  pairs {Y​1, X​1}, {Y​2, X​2}, . . . , {Y​K, X​K}. 

6. Note that the above generated parallel corpus is free of any noise associated with  translation. 

7. The weak translator is retrained on the generated parallel corpus in a similar way. 

 

Prefered languages are English and Hindi.   

 

Papers referenced 

1. Kalchbrenner and Blunsom​, ​Recurrent Continuous Translation Models​, 2013  2. Sutskever et al​, ​Sequence to Sequence Learning with Neural Networks​, 2014  3. Cho et al​, ​On the Properties of Neural Machine Translation: Encoder–Decoder 

Approaches​, 2014 

4. Hermann and Blunsom​, ​Multilingual Distributed Representations without Word  Alignment​, 2014 

5. Cho et al​, ​Neural Machine Translation by jointly learning to align and translate​,  ICLR 2015 

6. Wolk and Marasek​, ​Building subject­aligned comparable corpora and mining it  for truly parallel sentence pairs,​ 2014 

Referensi

Dokumen terkait

The compensation control method base on radial basis function (RBF) neural network was proposed by ref [18], the controller could approach nonlinear dynamic model of space

This paper, by combining artificial bee colony algorithm and BP neural network, proposes the image denoising method based on artificial bee colony and BP neural network (ABC-BPNN),

This paper applies extraction algorithm of energy distance vectors of wavelet packet decomposition and BP neural network technology to feature analysis and recognition for active

proposed blood cell image classification algorithm based on a deep convolutional generative adversarial network (DC-GAN) and residual neural network (ResNet).. The proposed

Keywords—Vietnamese Keywords Extraction, Vietnamese News Categorization, Text Classification, Neural Network, SVM, Random Forest, Natural Language Processing.. INTRODUCTION Text

ISSN: 1693-6930, DOI: 10.12928/TELKOMNIKA.v20i1.19759  118 Network and layer experiment using convolutional neural network for content based image retrieval work Fachruddin1,

Hybrid Neural Network NN and Particle Swarm Optimization PSO Hybrid NN-PSO is done by finding the weights w and the bias b best of scheme NN where the fitness function in PSO are

Two well-known neural network models Radial Basis Function Network RBFNs and Multi-Layer Perceptron MLP and logistic regression model were developed based on 16 attributes related to