Opinion Analysis Corpora Across Languages
6.4 Conclusion
In this paper, we have discussed the contributions made by our development ofNTCIR MOAT. We created a cross-lingual opinion corpus using the news document genre, following which, several researchers have conducted cross-lingual opinion research using our test collections. Although sentiment classification accuracy is improved by using a cross-lingual corpus, research investigating linguistic opinion properties characterized by languages rooted in different cultures and opinion retrieval strategies preferable for different language characteristics remain to be undertaken.
6https://nlp.stanford.edu/sentiment/.
6 Opinion Analysis Corpora Across Languages 93 In recent research, high-quality contextual representations based on neural archi- tectures such as ELMo (Peters et al.2018a) and BERT (Devlin et al.2019) are proving to be effective in NLP research. In addition, linguistic properties such as morpholog- ical, local-syntax, and longer-range semantics tend to be treated at different layers, such as the word-embedding layer, lower contextual layers, or upper layers in each of these cases (Peters et al.2018b; Jawahar et al.2019). As an extension of bilingual sentiment word-embedding frameworks (Zhou et al.2015), cross-lingual sentiment retrieval research that considers syntax and semantics in different languages will be an interesting direction for future work.
Acknowledgements This work was partially supported by JSPS Grants-in-Aid for Scientific Research (B) (#19H04420).
References
Balahur A, Steinberger R (2009) Rethinking sentiment analysis in the news: from theory to practice and back. In: Proceedings of workshop on opinion mining and sentiment analysis’ (WOMSA), Sevilla, Spain, pp 1–12
Balahur A, Steinberger R, Kabadjov M, Zavarella V, van der Goot E, Halkia M, Pouliquen B, Belyaeva J (2010) Sentiment analysis in the news. In: Proceedings of the 7th international con- ference on language resources and evaluation (LREC 2010), Valletta, Malta, pp 2216–2220 Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain
adaptation for sentiment classification. In: Proceedings of the 45th annual meeting of the associ- ation of computational linguistics (ACL), pp 440–447
Brody S, Diakopoulos N (2011) Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011), Edinburgh, Scotland, UK, pp 562–570 Choi Y, Cardie C, Riloff E, Patwardhan S (2005) Identifying sources of opinions with conditional
random fields and extraction patterns. In: Proceedings of the 2005 human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP 2005), Vancouver, B. C
Dang HT (2008) Overview of the TAC 2008 opinion question answering and summarization tasks.
In: Text analysis conference (TAC 2008) workshop notebook papers. Accessed 1 Apr 2010.
Available from:http://www.nist.gov/tac/tracks/2008/index.html
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional trans- formers for language understanding. In: Proceedings of the 2019 conference of the North Amer- ican chapter of the association for computational linguistics: human language technologies, vol- ume 1 (long and short papers), Minneapolis, Minnesota, pp 4171–4186.https://doi.org/10.18653/
v1/N19-1423
Elming J, Plank B, Hovy D (2014) Robust cross-domain sentiment analysis for low-resource lan- guages. In: Proceedings of the 5th workshop on computational approaches to subjectivity, senti- ment and social media analysis, association for computational linguistics, Baltimore, Maryland, pp 2–7.https://doi.org/10.3115/v1/W14-2602,https://www.aclweb.org/anthology/W14-2602 Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S (2017) Using millions of emoji occurrences
to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceed- ings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark, pp 1615–1625.https://doi.org/10.18653/v1/D17-1169
Gao D, Wei F, Li W, Liu X, Zhou M (2015) Cross-lingual sentiment lexicon learning with bilingual word graph label propagation. Comput Linguist 41:21–40
94 Y. Seki Jawahar G, Sagot B, Seddah D (2019) What does BERT learn about the structure of language? In:
Proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy, pp 3651–3657.https://doi.org/10.18653/v1/P19-1356
Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification.
In: Proceedings of the 49th annual meeting of the association for computational linguistics (ACL 2011), Portland, Oregon, pp 151–160
Le TA, Moeljadi D, Miura Y, Ohkuma T (2016) Sentiment analysis for low resource languages: a study on informal Indonesian tweets. In: Proceedings of the 12th workshop on Asian language resources (ALR), The COLING 2016 Organizing Committee, Osaka, Japan, pp 123–131.https://
www.aclweb.org/anthology/W16-5415
Louviere JJ, Flynn TN, Marley AAJ (2015) Best-worst scaling: theory, methods and applications.
Cambridge University Press, Cambridge
Lu B, Tan C, Cardie C, Tsou BK (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. In: Proceedings of the 49th annual meeting of the association for computational linguistics (ACL 2011), Portland, Oregon, pp 320–330
Macdonald C, Ounis I (2006) The TREC Blogs06 collection: creating and analysing a blog test collection. Technical report TR-2006-224, Department of Computing Science, University of Glasgow
Martinez-Camara E, Martin-Valdivia MT, Urena-Lopez LA, Montejo-Raez AR (2013) Sentiment analysis in twitter. Nat Lang Eng 11(2):1–28
Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics (ACL 2012), Jeju, Republic of Korea, pp 572–581
Mohammad SM, Bravo-Marquez F, Salameh M, Kiritchenko S (2018) SemEval-2018 task 1: affect in tweets. In: Proceedings of the 12th international workshop on semantic evaluation (SemEval- 2018), New Orleans, Louisiana, pp 1–17.https://doi.org/10.18653/v1/S18-1001
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL 2005), Ann Arbor, Michigan, pp 115–124
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Now Publishers Inc, Boston Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning
techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2011), pp 79–86
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018a) Deep con- textualized word representations. In: Proceedings of the 2018 conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (long papers), New Orleans, Louisiana, pp 2227–2237.https://doi.org/10.18653/v1/N18-1202 Peters M, Neumann M, Zettlemoyer L, tau Yih W (2018b) Dissecting contextual word embeddings:
architecture and representation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP 2018), Brussels, Belgium, pp 1499–1509
Purver M, Battersby S (2012) Experimenting with distant supervision for emotion classification. In:
Proceedings of the 13th Eurpoean chapter of the association for computational linguistics (EACL 2012), Avignon, France, pp 482–491
Ruppenhofer J, Somasundaran S, Wiebe J (2008) Finding the sources and targets of subjec- tive expressions. In: Proceedings of the 6th international language resources and evaluation (LREC’08), Marrakech, Morocco
Seki Y, Evans DK, Ku LW, Chen HH, Kando N, Lin CY (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of the 6th NTCIR workshop meeting, NII, Japan, pp 265–278 Seki Y, Evans DK, Ku LW, Sun L, Chen HH, Kando N (2008) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of the 7th NTCIR workshop meeting, NII, Japan, pp 185–203
6 Opinion Analysis Corpora Across Languages 95 Seki Y, Ku LW, Sun L, Chen HH, Kando N (2010) Overview of multilingual opinion analysis task at NTCIR-8 - a step toward cross lingual opinion analysis. In: Proceedings of the 8th NTCIR workshop meeting, NII, Japan, pp 209–220
Shanahan JG, Qu Y, Wiebe JM (2006) Computing attitude and affect in text: theory and applications.
Springer, Berlin
Socher R, Perelygin A, Wu J, Manning JCCD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the (2013) conference on empirical methods in natural language processing (EMNLP 2013). Association for Computational Linguistics, Seattle, Washington, USA, pp 1631–1642
Steinberger R, Pouliquen B, van der Goo E, (2009) An introduction to the Europe media monitor. In:
Proceedings of ACM SIGIR 2009 workshop: information access in a multilingual world, Boston, MA, USA
Stoyanov V, Cardie C, Wiebe J (2005) Multi-perspective question answering using the OpQA corpus. In: Proceedings of the 2005 human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP 2005), Vancouver, B. C Strapparava C, Mihalcea R (2007) SemEval-2007 task 14: affective text. In: Proceedings of the 4th
international workshop on semantic evaluations (SemEval-2007), Prague, pp 70–74
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classi- fication of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, USA, pp 417–424
Wan X (2009) Co-training for cross-lingual sentiment classification. In: Proceedings of the 47th annual meeting of the association of computational linguistics (ACL), Suntec, Singapore, pp 235–243
Wiebe J, Breck E, Buckley C, Cardie C, Davis P, Fraser B, Litman D, Pierce D, Riloff E, Wilson T (2002) NRRC summer workshop on multiple-perspective question answering final report Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language.
Lang Resour Eval 39(2–3):165–210
Wiebe JM, Wilson T, Bruce RF, Bell M, Martin M (2004) Learning subjective language. Comput Linguist 30(3):277–308
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the 2005 human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP 2005), Vancouver, B. C Zhou H, Chen L, Shi F, Huang D (2015) Learning bilingual sentiment word embeddings for cross-
language sentiment classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language pro- cessing (volume 1: long papers), Beijing, China, pp 430–440.https://doi.org/10.3115/v1/P15- 1042
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.