• Tidak ada hasil yang ditemukan

Chapter 3. Maritime English Corpus

3.2 Corpus Design

These experts who have given me many pieces of advice have been working at several different institutions such as the International Maritime Organization (IMO), World Maritime University (WMU), Korea Maritime and Ocean University (KMOU)16), Mokpo National Maritime University, Korean Register of Shipping (KR), Korea Maritime Institute (KMI), and Korea Institute of Maritime and Fisheries Technology.

I decided to compile a four million word corpus, selecting equal amounts of words from four different genres which represent academy, news, laws, and textbooks. There are some practical considerations to determine the size of the four million word corpus. I considered running time for corpus data using personal computers in the data-driven learning (DDL) environment.

When teaching students focusing on DDL methods, the corpus size influences teaching and learning. If students are engaged in extracting keywords, key word linked list, and n-grams, it takes much time to get the results if the corpus contains more than four million words. In addition, a sub-corpora of the MEC can be compared with a sub-corpora of BNC Baby in order to find the characteristics of an ESP genre because BNC Baby consists of academic writing, newspaper texts, imaginative writing, and spontaneous conversation and has almost equal amount of size for each genre which has one million words, representing the full BNC. Moreover, around four million word corpus is appropriate for language network analysis because this size of corpus produces a proper number of keywords, linked keywords, and collocates to enable software program to visualize and

16) There is a previous study for a compilation of a small size corpus of maritime English (Hong and Jhang, 2010).

compute network analyses algorithms. Based on these reasons, the MEC size is decided to be comprised of four million words and each of the four genres consists of a one million word sub-corpus.

To collect data for an academic genre, I used Springer’s database (http://www.springer.com) which provides numerous journals to scientific and professional communities and Elsevier’s Science Direct (http://www.sciencedirect.com) which is one of the largest publishers in the world. I selected the most relevant maritime related academic journals such as “Maritime Policy and Management”, “Journal for Maritime Research”,

“Maritime Studies”, “Gyroscopy and Navigation”, “Aegean Review of the Law of the Sea and Maritime Law”, and “WMU Journal of Marine Affairs”. All articles in these journal lists were saved manually as PDF files, as shown in Table 3.1.

Table 3.1 List of academic journal sources

Text_ID Titles Sources

A01 Maritime Policy and Management http://www.tandfonline.com/toc/TMPM20/curr ent#.Vb8g7O2wfIU

A02 Journal for Maritime Research http://www.tandfonline.com/toc/rmar20/current

#.Vb8hFe2wfIU

A03 Maritime Studies http://www.maritimestudiesjournal.com/

A04 Gyroscopy and Navigation http://www.springer.com/engineering/mechanic al+engineering/journal/13140

A05 Aegean Review of the Law of the Sea and Maritime Law

http://www.springer.com/law/international/jour nal/12180

A06 WMU Journal of Marine Affairs http://www.wmu.se/publications/wmu-journal

A news genre consists of official institution texts and commercial news texts. Official institution sources are “IMO Press Briefings” and “World Maritime University News”. Commercial news contains specialized maritime news Websites which are regarded as hub sites by the experts. These sources are “World Maritime News”, “The Maritime Executive”,

“Marinelink”, and “Maritime Today News.” Since these Websites contain numerous articles, we used a Wget crawler to collect them. After the collection, an NLP Python program automatically extracted only sentences out of these texts. Table 3.2 shows a list of news Web-site sources.

Table 3.2 List of news website sources

Text_ID Websites Sources

N01 IMO Press Briefings http://www.imo.org/MediaCentre/PressBriefings

N02 World Maritime News http://worldmaritimenews.com/archives

N03 Marinelink http://www.marinelink.com/

N04 World Maritime University http://www.wmu.se/news

N05 Maritime Today News http://www.maritimetoday.com/

N06 The Maritime Executive http://www.maritime-executive.com/offshore-ne ws

A law genre is a collection of the IMO regulations and codes recently released by the IMO. In order to collect database of formal regulations and codes, I obtained an agreement with KR which allows me to use the IMO official legal texts for academic purposes. Thus, the IMO data could be inserted to the law genre with its permission. KR’s department which is in charge of the IMO official legal texts provided some of these data as a form of CD UNIX forma. Table 3.3 shows a list of maritime law sources.

Table 3.3 List of maritime law sources

Text_ID Titles Sources

L01 AFS 2001 http://www.krs.co.kr

L02 Bunker 2001 http://www.krs.co.kr

L03 BWM Convention http://www.krs.co.kr

L04 COLREG 2014 Consolidated Edition http://www.krs.co.kr L05 FSS Code 2014 Consolidated Edition http://www.krs.co.kr L06 FTP Code 2014 Consolidated Edition http://www.krs.co.kr

L07 IBC 2014 http://www.krs.co.kr

L08 IGC 2014 http://www.krs.co.kr

L09 III Code http://www.krs.co.kr

L10 IMDG Code 2014 http://www.krs.co.kr

L11 ISM Code http://www.krs.co.kr

L12 ISMBC Code 2014 Consolidated Edition http://www.krs.co.kr

L13 ISPS Code http://www.krs.co.kr

L14 LSA Code http://www.krs.co.kr

L15 MARPOL 2014 Consolidated Edition http://www.krs.co.kr

L16 MLC 2014 Consolidated Edition http://www.krs.co.kr

L17 Noise Code http://www.krs.co.kr

L18 RO Code 2014 Consolidated Edition http://www.krs.co.kr L19 Ship Recycling 2014 Consolidated Edition http://www.krs.co.kr L20 SOLAS 2014 Consolidated Edition http://www.krs.co.kr

L21 STCW Convention & Codes http://www.krs.co.kr

L22 TONNAGE 1969 http://www.krs.co.kr

Maritime-related textbooks are selected for the last genre. I considered to include various fields so the contents of selected books are economics,

Text_ID Titles Sources

T01 A Global Union for Global Workers: Collective Bargaining and

Regulatory Politics in Maritime Shipping Routledge

T02 Admiral Lord Keith and the Naval War Against Napoleon University Press of Florida T03 Maritime Communities and Vegetation of Open Habitats Cambridge University

Press

T04 Command of the Sea Charles Scribner’s

Sons

T05 International Maritime Transport Routledge

T06 Island Disputes and Maritime Regime Springer

T07 Jurisdiction and Arbitration Clause Springer Berlin

Heidelberg

T08 Maritime Delimitation Martinus Nijhoff

T09 Maritime Economics Routledge

T10 Maritime Fiction Sailors and the Sea Palgrave Macmillan

T11 Maritime Law and Policy in China Routledge-Cavendish

T12 Maritime Safety Law Springer

T13 Maritime Security in the South China Sea Ashgate

T14 Maritime Security Routledge

T15 Maritime Transportation Safety Routledge

T16 Maritime Work Law Fundamentals: Responsible Ship owners,

Reliable Seafarers Springer

T17 Oceans Governance Allen & Unwin

T18 Places of Refuge for Ship Martinus Nijhoff

Publishers Boston

safety, transport, history, policy, etc. The number of collected textbooks is 30 kinds and all of them were PDF formats. Later, these PDF files are transformed into txt files and then they are filtered and extracted by an NLP process. Table 3.4 shows a list of book sources.

Table 3.4 List of textbook sources

T19 Random Seas and Design of Maritime World Scientific Publishing Company

T20 Review of Maritime Transport 2006 United Nations

T21 Roots of Strategy Book 4 Stackpole Books

T22 Security for Airport and Aerospace, Maritime and Port, and High-Threat Targets in Belgium

ICON Group International

T23 State Responsibility for Interferences with the Freedom of

Navigation in Public International Law Springer

T24 Sustainable Maritime transportation and Exploitation of Sea

Resources Proceedings of the 14th International Congress CRC Press

T25 The Carriage of Dangerous Goods by Sea Springer Berlin Heidelberg T26 The Evolving Maritime Balance of Power in the Asia-Pacific World Scientific Pub

Co. Inc.

T27 The Maritime Dimension of International Security RAND Corporation

T28 The Maritime Engineering Reference book Butterworth-Heinemann

T29 The Unforgiving Coast Maritime Oregon State

University Press

T30 Towards Principled Oceans Governance Routledge

The sum of these collected data is much more than 400 million words. The following chapters describe how to collect texts and how to compile each sub-corpora by using NLP.