Maksud dan Tujuan - KACAMATA PINTAR BERTAKARIR DENGAN MENGGUNAKAN LIP READING DAN SPEECH RECOGN

BAB I PENDAHULUAN

1.3 Maksud dan Tujuan

Maksud dibuatnya makalah ini yaitu untuk menampilkan ide tentang inovasi baru yang menggunakan sistem multimedia.

Sedangkan tujuan dibuatnya makalah ini yaitu sebagai berikut :

1. Untuk menjelaskan inovasi yang telah terpikirkan

2. Untuk menyebutkan pro dan kontra dari inovasi yang terpikirkan

4 BAB II PEMBAHASAN

2.1 Perangkat Kacamata Pintar

Jika membicarakan tentang kacamata pintar, yang terlintas pada pikiran adalah produk kacamata pintar milik Google yaitu Google Glass.

Google Glass adalah pioneer dalam inovasi teknologi wearable computing yang berbentuk kacamata. Google Glass menggunakan sistem operasi Android dan menggunakan perangkat pendamping berupa smartphone yang terhubung dengan koneksi wireless ataupun Bluetooth.[1]

Gambar 1 Bagian-bagian Google Glass

Kemampuan yang dimiliki Google Glass pun bermacam-macam

seperti memotret gambar dan merekam video menggunakan kamera juga

5 merekam suara menggunakan mikrofon yang terpasang di kacamata lalu mengupload gambar, video, dan audio tersebut ke internet atau membagikannya ke media sosial. Kemampuan lain yang dapat dilakukan oleh Google Glass yaitu melakukan video call yang menggabungkan penggunaan kamera, mikrofon, dan speaker yang terpasang.

2.2 Lip Reading

Mendeteksi pergerakan bibir manusia merupakan pekerjaan sensitif yang tidak hanya secara visual mengenali perubahan bentuk bibir dari satu bentuk ke bentuk lain, tetapi juga mengenali kata untuk memprediksi kata yang akan dikatakan dan juga untuk mengenali elemen yang lebih spesifik dalam memprediksi suatu kalimat. Jika diterima oleh kamera, pergerakan bibir merupakan gambar yang bergerak dari satu gambar ke gambar lain.

Maka mungkin bisa dikatakan bahwa lip reading merupakan metode yang dilakukan oleh sistem untuk membaca video real-time tentang pergerakan bibir untuk memahami apa arti pergerakan bibir yang terjadi. Membaca pergerakan bibir merupakan pekerjaan yang sangat sulit untuk ekstraksi fitur visual.[2]

Gambar 2 Lip reading

6 Teknik mendeteksi gerak bibir melalui image dengan ekstraksi fitur visual memerlukan metode untuk mendeteksi pergerakan bibir pembicara melalui urutan image. Deteksi gerak bibir dipengaruhi oleh variabilitas pembicara dari segi warna kulit, warna bibir, lebar bibir, dan jumlah pergerakan bibir selama berbicara, serta variabilitas terhadap lingkungan seperti kondisi pencahayaan. Setiap metode yang digunakan untuk mendeteksi gerakan bibir saat berbicara, bibir harus sesuai dari image ke image, agar stabil dan tidak terpengaruh oleh penampilan dari gigi dan lidah.[2]

2.3 Speech Recognition

Suara adalah metode komunikasi dasar, umum, dan efisien bagi

orang untuk berinteraksi satu sama lain. Teknologi pidato saat ini biasanya

tersedia untuk rentang tugas yang terbatas namun menarik. Teknologi ini

memungkinkan mesin untuk merespon dengan benar dan dapat diandalkan

untuk suara manusia dan memberikan layanan yang bermanfaat dan

berharga. Karena berkomunikasi dengan komputer lebih cepat

menggunakan suara daripada menggunakan keyboard, sehingga orang

akan lebih menyukai sistem tersebut. Komunikasi di antara manusia

didominasi oleh bahasa lisan, oleh karena itu wajar bagi orang untuk

mengharapkan antarmuka suara dengan komputer.

7

Gambar 3 Speech-to-text

Ini dapat dicapai dengan mengembangkan sistem pengenalan suara: speech-to-text yang memungkinkan komputer untuk menerjemahkan permintaan suara dan dikte ke dalam teks. Sistem pengenalan suara: speech-to-text adalah proses mengubah sinyal akustik yang ditangkap menggunakan mikrofon ke sekumpulan kata. Data yang direkam dapat digunakan untuk persiapan dokumen.

2.4 Perbandingan Lip Reading dan Speech Recognition

Berikut adalah perbandingan lain antara lip reading dan speech recognition :

1. Perbandingan antara lip reading dan speech recognition yang paling

signifikan adalah masalah akurasi tepat atau tidaknya input yang

didapatkan sebelum terjadi pemrosesan untuk dijadikan output berupa

takarir. Metode Speech recognition cenderung memiliki akurasi yang

8 lebih tinggi dibandingkan dengan metode lip reading dikarenakan bentuk input yang berbeda. Input yang berupa suara dapat diterima dengan lebih akurat dan tepat oleh suatu sistem dibandingkan dengan pergerakan bibir. Dijelaskan sebelumnya bahwa pergerakan bibir dapat dikatakan sebagai video real-time, maka terlihat bahwa membaca sebuah video untuk pergerakan bibir lebih sulit untuk mendapatkan hasil yang akurat jika dibandingkan dengan menggunakan speech recognition untuk membaca suara yang diterima. Tetapi bukan berarti lip reading tidak dapat digunakan untuk menerima inputan, yang menjadi masalahnya yaitu tingkat akurasinya yang lebih rendah dari menggunakan speech recognition.

2. Lingkungan menentukan keefektifan kedua metode yang disebutkan.

Jika lingkungannya berintensitas cahaya yang rendah, maka metode lip reading akan mengalami kesulitan tambahan berupa gambar dengan tingkat kecerahan yang kurang yang dapat mengalami ketidakakuratan hasil yang diperoleh. Untuk metode speech recognition, ia akan mengalami kesulitan jika lingkungannya berisik atau memiliki banyak noise yang berasal dari lingkungannya, dikarenakan noise yang didapatkan dapat memengaruhi gelombang yang diterima oleh sistem yang dapat menyebabkan ketidakakuratan hasil.

Dapat juga dilakukan kombinasi antara lip reading dengan speech

recognition untuk mendapatkan hasil yang lebih akurat. Speech

recognition dapat menutupi kekurangan lip reading dan menambah

persentasi keakuratan, lip reading juga dapat membantu speech

recognition dengan cara melakukan hal yang tidak bisa dilakukan oleh

speech recognition. Tetapi jika kedua metode tersebut digabung untuk

menterjemahkan perkataan lawan bicara ke dalam bentuk takarir, maka

akan membutuhkan energi dan waktu yang lebih besar. Sedangkan yang

9 harus menjadi fitur utama kacamata pintar bertakarir ini yaitu kecepatan real-time untuk menterjemahkan perkataan seseorang ke dalam bentuk takarir untuk ditampilkan pada antarmuka pengguna kacamata pintar.

Maka cara yang efektif untuk dilakukan yaitu untuk memilih salah satu metode untuk melakukan terjemahan.

2.5 Pro dan Kontra Kacamata Pintar

Berikut merupakan pro atau hal yang mendukung dibuatnya kacamata pintar bertakarir :

1. Selain dapat digunakan untuk berkomunikasi dengan orang asing, takarir juga dapat digunakan untuk mempelajari bahasa asing tersebut.[4] Karena salah satu cara yang paling efektif untuk belajar bahasa asing yaitu dengan cara berkomunikasi langsung dengan pengguna bahasa asing tersebut. Dengan adanya kacamata pintar bertakarir, proses belajar bahasa asing menjadi lebih mudah.

2. Para tuna rungu dapat memahami siapapun lawan bicaranya asalkan penggunanya (tuna rungu) adalah orang yang terpelajar yaitu dapat membaca huruf, kata, dan kalimat.

3. Google Translate adalah salah satu aplikasi penterjemah bahasa yang

paling baik yang ada saat ini. Menurut Li[5], hasil terjemahan Google

Translate cukup masuk akal untuk dibaca meskipun terdapat sedikit

tatabahasa yang kurang tepat. Oleh karena itu, maka memungkinkan

untuk menggunakan Google Translate untuk menterjemahkan inputan

yang didapat dari lawan bicara untuk dirubah ke dalam bentuk takarir

yang cukup masuk akal dan dapat dimengerti.

10 Sedangkan berikut merupakan kontra atau hal yang menentang dibuatnya kacamata pintar bertakarir :

1. Kamera yang terpasang untuk melakukan lip reading terhadap lawan bicara dapat dianggap sebagai pelanggaran privasi terhadap seseorang di publik. Orang yang menggunakan kacamata pintar dapat memotret dan merekam video tanpa sepengetahuan orang yang dipotret atau direkamnya. Karena alasan diatas, terdapat tempat-tempat yang melarang penggunaan kacamata pintar dengan alasan yang disebutkan.

2. Kacamata pintar tidak dapat melakukan komunikasi secara penuh,

tetapi hanya dapat memahami apa yang dikatakan lawan bicara. Oleh

karena itu pengguna tidak memiliki cara untuk memberikan balasan

yang dapat dipahami oleh lawan bicara. Kecuali jika lawan bicara juga

menggunakan kacamata pintar yang memiliki fungsi yang sama. Jadi

hanya menggunakan kacamata pintar tidak dapat memungkinkan

penggunanya untuk melakukan komunikasi secara dua arah, hanya

dapat memungkingkan untuk memahami komunikasi satu arah dari

lawan bicara menuju pengguna kacamata pintar bertakarir.

11 BAB III PENUTUP

3.1 Kesimpulan

Kacamata pintar bertakarir dapat diwujudkan dengan teknologi pada saat ini. Dengan menggunakan teknologi lip reading dan speech reconition dapat didapatkan hasil perkataan yang cukup akurat dari apa yang dikatakan oleh lawan bicara pengguna kacamata pintar bertakarir.

Lalu perkataan yang didapatkan dari hasil lip reading dan speech recognition dapat diterjemahkan dengan aplikasi penterjemah misalnya Google Translate ke dalam bahasa yang diinginkan atau ke dalam bahasa yang dimengerti. Maka akan muncul takarir hasil terjemahan tersebut pada antarmuka kacamata pintar.

Kacamata pintar bertakarir memiliki banyak pro dan keuntungan

bagi masyarakat, umumnya pada orang yang tidak mengerti bahasa asing

dan khususnya untuk seorang tuna rungu. Tetapi dibalik pro yang

dimilikinya, kacamata pintar bertakarir juga memiliki kontra yang akan

menghambat perkembangan dan aktualisasi kacamata tersebut. Oleh

karena itu, diperlukan inovasi-inovasi atau ide-ide baru lain yang dapat

membuat kacamata ini lolos dari pelanggaran moral dan peraturan publik

yang ada.

12 DAFTAR PUSTAKA

[1] Schweizer, H. (2014). Smart Glasses : Technology and Applications.

Ubiquitous Computer Seminar FS2014.

[2] Suhendra, A., & Lakuary R. P. (2017). Aplikasi Deteksi Gerak Bibir Menggunakan Kurva Bezier dengan EMGUCV.

Jurusan Sistem Informasi, Fakultas Ilmu Komputer

Universitas Gunadarma.

[3] Prerana, D., et al. (2015). Voice Recognition System : Speech-to Text.

Journal of Applied and Fundamental Sciences, 1(2), 191-195.

[4] Patricia, A. A., & Patricia D. C. (2014). Foreign Language Acquisition : The Role of Subtitling. Procedia - Social and Behavioral Sciences, 141, 1234-1238.

[5] Li, H., et al. (2014).Comparison of Google Translation with Human

Translation. International Florida Artificial Intelligence Research Society

Conference, 27, 190-195.

Smart glasses:

The maturing field of wearable computing aims to inter-weave computing devices into everyday life. This report fo-cuses on smart glasses, one of the categories of wearable computing devices which is very present in the media and expected to be a big market in the next years. It analyses the differences from smart glasses to other smart devices, in-troduces many possible applications for different target au-diences and gives an overview of the different smart glasses which are available now or should be available in the next few years. Interesting technological features of the smart glasses are highlighted and explained.

INTRODUCTION

Smart glasses are computing devices worn in front of the eyes. Evidently their displays move with the users head, which leads to the users seeing the display independently of his or her position and orientation. Therefore smart glasses or lenses are the only devices which can alter or enhance the wearer’s vision no matter where he/she is physically located and where he/she looks. There are three different paradigms of how to alter the visual information a wearer perceives.

Those three are introduced here.

• Virtual reality: The goal is to create a fully virtual world for the user to see, interact with and immerse into. The user sees this virtual world only, any other light sources are not affecting the eye. One significant difference to a sim-ple screen is that the actions of the user affect the virtual world. In example movement affects what virtual content the user sees. A famous fictional example of a device cre-ating a virtual world is the Holodeck from Star Trek.

• Augmented reality: The world is enhanced or augmented by virtual objects as seen in figure 1. The user can see the real world but also perceives virtual content created by a computing device and displayed by an additional light source which doesn’t prohibit the perception of the real world. Interaction with those virtual objects is a way of communicating with the computing devices.

• Diminished reality: Objects are subtracted from scenes by filtering the light reflected or emitted by those objects to-wards the eye. This is most often used in combination with augmented reality to replace the diminished objects by some virtual objects.

Like other smart devices, smart glasses will often also have a camera. Significant differences to other camera devices are

Figure 1: Reality is augmented with a virtual objec [7].

that the pictures or videos are taken from the users point of view, there is no need for the user to hold the device in his hands and the vision of the user is not occluded. This cam-era can see what the wearer sees at any time. In combination with eye tracking technology the devices can determine ex-actly what the wearer is looking at. This allows the device to get crucial information about the users interests, activities, surroundings and occupation.

Those fundamental differences to other computing devices are what makes smart glasses unique and interesting. They enable new applications which couldn’t be as easily realized with other devices.

DEVICES

All the applications in the world are useless without the right hardware to run on. That is why an overview of different smart glasses which have been released recently or should be released in the next few years is provided. Those glasses are developed by different companies and often trying to achieve different goals and appeal to different consumer mar-kets. Therefore they do not all stand in direct competition and should not be compared as such.

Devices with one display

There are smart glasses with a single display which is placed in the peripheral vision of the user. Those displays can be used to display information to the user. Unfortunately they can not be used to create a diminished or virtual reality be-cause sight on one eye is not affected. They also can not be used to create an interactive augmented reality because vir-tual objects can only be seen in peripheral vision.

Google Glass

One example of smart glasses with one display is Google Glass which runs the Android operating system. Its specifi-cations are the following

• Weight: 50g

• Processing: 1.2 GHz Dual-core ARM Cortex-A9 CPU, PowerVR SGX540 GPU, 16GB storage, 682MB RAM.

That’s roughly equivalent to the hardware of an IPhone 4

• Camera: 5MP still (2528x1856 pixels) or 720p video.

There is no flash

• Display: It is a color prism projector with a resolution of 640x360 pixels. See figure 3.

• Sensors: microphone, accelerometer, gyroscope and com-pass.

• Interaction: There is a long an narrow touch pad which supports swipe and tap gestures. The camera can be trig-gered by a button.

• Audio: There is a bone conduction transducer for audio.

Sound reaches the inner ear in form of vibrations on the scull. Note that this technology is audible by the hearing impaired as well as persons with normal hearing.

• Communication: It has no cellular modem which means it can not make phone calls on its own. It does have Blue-tooth and WLAN 802.11b/g

Google Glass is supposed to be used in combination with a smartphone and one of its main uses is to display notifica-tions in a convenient and quick way. It is supposed to be priced similarly to a high end smartphone but there are no official announcements concerning the exact price or release date.

Br ¨uckner TRAVIS

It is visible in figure 2 that Google Glass does not have a very sturdy design and that it is made for consumers. It is not made for rough environments such as industrial sites or facto-ries. One example of industrial smart glasses is the Br¨uckner TRAVIS shown in figure 4. This device is a lot heavier than Google Glass because the processing is done in a embedded PC worn in a vest. It is controlled with six hardware buttons and its main applications are streaming video and displaying manuals to employees.

Reckon MOD

There are also many devices designed for use during sports.

Similar to Br¨uckner Travis they need to function in a rough environment but also should not be heavy. One example of dedicated sports smart glasses are the Reckon MOD seen in figure 5. The Reckon MOD are snow sports smart glasses.

They can operate at temperatures from−20^◦to 30^◦, weigh approximately 65g and are water resistant. Interaction is done through a wrist remote. The main use of Reckon MOD is displaying maps and performance statistics.

Devices with two displays

Smart glasses with two displays can affect everything the wearer sees and could display 3 dimensional content. This makes it possible to create a virtual, augmented or dimin-ished reality.

Both systems with two displays presented in this section need to be connected to a PC with a cable by which the virtual

ob-Figure 2: Google Glass developer version [8]

Figure 3: Google Glass display: A mini projector projects onto a semi reflective mirror which only af-fects light stemming from the projector [9].

Figure 4: Br ¨uckner TRAVIS [10]

Figure 5: Reckon MOD [11]

Figure 6: Cast AR [12]

jects are created. In the future similar devices could be wire-less and worn outside. Those devices are interesting because they do not focus on displaying information but rather try to create an exciting visual experience.

Cast AR

An exciting new technology which is used to create a aug-mented indoor reality is Cast AR. It has a projector above each eye which projects onto a retro reflector with 120hz each creating a 3D image. A retro reflector is a surface that reflects light back to its source with a minimum of scatter-ing. Nevertheless some of the light of each projector will reach the eye it is not destined for. To deal with this, Cast AR has active shutter lenses. The projectors are active in disjoint small time intervals. While the projector above one eye is not active the active shutter lens of that eye will stop any light from reaching that eye. This happens at such a high speed that the human eye can not notice. The result is a stereoscopic 3D image.

Cast AR tracks head movement and orientation using an in-frared camera and inin-frared LEDs inside the retro reflector.

The exact position is calculated by triangulation in hardware on the glasses. This makes it possible to adjust the orienta-tion of the virtual objects with only a few millisecond delay to head movement. Many people can share one retro reflector each seeing a different scene or the same scene from different angles.

Another advantage of Cast AR compared to other smart glasses is that the eye focuses on items in a distance rather than a screen in front of the eyes. This makes it possible to use Cast AR for long time periods without eye strain.

One of the disadvantages is that the active shutter glasses fil-ter a lot of light which makes the scenes appear darker. By in-creasing the brightness of the projectors its possible to make the virtual objects brighter, but it is not possible to make any real objects in the room brighter without changing lighting of the room which might disturb others.

Another disadvantage is the need for a retro reflective sur-face. Although these are very flexible, lightweight and not expensive they take up space and you can’t see any virtual objects or scenes without one in the background. The price of Cast AR is expected to be around 200$

Oculus Rift

The Oculus Rift is a virtual reality solution which uses two displays placed in front of lenses close to the eyes of the wearer. There is one display in front of each eye, together they have a 1920x1080 pixel resolution on the newer

proto-Figure 7: Oculus Rift Crystal cove prototype [13]

types. For Oculus Rift it is very simple to create 3D scenes because each display is only visible by one eye. Also ness is not a problem because it only depends on the bright-ness of the display which may be adjusted. Oculus Rift tracks head movement using infrared LEDs like Cast AR but it also relies on a gyroscope and accelerometer. The advantage of tracking with a gyroscope and accelerometer is a very low latency, the disadvantage compared to the infrared solution is that over time errors accumulate and there might be ori-entation drift.[6] By combining both methods Oculus Rift implements precise low latency head tracking. As already mentioned Oculus Rift is used to create a virtual reality. No light from the environment reaches the eye. The advantage is

Dalam dokumen KACAMATA PINTAR BERTAKARIR DENGAN MENGGUNAKAN LIP READING DAN SPEECH RECOGNITION (Halaman 6-0)