• Tidak ada hasil yang ditemukan

Exploring the voice onset time of Spanish learners of Mandarin

Dalam dokumen on monolingual and bilingual speech 2015 (Halaman 76-83)

Man-ni Chu, Yu-duo Lin

[email protected], [email protected]

Graduate Institute of Cross-Cultural studies, Fu Jen Catholic University

Abstract. Voice onset time (VOT) has been examined across languages in plenty of studies (e.g., Lisker & Abramson, 1964), especially its process of acquisition. Several models, Contrastive Analysis Hypothesis (CAH, Lado, 1957), Perceptual Assimilation Model (PAM, Best, 1995), and Speech Learning Model (SLM, Flege, 1995, 1999, 2002) were reviewed to examine whether the VOTs’ performance in L1 could influence acquisition in L2. Thus, fifteen Spanish natives learning Mandarin in Taiwan were recruited to participate in 2 perception and 1 production experiments.

Several univariate (one way ANOVAs) and ANCOVAs were performed to examine the correct rate (perception) and the mean VOT values (production). The results showed that “Vowel” and

“Place of Articulation” play important roles, and most of these Spanish-Mandarin learners made significant progress after several months of immersion in a Mandarin society. That the adjacent high/low vowels affect our subjects’ perceptual discrimination can be explained by the vocal-fold tension of the high vowel /i/ (Higgins, Netsell, & Schulte, 1988; Cheng, 2013). The aspirated voiceless stops were perceived and produced significantly differently from unaspirated ones because they were easy for Spanish-Mandarin learners to learn, supporting CAH and PAM. To conclude, the environment where L2 learners stay would influence their progression in L2. The characteristics of preceding vowels and the nature of the stops could also be significant in the acquisition of aspirated/ unaspirated stops.

Keywords: voice onset time (VOT), perception, production

Introduction

Standard Chinese (SC), spoken as a lingua franca in Mainland China and Taiwan, has been known to have five vowels (/a/, /ǝ/, /i/, /u/ and /y/) and nineteen consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /ts/, /tsʰ/, /tʂ/, /tʂʰ/, /m/, /n/, (/ŋ/), /f/, /s/, /ʂ/, /ʐ/, /x/ and /l/) (Duanmu, 2007). The syllable structure is (C1)(G)V(C2)1, where only /ŋ/ is not allowed to allot the C1, and C2 only allows /n/ and /ŋ/. In this study, SC spoken in Taiwan is represented as Taiwan Mandarin (TM). On the other hand, Spanish spoken in most Southern American countries is noted as Caribbean Spanish (CS). CS has five vowels (/a/, /e/, /i/, /o/ and /u/) and twenty consonants (/p/, /b/, /t/, /d/, /k/, /g/, /x/, /f/, /θ/, /s/, /tʃ/, /ʝ/, /m/, /n/, /ɲ/, /r/, /ɾ/, /l/, /ʎ/ and (/ʃ/)) (Salcedo, 2010). The maximum Spanish syllable structure is (C1)(C2)(S1)V(S2)(C3)(C4), where S stands for semi-vowels. The difference between the CS and TM is that the latter has two extra vowels, /y/ and /ə/. As for the stops, CS has voiceless/voiced counterparts, while TM has aspirated/unaspirated voiceless ones. The difference between aspirated and unaspirated stops lies on the duration of the voice onset time (VOT), which signifies a short postponement, measured from the beginning of a released burst (explosion) to the point that the vocal folds start to vibrate. Table 1 shows the mean VOT values of stops in TM and CS.

Lado (1957) proposes Contrastive Analysis Hypothesis (CAH) to claim that L2 learners are inclined to transfer meanings or forms from the primary language to the subsequent language. In other words, if the L2 patterns were similar to those of L1, learners might experience positive transfer, which meant that learning would be facilitated leading to a relatively easy acquisition. In addition, the more different the patterns of L2 are, the more difficult it would be for learners to acquire L2, due to negative transfer. More specifically in phonetic and phonological learning, the Perceptual Assimilation Model (PAM) proposed by Best (1995) claims that the similarity of the sounds between

1 However, there are some other opinions, e.g., Lin (2007) believes that the syllable structure should be C1GVC2,where glide (G) occupies a time slot.

M. N. Chu, Y. D. Lin

69

L1 and L2 could assist learning. The higher the degree of gestural similarity between native and non- native sounds, the better the mapping of the native phoneme categories and non-native phones. On the other hand, the Speech Learning Model (SLM) proposed by Flege (1995, 1999, 2002) claims that L2 learners would find it more difficult to acquire L2 if the L2 sounds or phonological structures are more similar to L1. If an L2 sound is similar to a particular L1 sound, learners’ pre-existing L1 phonology will impede their accurate perception and production of that L2 sound. That is, phonological knowledge of L1 hinders an acquisition of similar L2 sounds.

Table 1. The Mean VOT Values of Stops in Mandarin (reprinted from Lin & Wang, 2005) and Spanish (reprinted from Lisker & Abramson, 1964)

Stops VOT values (ms) VOT values (ms) Unaspirated voiceless

/p/ 6 4

/t/ 9 9

/k/ 24 29

Mean 13 14

Aspirated voiceless

/ph/ 88

/th/ 86

/kh/ 82

mean 85

Voiced

/b/ -138

/d/ -110

/g/ -108

mean -118

Taken together, CAH could predict that aspirated stops not allowed in CS-TM leaners’ phonological inventory impedes the learning of stops in TM. PAM would also focus on the simliarity of (un)aspirated voiceless stops between CS and TM, resulting in easier acquisition. SLM would claim that CS unaspirated stops prevent the establishment of a new distinct category for TM aspirated ones, resulting in interference. This raises a question: what happens if an L2 contrast involves some dimension, for example, aspiration, not being used contrastive in the learners’ L1? Thus, several perceptual and production experiments were conducted to examine whether the CS-TM learners encounter any difficulty in producing initial aspirated stops in onset position, and whether any improvement can be observed after several months living in a TM society. More sepcifically, several factors (such as context, articulators, and so on) will be examined to understand what obstacles impedes their performance when learning TM.

Method Participants

Fifteen participants2 (6F, 9M, 18-25; mean age=24;) were recruited to participate in both perception and production experiments. All are citizens of the Dominican Republic and TM beginning learners.

Most of them are Spanish-English bilinguals, and some of them also speak French, Italian, Russian.

Mandarin classes are taken six hours a day, meaning they have an opportunity to explore Mandarin step by step. None of them reports any hearing or reading disability.

Perception experiment Stimuli

A total of 288 Mandarin characters (the onset varies from 6 consonants [p, pʰ, t, tʰ, k and kʰ] X 12 rhymes [a, i, u, o, in, iŋ, un, uŋ, ən, əŋ, ɑn and ɑŋ] X four tones [H (Tone 1), R (Tone 2), L (Tone 3)

2 Originally, there were seventeen participants. However, because of the heavy Chinese learning program, only fifteen people were available.

70

and F (Tone 4)3]) were read by a native speaker of TM. All items were presented in Hanyu pinyin and the tones were presented in numbers (1 for H, 2 for R, 3 for L, and 4 for F), shown in Appendix.

Procedure and analysis

Participants were asked to choose one of three items represented in Hanyu Pinyin, based on the voice files that they just heard. If they were not sure, [other] was recommended. The first identification task took place after the participants had been learning TM for less than one month. The interval between 1st and 2nd experiment was about 3 months. The correct rate was served as the dependent variable when performing ANOVA for the 1st experiment and ANCOVA for the 2nd experiment. The independent variables factors suspected to be factors that affect learners’ performance were: “Vowel”,

“Aspiration”, “Place of Articulation (POA) of stops” and “CVC2”.

Production experiment Stimuli

The wordlist, shown in the Appendix, served as stimuli in the perception experiment.

Procedure and analysis

Praat (5.3.34) at a sampling frequency of 44100 Hz, and headphones (logitech) were used to record the stimuli read by the CS-TM leaners three times. The best stimuli was chosen and annotated by the second author. The production task took place after the second perceptual experiment was done. VOT values were measured as the dependent variable when performing ANOVA. The same factors were considered as indepedent variables in the perception experiment.

Results

Perception results (1st exp)

The ANOVA was performed and the results showed that there was a significant difference among vowels, F (4, 191) = 6.179, p < .05; the post hoc multiple comparison (Bonferroni) indicated that participants perceived stops adjacent to vowel /e/ and /i/ better than those adjacent to vowel /u/, and also perceived stops adjacent to vowel /i/ better than those adjacent to vowel /o/. There was also a significant difference in perceiving stops by means of their “POA”, F (2, 191) = 22.691, p < .05. The bilabial and velar stops were better perceived than alveolar ones. Perception of aspirated stops was significantly better than that of unaspirated ones, F (1, 191) = 34.36, p < .05. Finally, ‘tone’ was one of the factors found to significantly affect perception of stops in TM, F (3, 191) = 2.845, p < .05.

Perception results (2nd exp)

The ANCOVA was performed and the results showed that there was no difference with “Vowel” (F (4, 192) = 0.954, p > .05), “Aspiration” (F (1, 192) = 0.963, p > .05), “Tone” (F (3, 192) = 1.665, p >

.05), “POA” (F (2, 192) = 1.033, p > .05) and “CVC2” (F (1, 192) = 1.773, p > .05).

Comparisons between two perception results

A pair-samples t test was performed to examine the relationship between the 1st and 2nd experiment.

The results showed no significant difference between the first (M = -30.2, SD = 67.9) and the second (M = -29.01, SD = 85.1); t (674) = -0.54, p > .05. This means that our participants did not significantly change their way of perceiving TM when the correct rates were measured.

3 To describe the pitch contour: they are high-level (Tone 1), mid-rising (Tone 2), low-dipping (Tone 3) and high-falling (Tone 4) (Duanmu, 2007).

M. N. Chu, Y. D. Lin

71 Production results

The ANOVA was performed and the results showed there was a significant difference among

“Vowels” F (4, 191) = 10.724, p < .05. The post hoc multiple comparison (Bonferroni) indicated that our participants produced stops adjacent to vowel /e/ and /i/ longer than those adjacent to vowel /a/, and produced stops adjacnet to /u/ longer than those adjacent to vowel /a/ and /o/. There was also a significant difference in producing stops by means of their “POA”, F (2, 191) = 52.034, p < .05. Velar stops were produced significantly longer than alveolars, which were significantly longer than bilabials. Production of aspirated stops was significantly longer than unaspirated ones, F (1, 191) = 3302.011, p < .05. In addition, the production of stops was significantly affected by “tones”, F (3, 191) = 10.215, p < .05. The post hoc multiple comparison (Bonferroni) indicated that the production of stops with T3 was longer than those with T1 and T4. Finally, there was also a significant difference when the stimuli had a coda, F (1, 191) = 24.027, p < .05. Closed codas affect participants to produce longer VOT stops in onset position.

To summarize perception and production results in Table 2, ‘Vowel’, ‘Aspiration or not’, ‘POA of the stop’ affect perception and production of initial stops in TM. It is of interest that all these effects diminished in the second perceptual experiment.

Table 2. Results on perception and production

vowel tone aspiration POA CVC2

perception 1st */e,i/>/u/; /i/>/o/ *asp>unasp * bil,vel> alv 2nd

production */e,i/ > /a/;

/u/ > /a,o/ *T3>T1,T4 *asp>unasp * vel > alv>bil *with C2>without C2

Discussion Perception

Comparing the two results on perception, CS-TM learners have acquired aspirated stops since they have been learning TM for less than a month, meaning that aspiration stops are easy for them to perceive. This result is in line with PAM; the more similar L2 sounds are to L1 sounds, the more accurately they are perceived. However, this result could still be partially explained in the light of SLM.

The longer learners are immersed in L2, the greater the influence they will experience. Flege (1987) studied the VOT values of both French and American English (AE) bilinguals finding that both their VOT values were affected significantly by the length of stay in the L2 environment. Compared to Sancier and Fowler’s (1997) findings, whose participants had only stayed around three months, Flege’s finding of the VOT shift was more observable. Thus, our results are adequately explained by SLM by means of the learning process. Comparing our perceptual data with those in Flege (1987) and Sancier and Fowler (1997), both the ability to produce and perceive a native sound is affected by immersion in the L2 environment for just a few months. The non-native sounds, aspirated stops, are emphasized after a short time learning. At the same time, the ability to perceive unaspirated stops is reduced because of lack of exposure to the native CS environment. In this sense, our results provide further evidence to support the claim in SLM, i.e. the influence of the perceptual ability of the native sound.

As concerns which factors affect stop identification, stops adjacent to front vowels are perceived better than those adjacent to back vowels. Morris, McCrea, and Herring (2008) and Higgins et al.

(1988) explain that when a speaker produces vowel [i], the vocal folds tense and delay vibration. This may explain why VOT values are longer and more recognizable for stops adjacent to vowel [i]

(Kondaurova & Francis, 2008). However, the opposite observation was made by Peng (2009), and Rochet & Fei (1991) that [p] in TM has longer VOT values before [u] than before [i]. Could this

72

vocal-fold tension of vowel [i] be generalized to other front vowels causing longer duration? We will tentatively look into this possibility as further acoustic analysis research is needed.

Alternatively, Wu (2004) states that vowel /i/ has a lower F14 value compared to vowel /u/ (290 Hz vs. 380 Hz). Therefore, onset stops might have larger movement space when adjacent to vowel /i/ than to vowel /u/. In other words, because stops adjacent to vowel /i/ have longer VOT values, participants might perceive it better. Again, could this be generalized to other vowels, such as the mid-front vowel /e/. But such explanations need further investigation.

On the other hand, bilabial and velar stops are perceived better than alveolars. This result matches Winters’ (2000) finding that labials and dorsals were more salient than coronal stops; as a result, they are easier to be perceived.

Production

Stops adjacent to vowels /e/ and /i/ are produced longer than those adjacent to vowel /a/. Those adjacent to vowel /u/ are produced longer than those adjacent to vowel /a/ and /o/. The distinction between those conditions is the height of tongue position. High vowels inferring a condition of longer VOT values (Cho & Ladefoged, 1999) could benefit CS-TM learners’ production. The only similarity between perception and production in our result is that stops adjacent to vowel /i/ get longer VOT than those adjacent to vowel /a/ (Higgins et al., 1988).

The CS-TM learners produce velar stops longer than alveolar ones, which are longer than bilabial onsets. The VOT values of glottal and velar sounds are the longest (Cho & Ladefoged 1999; Kent &

Read, 2002; Wu, 2004), similar to native speakers of TM as shown in Table 1. This means that, after several months of TM training our participants acquired aspirated stops well enough to distinguish the subtle difference like native speakers do. It is not surprising to find that the VOT duration of aspirated stops (long lag) is longer than that of unaspirated onset (short lag), since participants produced them well. Whether tone or nasal codas play any role in the production of initial stops remains unknown.

Conclusion

Our experiments, involving 15 CS-TM learners, have revealed a speedy process of learning initial aspirated stops in both perception and production. These results are consistent with the PAM model proposition that the closer the L2 sound to the L1 sound, the easier such a sound is acquired. We suspect that the same orthography of Hanyu Pinyin to represent the difference between voiced- voiceless in CS and unaspirated-aspirated voiceless stops is facilitatory: i.e. /b/ and /p/ (Pinyin) matches /p/ and /ph/ (TM phoneme). Even though several factors, such as the vowel condition and the POA of the stop are found to affect the production and perception of initial stops, they are discussed in terms of articulation, or they conform to a pattern observed in most language, velars > alveolars >

bilabials. How exactly tone and coda play a role remains to be further investigated.

References

Best, C. T. (1995). A direct-realist view of cross-language speech perception. In W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Speech Research (pp. 171 - 206). York:

Timonium.

Best, C. T., McRoberts, G., & Sithole, N. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. Journal of Experimental Psychology, 14, pp. 345-360.

Cheng, M.-C. (2013). Voice onset time of syllable-initial stops in Sixian hakka: Isolated syllables. Journal of National Taiwan Normal University: Linguistics and Literature, 58(2), 193-227.

4 The F1 is mentioned here because it reflects the formant (position of the tongue) of the sound. The higher the tongue, the lower the F1; the lower the tongue, the higher the F1 will be.

M. N. Chu, Y. D. Lin

73

Cho, T., & Ladefoged, P. (1999). Variation and universal in VOT: Evidence from 18 languages. Journal of Phonetics, 27, 207-229.

Duanmu, S. (2007). The phonology of Standard Chinese. Oxford, UK: Oxford University Press.

Flege, J. E. (1987). The production of “new” and “similar” phones in a foreign language: evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47-65.

Flege, J. E. (1991). Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language. The Journal of the Acoustical Society of America, 89, 395-411.

Flege, J. E. (1993). Production and perception of a novel, second-language phonetic contrast. Journal of the Acoustical Society of America, 93, 1589-1608.

Flege, J. E. (1995). Second-language speech learning: Theory, findings and problems. In W. Strange (ed), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues (pp. 233-272).

Timonium, MD: York Press.

Flege, J. E. (1999). The relation between L2 production and perception. In J. Ohala, Y. Hasegawa, M.

Granveille, & A. Bailey (eds.) Proceedings of the XIVth International Congress of Phonetics Sciences (pp.

11273-1276). Berkely, United States.

Flege, J. E. (2002). Interactions between the native and second-language phonetic systems. In P. Burmeister, T.

Piske, & A. Rohde (eds.), An integrated view of language development: Papers in honor of Henning Wode (pp. 217–244). Trier, Wissenschaftlicher Verlag.

Higgins, M. B., Netsell, R., & Schulte, L. (1998). Vowel-related differences in laryngeal articulatory and phonatory function. Journal of Speech, Language, and Hearing Research, 41, 712-724.

Hu, G. (2012). Chinese and Swedish stops in contrast. In A. Eriksson, Å. Abelin, P. Nordgren, & K. Lundholm Fors (eds.), Proceedings of Fonetik 2012 (pp. 77-80). Gothenburg Department of Philosophy, Linguistics and Theroy of Science, University of Gothenburg.

Kent, R. D., & Read, C. (2002). The Acoustic Analysis of Speech (2nd ed.). CA: Singlular.

Klatt, D. (1975). Voice onset time, frication, and aspiration in word-initial consonant clusters. Journal of Speech and Hearing Research, 18, 686 -706.

Kondaurova, M. V., & Francis, A. L. (2008). The relationship between native allophonic experience with vowel duration and perception of the English tense/lax vowel contrast by Spanish and Russian listeners. The Journal of the Acoustical Society of America, 124(6), 3959-3971.

Lado, R. (1957). Linguistics Across Cultures. Ann Arbor: The University of Michigan Press.

Lin, Yen-Hwei (2007). The Sounds of Chinese. Cambridge, UK: Cambridge University Press.

Morris, R. J., McCrea, C. R., & Herring, K. D. (2008). Voice onset time differences between adult males and females: Isolated syllables. Journal of Phonetics, 36(2), 308-317.

Peng, J.-F. (2009). Factors fo voice onset time: stops in Mandarin and Hakka. Unpublished master thesis, National ChengKung University.

Rochet, B. L., & Fei, Y. (1991). Effect of consonant and vowel context on Mandarin Chinese VOT: Production and perception. Canadian Acoustics, 19(4), 105-106.

Salcedo, C. (2010). The phonological system of Spanish. Revista de Lingüística y Lenguas Aplicadas, 5, 195- 209.

Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25(4), 421-436.

Winters, S. (2000). Turning phonology inside out: Testing the relative salience of audio and visual cues for place of articulation. In R. Levine, A. Miller-Ockhuizen, & T. Gonsalvez (eds.), Ohio State Working Papers in Linguistics 53, (pp. 168-199). Ohio State University.

Wu, T.-J. (1998). Experimental study on Mandarin unaspirated / aspirated consonants. Chinese Language, 3, 256 -283.

Wu, T.-J. (2004). Wu Tsung-Ji Linugistic Papers. Beijing: Commercial Press.

Zheng J. (2011). Phonetics: Science of speech. Taipei: Psychology Press.

74

Appendix

The wordlist in Hanyupinyin

ba1 ba2 ba3 ba4 da1 da2 da3 da4 ga1 ga2 ga3 ga4

pa1 pa2 pa3 pa4 ta1 ta2 ta3 ta4 ka1 ka2 ka3 ka4

bi1 bi2 bi3 bi4 di1 di2 di3 di4 gi1 gi2 gi3 gi4

pi1 pi2 pi3 pi4 ti1 ti2 ti3 ti4 ki1 ki2 ki3 ki4

bu1 bu2 bu3 bu4 du1 du2 du3 du4 gu1 gu2 gu3 gu4

pu1 pu2 pu3 pu4 tu1 tu2 tu3 tu4 ku1 ku2 ku3 ku4

bo1 bo2 bo3 bo4 do1 do2 do3 do4 go1 go2 go3 go4

po1 po2 po3 po4 to1 to2 to3 to4 ko1 ko2 ko3 ko4

ban1 ban2 ban3 ban4 dan1 dan2 dan3 dan4 gan1 gan2 gan3 gan4 pan1 pan2 pan3 pan4 tan1 tan2 tan3 tan4 kan1 kan2 kan3 kan4 bin1 bin2 bin3 bin4 din1 din2 din3 din4 gin1 gin2 gin3 gin4 pin1 pin2 pin3 pin4 tin1 tin2 tin3 tin4 kin1 kin2 kin3 kin4 bun1 bun2 bun3 bun4 dun1 dun2 dun3 dun4 gun1 gun2 gun3 gun4 pun1 pun2 pun3 pun4 tun1 tun2 tun3 tun4 kun1 kun2 kun3 kun4 ben1 ben2 ben3 ben4 den1 den2 den3 den4 gen1 gen2 gen3 gen4 pen1 pen2 pen3 pen4 ten1 ten2 ten3 ten4 ken1 ken2 ken3 ken4 beng1 beng2 beng3 beng4 deng1 deng2 deng3 deng4 geng1 geng2 geng3 geng4 peng1 peng2 peng3 peng4 teng1 teng2 teng3 teng4 keng1 keng2 keng3 keng4 bang1 bang2 bang3 bang4 dang1 dang2 dang3 dang4 gang1 gang2 gang3 gang4 pang1 pang2 pang3 pang4 tang1 tang2 tang3 tang4 kang1 kang2 kang3 kang4 bing1 bing2 bing3 bing4 ding1 ding2 ding3 ding4 ging1 ging2 ging3 ging4 ping1 ping2 ping3 ping4 ting1 ting2 ting3 ting4 king1 king2 king3 king4 bong1 bong2 bong3 bong4 dong1 dong2 dong3 dong4 gong1 gong2 gong3 gong4 pong1 pong2 pong3 pong4 tong1 tong2 tong3 tong4 kong1 kong2 kong3 kong4

Dalam dokumen on monolingual and bilingual speech 2015 (Halaman 76-83)