The Investigation of Code-Switching in a Computerised Corpus of Child Bilingual Language

Lonngren Sampaio, Catherine Anne

View/Open

Author

Lonngren Sampaio, Catherine Anne

Abstract

This dissertation describes the investigation of codeswitching in a computerised corpus of child bilingual language, the LOBILL Corpus, which consists of twenty-five hours of recordings of naturalistic interactions between two bilingual Brazilian/English siblings (JAM, 3;6 and MEG, 5;10) and their family members. Collected over three years, the data was transcribed and coded using the CHAT (Codes for the Human Analysis of Transcripts) transcription system developed by MacWhinney and colleagues (MacWhinney, 1991). In addition to standard CHAT coding, language codes were inserted throughout the corpus and a specially developed postcode was added to all bilingual utterances. Addressee information for each utterance was also included. The longitudinal and heterogenous nature of the corpus and its specific coding allowed for the comprehensive investigation of the children's code-switching practices from both grammatical and pragmatic perspectives. Three levels of analyses were performed using the CLAN (Computerized Language ANalysis) software (ibid). First, quantitative analyses were carried out using the commands FREQ (which outputs frequency word lists), VOCD (which outputs vocabulary diversity scores) and WDLEN (which outputs mean word and utterance lengths). An analysis of the results pointed to the existence of relationships between the various values found and the participatory roles of English and Portuguese in code-switched utterances. The second level of analysis involved the examination and interpretation of word lists and code lists produced by the use of FREQ. Using Myers-Scotton's 4-Morpheme Model (4-M Model) (Jake & Myers-Scotton, 2009) to interpret the word lists, comparisons of morpheme types revealed the existence of an asymmetry in terms of the contributions of both languages to bilingual utterances. These results were seen to lend support to the Matrix Language/Embedded Language asymmetry proposed in the Matrix Frame Language Model (MFL Model) (ibid). The quantitative analysis of four types of codes (used to code instances of retracings and reformulations, errors, tag questions and metalinguistic usage) provided evidence for the existence of potential relationships between these features of spoken discourse and code-switching.