Annotating Sequential Event Interactions in Virtual Reality-Mediated Embodiment Corpora

The characteristic of ephemerality is a natural aspect of spoken language. If not documented through text, audio, and video, then what has been spoken is no longer attainable for us to investigate empirically (Thompson 2005). An often-neglected aspect of language documentation is that the setting within which spoken language unfolds, in all its spatiotemporal dimensions, is equally ephemeral. Existing documentary linguistic methodologies, relying on tools such as fieldnotes and video camera equipment lack key aspects preventing them capturing crucial elements that define naturalistic spoken communication. One of these aspects is the ability to preserve the three-dimensional space exactly as experienced by language speakers during the collection of the documented language. Furthermore, there is a lack of emphasis on capturing the embodied presence of speakers within that space.

This paper is based on a methodology which utilizes virtual-reality and leverages the technologies behind spatial computing to mitigate  the ephemerality of the communicative setting. It enables linguistic researchers to capture and reconstruct these settings for in-depth exploration during subsequent analysis stages that follow fieldwork (Alsayed 2023). In contrast with other language documentation methodologies, this approach enables the comprehensive capture of communicative events, as asserted by Himmelmann (1998), encompassing speech, sound, and all aspects of communication, including the environment and a digital replication of the speakers themselves.

The paper discusses how an embodied corpus collected through virtual reality-mediated fieldwork can become an integral part of linguistic corpus study. It presents an annotation tagset for sequential event interactions within virtual environments used as stimuli for participants in linguistic fieldwork. Annotating the documented spoken samples with these proposed tags is shown to facilitate the statistical analysis of context-sensitive structural phenomena observed in linguistic outputs through utilizing visual domains as explanatory variables and testing the linguistic response to their changes. The paper concludes with a discussion of the relationship between visual input and linguistic output which, as asserted in Bergen (2015), are closely interrelated and are both an integral part of embodied human experiences. Visual cues form contextual domains within the common ground (Stalnaker 1974) of mutual information which is shared between speakers . Identifying the visual parameters of these cues is a key step towards triangulating and studying constructions triggered through their stimulus.

 

References

Alsayed, Abdulrahman A A. “Extended Reality Language Research: Data Sources, Taxonomy and the Documentation of Embodied Corpora.” Modern Languages Open, vol.1, 2023, p. 46. Liverpool University Press. DOI: https://doi.org/10.3828/mlo.v0i0.441

Bergen, B. (2015). Embodiment, simulation and meaning. In P. H. Matthews (Ed.), The Routledge handbook of semantics (pp. 142-157). Routledge.

Himmelmann, Nikolaus P. “Documentary and Descriptive Linguistics.” Linguistics, vol. 36, 1998, pp. 61–195. De Gruyter. DOI: https://doi.org/10.1515/ling.1998.36.1.161

Stalnaker, R. (2002). Common ground. Linguistics and Philosophy, 25(5/6), 701-721.

Thompson, P. A. (2005) Spoken language corpora. In: Wynne, M. (ed.) Developing linguistic corpora: a guide to good practice. AHDS guides to good practice. Oxbow, Oxford, pp. 59-70