Corpus

We have recorded an audio/visual corpus of conversations held in two European languages – Standard Southern British English and Bosnian Serbo-Croatian, totaling about 4 hours per language. There were four young adult participants for each language. The participants, who already knew each other, were recorded in a naturalistic setting in their home country. Each speaker was recorded on a separate audio channel, so that their talk can be analysed acoustically even when speaking in overlap. For a full description of the corpus, see Kurtic et al. (2012). All the recordings have been transcribed orthographically in ELAN and each instance of overlap has been annotated on a number of parameters. In due course we will make an annotated database of our audio/visual recordings available to other researchers via the Internet, and provide the facility for researchers to share their own annotations of the corpus. In this way, the corpus of recordings will be an evolving resource that will continue to benefit the research community beyond the life of the project.

A short extract from the Bosnian Serbo-Croatian recordings is given below, together with a transcript (including English translation).

Here is a second example, with sound files and transcript in a Powerpoint presentation:

A short extract from the English corpus is given below, illustrating the individual and mixed recordings of two talkers in overlap:

Access to the full corpus is available on request; please contact us to arrange this.