WYRED Research Aim 1

Collection of a West Yorkshire Database

Research Aim 1: Collection of a West Yorkshire Database

There are currently no large, forensically-relevant databases that are representative of Northern British English speech varieties. WYRED will collect a database of West Yorkshire (WY) male speakers from three of the biggest urban areas in the county (Kirklees, Bradford and Wakefield). The most recent census indicated that there were at least 5.3 million people living in Yorkshire and the Humber (Office of National Statistics, 2011). The proximity of the regions to the University of Huddersfield, the host institution, will facilitate the recruitment of participants for the study.

The database will include spontaneous speech, from a large number of speakers from three regions within WY, recorded under both studio and telephone quality conditions. 180 male speakers will be recorded. Participants will be required to have English as their first and only language, have gone to school in one of the four areas, and grown up in an English-only speaking household. 60 speakers will be collected from each of the following areas within WY: Kirklees, Bradford and Wakefield. This is a substantially larger number of participants than in previous forensic phonetic studies of English, and the first to include multiple urban areas within the same county.

The recordings will closely follow the collection procedures followed by the DyViS database for Task 1 and Task 2 (Nolan et al., 2009). Task 3 will consist of non-crime related discussion (e.g. sport, food, travel) between a speaker and a friend that meets the same eligibility requirements. It is often the case that suspects offer ‘no comment’ responses related to the crime in question. However, in order to obtain a speech sample from the suspect, the interviewer will attempt to discuss another topic.

The following data will be collected from all subjects:

  • Task 1: a mock police interview (research assistant A; studio quality)
  • Task 2: telephone conversation with ‘accomplice’ (research assistant B) – interview debrief (recorded on two lines – studio and telephone quality)
  • Task 3: spontaneous conversation with a friend (similar age, same gender and region) (studio quality; recorded one week later)
  • Task 4: answerphone message recording (recorded on two lines – studio and telephone quality; one week later)

All speakers will record the first two tasks and the final tasks one week apart to enable the analysis of non-contemporaneous variation, a significant concern in forensic cases (Enzinger and Morrison, 2012; Morrison et al., 2012). Full orthographic transcripts of all data will be provided.

Data Analysis

All analysed linguistic-phonetic data will be used to address Aims 2 and 3 for the project. The studio quality data will be used in the analysis of hesitation markers and long-term formant distributions, as it is comparable to work already carried out on the DyViS database (see Gold et al., 2013; Wood et al., 2014). Voice quality (VQ) and fundamental frequency (f0) will be analysed using both the studio and telephone quality data, as comparable work has also been carried out on the DyViS databse (see Stevens and French, 2012; Hudson et al., 2007; Gold, 2014; current telephone quality f0 analysis currently being carried out by Hudson). These four speech parameters have been selected based on expert opinion regarding highly discriminant parameters (Gold and French, 2011), and have also been shown to be highly discriminant in previous research (Gold et al., 2013; Stevens and French, 2012; Wood et al., 2014; Gold, 2014). The analysis of the four parameters will be used to provide population data for WY, and will also be used to determine the generalisability of regional speech data.

Data will be analysed with Praat using standard acoustic techniques. Statistical analysis will be carried out using MatLab implementation of Aitken and Lucy’s (2004) Multivariate Kernel-Density formula (Morrison, 2007) to calculate likelihood ratios. The likelihood ratio (LR) is a gradient measure of the value of evidence (Aitken and Taroni, 2004) or what is also referred to as the strength of evidence (Rose, 2002) under a Bayesian framework. An LR is the calculation of the probability of obtaining the results of a given forensic examination on the basis of the prosecution hypothesis divided by the probability of obtaining those same results on the basis of the defence hypothesis. Calculating LRs will allow the assessment of typicality of the specific speech parameters in the population and also allow for direct comparisons to the suitability of using different regional population data as the background population. LR models do not currently exist to analyse VQ, therefore the methods for analysing variability and considering the discriminant ability of the parameter will follow that of Stevens and French (2012).