Voice quality database
Research type
Research Study
Full title
Voice quality database
IRAS ID
264292
Contact name
Chaitanya Gadepalli
Contact email
Sponsor organisation
Salford Royal Foundation Trust
Duration of Study in the UK
0 years, 11 months, 31 days
Research summary
Voice disorders are common and comprise a significant proportion of ear, nose and throat referrals from primary practice. Many voice problems are long-term. Various conditions can cause voice disorders such as; voice-strain due to excessive speaking or singing, vocal cord damage, infection, side-effects of inhaled steroids used to treat asthma or more serious disease including laryngeal cancer and neurological disease. Problems with voice cause significant morbidity in day-to-day life. They adversely affect the primary means of communication of most people, causing even more problems in people who are dependent on their voice for their occupation.
This project is the regarding assessment of voice quality. Voice can be expertly assessed by trained Speech & Language Therapists (SLTs). They use perceptual methods which involve listening to patients as they engage in certain prescribed vocal manoeuvres. Any observed loss of voice quality may be quantified according to a well-recognised standard known as the ‘GRBAS’ scale. This requires the trained clinician to assess five characteristics of voice quality: Grade, Roughness, Breathiness, Asthenia and Strain. Each characteristic (referred to as a GRBAS component), is given a numerical score 0, 1, 2 or 3. A score of 0 indicates that there is no abnormality in that characteristic. A score of 1 indicates mild abnormality; 2 indicates a moderate degree of abnormality and 3 indicates severe abnormality.
Perceptual scoring according to the GRBAS scale has the advantage of being well-defined, standardised, widely understood and recommended by many professional bodies. The need for time-consuming consultations with highly trained clinicians is a limitation. Also, even among groups of well-trained clinicians, there is usually significant inconsistency in the scoring. This has been observed in experiments where clinicians are asked to repeat their assessments of some patients and when different clinicians are engaged to assess the same patients. The two types of inconsistency are referred to as intra-rater and inter-rater error, respectively. It may therefore be argued that there is a need for computerised voice quality measurement which is less time-consuming and consistent than perceptual scoring by clinicians. Commercial and academic programs already exist for analysing digitally recorded speech samples, measuring any loss of quality. The software can be used across all or most types of voice pathology. However; up to date, such software has had a limited clinical role, largely because the measurements produced are mathematically complicated and difficult for clinicians to interpret. This project has been motivated by the belief that computerised voice quality assessment techniques would be better accepted if their results were unambiguously related to the widely known GRBAS standard.Voice characteristics may be measured scientifically by running computer programs (or ‘applications’) which apply ‘digital signal processing’ (DSP) to recordings of acoustic speech and/or ‘Electroglottograph’ (EGG) waveforms. EGG waveforms are obtained by just like ECG (Electro cardio graph) by attaching electrodes across the neck of a patient to monitor the movement of the vocal cords. Measuring involuntary variations in the pitch, (or fundamental frequency) of the voice, its amplitude, harmonic-to-noise ratio and other characteristics produce scientific data that is indicative of voice quality. The DSP techniques used to make the measurements employ spectral analysis and the concept of a ‘cepstrum’. The basis of these techniques is the ‘fast Fourier transform’ (FFT). This is a universal algorithm for extracting the frequency components of signals such as speech. Many refinements are needed to use the techniques effectively for dysphonic voices where the fundamental frequency is not well-defined. Whilst the techniques work very well for normal voices and are routinely used every day with mobile telephones, their reliability for quality impaired voices is much less well understood. Their effective clinical application is still a significant challenge in DSP research. In principle, DSP methods can provide voice quality assessment of voiced speech (vowels) and distinguish vowels from non-periodic speech segments (consonants) as occur in continuous speech samples (spoken sentences). However, the more severely dysphonic the voice, the more difficult and potentially unreliable the measurements.
In recent times cepstral and spectral-based measures have been studied more in dysphonia, characterised by breathiness and roughness. These methods are effective in distinguishing strained dysphonia from normal voice quality. The utility of these acoustic measures is supported by their moderate-to-high relationship with perceptually rated strain severity. All the computerised methods produce numerical values which may mean little to a clinician. We wish to critically analyse voice samples in normal population and patients with voice pathology, in order to produce a computerised measure that correlates with the standard subjective perceptual tool (“GRBAS”). We will be using conventional DSP analysis, including cepstrum techniques, and our own DSP algorithms. The second aspect of our research is to analyse voice in specific diseases such as Mucoplysacccharodosis, subglottic / tracheal stenosis. This will help us understand the effects various pathologies can have on voice.
REC name
Social Care REC
REC reference
19/IEC08/0033
Date of REC Opinion
4 Jul 2019
REC opinion
Favourable Opinion