New: the Sounds of World English: make a 'map' of your own accent and contribute to this global project.
New: in a recent article in Nature, we show how soprano
singers tune one of the resonances of the vocal tract to
the note they are singing. New: we show some recent
results and give an explanation of the acoustics of harmonic
singing.
Voice Acoustics: our research
For background information, see our introduction
to the acoustics of the vocal tract. It also has an explanation
and some sound files of helium
speech.
Why study vocal tract acoustics? There are fundamental
scientific questions to answer in the field of acoustical
phonetics, but we are also interested in the applications
in speech training (language teaching), and in speech
pathology. When adults or teenagers learn a foreign language,
they rarely achieve authentic pronunciation and are sometimes
almost unintelligible to speakers of that language. This difficulty
is due to imprecision or inadequacy of the auditory feedback
system usually used to learn languages - students often cannot
hear how wrong their imitation of a sound is, and do not know
what to do to improve it. (Technically, the problems are called
categorisation and interference.) The problem is even more
severe for the hearing impaired who have little or no auditory
feedback and can obtain very little feedback about the interior
of the vocal tract from looking at the lips. A feature of
the "deaf accent" is inappropriate use of the soft
palate - which is not surprising given how difficult it is
to see or to feel what one's soft palate is doing during speech.
We also investigate the acoustics of the singing voice,
partly for its intrinsic interest, and partly with the aim
of improving pedagogy in that field.
We have developed a device for measuring some important
acoustic properties of the vocal tract non-invasively, in
real-time, while the owner of the vocal tract is speaking
or singing. We use it as a research tool, but we have demonstrated
its use as a speech trainer.
Existing technologies used in speech pathology and
speech trainers to provide visual feedback from the speech
sound are inherently limited in precision and practicality.
Even the most advanced speech recognition systems still mistake
words, which indicates the limits of their precision in accurate
measures of pronunciation. The basic problem is that the speech
signal alone does not have enough information in it to allow
us to work out, quickly and precisely, the configuration of
the vocal tract. This is not a problem for understanding speech,
but it may be a problem in learning precise pronunciation.
Our approach is therefore to introduce a signal with more
information in the frequency domain.
Our technology is called Real-time Acoustic
response by Vocal tract Excitation or RAVE.
In model experiments using the laboratory prototype, we have
shown that one or two hours' training using visual feedback
of some key features of the acoustical response of a subject's
vocal tract improves the accuracy and intelligibility of pronunciation
of foreign phonemes by monolingual adults.
How it works:
We inject into the vocal tract an acoustic current which is
synthesised to give high resolution frequency information over
the frequency range of interest. We then measure the impedance
of the vocal tract in parallel with the external field using
the response to this excitation signal.
In this figure, the author pronounces the vowel in 'heard'.
The sharp vertical peaks are the harmonics of my voice. The
broad signal shows the response of my vocal tract to the acoustic
curent signal being injected from the lips.
For this vowel, my vocal tract behaves rather
like a cylinder about 170 mm long, nearly closed at the vocal
folds and open at the mouth. A cylinder, length L, closed
at one end has resonances at f0 = v/4L , at 3f0,
5f0 etc, where v is the speed of sound. (See pipes
and harmonics.) So we see resonances at about 0.5, 1.5,
2.5, 3.5 and 4.5 kHz, which appear as the peaks in the
smooth curve in this figure. When I pronounce the vowel in
"had", I open my mouth wider, so the tract is no longer cylindrical,
but flared at the open end, a bit like the flare and bell
on a brass instrument. One of the effects
of a this shape in a brass instrument is to raise the
frequencies of the resonances, especially those of the lower
resonances. (In a related example, conical pipes have resonances
at higher frequencies than do cylindrical ones. See this
link for an explanation.)
From this response we can readily determine the resonances
of the vocal tract, independently of the speech signal. The
resonant frequencies are interesting for fundamental acoustical
phonetic research but, if we extract them in real time, they
can be used to drive a cursor for speech training. This is
how we do it in the real time version.
Schematic diagram. (a) shows the spectrum of the
speech signal alone. This male voice has harmonic partials
spaced at the pitch frequency 126 Hz. (b) The injected
signal has frequencies spaced at 5 Hz, whose amplitudes are
calibrated (in this case) using the radiation field outside
the speake's mouth. (c) The sum of the speech signal
and the broad band signal (including the effects of the resonances)
goes from the microphone to the ADC. The speech signal is
used to measure pitch and amplitude; then the harmonic components
below 1 kHz are removed. (d) The resonances are detected
from the remaining interpolated signal. Similarly, the broadband
signals may be removed to leave just the speech harmonics.
In the real-time version of the device used for speech training,
the resonance frequencies are used to position the cursor
on the vowel plane (see below). Notice that the signal:noise
ratio in these figures is greater than in the preceding figure.
This is a consequence of making the measurements rapidly.
How it looks:
This is a screen dump of the feedback display in the current
speech trainer device, set up with targets from Australian
English. The background ellipses are measurements of the vowels
of 33 Australian men, with mean values for each vowel at the
centre of each ellipse. The semi-axes are the standard deviations
in R1 and R2. These or other areas can be used as targets
in speech training. A cursor on the monitor (the cross at
(1190,530)) shows the current configuration of the subject's
own vocal tract. Initially, subjects 'steer' the motion of
the cursor by consciously controlling jaw and tongue position.
Speakers of the language displayed can 'aim' towards one of
the vowels shown. After some practice, however, it becomes
nearly as automatic as using a joy-stick or a mouse - one
just 'makes it go' where one wants, without thinking of the
muscular details. In other words, a visual feedback loop is
unconsciously used to train articulation.
Does it work?
For a report of a trial experiment using a prototype system
as a language trainer, see our papers:
- Dowd, A., Smith, J.R. and Wolfe, J. (1998) "Learning
to pronounce vowel sounds in a foreign language using acoustic
measurements of the vocal tract as feedback in real time"
Language and Speech, 41, 1-20.
- Epps, J., Smith, J.R. and Wolfe, J. (1997) "A novel instrument
to measure acoustic resonances of the vocal tract during
speech" Measurement Science and Technology 8,
1112-1121.
- Donaldson, T., Wang, D., Smith, J. and Wolfe, J. (2003)
"Vocal
tract resonances: a preliminary study of sex differences
for young Australians", Acoustics Australia,
31, 95-98.
- J., Dowd, A., Smith, J.R. and Wolfe, J. (1997) Real
time measurements of the vocal tract resonances during speech
Eurospeech'97 (G. Kokkinakis, N. Fakotakis &
E. Dermatas, eds.) Rhodes, 721-724.
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Tuning
of vocal tract resonances by sopranos", Nature,
427, 116.
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Vocal
tract resonances in singing: the soprano voice", J.
Acoust. Soc. America, 116, 2434-2439.
Some explanatory notes
Related pages
Researchers at UNSW: Annette Dowd, Nicola Dwyer,
Julien Epps, John Smith, John Tann and Joe Wolfe |