Date of Award:


Document Type:


Degree Name:

Doctor of Philosophy (PhD)


Electrical and Computer Engineering

Committee Chair(s)

Todd K. Moon


Todd K. Moon


Jacob H. Gunther


Scott E. Budge


Stephanie A. Borrie


Kevin R. Moon


In many different fields there are signals that need to be aligned or “warped” in order to measure the similarity between them. When two time signals are compared, or when a pattern is sought in a larger stream of data, it may be necessary to warp one of the signals in a nonlinear way by compressing or stretching it to fit the other. Simple point-to-point comparison may give inadequate results, because one part of the signal might be comparing different relative parts of the other signal/pattern. Such cases need some sort of alignment todo the comparison. Dynamic Time Warping (DTW) is a powerful and widely used technique of time series analysis which performs such nonlinear warping in temporal domain. The work in this dissertation develops in two directions. The first direction is to extend the this dynamic time warping to produce a two-level dynamic warping algorithm, with warping in both temporal and spectral domains. While there have been hundreds of research efforts in the last two decades that have applied and used the one-dimensional warping process idea between time series, extending DTW method to two or more dimensions poses a more involved problem. The two-dimensional dynamic warping algorithm developed here for a variety of speech signal processing is ideally suited.

The second direction is focused on two speech signal applications. The First application is the evaluation of dysarthric speech. Dysarthria is a neurological motor speech disorder, which characterized by spectral and temporal degradation in speech production. Dysarthria management has focused primarily teaching patients to improve their ability to produce speech or strategies to compensate for their deficits. However, many individuals with dysarthria are not well-suited for traditional speaker-oriented intervention. Recent studies have shown that speech intelligibility can be improved by training the listener to better understand the degraded speech signal. A computer-based training tool was developed using a two-level dynamic warping algorithm to eventually be incorporated into a program that trains listeners to learn to imitate dysarthric speech by providing subjects with feedback about the accuracy of their imitation attempts during training.

The second application is voice transformation. Voice transformation techniques aims to modify a subject’s voice characteristics to make them sound like someone else, for example from a male speaker to female speaker. The approach taken here avoids the need to find acoustic parameters as many voice transformation methods do, and instead deals directly with spectral information. Based on the two-Level DW it is straightforward to map the source speech to target speech when both are available. The resulted spectral warping signal produced as described above introduces significant processing artifacts. Phase reconstruction was applied to the transformed signal to improve the quality of the final sound. Neural networks are trained to perform the voice transformation.



Additional Files

conv.mp3 (51 kB)

female.mp3 (51 kB)

female_a0002.mp3 (61 kB)

female_a0003.mp3 (59 kB)

female_a0004.mp3 (46 kB)

female_a0005.mp3 (29 kB)

female_a0006.mp3 (55 kB)

female_a0007.mp3 (53 kB)

male.mp3 (48 kB)

male_a0002.mp3 (53 kB)

male_a0003.mp3 (53 kB)

male_a0004.mp3 (41 kB)

male_a0005.mp3 (21 kB)

male_a0006.mp3 (48 kB)

male_a0007.mp3 (48 kB)

phase0no.mp3 (51 kB)

phase0yes.mp3 (51 kB)

phase2.mp3 (51 kB)

phase3.mp3 (51 kB)

phrase1phase1.mp3 (51 kB)

phrase1phase2.mp3 (51 kB)

phrase2.mp3 (52 kB)

phrase3.mp3 (52 kB)

phrase4.mp3 (38 kB)

phrase5.mp3 (22 kB)

phrase6.mp3 (47 kB)

phrase7.mp3 (46 kB)

waled.mp3 (51 kB)