Discriminant analysis

This tutorial will show you how to perform discriminant analysis with Praat.

As an example, we will use the dataset from Pols et al. (1973) with the frequencies and levels of the first three formants from the 12 Dutch monophthongal vowels as spoken in /h_t/ context by 50 male speakers. This data set has been incorporated into Praat and can be called into play with the Create TableOfReal (Pols 1973)... command that can be found under NewTablesData sets from the literature.

In the list of objects a new TableOfReal object will appear with 6 columns and 600 rows (50 speakers × 12 vowels). The first three columns contain the formant frequencies in Hz, the last three columns contain the levels of the first three formants given in decibels below the overall sound pressure level of the measured vowel segment. Each row is labelled with a vowel label.

Pols et al. use logarithms of frequency values, we will too. Because the measurement units in the first three columns are in Hz and in the last three columns in dB, it is probably better to standardize the columns. The following script summarizes our achievements up till now:

table = Create TableOfReal (Pols 1973): "yes"
Formula: ~ if col < 4 then log10 (self) else self fi
Standardize columns
# change the column labels too, for nice plot labels.
Set column label (index): 1, "standardized log (%F__1_)"
Set column label (index): 2, "standardized log (%F__2_)"
Set column label (index): 3, "standardized log (%F__3_)"
Set column label (index): 4, "standardized %L__1_"
Set column label (index): 5, "standardized %L__2_"
Set column label (index): 6, "standardized %L__3_"

To get an indication of what these data look like, we make a scatter plot of the first standardized log-formant-frequency against the second standardized log-formant-frequency. With the next script fragment you can reproduce the following picture.

Select outer viewport: 0, 5, 0, 5
selectObject: table
Draw scatter plot: 1, 2, 0, 0, -2.9, 2.9, -2.9, 2.9, 10, "yes", "+", "yes"

Apart from a difference in scale this plot is the same as fig. 3 in the Pols et al. article.

1. How to perform a discriminant analysis

Select the TableOfReal and choose from the dynamic menu the option To Discriminant. This command is available in the "Multivariate statistics" action button. The resulting Discriminant object will bear the same name as the TableOfReal object. The following script summarizes:

selectObject: table
discriminant = To Discriminant

2. How to project data on the discriminant space

You select a TableOfReal and a Discriminant object together and choose: To Configuration.... One of the options of the newly created Configuration object is to draw it. The following picture shows how the data look in the plane spanned by the first two dimensions of this Configuration. The directions in this configuration are the eigenvectors from the Discriminant.

The following script summarizes:

selectObject: table, discriminant
To Configuration: 2
Select outer viewport: 0, 5, 0, 5
Draw: 1, 2, -2.9, 2.9, -2.9, 2.9, 12, "yes", "+", "yes"

If you are only interested in this projection, there also is a shortcut without an intermediate Discriminant object: select the TableOfReal object and choose To Configuration (lda)....

3. How to draw concentration ellipses

Select the Discriminant object and choose Draw sigma ellipses.... In the form you can fill out the coverage of the ellipse by way of the Number of sigmas parameter. You can also select the projection plane. The next figure shows the 1-σ concentration ellipses in the standardized log F1 vs log F2 plane. When the data are multinormally distributed, a 1-σ ellipse will cover approximately 39.3% of the data. The following code summarizes:

selectObject: discriminant
Draw sigma ellipses: 1.0, "no", 1, 2, -2.9, 2.9, -2.9, 2.9, 12, "yes"

4. How to classify

Select together the Discriminant object (the classifier), and a TableOfReal object (the data to be classified). Next you choose To ClassificationTable. Normally you will enable the option Pool covariance matrices and the pooled covariance matrix will be used for classification.

The ClassificationTable can be converted to a Confusion object and its fraction correct can be queried with: Confusion: Get fraction correct.

In general you would separate your data into two independent sets, TRAIN and TEST. You would use TRAIN to train the discriminant classifier and TEST to test how well it classifies. Several possibilities for splitting a dataset into two sets exist. We mention the jackknife ("leave-one-out") and the bootstrap methods ("resampling").

5.1 Jackknife classification

The following script summarizes jackknife classification of the dataset:

selectObject: table
numberOfRows = Get number of rows
for irow to numberOfRows
selectObject: table
rowi = Extract rows where: ~ row = irow
selectObject: table
rest = Extract rows where: ~ row <> irow
discriminant = To Discriminant
plusObject: rowi
classification = To ClassificationTable: "yes", "yes"
if irow = 1
confusion = To Confusion: "yes"
else
plusObject: confusion
Increase confusion count
endif
removeObject: rowi, rest, discriminant, classification
endfor
selectObject: confusion
fractionCorrect = Get fraction correct
appendInfoLine: fractionCorrect, " (fraction correct, jackknifed ", numberOfRows, " times)."
removeObject: confusion

5.2 Bootstrap classification

The following script summarizes bootstrap classification.

fractionCorrect = 0
for i to numberOfBootstraps
selectObject: table
resampled = To TableOfReal (bootstrap)
discriminant = To Discriminant
plusObject: resampled
classification = To ClassificationTable: "yes", "yes"
confusion = To Confusion: "yes"
fc = Get fraction correct
fractionCorrect += fc
removeObject: resampled, discriminant, classification, confusion
endfor
fractionCorrect /= numberOfBootstraps
appendInfoLine: fractionCorrect, " (fraction correct, bootstrapped ", numberOfBootstraps, " times)."

Links to this page


© djmw 20170829