next up previous
Next: Threshold settings Up: The IDIAP system Previous: Parametrization

Classifier

As classifier an MLP system is used [10]. The size of the MLP is 462 input neurons, 100 neurons on the hidden layer and 2 neurons on the output layer. The 462 input neurons correspond to 11 consecutive input vectors, in order to capture more long term speech events. the 2 neurons of the output layer are the local log likelihood score (LLS) of the target speaker (tex2html_wrap_inline166) and the non-target speaker ( tex2html_wrap_inline168)(also named world or cohort). These LLS are summed along the speech segment (using N frames) to obtain a total log likelihood tex2html_wrap_inline172 for the target speaker and tex2html_wrap_inline174 for the non-target speaker.

tex2html_wrap_inline176

tex2html_wrap_inline178

TLLR=TLLsp-TLLns

The final score used for each speech segment is TLLR which correspond to a log likelihood ratio [9].

One MLP system is built for each target speaker. Thecohort speaker data were created from around 40 male and 40 female speakers speech extracted from Switchboard database. The total amount of speech for each training condition was balanced with the amount of data for each target speaker (i.e. 1 minute).



Dominique Genoud
Mon Aug 18 15:56:59 MET DST 1997