Creating a dataset for neural network training (Speech Recognition)

4 visualizaciones (últimos 30 días)
Hi,
Firstly, I'd like to apologise for asking this question which I'm aware has been asked plenty of times before. However, despite having gone through the solutions to a number of resolved questions on the subject on this forum, I'm still unclear as to how to proceed with my particular problem which I will attempt to elaborate as well as I can in the following paragraph.
I am training a neural network to perform consonant recognition using MFCCs. Here are a few numbers that might come in handy to get an idea of my problem:
1) I have recorded voice samples from 16 people, and have 227 voice samples per person (So that's 3632 samples in all) . A different proportion of this number corresponds to different output classes (for example, 13 of these 227 samples correspond to the output class of consonant 'b', 12 samples correspond to consonant 'd' and 5 correspond to consonant 'q').
2) Each voice sample generates an MFCC matrix (feature vector) of dimensions 13x15 which I want to train the network with. This means that every voice sample is divided into 15 frames, each of which has 13 MFCC values.
3) There are 20 output classes (for 20 different basic consonant sounds).
My questions are :
1) How do I format the input and target matrices? I have a feeling it may be of dimensions 195x3632 (13*15=195), where each column corresponds to each voice sample, and every column contains 13 MFCC values for each of the 15 frames per sample i.e. 195 values in all. The target dataset is probably a matrix of dimensions 20x3632 (where every column has 19 'zeroes' and 1 'one' indicating which consonant that particular voice sample corresponds to)
Could you confirm if this correct?
2) If this correct, during the testing of the network with unknown inputs, would the inputs have to be arranged in a matrix of dimensions 195x1? (i.e. 13 MFCC values for each of the 15 frames per input voice sample)
3) If my premise is correct, could you tell me how to create a dataset file? Do I even have to create a separate dataset .m file if I simply save my input and target matrices to the workspace and simply load them in the NN toolbox GUI?
4) How would I separate the training data into training, validation and test data? do I simply load the input and target matrices into the GUI, select 'matrix columns' as samples and specify the appropriate percentages?
5) Lastly, would a 195x3632 matrix be too large a training set or would I have to trim it down?
I have tried to explain my problem in as much detail as possible. I'd greatly appreciate an answer that is specific to my question rather than a generic answer on how to create input and target matrices as has been presented as solutions to previous questions, as I had trouble understanding them.
Once again, I apologise for asking a question that has been asked before.
Cheers,
Anand

Respuesta aceptada

Greg Heath
Greg Heath el 23 de Abr. de 2014
It looks like you know how to do it. So ... go do it.
Yes you might have to use input dimensionality reduction. However, try without it first.
  3 comentarios
kamalvir
kamalvir el 28 de En. de 2015
my mfcc feature values are both positive and negative and vary over a wide range. i have heard some preprocessing is required for inputting to a neural network to make values vary fron 0 to 1. how to do it??please reply . i need this desperately.

Iniciar sesión para comentar.

Más respuestas (1)

Foresight india
Foresight india el 11 de Ag. de 2017
We are starting to develop Automatic voice response system| neural technology
-The challenge here is to replicate human voice to offer better communication and for this we have to use multi layered neural network application to handle- AVR- Automatic voice response We are inviting people already working in this domain, to review or share your suggestion.

Categorías

Más información sobre Sequence and Numeric Feature Data Workflows en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by