How to pass new sequence to trained neural network in order predict protein structure from sequence?

1 visualización (últimos 30 días)

Kendall el 10 de Mzo. de 2014

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/120795-how-to-pass-new-sequence-to-trained-neural-network-in-order-predict-protein-structure-from-sequence

Editada: Greg Heath el 20 de Mzo. de 2014

Hello. A fellow student generated a MATLAB function that uses a series of amino acid (AA) sequences to train a neural network to predict secondary protein structure from AA sequence.

    function net= ssp_train( aa, ss, halfwindow, net )
    % Train a neural network on the given amino acid sequence aa and secondary structure sequence ss. 
    % If halfwindow is not provided, use halfwindow=1.
    if  ~exist('halfwindow','var') 
      halfwindow = 1; 
    end
    % If net is not provided, create a neural network with 5 hidden units.
    if ~exist('net', 'var') 
      num_hidden_units = 5;
    % If net is provided but is an integer, create a neural network with that many hidden units.
    % All integers are divisible by 1. So a good test for non-integer could use the mod function:
    elseif mod(net,1)==0
      num_hidden_units = abs(net);
    else 
        error('Please provide hidden unit number as an integer!');
    % alternatively, replace line#17-20 by else num_hidden_units = abs(round(net));
    end
    % arg represents row vector of one or more hidden layer sizes
    net = feedforwardnet(num_hidden_units);
    % Set net.trainParam.showWindow=false to prevent the GUI. 
    net.trainParam.showWindow=false;
    % Change the net.divideParam, so that the 
    %   80% of data is used for training, and 
    %   20% is used % for validation. (0% is used for testing).
    net.divideParam.trainRatio = 0.8;
    net.divideParam.valRatio   = 0.2;
    net.divideParam.testRatio  = 0.0;
    W = 2*halfwindow + 1;
    % Figure 11.21: A number of nodes are present in the input layer, 
    % which can be fired by certain types of residues, e.g., the D (Asp).
    % There are often 20 nodes per residue, with just one having the value 1 (which means it is activated).
    % The nodes that are activated then pass a signal to the hidden layers, where
    % conditional and addititive calculations are performed on the information
    %
    % Code adapted from http://www.mathworks.com/help/nnet/ug/create-neural-network-object.html
    %=== binarization of the input matrix
      seqInt = double(aa2int(aa));  
      seqInt(seqInt>20)=0; % Any amino acid with integer representation >20 is represented with all zeros
      winP = hankel(seqInt(1:W),seqInt(W:end)); % concurrent inputs (sliding windows)
      P = double(kron(winP,ones(20,1)) == kron(ones(size(winP)),(1:20)'));  % TBD: Get Input  P
    %=== binarization of the target matrix
      ssInt = zeros(size(ss));
      ssInt(ss=='H') = 1;
      ssInt(ss=='E') = 2;
      ssInt(ss=='T') = 3;
      winT = hankel(ssInt(1:W),ssInt(W:end));          % concurrent targets (sliding windows)
      T = double(kron(winT,ones(3,1)) == kron(ones(size(winT)),(1:3)')); 
    %%=== train neural network with input and target matrices    
       net = train( net, P, T );

My problem is that now that the neural network is created and trained (represented by the network variable 'net', I think?) it is unclear to me how I am supposed to input a query AA sequence so that it will be analyzed by the neural network and provide the sequence's predicted secondary structure as output.

Any help would be much appreciate. Thanks so much and all the best.