Documentation Center

  • Trials
  • Product Updates

getgenbank

Retrieve sequence information from GenBank database

Syntax

Data = getgenbank(AccessionNumber)
getgenbank(AccessionNumber)

Data = getgenbank(..., 'PartialSeq', PartialSeqValue, ...)
Data = getgenbank(..., 'ToFile', ToFileValue, ...)
Data = getgenbank(..., 'FileFormat', FileFormatValue, ...)
Data = getgenbank(..., 'SequenceOnly', SequenceOnlyValue, ...)

Arguments

AccessionNumberString specifying a unique alphanumeric identifier for a sequence record.
PartialSeqValueTwo-element array of integers containing the start and end positions of the subsequence [StartBP, EndBP] that specifies a subsequence to retrieve. StartBP is an integer between 1 and EndBP. EndBP is an integer between StartBP and the length of the sequence.
ToFileValue String specifying either a file name or a path and file name for saving the GenBank® data. If you specify only a file name, the file is saved to the MATLAB® Current Folder.
FileFormatValueString specifying the format for the sequence information. Choices are:
  • 'GenBank' — Default when SequenceOnlyValue is false.

  • 'FASTA' — Default when SequenceOnlyValue is true.

When 'FASTA', then Data contains only two fields, Header and Sequence.

SequenceOnlyValue

Controls the return of only the sequence as a character array. Choices are true or false (default).

Description

getgenbank retrieves nucleotide information from the GenBank database. This database is maintained by the National Center for Biotechnology Information (NCBI). For more details about the GenBank database, see

http://www.ncbi.nlm.nih.gov/Genbank/

Data = getgenbank(AccessionNumber) searches for the accession number in the GenBank database and returns Data, a MATLAB structure containing information for the sequence.

    Tip   If an error occurs while retrieving the GenBank-formatted information, try rerunning the query. Errors can occur due to Internet connectivity issues that are unrelated to the GenBank record.

getgenbank(AccessionNumber) displays information in the MATLAB Command Window without returning data to a variable. The displayed information is only hyperlinks to the URLs used to search for and retrieve the data.

getgenbank(..., 'PropertyName', PropertyValue, ...) calls getgenbank with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


Data = getgenbank(..., 'PartialSeq', PartialSeqValue, ...)
returns the specified subsequence in the Sequence field of the MATLAB structure. PartialSeqValue is a two-element array of integers containing the start and end positions of the subsequence [StartBP, EndBP]. StartBP is an integer between 1 and EndBP. EndBP is an integer between StartBP and the length of the sequence.

Data = getgenbank(..., 'ToFile', ToFileValue, ...) saves the data returned from the GenBank database to a file. ToFileValue is a string specifying either a file name or a path and file name for saving the GenBank data. If you specify only a file name, the file is saved to the MATLAB Current Folder.

    Tip   You can read a GenBank-formatted file back into MATLAB using the genbankread function.

    Tip   To append GenBank data to an existing file, specify that file name, and the data will be added to the end of the file.

    If you are using getgenbank in a script, you can disable the append warning message by entering the following command lines before the getgenbank command:

    warnState = warning %Save the current warning state
    warning('off','Bioinfo:getncbidata:AppendToFile'); 

    Then enter the following command line after the getgenbank command:

    warning(warnState) %Reset warning state to previous settings

Data = getgenbank(..., 'FileFormat', FileFormatValue, ...) returns the sequence in the specified format. Choices are 'GenBank' or 'FASTA'. When 'FASTA', then Data contains only two fields, Header and Sequence. 'GenBank' is the default when SequenceOnlyValue is false. 'FASTA' is the default when SequenceOnlyValue is true.

Data = getgenbank(..., 'SequenceOnly', SequenceOnlyValue, ...) returns only the sequence in Data, a character array. Choices are true or false (default).

    Note:   If you use the 'SequenceOnly' and 'ToFile' properties together, the output is always a FASTA-formatted file.

Examples

Retrieving an RNA Sequence

To retrieve the sequence from chromosome 19 that codes for the human insulin receptor and store it in a structure, S, in the MATLAB Command Window, type:

S = getgenbank('M10051')

S = 

                LocusName: 'HUMINSR'
      LocusSequenceLength: '4723'
     LocusNumberofStrands: ''
            LocusTopology: 'linear'
        LocusMoleculeType: 'mRNA'
     LocusGenBankDivision: 'PRI'
    LocusModificationDate: '06-JAN-1995'
               Definition: 'Human insulin receptor mRNA, complete cds.'
                Accession: 'M10051'
                  Version: 'M10051.1'
                       GI: '186439'
                  Project: []
                   DBLink: []
                 Keywords: 'insulin receptor; tyrosine kinase.'
                  Segment: []
                   Source: 'Homo sapiens (human)'
           SourceOrganism: [4x65 char]
                Reference: {[1x1 struct]}
                  Comment: [14x67 char]
                 Features: [51x74 char]
                      CDS: [1x1 struct]
                 Sequence: [1x4723 char]
                SearchURL: [1x67 char]
              RetrieveURL: [1x101 char]                            

Retrieving a Partial RNA Sequence

By looking at the Features field of the structure returned in Retrieving an RNA Sequence, you can determine that the coding sequence is positions 139 through 4287. To retrieve only the coding sequence from chromosome 19 that codes for the human insulin receptor and store it in a structure, CDS, in the MATLAB Command Window, type:

CDS = getgenbank('M10051','PARTIALSEQ',[139,4287]);

See Also

| | | |

Was this topic helpful?