Main Content

getCompactAlignment

Class: BioMap

Construct compact alignment represented in BioMap object

Syntax

CompAlignment = getCompactAlignment(BioObj, StartPos, EndPos)
CompAlignment = getCompactAlignment(BioObj, StartPos, EndPos, R)
CompAlignment = getCompactAlignment(..., 'ParameterName', ParameterValue)
[CompAlignment, Indices] = getCompactAlignment(...)
[CompAlignment, Indices, Rows] = getCompactAlignment(...)

Description

CompAlignment = getCompactAlignment(BioObj, StartPos, EndPos) returns CompAlignment, a character array containing the aligned read sequences from BioObj, a BioMap object, in a compact format. The read sequences must align within a specific region of the reference sequence, which is defined by StartPos and EndPos, two positive integers such that StartPos is less than EndPos, and both are smaller than the length of the reference sequence.

CompAlignment = getCompactAlignment(BioObj, StartPos, EndPos, R) selects the reference where getCompactAlignment reconstructs the alignment.

CompAlignment = getCompactAlignment(..., 'ParameterName', ParameterValue) accepts one or more comma-separated parameter name/value pairs. Specify ParameterName inside single quotes.

[CompAlignment, Indices] = getCompactAlignment(...) returns Indices, a vector of indices specifying the read sequences that align within a specific region of the reference sequence.

[CompAlignment, Indices, Rows] = getCompactAlignment(...) returns Rows, a vector of positive numbers specifying the row in CompAlignment where each read sequence is best displayed.

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Positive integer that defines the start of a region of the reference sequence. StartPos must be less than EndPos, and smaller than the total length of the reference sequence.

EndPos

Positive integer that defines the end of a region of the reference sequence. EndPos must be greater than StartPos, and smaller than the total length of the reference sequence.

R

Positive integer indexing the SequenceDictionary property of BioObj, or a character vector or string specifying the actual name of the reference.

Name-Value Arguments

Full

Specifies whether or not to include only the read sequences that fully align with the defined region of the reference sequence, that is, they are completely contained within the region, and do not extend beyond the region. Choices are true or false (default).

Default: false

TrimAlignment

Specifies whether or not to trim empty leading and trailing columns from the alignment. Choices are true or false. Default is false, which does not trim the alignment, but includes any empty leading or trailing columns, and returns an alignment always of length EndPosStartPos + 1.

Default: false

Output Arguments

CompAlignment

Character array containing the aligned read sequences from BioObj that align within the requested region. The character array represents a compact alignment, that is each row of the character array contains one or more aligned sequences, such that the number of rows in the character array is minimized. Each aligned sequence includes only the sequence positions that fall within the requested region, and each aligned sequence can include gaps.

Indices

Vector of indices specifying the read sequences from BioObj that align within the requested region.

Rows

Vector of positive numbers specifying the row in CompAlignment where each read sequence is best displayed.

Examples

Construct a BioMap object, and then construct the compact alignment between positions 30 and 59 of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Construct the compact alignment between positions 30 and 59 of
% the reference sequence, and return the indices of the reads in the
% compact alignment, as well as the row each read is in. 
[CompAlignment, Ind, Row] = getCompactAlignment(BMObj1, 30, 59)
CompAlignment =

TAACTCG      GCCCAGCATTAGGGAGC
TAACTCGT           CATTAGGGAGC
TAACTCGTCC          ATTAGGGAGC
TAACTCTTCTCT         TTAGGGAGC
TAACTCGTCCATGG        TAGGGAGC
TAACTCGTCCCTGGCCCA           C
TAACTCGTCCATGGCCCAG           
TAACTCGTCCATTGCCCAGC          
TAACTCGTCCATGGCCCAGCATT       
TAACTCGTCCATGGCCCAGCATTTGGG   
TAACTCGTCCATGGCCCAGCATTAGGG   
TAACTCGTCCATGGCCCAGCATTAGGGAGC
TAACTCGTCCATGGCCCAGCATTAGGGATC
TAACTCGTCCATGGCCCAGCATTAGGGAGC
 AACTCGTCCATGGCCCAGCATTAGGGAGC
      GTACATGGCCCAGCATTAGGGAGC
       TCCATGGCCCAGCATTAGGGCGC


Ind =

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23


Row =

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
     1
     2
     3
     4
     5
     6

Algorithms

getCompactAlignment assumes the reference sequence has no gaps. Therefore, positions in reads corresponding to insertions (I) and padding (P) do not appear in the alignment.

Because soft clipped positions (S) are not associated with positions that align to the reference sequence, they do not appear in the alignment.

A skipped position (N) appears as a - (hyphen) in the alignment.

Hard clipped positions (H) do not appear in the sequences or the alignment.