Fast/Approximate Pairwise Alignment Parameters

Options Description
K-tuple (word) size
This option allows users specifying the size of exactly matching fragment that is used. INCREASE for speed, DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) users may need to increase it.
Top diagonals
The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This option allows users specifying how many. Decrease for speed; increase for sensitivity.
Window size
This option allows users specifying the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity.
Gap penalty
This option allows users specifying a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values.

 

 

Slow/Accurate Pairwise Alignment Parameters

Options Description
Gap open penalty
This option allows users specifying the penalty for opening a gap in the alignment.
Gap extenstion penalty
This option allows users specifying the penalty for extending a gap by 1 residue.
DNA weight matrix

This option allows users specifying the scores assigned to matches and mismatches.

IUB: This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0.

ClustalW(1.6): The previous system used by Clustal W, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0.

Protein weight matrix
This option allows users specifying the scoring table which describes the similarity of each amino acid to each other.

 

 

Multiple Alignment Parameters

Options Description

Gap open penalty

Gap extenstion penalty

These two options control the cost of opening up every new gap and the cost of every item in a gap. Increasing the gap opening penalty will make gaps less frequent. Increasing the gap extension penalty will make gaps shorter. Terminal gaps are not penalised.
DNA weight matrix

This option allows users specifying the scores assigned to matches and mismatches.

IUB: This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0.

ClustalW(1.6): The previous system used by Clustal W, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0.

Protein weight matrix

This option allows users specifying the weight matrices. Note, a series is used! The actual matrix that is used depends on how similar the sequences to be aligned at this alignment step are. Different matrices work differently at each evolutionary distance.

BLOSUM (Henikoff) series: these matrices appear to be the best available for carrying out database similarity (homology searches). The matrices used are: Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal W versions)


PAM (Dayhoff) series: these have been extremely widely used since the late '70s. The PAM 20, 60, 120 and 350 matrices are used here.


Gonnet series:
these matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger data set. They appear to be more sensitive than the Dayhoff series. The GONNET 80, 120, 160, 250 and 350 matrices are used here. This series is the default for Clustal W version 1.8.


Identity matrix:
this matrix gives a score of 1.0 to two identical amino acids and a score of zero otherwise.

% identical for delay
This option is to switch delays the alignment of the most distantly related sequences until after the most closely related sequences have been aligned. The setting shows the percent identity level required to delay the addition of a sequence; sequences that are less identical than this level to any other sequences will be aligned later.
Transition weighting
This option is to give transitions (A <--> G or C <--> T i.e. purine-purine or pyrimidine-pyrimidine substitutions) a weight between 0 and 1; a weight of zero means that the transitions are scored as mismatches, while a weight of 1 gives the transitions the match score. For distantly related DNA sequences, the weight should be near to zero; for closely related sequences it can be useful to assign a higher score.
End gap separation penalty
This option is to treat end gaps just like internal gaps for the purposes of avoiding gaps that are too close (set by the parameter "Gap separation penalty range"). If you select "Yes", end gaps will be ignored for this purpose. This is useful when you wish to align fragments where the end gaps are not biologically meaningful.
Gap separation penalty range
This option is to try to decrease the chances of gaps being too close to each other. Gaps that are less than this distance apart are penalised more than other gaps. This does not prevent close gaps; it makes them less frequent, promoting a block-like appearance of the alignment.
Resdue-specific gaps
This option is to switch amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence. As an example, positions that are rich in glycine are more likely to have an adjacent gap than positions that are rich in valine.

Hydrophilic gaps

Hydrophilic residue

This option used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common.

The residues that are "considered" to be hydrophilic are entry here.