Gap extenstion penalty
two options control the cost of opening up every new gap and the
cost of every item in a gap. Increasing the gap opening penalty
will make gaps less frequent. Increasing the gap extension penalty
will make gaps shorter. Terminal gaps are not penalised.
This option allows
users specifying the scores assigned to matches and mismatches.
This is the default scoring matrix
used by BESTFIT for the comparison of nucleic acid sequences. X's
and N's are treated as matches to any IUB ambiguity symbol. All
matches score 1.9; all mismatches for IUB symbols score 0.
The previous system used by Clustal
W, in which matches score 1.0 and mismatches score 0. All matches
for IUB symbols also score 0.
This option allows users specifying
the weight matrices. Note, a series is used! The actual matrix
that is used depends on how similar the sequences to be aligned
at this alignment step are. Different matrices work differently
at each evolutionary distance.
(Henikoff) series: these matrices appear to be
the best available for carrying out database similarity (homology
searches). The matrices used are: Blosum 80, 62, 45 and 30. (BLOSUM
was the default in earlier Clustal W versions)
PAM (Dayhoff) series:
these have been extremely widely used since the late '70s. The
PAM 20, 60, 120 and 350 matrices are used here.
these matrices were derived using almost the same procedure as
the Dayhoff one (above) but are much more up to date and are based
on a far larger data set. They appear to be more sensitive than
the Dayhoff series. The GONNET 80, 120, 160, 250 and 350 matrices
are used here. This series is the default for Clustal W version
this matrix gives a score of 1.0 to two identical amino acids
and a score of zero otherwise.
identical for delay
option is to switch
delays the alignment of the most distantly related sequences until
after the most closely related sequences have been aligned. The
setting shows the percent identity level required to delay the addition
of a sequence; sequences that are less identical than this level
to any other sequences will be aligned later.
option is to give transitions
(A <--> G or C <--> T i.e. purine-purine or pyrimidine-pyrimidine
substitutions) a weight between 0 and 1; a weight of zero means
that the transitions are scored as mismatches, while a weight of
1 gives the transitions the match score. For distantly related DNA
sequences, the weight should be near to zero; for closely related
sequences it can be useful to assign a higher score.
gap separation penalty
option is to treat end gaps
just like internal gaps for the purposes of avoiding gaps that are
too close (set by the parameter "Gap separation penalty range").
If you select "Yes", end gaps will be ignored for this
purpose. This is useful when you wish to align fragments where the
end gaps are not biologically meaningful.
separation penalty range
option is to try to decrease
the chances of gaps being too close to each other. Gaps that are
less than this distance apart are penalised more than other gaps.
This does not prevent close gaps; it makes them less frequent, promoting
a block-like appearance of the alignment.
option is to switch
amino acid specific gap penalties that reduce or increase the gap
opening penalties at each position in the alignment or sequence.
As an example, positions that are rich in glycine are more likely
to have an adjacent gap than positions that are rich in valine.
option used to increase the chances of a gap within a run (5 or
more residues) of hydrophilic amino acids; these are likely to be
loop or random coil regions where gaps are more common.
residues that are "considered" to be hydrophilic are entry