Common
options
Options |
Description |
Randomize
input order of species
Random number seed
Number of times to jumble |
In
most of the tree construction methods, the exact details of the
search of different trees depend on the order of input of species.
In these methods this option enables you to tell the program to
use a random number generator to choose the input order of species.
This
option is for the random number generator. Each different seed leads
to a different sequence of addition of species. By simply changing
the random number seed and performing analysis again one can look
for other, and better trees.
This
option allows users setting how many times you want to restart the
process mentioned. If you select 10, the program will try ten different
orders of species in constructing the trees, and the results printed
out will reflect this entire search process (that is, the best trees
found among all 10 runs will be printed out, not the best trees
from each individual run). |
Outgroup
root |
This
specifies which species is to be used to root the tree by having
it become the outgroup. |
Use
Threshold parsimony? |
This
sets a threshold such that if the number of steps counted in a character
is higher than the threshold, it will be taken to be the threshold
value rather than the actual number of steps. The default is a threshold
so high that it will never be surpassed. The use of thresholds to
obtain methods intermediate between parsimony and compatibility
methods is described in J. Felsenstein's 1981b paper. |
Global
rearrangements |
In
the methods which construct trees (except "Neighbor-joining
and UPGMA method" and "Maximum parsimony (branch and bound
search) method"), after all species have been added to the
tree a rearrangements phase ensues. In most of these methods the
rearrangements are automatically global, which in this case means
that subtrees will be removed from the tree and put back on in all
possible ways so as to have a better chance of finding a better
tree. Since this can be time consuming (it roughly triples the time
taken for a run) it is left as an option in some of the methods,
specifically "Fitch-Margoliash and least squares method"
and "Maximum likelihood method". In these methods the
option "Global rearrangements" toggles between the default
of local rearrangement and global rearrangement. |
Learn
more by reading original document
Top | Back to help
Options
about bootstrap analysis
Options |
Description |
Odd
random number |
This
is for the random number generator. |
Number
of replicates |
This allows users
setting the number of replicate data sets. This defaults to 100.
Most statisticians would be happiest with 1000 to 10,000 replicates
in a bootstrap, but 100 gives a good rough picture.
Notice: at present
POWER only allows up to 1000.
|
Resampling
methods |
The
bootstrap:
bootstrapping was invented by Bradley Efron in 1979, and its use
in phylogeny estimation was introduced by J. Felsenstein (Felsenstein,
1985b). It involves creating a new data set by sampling N characters
randomly with replacement, so that the resulting data set has
the same size as the original, but some characters have been left
out and others are duplicated. The random variation of the results
from analyzing these bootstrapped data sets can be shown statistically
to be typical of the variation that you would get from collecting
new data sets. The method assumes that the characters evolve independently,
an assumption that may not be realistic for many kinds of data.
Delete-half-jackknifing:
this alternative to the bootstrap
involves sampling a random half of the characters, and including
them in the data but dropping the others. The resulting data sets
are half the size of the original, and no characters are duplicated.
The random variation from doing this should be very similar to
that obtained from the bootstrap. The method is advocated by Wu
(1986).
Permuting
species within characters:
this method of resampling (well, OK, it may not be best to call
it resampling) was introduced by Archie (1989) and Faith (1990;
see also Faith and Cranston, 1991). It involves permuting the
columns of the data matrix separately. This produces data matrices
that have the same number and kinds of characters but no taxonomic
structure. It is used for different purposes than the bootstrap,
as it tests not the variation around an estimated tree but the
hypothesis that there is no taxonomic structure in the data: if
a statistic such as number of steps is significantly smaller in
the actual data than it is in replicates that are permuted, then
we can argue that there is some taxonomic structure in the data
(though perhaps it might be just a pair of sibling species).
|
Learn
more by reading original document
Top | Back to help
Options about various methods for
phylogeny inference
Methods |
Nucleic
acid |
Protein |
Character
state methods |
|
|
Distance
Methods |
|
|
Maximum
likelihood mothodes |
|
|
Top | Back to help
Maximum
parsimony(heuristic szearch)
Learn
more by reading original document
Top | Back to methods table | Back to help
Maximum
parsimony(branch and bound search)
Options |
Description |
At
most how many trees will be examined |
When
the number of trees searched rearches the number set by this option,
the method aborts and prints out that it has not found all most
parsimonious trees in the outfile, but prints out what is has got
so far anyway. These trees need not be any of the most parsimonious
trees: they are simply the most parsimonious ones found so far.
In it's default setting the method will abort after looking at 100,000
trees.
Notice:
this option of POWER is a combination of options "How many
groups of 100 trees" and "How often to report, in trees"
of original program. |
Branch
and bound is simple |
The
option alters a step in "Maximum parsimony (branch and bound
search)" which reconsiders the order in which species are added
to the tree. Normally the decision as to what species to add to
the tree next is made as the first tree is being constructed; that
ordering of species is not altered subsequently. This option causes
it to be continually reconsidered. This will probably result in
a substantial increase in run time, but on some data sets of intermediate
messiness it may help. |
Outgroup
root |
This
is described in the common options |
Use Threshold
parsimony? |
This
is described in the common options |
Learn
more by reading original document
Top | Back to methods table | Back to help
Compatibility
method
Learn
more by reading original document
Top | Back to methods table | Back to help
Distance
matrix computation
Options |
Description |
Substitution
model |
Jukes-Cantor:
Jukes and Cantor's
(1969) model assumes that there is independent change at all sites,
with equal probability. Whether a base changes is independent of
its identity, and when it changes there is an equal probability
of ending up with each of the other three bases.
Kimura
2 parameter:
this model is almost
as symmetric as Jukes
and Cantor's (1969) model, but allows for a difference between transition
and transversion rates.
Maximum Likelihood:
This model used is a model incorporating
different rates of transition and transversion, but also allowing
for different frequencies of the four nucleotides. It is the model
which is used in "Maximum likelihood method". You will
find the model described in the original document for that method.
The transition probabilities for this model are also given by Kishino
and Hasegawa (1989).
Jin
and Nei: The Jin
and Nei (1990) distance uses Kimura's model of base substitution,
but assumes that the rate of substitution varies from site to site
according to a gamma distribution, with a coefficient
of variation
that is specified by the user.
|
Transition/transversion
ratio |
This
option allows users set the expected ratio of transitions to transversions.
Note that this is not the ratio of the first to the second kinds
of events, but the resulting expected ratio of transitions to transversions.
The exact relationship between these two quantities depends on the
frequencies in the base pools. |
Use
empirical base frequencies |
This
option is active when the substitution
model "Maximum Likelihood " is selected, which distance
requires that the program be provided with the equilibrium frequencies
of the four bases A, C, G, and T (or U). Its default setting is
one which may save users much time. If you want to use the empirical
frequencies of the bases, observed in the input sequences, as the
base frequencies, you simply use the default setting of this option.
These empirical frequencies are not really the maximum likelihood
estimates of the base frequencies, but they will often be close
to those values (what they are is maximum likelihood estimates under
a "star" or "explosion" phylogeny). |
Coefficient
of variation of substitution rate among site |
If
the substitution
model "Nei/Jin
distance" is selected this option is active. This is different
from the parameters used by Nei and Jin but related to them: their
parameters a are related to the coefficient of variation by
(their
parameter b is absorbed here by the requirement that time is scaled
so that the mean rate of evolution is 1 per unit time, which means
that a = b). As we consider cases in which the rates are less variable
we should set a larger and larger, as CV gets smaller and smaller. |
Learn
more by reading original document
Top | Back to methods table | Back to help
Neighbor-joining
and UPGMA method
Options |
Description |
Tree
constructing method |
Neighbor-joining:
this option constructs
a tree by successive clustering of lineages, setting branch lengths
as the lineages join. The tree is not rearranged thereafter. The
tree does not assume an evolutionary clock, so that it is in effect
an unrooted tree. It should be somewhat similar to the tree obtained
by "Fitch-Margoliash and least squares method". However
the algorithm is far faster than "Fitch-Margoliash and least
squares method" or "Fitch-Margoliash and least squares
method with molecular clock". This will make it particularly
effective in their place for large studies or for bootstrap or jackknife
resampling studies which require runs on multiple data sets.
UPGMA:
this option constructs
a tree by successive (agglomerative) clustering using an average-linkage
method of clustering. It has some relationship to "Fitch-Margoliash
and least squares method with molecular clock", in that when
the tree topology turns out the same, the branch lengths with UPGMA
will turn out to be the same as with the option
"POWER"
= 0 of "Fitch-Margoliash and least squares method with molecular
clock". |
Outgroup
root |
This
is described in the common options |
Randomize
input order of species
Random number seed |
These
are described in the common options
"Neighbor-joining
and UPGMA method" does not allow multiple jumbles (as most
of the other programs that have it do), as there is no objective
way of choosing which of the multiple results is best, there being
no explicit criterion for optimality of the tree. |
Learn
more by reading original document
Top | Back to methods table | Back to help
Fitch-Margoliash
and least squares method
Options |
Description |
POWER |
The methods "Fitch-Margoliash
and least squares method" and "Fitch-Margoliash and
least squares method with molecular clock" carry out the
method of Fitch and Margoliash (1967) for fitting trees to distance
matrices. They also are able to carry out the least squares method
of Cavalli-Sforza and Edwards (1967), plus a variety of other
methods of the same family (see the discussion of the "POWER"
option below).
The objective of these methods is to find that tree which minimizes
(D
is the observed distance between species i and j and d is the expected
distance, computed as the sum of the lengths (amounts of evolution)
of the segments of the tree from species i to species j. The quantity
n is the number of times each distance has been replicated. In simple
cases this is taken to be one, but the user can, as an option, specify
the degree of replication for each distance (POWER does not provide
this option). The distance is then assumed to be a mean of those
replicates. The power P is what distinguished the various methods.
For the Fitch-Margoliash method, which is the default method with
this program, P is 2.0. For the Cavalli-Sforza and Edwards least
squares method it should be set to 0 (so that the denominator is
always 1). An intermediate method is also available in which P is
1.0, and any other value of P, such as 4.0 or -2.3, can also be
used. This generates a whole family of methods.
Notice:
if the option "POWER" = 2.0, Fitch and Margoliash's "average
percent standard deviation" is also computed and printed out
in out file. This is the sum of squares, divided by N-2, and then
square-rooted and then multiplied by 100 (n is the number of species
on the tree):
where
N is the total number of off-diagonal distance measurements that
are in
the (square) distance matrix. |
Negative
branch lengths allowed |
This
option indicates that negative segment lengths are to be allowed
in the tree (default is to require that all branch lengths be nonnegative). |
Outgroup
root |
This
is described in the common options |
Global rearrangments |
This
is described in the common options |
Randomize
input order of species
Random number seed
Number of times to
jumble |
These
are described in the common options |
Learn
more by reading original document
Top | Back to methods table | Back to help
Fitch-Margoliash
and least squares method with molecular clock
Learn
more by reading original document
Top | Back to methods table | Back to help
Maximum
likelihood method
Options |
Description |
Transition/transversion
ratio |
This
is described in the options of distance matrix computation |
Use empirical
base frequencies |
This
is described in the options of distance matrix computation |
One
category of substitution rates |
The
following options allows the user to specify how many categories
of substitution rates there will be, and what are the rates and
probabilities for each.
Default
for the method is one category, with rate 1.0 and probability 1.0
(actually the rate does not matter in that case). |
Number of
categories |
Entry
how many categories there will be (for the moment there is an upper
limit of 9, which should not be restrictive). |
Rate
for each category |
Entry
the rates for each category. These rates are only meaningful relative
to each other, so that rates 1.0, 2.0, and 2.4 have the exact same
effect as rates 2.0, 4.0, and 4.8. Note that a category can have
rate of change 0, so that this allows us to take into account that
there may be a category of sites that are invariant. (Note that
the run time of the program will be proportional to the number of
rate categories: twice as many categories means twice as long a
run.) |
Probability
for each category |
Entry
the probabilities of a random site falling into each of these categories.
These probabilities must be nonnegative and sum to 1. |
Global rearrangements |
This
is described in the common options |
Randomize
input order of species
Random number seed
Number of times to
jumble |
These
are described in the common options |
Outgroup
root |
This
is described in the common options |
Learn
more by reading original document
Top | Back to methods table | Back to help
Maximum
likelihood method with molecular clock
Learn
more by reading original document
Top | Back to methods table | Back to help
Maximum
parsimony(heuristic szearch)
Learn
more by reading original document
Top | Back to methods table | Back to help
Distance
matrix computation
Options |
Description |
Substitution
model |
Dayhoff
PAM matrix: This
uses Dayhoff's PAM 001 matrix from Dayhoff (1979), page 348. The
PAM model is an empirical one that scales probabilities of change
from one amino acid to another in terms of a unit which is an expected
1% change between two amino acid sequences. The PAM 001 matrix is
used to make a transition probability matrix which allows prediction
of the probability of changing from any one amino acid to any other,
and also predicts equilibrium amino acid composition. The program
assumes that these probabilities are correct and bases its computations
of distance on them. The distance that is computed is scaled in
units of expected fraction of amino acids changed.
Kimura
formula: This is
a rough-and-ready distance formula for approximating PAM distance
by simply measuring the fraction of amino acids, p, that differs
between two sequences and computing the distance as (Kimura, 1983)
This
is very quick to do but has some obvious limitations. It does not
take into account which amino acids differ or to what amino acids
they change, so some information is lost. The units of the distance
measure are fraction of amino acids differing, as also in the case
of the PAM distance. If the fraction of amino acids differing gets
larger than 0.8541 the distance becomes infinite.
Categories
model: This is J.
Felsenstein's own concoction. He imagined a nucleotide sequence
changing according to Kimura's 2-parameter model, with the exception
that some changes of amino acids are less likely than others. The
amino acids are grouped into a series of categories. Any base change
that does not change which category the amino acid is in is allowed,
but if an amino acid changes category this is allowed only a certain
fraction of the time. The fraction is called the "ease"
and there is a parameter for it (specified in the option "Ease
of changing category of amino acid?"), which is 1.0 when all
changes are allowed and near 0.0 when changes between categories
are nearly impossible .
In
this selection, the user have been allowed to select the Transition/Transversion
ratio (specified in the option "Transition/Transversion ratio"),
which of several genetic codes to use (specified in the option "Use
which genetic code?"), and which categorization of amino acids
to use (specified in the option "Which categorization of amino
acides?"). |
Use
which genetic code? |
This
option allows the user to select among various nuclear and mitochondrial
genetic codes. |
Which
categorization of amino acides? |
George/Hunt/Barker:
The George-Hunt-Barker (1988) classification of amino acids. The
categorizations of amino acids are (Glu Gln Asp Asn), (Lys Arg His),
(Phe Tyr Trp), (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr Pro).
Chemical:
One is found in an old "baby biochemistry" book (Conn
and Stumpf, 1963). The categorizations of amino acids are (Glu Gln
Asp Asn), (Lys Arg His), (Phe Tyr Trp), (Cys Met), (Val Leu Ileu
Gly Ala Ser Thr), (Pro).
Hall:
A classification provided by Ben Hall, a colleage of J. Felsenstein.
The categorizations of amino acids are (Glu Gln Asp Asn), (Lys Arg
His), (Phe Tyr Trp), (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr),
(Pro). |
Ease
of changing category of amino acid? |
As
is decribed in the selection "Categories model" of option
"Substitution model". |
Transition/transversion
ratio |
This
is described in the options of distance matrix computation |
Equal
base Frequencies |
Use equal base frequencies
or entry frequencies of A, C, G, T/U |
Learn
more by reading original document
Top | Back to methods table | Back to help |