Common options

Options Description

Randomize input order of species

 

Random number seed

 

Number of times to jumble

In most of the tree construction methods, the exact details of the search of different trees depend on the order of input of species. In these methods this option enables you to tell the program to use a random number generator to choose the input order of species.

This option is for the random number generator. Each different seed leads to a different sequence of addition of species. By simply changing the random number seed and performing analysis again one can look for other, and better trees.

This option allows users setting how many times you want to restart the process mentioned. If you select 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process (that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run).

Outgroup root
This specifies which species is to be used to root the tree by having it become the outgroup.
Use Threshold parsimony?
This sets a threshold such that if the number of steps counted in a character is higher than the threshold, it will be taken to be the threshold value rather than the actual number of steps. The default is a threshold so high that it will never be surpassed. The use of thresholds to obtain methods intermediate between parsimony and compatibility methods is described in J. Felsenstein's 1981b paper.
Global rearrangements
In the methods which construct trees (except "Neighbor-joining and UPGMA method" and "Maximum parsimony (branch and bound search) method"), after all species have been added to the tree a rearrangements phase ensues. In most of these methods the rearrangements are automatically global, which in this case means that subtrees will be removed from the tree and put back on in all possible ways so as to have a better chance of finding a better tree. Since this can be time consuming (it roughly triples the time taken for a run) it is left as an option in some of the methods, specifically "Fitch-Margoliash and least squares method" and "Maximum likelihood method". In these methods the option "Global rearrangements" toggles between the default of local rearrangement and global rearrangement.

Learn more by reading original document

Top | Back to help

 

Options about bootstrap analysis

Options Description
Odd random number This is for the random number generator.
Number of replicates

This allows users setting the number of replicate data sets. This defaults to 100. Most statisticians would be happiest with 1000 to 10,000 replicates in a bootstrap, but 100 gives a good rough picture.

Notice: at present POWER only allows up to 1000.

Resampling methods

The bootstrap: bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by J. Felsenstein (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data.

Delete-half-jackknifing: this alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986).

Permuting species within characters: this method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just a pair of sibling species).

Learn more by reading original document

Top | Back to help

 

Options about various methods for phylogeny inference

Methods Nucleic acid Protein
Character state methods
Distance Methods
Maximum likelihood mothodes  

Top | Back to help

 

Maximum parsimony(heuristic szearch)

Options Description

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Outgroup root
Use Threshold parsimony?

Learn more by reading original document

Top | Back to methods table | Back to help


Maximum parsimony(branch and bound search)

Options Description
At most how many trees will be examined

When the number of trees searched rearches the number set by this option, the method aborts and prints out that it has not found all most parsimonious trees in the outfile, but prints out what is has got so far anyway. These trees need not be any of the most parsimonious trees: they are simply the most parsimonious ones found so far. In it's default setting the method will abort after looking at 100,000 trees.

Notice: this option of POWER is a combination of options "How many groups of 100 trees" and "How often to report, in trees" of original program.

Branch and bound is simple
The option alters a step in "Maximum parsimony (branch and bound search)" which reconsiders the order in which species are added to the tree. Normally the decision as to what species to add to the tree next is made as the first tree is being constructed; that ordering of species is not altered subsequently. This option causes it to be continually reconsidered. This will probably result in a substantial increase in run time, but on some data sets of intermediate messiness it may help.
Outgroup root This is described in the common options
Use Threshold parsimony? This is described in the common options

Learn more by reading original document

Top | Back to methods table | Back to help


Compatibility method

Options Description

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Outgroup root

Learn more by reading original document

Top | Back to methods table | Back to help

 

Distance matrix computation

Options Description
Substitution model

Jukes-Cantor: Jukes and Cantor's (1969) model assumes that there is independent change at all sites, with equal probability. Whether a base changes is independent of its identity, and when it changes there is an equal probability of ending up with each of the other three bases.

Kimura 2 parameter: this model is almost as symmetric as Jukes and Cantor's (1969) model, but allows for a difference between transition and transversion rates.


Maximum Likelihood: This model used is a model incorporating different rates of transition and transversion, but also allowing for different frequencies of the four nucleotides. It is the model which is used in "Maximum likelihood method". You will find the model described in the original document for that method. The transition probabilities for this model are also given by Kishino and Hasegawa (1989).

Jin and Nei: The Jin and Nei (1990) distance uses Kimura's model of base substitution, but assumes that the rate of substitution varies from site to site according to a gamma distribution, with a coefficient of variation that is specified by the user.

Transition/transversion ratio
This option allows users set the expected ratio of transitions to transversions. Note that this is not the ratio of the first to the second kinds of events, but the resulting expected ratio of transitions to transversions. The exact relationship between these two quantities depends on the frequencies in the base pools.
Use empirical base frequencies
This option is active when the substitution model "Maximum Likelihood " is selected, which distance requires that the program be provided with the equilibrium frequencies of the four bases A, C, G, and T (or U). Its default setting is one which may save users much time. If you want to use the empirical frequencies of the bases, observed in the input sequences, as the base frequencies, you simply use the default setting of this option. These empirical frequencies are not really the maximum likelihood estimates of the base frequencies, but they will often be close to those values (what they are is maximum likelihood estimates under a "star" or "explosion" phylogeny).
Coefficient of variation of substitution rate among site
If the substitution model "Nei/Jin distance" is selected this option is active. This is different from the parameters used by Nei and Jin but related to them: their parameters a are related to the coefficient of variation by

(their parameter b is absorbed here by the requirement that time is scaled so that the mean rate of evolution is 1 per unit time, which means that a = b). As we consider cases in which the rates are less variable we should set a larger and larger, as CV gets smaller and smaller.

Learn more by reading original document

Top | Back to methods table | Back to help

 

Neighbor-joining and UPGMA method

Options Description
Tree constructing method

Neighbor-joining: this option constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree. It should be somewhat similar to the tree obtained by "Fitch-Margoliash and least squares method". However the algorithm is far faster than "Fitch-Margoliash and least squares method" or "Fitch-Margoliash and least squares method with molecular clock". This will make it particularly effective in their place for large studies or for bootstrap or jackknife resampling studies which require runs on multiple data sets.

UPGMA: this option constructs a tree by successive (agglomerative) clustering using an average-linkage method of clustering. It has some relationship to "Fitch-Margoliash and least squares method with molecular clock", in that when the tree topology turns out the same, the branch lengths with UPGMA will turn out to be the same as with the option "POWER" = 0 of "Fitch-Margoliash and least squares method with molecular clock".

Outgroup root This is described in the common options

Randomize input order of species

Random number seed

These are described in the common options

"Neighbor-joining and UPGMA method" does not allow multiple jumbles (as most of the other programs that have it do), as there is no objective way of choosing which of the multiple results is best, there being no explicit criterion for optimality of the tree.

Learn more by reading original document

Top | Back to methods table | Back to help

 

Fitch-Margoliash and least squares method

Options Description
POWER

The methods "Fitch-Margoliash and least squares method" and "Fitch-Margoliash and least squares method with molecular clock" carry out the method of Fitch and Margoliash (1967) for fitting trees to distance matrices. They also are able to carry out the least squares method of Cavalli-Sforza and Edwards (1967), plus a variety of other methods of the same family (see the discussion of the "POWER" option below).


The objective of these methods is to find that tree which minimizes

(D is the observed distance between species i and j and d is the expected distance, computed as the sum of the lengths (amounts of evolution) of the segments of the tree from species i to species j. The quantity n is the number of times each distance has been replicated. In simple cases this is taken to be one, but the user can, as an option, specify the degree of replication for each distance (POWER does not provide this option). The distance is then assumed to be a mean of those replicates. The power P is what distinguished the various methods. For the Fitch-Margoliash method, which is the default method with this program, P is 2.0. For the Cavalli-Sforza and Edwards least squares method it should be set to 0 (so that the denominator is always 1). An intermediate method is also available in which P is 1.0, and any other value of P, such as 4.0 or -2.3, can also be used. This generates a whole family of methods.

Notice: if the option "POWER" = 2.0, Fitch and Margoliash's "average percent standard deviation" is also computed and printed out in out file. This is the sum of squares, divided by N-2, and then square-rooted and then multiplied by 100 (n is the number of species on the tree):

where N is the total number of off-diagonal distance measurements that are in
the (square) distance matrix.

Negative branch lengths allowed
This option indicates that negative segment lengths are to be allowed in the tree (default is to require that all branch lengths be nonnegative).
Outgroup root This is described in the common options
Global rearrangments This is described in the common options

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Learn more by reading original document

Top | Back to methods table | Back to help

 

Fitch-Margoliash and least squares method with molecular clock

Options Description
POWER This is described in the options of Fitch-Margoliash and least squares method
Negative branch lengths allowed This is described in the options of Fitch-Margoliash and least squares method

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Learn more by reading original document

Top | Back to methods table | Back to help

 

Maximum likelihood method

Options Description
Transition/transversion ratio This is described in the options of distance matrix computation
Use empirical base frequencies This is described in the options of distance matrix computation
One category of substitution rates

The following options allows the user to specify how many categories of substitution rates there will be, and what are the rates and probabilities for each.

Default for the method is one category, with rate 1.0 and probability 1.0 (actually the rate does not matter in that case).

Number of categories
Entry how many categories there will be (for the moment there is an upper limit of 9, which should not be restrictive).
Rate for each category
Entry the rates for each category. These rates are only meaningful relative to each other, so that rates 1.0, 2.0, and 2.4 have the exact same effect as rates 2.0, 4.0, and 4.8. Note that a category can have rate of change 0, so that this allows us to take into account that there may be a category of sites that are invariant. (Note that the run time of the program will be proportional to the number of rate categories: twice as many categories means twice as long a run.)
Probability for each category

Entry the probabilities of a random site falling into each of these categories. These probabilities must be nonnegative and sum to 1.

Global rearrangements This is described in the common options

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options
Outgroup root This is described in the common options

Learn more by reading original document

Top | Back to methods table | Back to help

 

Maximum likelihood method with molecular clock

Options Description
Transition/transversion ratio This is described in the options of distance matrix computation
Use empirical base frequencies This is described in the options of distance matrix computation

One category of substitution rates

Number of categories

Rate for each category

Probability for each category

These are described in the options of maximum likelihood method

Global rearrangements This is described in the common options

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Learn more by reading original document

Top | Back to methods table | Back to help

 

Maximum parsimony(heuristic szearch)

Options Description

Randomize input order of species

Random number seed

Number of times to jumble

These are described in the common options

Outgroup root
Use Threshold parsimony?

Learn more by reading original document

Top | Back to methods table | Back to help

 

Distance matrix computation

Options Description
Substitution model

Dayhoff PAM matrix: This uses Dayhoff's PAM 001 matrix from Dayhoff (1979), page 348. The PAM model is an empirical one that scales probabilities of change from one amino acid to another in terms of a unit which is an expected 1% change between two amino acid sequences. The PAM 001 matrix is used to make a transition probability matrix which allows prediction of the probability of changing from any one amino acid to any other, and also predicts equilibrium amino acid composition. The program assumes that these probabilities are correct and bases its computations of distance on them. The distance that is computed is scaled in units of expected fraction of amino acids changed.

Kimura formula: This is a rough-and-ready distance formula for approximating PAM distance by simply measuring the fraction of amino acids, p, that differs between two sequences and computing the distance as (Kimura, 1983)

This is very quick to do but has some obvious limitations. It does not take into account which amino acids differ or to what amino acids they change, so some information is lost. The units of the distance measure are fraction of amino acids differing, as also in the case of the PAM distance. If the fraction of amino acids differing gets larger than 0.8541 the distance becomes infinite.

Categories model: This is J. Felsenstein's own concoction. He imagined a nucleotide sequence changing according to Kimura's 2-parameter model, with the exception that some changes of amino acids are less likely than others. The amino acids are grouped into a series of categories. Any base change that does not change which category the amino acid is in is allowed, but if an amino acid changes category this is allowed only a certain fraction of the time. The fraction is called the "ease" and there is a parameter for it (specified in the option "Ease of changing category of amino acid?"), which is 1.0 when all changes are allowed and near 0.0 when changes between categories are nearly impossible .

In this selection, the user have been allowed to select the Transition/Transversion ratio (specified in the option "Transition/Transversion ratio"), which of several genetic codes to use (specified in the option "Use which genetic code?"), and which categorization of amino acids to use (specified in the option "Which categorization of amino acides?").

Use which genetic code?
This option allows the user to select among various nuclear and mitochondrial genetic codes.
Which categorization of amino acides?

George/Hunt/Barker: The George-Hunt-Barker (1988) classification of amino acids. The categorizations of amino acids are (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp), (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr Pro).

Chemical: One is found in an old "baby biochemistry" book (Conn and Stumpf, 1963). The categorizations of amino acids are (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp), (Cys Met), (Val Leu Ileu Gly Ala Ser Thr), (Pro).

Hall: A classification provided by Ben Hall, a colleage of J. Felsenstein. The categorizations of amino acids are (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp), (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr), (Pro).

Ease of changing category of amino acid?
As is decribed in the selection "Categories model" of option "Substitution model".
Transition/transversion ratio This is described in the options of distance matrix computation
Equal base Frequencies Use equal base frequencies or entry frequencies of A, C, G, T/U

Learn more by reading original document

Top | Back to methods table | Back to help