What is this?
This is a short course/tutorial I developed to help researchers cover some of the families of phylogenetic comparative analyses of trait evolution and correlation, diversification rates, as well as community structure. My objective is to provide a short background in statistical and comparative phylogenetic methods, to further allow researchers to explore their research questions on their own.
A big part of the inspiration for this workshop came from the book Phylogenies in Ecology, by Marc W. Cadotte and Jonathan Davies, as well as from courses and workshops on ecophylogenetics taught by Dr. Will Pearse, Dr. Jonathan Davies and Dr. Steven Kembel.
I am progressively adding more material to this document. You can access all topics addressed here in the outline on the right side of this page.
If you would like to provide feedback on this material, do not hesitate to get in touch!
I will highly appreciate your comments!
Kind regards,
Pedro Henrique P. Braga
pedrohbraga.github.io
ph.pereirabraga [at] gmail [dot] com.
Preparation for this tutorial
I kindly ask you to run the following code to download the datasets, and install and load the libraries we will use through this course:
# Check for needed packages, and install the missing ones
required.libraries <- c("ape", "picante", "pez", "phytools",
"vegan", "adephylo", "phylobase",
"geiger", "mvMORPH", "OUwie",
"hisse", "BAMMtools",
"phylosignal", "Biostrings",
"devtools",
"ggplot2",
"fulltext")
needed.libraries <- required.libraries[!(required.libraries %in% installed.packages()[,"Package"])]
if(length(needed.libraries)) install.packages(needed.libraries)
# Load all required libraries at once
lapply(required.libraries, require, character.only = TRUE)
### Install missing libraries
source("https://bioconductor.org/biocLite.R")
biocLite("ggtree")
###
set.seed(1)
# Download files from Marc Cadotte's and Jonathan Davies' book:
temp <- tempfile()
download.file("http://www.utsc.utoronto.ca/~mcadotte/ewExternalFiles/resources.zip", temp)
dir.create("data/Jasper")
JasperData <- c("jasper_tree.phy", "jasper_data.csv")
lapply(JasperData,
FUN = function(x)
unzip(zipfile = temp,
files = paste("resources/data/Jasper/", x, sep = ""),
exdir = "data/Jasper"))
# Download the tree files from this paper
download.file("https://onlinelibrary.wiley.com/action/
downloadSupplement?doi=10.1111%2Fj.1461-0248.2009.01307.x&attachmentId=179085359",
destfile = "data/ele_1307_sm_sa1.tre")
# If the download of the above file does not work,
# download the first supplementary material from the following paper
# and place it within your 'data' directory:
# https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2009.01307.x
Why are phylogenies important?
Since many decades we recognize that many ecolgical patterns are processes are not independent of the evolution of the lineages involved in generating these patterns.
Here, we are taking it seriously what Theodosius Dobzhansky said in 1973: “nothing in biology makes sense except in the light of evolution”!
A brief introduction to phylogenies
Phylogenetic trees (i.e., evolutionary tree or cladogram) are branching diagrams illustrating the evolutionary relationships among taxa.
Phylogenetic trees can be constructed based on species’ genes and/or on traits. Phylogenetic reconstruction through analyses of molecular sequences usually consists of three distinct procedures: (i) sequence alignment; (ii) character coding; and (iii) tree building. There is a range of available options for each of these steps, with no simple objective method for choosing between them.
In this course, we will not cover the steps necessary to reconstruct phylogenies. We will thus assume that a phylogenetic relationship is already available for the taxa of interest.
ape
and the phylo
object in R
The core of phylogenetics analyses in R is the library ape
- an acronym that stands for Analysis of Phylogenetics and Evolution. You can run the following commands to install, load and call help for ape
:
# install.packages("ape")
library(ape)
# help(package = ape)
From ape
, we depict two important functions to read and load trees into R
: read.nexus
and read.tree
:
read.nexus
reads NEXUS formatted trees, which hasis organized into major units known as blocks and usually have the following basic structure:
#NEXUS
[!This is the data file used for my dissertation]
BEGIN TAXA;
TaxLabels Scarabaeus Drosophila Aranaeus;
END;
BEGIN TREES;
Translate beetle Scarabaeus, fly Drosophila, spider Aranaeus;
Tree tree1 = ((1,2),3);
Tree tree2 = ((beetle,fly),spider);
Tree tree3 = ((Scarabaeus,Drosophila),Aranaeus);
END;
read.tree
reads Newick formatted trees, which commonly have the following format:
(Bovine:0.69395,(Hylobates:0.36079,(Pongo:0.33636,(G._Gorilla:0.17147, (P._paniscus:0.19268,H._sapiens:0.11927):0.08386):0.06124):0.15057):0.54939, Rodent:1.21460)
When phylogenies are created, loaded and handled in ape
, their objects are represented as a list of class phylo
. Let’s try reading the following text string of a tree (is it in Newick or in NEXUS format?) and see what R
tells us when we call it:
tree.primates <- read.tree(text="((((Homo:0.21, Pongo:0.21):0.28, Macaca:0.49):0.13,
Ateles:0.62):0.38, Galago:1.00);")
tree.primates
Phylogenetic tree with 5 tips and 4 internal nodes.
Tip labels:
[1] "Homo" "Pongo" "Macaca" "\nAteles" "Galago"
Rooted; includes branch lengths.
A class phylo
object distributes the information of phylogenetic trees into six main components:
str(tree.primates)
List of 4
$ edge : int [1:8, 1:2] 6 7 8 9 9 8 7 6 7 8 ...
$ edge.length: num [1:8] 0.38 0.13 0.28 0.21 0.21 0.49 0.62 1
$ Nnode : int 4
$ tip.label : chr [1:5] "Homo" "Pongo" "Macaca" "\nAteles" ...
- attr(*, "class")= chr "phylo"
- attr(*, "order")= chr "cladewise"
Since phylo
is a list, we can access its elements by using the $
symbol. We can access the matrix edge
by running phylo_object$edge
, which hasthe beginning and ending node number for all the nodes and tips of a given tree. In summary, this allows us to keep track of the internal and external nodes of the tree:
tree.primates$edge
[,1] [,2]
[1,] 6 7
[2,] 7 8
[3,] 8 9
[4,] 9 1
[5,] 9 2
[6,] 8 3
[7,] 7 4
[8,] 6 5
The vector tip.label
element contains all the labels for all tips in the tree (i.e. the names of your OTUs):
tree.primates$tip.label
[1] "Homo" "Pongo" "Macaca" "\nAteles" "Galago"
There is a little error in the tip labels, right? We can easily rewrite the labels with this:
tree.primates$tip.label <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")
Another important element is the vector edge.length
, which indicates the length of each edge with relation to the root or base of the tree:
tree.primates$edge.length
[1] 0.38 0.13 0.28 0.21 0.21 0.49 0.62 1.00
Finally, Nnode
contains the number of internal nodes in the tree:
tree.primates$Nnode
[1] 4
One can see all the important information a phylo
object carries from a tree by plotting it with all its elements:
plot(tree.primates,
edge.width = 2,
label.offset = 0.05,
type = "cladogram")
nodelabels()
tiplabels()
add.scale.bar()
There are other classes that store trees in R, such as phylo4
and phylo4d
objects. We will talk about them later in the course.
Sometimes, you may be interested in simulating trees. There are various ways of creating trees through simulations in R
. Let’s take a look at phytools
’s pbtree
:
sampleTree <- pbtree(n = 20, nsim = 1)
plot(sampleTree)
Handling and preparing phylogenies
Now that we know the main characteristics of a phylogenetic tree and of a phylo
object, let’s dive into more details on the structure of phylogenetic trees.
Dicotomous and polytomous trees
While some phylogenetic analyses are able to handle polytomies (i.e. more than two OTUs originating from the same node in a tree; or an unresolved clade), one may be interesting in solving the polytomies to produce a binary tree.
t1 <- read.tree(text = "((A,B,C),D);")
# See how A, B and C are within the same round brackets,
# *i.e.* departing from the same node.
plot(t1,
type = "cladogram")
One may check if a tree is binary by using is.binary.tree
function and, in case it is FALSE
, randomly resolve polytomies using the multi2d
function:
is.binary.tree (t1)
[1] FALSE
t2 <- multi2di(t1)
plot(t2,
type = "cladogram")
Ultrametric, non-ultrametric and uninformative trees
Ultrametric trees (or approximates of them) can be used to infer both braching and temporal patterns of evolutionary history for a given clade. For a tree to be ultrametric, its matrix has to follow the following strong assumptions:
- Each tip has to be uniquely labelled;
- Each internal node of the tree diverges towards at least two “children”, i.e. is dicotomous;
- Along any path from the root to a tip (or leaf), the numbers labelling internal nodes strictly decrease, i.e. time must strictly increase along the path from the root;
- The distance from the root to all tips of the tree is constant for all clades.
In many ultrametric trees, one can assume that all extant species are sampled at the present time, and branch lengths represent time calibrated in millions of years.
This information is obtained when we infer plausible history from data that reflects time since divergence (i.e. reconstruct evolutionary history based on the molecular clock theory).
When a tree violates one of the above assumptions (especially, number 4), a tree is then considered to be non-ultrametric. For instance, a non-ultrametric tree will have branches with different distances towards the tips. While this information reflects different rates of evolution across the phylogeny, many analyses will require an ultrametric tree (i.e. where models assume constant variance and equal means at the tips).
In these cases, one may be interested in “smoothing out” these differences across the evolutionary history to transform a non-ultrametric tree into a ultrametric tree.
One method is to use the mean path length (MPL) that assumes random mutation and a fixed molecular clock to calculate the age of a node as the mean of all the distances from this node to all tips diverging from it. To run this, we use the ape
function chronoMPL
. MPL sometimes returns negative branch lengths, meaning that it should be used with caution.
Another function we can use is chronos
, also from the ape
package. chronos
uses the penalized maximum likelihood method to estimate divergences times.
Let’s try these methods by first creating a random tree, then tansforming its edge lengths using both chronoMPL
and chronos
functions, and plotting them side-by-side:
tree.NonUlt <- rtree(7)
tree.Ult.MPL <- chronoMPL(tree.NonUlt)
tree.Ult.S <- chronos(tree.NonUlt)
Setting initial dates...
Fitting in progress... get a first set of estimates
Penalised log-lik = -9.270834
Optimising rates... dates... -9.270834
Done.
par(mfrow=c(1,3))
plot(tree.NonUlt, edge.width = 2,
cex = 2,
main = "A) Non-ultrametric",
cex.main = 2)
plot(tree.Ult.MPL, edge.width = 2,
cex = 2,
main = "B) Ultrametric chronoMPL",
cex.main = 2)
plot(tree.Ult.S, edge.width = 2,
cex = 2,
main = "C) Ultrametric chronos",
cex.main = 2)
You may verify if any tree is ultrametric using is.ultrametric
:
is.ultrametric(tree.NonUlt)
[1] FALSE
is.ultrametric(tree.Ult.MPL)
[1] TRUE
is.ultrametric(tree.Ult.S)
[1] TRUE
Sometimes, trees may have no branch lengths available, but only the divergence relationships. We often refer to these trees as being non-informative trees (despite the fact that they can still give us some information about branching events):
tree.NonInf <- rtree(7)
tree.NonInf$edge.length <- NULL
plot(tree.NonInf, edge.width = 2,
cex = 2,
main = "D) Uninformative tree",
cex.main = 2)
Edge length and rate transformations
Very oten, we may also be interested in altering the edge lengths of a given tree for a number of reasons. We may be interested, for example, in chaning the edge distance from molecular distances to time. Alternatively, we may want to change edge lengths to meet specific evolutionary models - let’s say, we may seek to assess whether a given ecological pattern might be explained by recent evolutionary changes [i.e. resembling a Pagel’s \(\delta\) time-dependent model, (Pagel 1999)], by branching events (i.e., diversification rates; by changing Pagel’s \(\kappa\)) or by the phylogenetic structure of a tree (by rescaling a tree using Pagel’s \(\lambda\)).
We can use the function rescale
to perform changes in the branches of a phylogenetic tree.
Pagel’s \(\delta\) time-dependent model rescales a tree by raising all node depths to an estimated power greater than 1 (\(\delta\)). A value greater than 1 means that recent evolution has been relatively fast; and if delta is less than 1, recent evolution has been comparatively slow.
We can try the \(\delta\) transformation in a random ultrametric tree:
library(geiger)
tree.Ult1 <- rcoal(25)
tree.Ult1.Delta.01 <- rescale(tree.Ult1,
model = "delta", 0.1) # Setting power of 0.1
tree.Ult1.Delta.1 <- rescale(tree.Ult1,
model = "delta", 1) # with power 1; compare this tree with the original tree
tree.Ult1.Delta.10 <- rescale(tree.Ult1,
model = "delta", 10) # with power 10
# Representing all three trees
par(mfrow = c(1,3), cex.main = 2)
plot(tree.Ult1.Delta.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("A) ", delta," = 0.1")))
plot(tree.Ult1.Delta.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("B) ", delta," = 1")))
plot(tree.Ult1.Delta.10,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("C) ", delta," = 10")))
Note how \(\delta\) = 1 returns the original tree, \(\delta\) < 1 extends the branch lengths of the edges, and \(\delta\) > 1 shortens the edge lengths of the tree.
Pagel’s \(\kappa\) punctuational (speciational) model of trait evolution focus on the number of speciation events between species to assess whether different rates of evolution are associated with branching events. This transformation raises all branch lengths to an estimated power.
tree.Ult1.kappa.01 <- rescale(tree.Ult1,
model = "kappa", 0.1) # Kappa of 0.1
tree.Ult1.kappa.1 <- rescale(tree.Ult1,
model = "kappa", 1) # Kappa of 1
tree.Ult1.kappa.2 <- rescale(tree.Ult1,
model = "kappa", 2) # Kappa of 2
# Represent these trees
par(mfrow = c(1,3),
cex.main = 2)
plot(tree.Ult1.kappa.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("D) ", kappa," = 0.1")))
plot(tree.Ult1.kappa.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("E) ", kappa," = 1")))
plot(tree.Ult1.kappa.2,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("F) ", kappa," = 2")))
Finally, by rescaling a tree using Pagel’s \(\lambda\), one may change the structure of a phylogenetic tree to reduce the overall contribution of internal edges relative to the terminal edges of a given tree.
A Pagel’s \(\lambda\) of closer to 0 removes such structure from the tree, making it resemble a “star” phylogeny; while the closest \(\lambda\) is to 1, the more the rescaled tree resembles the original. Let’s try it:
tree.Ult1.lambda.01 <- rescale(tree.Ult1, model = "lambda", 0.1)
tree.Ult1.lambda.05 <- rescale(tree.Ult1, model = "lambda", 0.5)
tree.Ult1.lambda.1 <- rescale(tree.Ult1, model = "lambda", 1)
# Represent the above trees
par(mfrow = c(1,3),
cex.main = 2)
plot(tree.Ult1.lambda.01,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("G) ", lambda," = 0.1")))
plot(tree.Ult1.lambda.05,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("H) ", lambda," = 0.5")))
plot(tree.Ult1.lambda.1,
show.tip.label = FALSE,
edge.width = 2,
cex = 2,
main = expression(paste("I) ", lambda," = 1")))
The above examples are our opening door to start understanding three key features of evolutionary change: the tempo (δ), mode (κ) and the phylogenetic signal (λ) of the evolution (of a trait) (Pagel, 1994, 1999).
You can compare all our transformations with the figure below:
Representing trees and other class objects
There are multiple ways for representing trees in R. You can access the help page of any R function we are using by running the function name preceeded by question mark (?
), e.g. ?plot.phylo
.
treeVertebrates <- read.tree(text = "(((((((Cow, Pig), Whale),
(Bat,(Lemur, Human))),
(Robin, Iguana)), Coelacanth),
Gold fish), Shark);")
par(mfrow = c(2,2))
# Using the plot function, which is based on plot.phylo
plot(treeVertebrates, no.margin = TRUE,
edge.width = 2)
# Plotting with roundPhylogram
roundPhylogram(treeVertebrates)
# Removing the root of the tree
plot(unroot(treeVertebrates),
type="unrooted",
no.margin = TRUE, lab4ut = "axial",
edge.width=2)
# Posting a type fan tree
plotTree(treeVertebrates,
type="fan",
fsize=0.7,
lwd=1,
ftype="i")
One package that allows a fast and easily changeable visualization of phylogenetic trees is the ggtree
package, which is based on the popular ggplot2
package. To better work with ggtree
, you should have a basic familiarity with the Grammar of Graphics and with ggplot2
. I recommend the QCBS R Workshop material on ggplot2
, which helped me acquire a general knowledge on this package.
You can access the documentation for ggtree
here.
We can use ggtree
, for example, to attribute colors to different taxonomic groups (in this case, genera of bats).
library(ggtree)
data(chiroptera, package="ape")
# Separate the genus names from species names
groupInfo <- split(chiroptera$tip.label,
gsub("_\\w+", "", chiroptera$tip.label))
head(groupInfo)
$Acerodon
[1] "Acerodon_celebensis" "Acerodon_humilis" "Acerodon_jubatus"
[4] "Acerodon_leucotis" "Acerodon_mackloti"
$Aethalops
[1] "Aethalops_alecto"
$Alionycteris
[1] "Alionycteris_paucidentata"
$Ametrida
[1] "Ametrida_centurio"
$Amorphochilus
[1] "Amorphochilus_schnablii"
$Anoura
[1] "Anoura_caudifer" "Anoura_cultrata" "Anoura_geoffroyi"
[4] "Anoura_latidens"
# Group species accordingly to their genera
chiroptera <- groupOTU(chiroptera,
groupInfo)
# Plot tree using ggtree
ggtree(chiroptera,
aes(color = group),
layout='circular') +
geom_tiplab(size = 1,
aes(angle = angle))
The picante
package
One of the core packages for hypothesis testing in ecophylogenetics is picante
(Kembel et a. 2010). picante
mainly works with three types of objects: the phylogeny data as a phylo
class object; the community presence-absence (binary) or abundance matrix; and a species-trait matrix.
The pez
package
pez
is a package that contains many novel and wrappers functions for phylogenetic analyses in community data (Pearse et al. 2015). It has useful cuntions that will help us keep the different datasets we will be working on matching.
Other classes of objects
phylo4d
and phylo4
are recent classes developed based on the phylo
class that seeks to standardize and unify the representation of comparative data and phylogenetic trees in R
.
Exercises
We are short in time! I will add a few exercises here later.
Community Phylogenetic Patterns
Ecologists share the goal of understanding and describing the processes that create the uneven distribution patterns of species, which create the variation in abundance, identity and diversity we know today.
Among many factors that determine the presence of taxa in communities are the abiotic and biotic conditions of such communities (e.g., Diamond 1975; Westoby and Wright 2006; but see Hubbell 2001).
Since taxa that recently diverged tend to be ecologically similar (Darwin 1859; Lord et al. 1995; Wiens and Graham 2005), a direct link may exist between the evolutionary relatedness of organisms in a community, the characters they possess, and the ecological processes that determine their distribution and abundance.
In fact, many communities exhibit nonrandom patterns of evolutionary relatedness among co-occurring species (reviewed in Webb et al. 2002): a phenomenon we refer to as phylogenetic community structure.
Preparing data for community phylogenetic analyses
For this analysis, we will use the data on the Jasper Ridge plant communities available in the book “Phylogenies in Ecology”, written by Marc Cadotte and Jonathan Davies (2016).
JasperPlants.tree <- read.tree("data/Jasper/resources/data/Jasper/jasper_tree.phy")
JasperPlants.comm <- read.csv("data/Jasper/resources/data/Jasper/jasper_data.csv",
row.names = 1)
Before proceeding to our analyses, we have to make sure that both the community and phylogenetic data match. For this, the names of the tips should match the species names in the dataset. This also means that species present in one dataset should not be absent in the other.
(You thought you were done preparing your dataset, eh?)
For this, we can use the function drop.tip
from the ape
package:
JasperPlants.cleanTree <- drop.tip(phy = JasperPlants.tree,
tip = setdiff(JasperPlants.tree$tip.label,
colnames(JasperPlants.comm)))
You can also match your datasets using the function match.phylo.comm
from picante
, or the function comparative.comm
from pez
.
# Using picante
JasperPlants.picCleanTree <- match.phylo.comm(phy = JasperPlants.tree,
comm = JasperPlants.comm)$phy
JasperPlants.picCleanComm <- match.phylo.comm(phy = JasperPlants.tree,
comm = JasperPlants.comm)$comm
# Do the same using pez
# JasperPlants.pezCleanComm <- comparative.comm(phy = JasperPlants.tree,
# comm = as.matrix(JasperPlants.comm))
plot(JasperPlants.picCleanTree,
cex = 0.4)
JasperPlants.picCleanComm[1:4, 1:4]
Geranium_dissectum Erodium_botrys Anagallis_arvensis
J11 20 10 5
J110 0 15 0
J12 20 20 0
J13 0 40 0
Epilobium_brachycarpum
J11 0
J110 0
J12 1
J13 0
Phylogenetic \(\alpha\) diversity (PD)
One simple way to measure evolutionary diversity within a community is through Daniel Faith’s PD metric (Faith, 1992; http://goo.gl/wM08Oy).
PD sums the branch lenghts of all co-occurring species in a given site, from the tips to the root of the phylogenetic tree.
Higher PD values are given to communities that has more evolutionary divergent taxa and older history, while lower PD values represent assemblages that have taxa with more recent evolutionar history.
Faith’s PD can be implemented using the pd
function in the picante
package.
In addition to Faith’s PD, the pd
function also returns species richness (SR).
SR is the same as observed richness (\(S_{obs}\)) or \(\alpha\) diversity.
Jasper.PD <- pd(JasperPlants.picCleanComm, JasperPlants.picCleanTree,
include.root = FALSE)
head(Jasper.PD) # See the first six rows of the resulting matrix
PD SR
J11 831.4887 13
J110 501.8703 9
J12 769.9249 14
J13 712.0532 13
J14 640.5896 11
J15 640.2884 13
And how PD estimates compares with SR?
cor.test(Jasper.PD$PD, Jasper.PD$SR)
Pearson's product-moment correlation
data: Jasper.PD$PD and Jasper.PD$SR
t = 9.0965, df = 28, p-value = 7.457e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7321097 0.9338472
sample estimates:
cor
0.8643903
plot(Jasper.PD$PD,
Jasper.PD$SR,
xlab = "Phylogenetic Diversity", ylab = "Species Richness",
pch = 16)
Randomizations and null models
We are often interested in assessing whether or not observed patterns differ from null expectation.picante
has many functions that allow us to test hypotheses under different types of null models:
Null Model | Description |
---|---|
taxa.labels | Shuffles taxa labels across tips of phylogeny (across all taxa included in phylogeny) |
richness | Randomizes community data matrix abundances within samples (maintains sample species richness) |
frequency | Randomizes community data matrix abundances within species (maintains species occurence frequency) |
sample.pool | Randomizes community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability |
phylogeny.pool | Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability |
independentswap | Randomizes community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness |
trialswap | Randomizes community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness |
We will start using null models to calculate the standardized effect size (ses) of PD for each community. For this, we can use the ses.pd
function in picante
:
# This is a computationally intensive process and take some time to run
Jasper.ses.pd <- ses.pd(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree,
null.model = "taxa.labels",
runs = 1000)
head(Jasper.ses.pd)
ntaxa pd.obs pd.rand.mean pd.rand.sd pd.obs.rank pd.obs.z
J11 13 831.4887 793.8102 64.00612 713 0.5886711
J110 9 501.8703 624.4736 58.22656 23 -2.1056247
J12 14 769.9249 832.2727 63.65673 165 -0.9794380
J13 13 712.0532 791.8934 64.23287 114 -1.2429803
J14 11 640.5896 712.3946 60.74966 123 -1.1819819
J15 13 640.2884 793.5430 61.60145 8 -2.4878408
pd.obs.p runs
J11 0.712287712 1000
J110 0.022977023 1000
J12 0.164835165 1000
J13 0.113886114 1000
J14 0.122877123 1000
J15 0.007992008 1000
Phylogenetic community structure
Another way of looking into the phylogenetic structure of communities is to quantify at what degree closely related taxa co-occur.
The basis for patterns of phylogenetic community structure can lie in a simple framework (table 1; Webb et al. 2002): when species traits responsible for their physiological tolerances are conserved, an environmental filtering that limits the range of viable ecological strategies at a given site is expected to select co-occurring species more related than expected by chance, i.e. generate a pattern known as phylogenetic clustering.
On the other hand, competitive exclusion can limit the ecological similarity of co-occurring species, generating a pattern of phylogenetic overdispersion or phylogenetic evenness.
Finally, when traits are diverging faster across the evolutionary time, the effects of habitat filtering should be weaker, producing evenly dispersed patterns of relatedness; while competition or limiting similarity is expected to produce random or clustered patterns.
If communities are assembled independently with respect to traits (e.g., Hubbell 2001), then patterns of relatedness should be resemble random expectation.
Over the recent years, other processes were shown to drive phylogenetic patterns of community structure, such as, isolation, time since isolation, diversification, scale-dependent assembly and local and regional composition and richness.
Assembly.process | Under.niche.conservatism | Under.niche.divergence |
---|---|---|
Limiting similarity | Even dispersion | Random or clustered dispersion |
Habitat filtering | Clustered dispersion | Even dispersion |
Neutral assembly | Random dispersion | Random dispersion |
Phylogenetic resemblance matrix
An important step prior to the estimation of the phylogenetic structure of communities is to create a phylogenetic resemblance matrix, or a phylogenetic variance-covariance matrix. A phylogenetic variance-covariance matrix is nothing more than the calculated distances between taxa in a tree.
In R, we commonly refer to this matrix as a matrix of cophenetic distances among taxa and we can use the function cophenetic.phylo
from picante
to calculate it:
# Create a Phylogenetic Distance Matrix {picante}
JasperPlants.cophenDist <- cophenetic.phylo(JasperPlants.picCleanTree)
Nearest taxon index (NTI)
Another way to assess the phylogenetic structure of communities is through the calculation of the nearest taxon index (NTI). NTI differs from NRI by being based on the mean nearest neighbour phylogenetic distance (MNTD), instead of the overall mean pairwise distances (MPD).
MNTD is obtained by calculating the mean phylogenetic distance of each taxon of a given community to its nearest neighbour in a tree. By doing this, one can observe that NTI is more sensitive to the clustering or overdispersion patterns happening closer to the tips of a tree, while NRI represents an overall pattern of structuring across the whole phylogenetic tree.
As in NRI, higher values of NRI indicate phylogenetic clustering and lower values of NRI indicate phylogenetic overdispersion.
# Estimate Standardized Effect Size of the MNTD on the cophenetic matrix
JasperPlants.ses.mntd <- ses.mntd(JasperPlants.picCleanComm,
JasperPlants.cophenDist,
null.model = "taxa.labels",
abundance.weighted = FALSE,
runs = 100)
head(JasperPlants.ses.mntd)
ntaxa mntd.obs mntd.rand.mean mntd.rand.sd mntd.obs.rank mntd.obs.z
J11 13 76.65059 80.65936 11.55970 41 -0.3467888
J110 9 71.32136 91.26041 16.29823 11 -1.2233870
J12 14 67.04067 77.98757 10.25801 17 -1.0671567
J13 13 65.27537 82.00706 10.58893 9 -1.5801110
J14 11 74.54219 85.16600 14.93720 25 -0.7112315
J15 13 47.69848 80.04591 10.58640 1 -3.0555649
mntd.obs.p runs
J11 0.40594059 100
J110 0.10891089 100
J12 0.16831683 100
J13 0.08910891 100
J14 0.24752475 100
J15 0.00990099 100
# Calculate NTI
JasperPlants.NTI <- as.matrix(-1 *
((JasperPlants.ses.mntd[,2] - JasperPlants.ses.mntd[,3]) / JasperPlants.ses.mntd[,4]))
rownames(JasperPlants.NTI) <- row.names(JasperPlants.ses.mntd)
colnames(JasperPlants.NTI) <- "NTI"
head(JasperPlants.NTI)
NTI
J11 0.3467888
J110 1.2233870
J12 1.0671567
J13 1.5801110
J14 0.7112315
J15 3.0555649
Phylogenetic beta diversity (PBD)
One may not be interested only in the patterns of \(\alpha\) diversity, but in understanding how communities are evolutionary distinct from each other, i.e. knowing the pairwise phylogenetic \(\beta\) diversity.
Many metrics are available to assess the phylogenetic \(\beta\) diversity of communities. Here, we will do this by calculating the PhyloSor index, which is the phylogenetic analog of the Sørensen index (Bryant et al., 2008; Swenson, 2011; Feng et.al., 2012). PhyloSor can be calculated using the function phylosor
from picante
:
Jasper.pbd <- phylosor(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree)
as.matrix(Jasper.pbd)[1:7,1:7]
J11 J110 J12 J13 J14 J15 J16
J11 0.0000000 0.6381010 0.7884823 0.8098831 0.8232718 0.7058630 0.8401553
J110 0.6381010 0.0000000 0.7032579 0.7367845 0.7567888 0.7165036 0.6524526
J12 0.7884823 0.7032579 0.0000000 0.8166769 0.8852419 0.8188865 0.8371251
J13 0.8098831 0.7367845 0.8166769 0.0000000 0.8727346 0.7124985 0.7688849
J14 0.8232718 0.7567888 0.8852419 0.8727346 0.0000000 0.7543848 0.8751975
J15 0.7058630 0.7165036 0.8188865 0.7124985 0.7543848 0.0000000 0.6949025
J16 0.8401553 0.6524526 0.8371251 0.7688849 0.8751975 0.6949025 0.0000000
We can go one step further and test whether the observed PBD values differ from null expectation under different null models. First, we will use the function phylosor.rnd
to obtain PhyloSor values of phylogenetic beta-diversity obtained by randomization:
Jasper.pbd.rndTaxa <- phylosor.rnd(samp = JasperPlants.picCleanComm,
tree = JasperPlants.picCleanTree,
cstSor = TRUE,
null.model = "taxa.labels", # Pay attention to the null models!
runs = 100)
Apparently, the a standardized effect size function is not available within picante
. Thus, we will create our own null model to test our hypothesis:
# I first ran ses.pd to build this in a more-or-less similar format
ses.phylosor <- function(obs, rand){
rand <- t(as.data.frame(lapply(rand, as.vector)))
phySor.obs <- as.numeric(obs)
phySor.mean <- apply(rand, MARGIN = 2,
FUN = mean,
na.rm = TRUE)
phySor.sd <- apply(rand,
MARGIN = 2,
FUN = sd,
na.rm = TRUE)
phySor.ses <- (phySor.obs - phySor.mean)/phySor.sd
phySor.obs.rank <- apply(X = rbind(phySor.obs, rand),
MARGIN = 2,
FUN = rank)[1, ]
phySor.obs.rank <- ifelse(is.na(phySor.mean), NA, phySor.obs.rank)
data.frame(phySor.obs,
phySor.mean,
phySor.sd,
phySor.obs.rank,
phySor.ses,
phySor.obs.p = phySor.obs.rank/(dim(rand)[1] + 1))
}
JasperPlants.ses.pbd <- ses.phylosor(Jasper.pbd,
Jasper.pbd.rndTaxa)
# Project the table using knitr::kable
round(JasperPlants.ses.pbd[45:55,], 3) %>%
kable("html") %>%
kable_styling()
phySor.obs | phySor.mean | phySor.sd | phySor.obs.rank | phySor.ses | phySor.obs.p | |
---|---|---|---|---|---|---|
45 | 0.527 | 0.589 | 0.060 | 17 | -1.043 | 0.168 |
46 | 0.647 | 0.601 | 0.055 | 83 | 0.823 | 0.822 |
47 | 0.589 | 0.584 | 0.063 | 49 | 0.083 | 0.485 |
48 | 0.628 | 0.624 | 0.058 | 51 | 0.078 | 0.505 |
49 | 0.652 | 0.736 | 0.040 | 3 | -2.136 | 0.030 |
50 | 0.687 | 0.712 | 0.050 | 31 | -0.500 | 0.307 |
51 | 0.770 | 0.757 | 0.048 | 58 | 0.268 | 0.574 |
52 | 0.671 | 0.724 | 0.044 | 13 | -1.191 | 0.129 |
53 | 0.665 | 0.692 | 0.045 | 31 | -0.600 | 0.307 |
54 | 0.575 | 0.647 | 0.048 | 7 | -1.476 | 0.069 |
55 | 0.649 | 0.671 | 0.051 | 32 | -0.431 | 0.317 |
Decomposing phylogenetic beta diversity
Our questions about phylogenetic can become even more complex!
One may also be interested in decomposing the turnover and nestedness components of beta diversity (Baselga, 2010; Leprieur et al., 2012). Baselga (2010) proposed an additive partitioning framework that consists in decomposing the pair-wise Sørensen’s dissimilarity index into two additive components accounting for (i) ‘true’ turnover of species and (ii) richness differences between nested communities. In 2012, Leprieur et al. (2012) extended Baselga’s (2010) framework to decompose PhyloSor into two components accounting for ‘true’ phylogenetic turnover and phylogenetic diversity gradients, respectively.
Still not covered in this tutorial!
Trait Evolution
One central concept in ecophylogenetics is the non‐independence among species traits because of their phylogenetic relatedness (Felsenstein 1985; Revell, Harmon & Collar 2008).
For this part of the workshop, we will use our newly acquired knowledge in the Edge length and rate transformations section about Pagel’s \(\lambda\), \(\kappa\) and \(\delta\). We will also see about alternative evolutionary models and learn how to test whether phylogenetic signal is present in species traits, as well as, fit traits to different models of evolution.
Dataset preparation
We are now introducing a third-type of data in our phylogenetic analyses: traits. Here we will use a different dataset.
For this part of the course, we will fetch our dataset from papers using a very useful R package: fulltext
. fulltext
is maintained by Scott Chamberlain, Will Pearse and Katrin Leinweber, and is able to get files directly from the supplementary material of published papers, with only the DOI references!
# The idea of using this dataset was borrowed from Will Pearse's minicourse on ecophylogenetics.
# Nevertheless, we will use it for further analyses here!
library('fulltext')
MCDB.comm <- read.csv(ft_get_si("E092-201", "MCDB_communities.csv", "esa_data_archives"))
MCDB.species <- read.csv(ft_get_si("E092-201", "MCDB_species.csv", "esa_data_archives"))
Mammalian.tree <- read.nexus("data/ele_1307_sm_sa1.tre")[[1]] # fulltext is not fetching files from Wiley
PanTHERIA.traits <- read.delim(ft_get_si("E090-184", "PanTHERIA_1-0_WR05_Aug2008.txt", "esa_archives"))
What have we just done?
In MCDB.comm
, we have loaded 7977 records of species composition and abundance of mammalian communities (excluding bats) for 1000 sites throughout the world, which was compiled from published literature.
MCDB.species
contains 700 records that includes the name of species and their taxonomic information.
This data is part of the ecological archive of the paper by Thibault et al. (2011)
(Katherine M. Thibault, Sarah R. Supp, Mikaelle Giffin, Ethan P. White, and S. K. Morgan Ernest. 2011. Species composition and abundance of mammalian communities. Ecology 92:2316.)
Mammalian.tree
loaded the first tree described in the published paper by Fritz et al. 2009
(Fritz, S. A., Bininda-Emonds, O. R. P. and Purvis, A. (2009), Geographical variation in predictors of mammalian extinction risk: big is bad, but only in the tropics. Ecology Letters 2(6):538–549. doi: 10.1111/J.1461-0248.2009.01307)
And, finally, PanTHERIA.traits
contain a species-level data set of life history, ecology, and geography of extant and recently extinct mammals. The metadata for this dataset can be consulted in its website.
Now that we know about our dataset, we can explore each one of the loaded objects:
head(MCDB.comm)
Site_ID Initial_year Species_ID Presence_only Abundance Mass
1 1008 2002 CHPE 0 52 NULL
2 1008 2002 CHSX 0 4 NULL
3 1008 2002 DIDE 0 12 NULL
4 1008 2002 DIME 0 20 NULL
5 1008 2002 PELO 0 5 NULL
6 1008 2002 PEPA 0 9 NULL
head(MCDB.species)
Species_ID Family Genus Species Species_level
1 ABBE Abrocomidae Abrocoma bennettii 1
2 ABLO Cricetidae Abrothrix longipilis 1
3 ABOL Cricetidae Abrothrix olivaceus 1
4 ACSP Muridae Acomys spinosissimus 1
5 ACWI Muridae Acomys wilsoni 1
6 ACPY Acrobatidae Acrobates pygmaeus 1
str(Mammalian.tree)
List of 4
$ edge : int [1:7522, 1:2] 5021 5022 5023 5023 5024 5024 5022 5021 5025 5026 ...
$ edge.length: num [1:7522] 102.6 49.9 13.7 0.2 13.5 ...
$ Nnode : int 2503
$ tip.label : chr [1:5020] "Tachyglossus_aculeatus" "Zaglossus_bruijni" "Zaglossus_bartoni" "Ornithorhynchus_anatinus" ...
- attr(*, "class")= chr "phylo"
- attr(*, "order")= chr "cladewise"
str(PanTHERIA.traits)
'data.frame': 5416 obs. of 55 variables:
$ MSW05_Order : Factor w/ 29 levels "Afrosoricida",..: 2 3 3 3 3 2 2 2 23 23 ...
$ MSW05_Family : Factor w/ 153 levels "Abrocomidae",..: 17 18 18 18 18 12 12 12 117 117 ...
$ MSW05_Genus : Factor w/ 1230 levels "Abditomys","Abeomelomys",..: 151 152 152 152 152 117 117 117 139 139 ...
$ MSW05_Species : Factor w/ 3802 levels "abae","abbreviatus",..: 979 34 290 1804 1946 1191 1362 1662 828 948 ...
$ MSW05_Binomial : Factor w/ 5416 levels "Abditomys latidens",..: 563 564 565 566 567 409 410 411 476 477 ...
$ X1.1_ActivityCycle : num 3 1 2 2 2 2 -999 2 3 -999 ...
$ X5.1_AdultBodyMass_g : num 492714 10392 9659 11989 31757 ...
$ X8.1_AdultForearmLen_mm : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X13.1_AdultHeadBodyLen_mm : num -999 745 828 872 1055 ...
$ X2.1_AgeatEyeOpening_d : num -999 -999 7.5 11.9 14 ...
$ X3.1_AgeatFirstBirth_d : num 1652 -999 -999 365 548 ...
$ X18.1_BasalMetRate_mLO2hr : num 40293 -999 -999 3699 11254 ...
$ X5.2_BasalMetRateMass_g : num 407000 -999 -999 10450 33100 ...
$ X6.1_DietBreadth : num 3 6 6 1 1 3 2 2 -999 -999 ...
$ X7.1_DispersalAge_d : num -999 330 -999 255 180 ...
$ X9.1_GestationLen_d : num 386.5 65 61.2 61.7 63.5 ...
$ X12.1_HabitatBreadth : num 1 1 1 1 1 -999 -999 1 1 -999 ...
$ X22.1_HomeRange_km2 : num 196.32 1.01 2.95 18.88 159.86 ...
$ X22.2_HomeRange_Indiv_km2 : num -999 1.01 3.13 19.91 43.13 ...
$ X14.1_InterbirthInterval_d : num 614 -999 365 365 365 ...
$ X15.1_LitterSize : num 0.98 4.5 3.74 5.72 4.98 1.22 1 1.22 1.01 -999 ...
$ X16.1_LittersPerYear : num 1 -999 -999 -999 2 1 1 1 -999 -999 ...
$ X17.1_MaxLongevity_m : num 480 137 192 262 354 ...
$ X5.3_NeonateBodyMass_g : num 36751 -999 212 200 412 ...
$ X13.2_NeonateHeadBodyLen_mm : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X21.1_PopulationDensity_n.km2: num 0.98 0.74 0.22 0.25 0.01 0.54 -999 0.75 4.89 -999 ...
$ X10.1_PopulationGrpSize : num 11 -999 -999 -999 -999 -999 -999 21 -999 -999 ...
$ X23.1_SexualMaturityAge_d : num 1948 250 371 373 679 ...
$ X10.2_SocialGrpSize : num 10 -999 -999 -999 -999 40 110 40 2.05 -999 ...
$ X24.1_TeatNumber : int -999 8 8 8 9 -999 -999 -999 -999 -999 ...
$ X12.2_Terrestriality : num 1 1 1 1 1 -999 -999 1 2 -999 ...
$ X6.2_TrophicLevel : int 1 2 2 3 3 1 1 1 -999 -999 ...
$ X25.1_WeaningAge_d : num 389.4 52.9 61.3 43.7 44.8 ...
$ X5.4_WeaningBodyMass_g : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X13.3_WeaningHeadBodyLen_mm : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ References : Factor w/ 2589 levels "-999","1;13;134;484;543;820;883;890;1154;1155;1156;1157;1158;1297;1849;2018;2151;2620;2655;3079",..: 1470 1564 1678 1152 1146 1478 1529 1513 393 1 ...
$ X5.5_AdultBodyMass_g_EXT : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X16.2_LittersPerYear_EXT : num -999 -999 1.1 1.1 -999 -999 -999 -999 1.05 -999 ...
$ X5.6_NeonateBodyMass_g_EXT : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X5.7_WeaningBodyMass_g_EXT : num -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 ...
$ X26.1_GR_Area_km2 : num -999 10581413 25739527 17099094 50803440 ...
$ X26.2_GR_MaxLat_dd : num -999 16.7 47 71.4 83.3 ...
$ X26.3_GR_MinLat_dd : num -999 -28.73 -4.71 8.02 11.48 ...
$ X26.4_GR_MidRangeLat_dd : num -999 -6 21.1 39.7 47.4 ...
$ X26.5_GR_MaxLong_dd : num -999 43.5 108.5 -67.1 179.7 ...
$ X26.6_GR_MinLong_dd : num -999 -17.5 -17.1 -168.1 -171.8 ...
$ X26.7_GR_MidRangeLong_dd : num -999 13 45.7 -117.6 3.9 ...
$ X27.1_HuPopDen_Min_n.km2 : int -999 0 0 0 0 1 0 1 0 -999 ...
$ X27.2_HuPopDen_Mean_n.km2 : num -999 35.2 79.3 27.3 37.9 ...
$ X27.3_HuPopDen_5p_n.km2 : num -999 1 0 0 0 8 0 4 0 -999 ...
$ X27.4_HuPopDen_Change : num -999 0.14 0.1 0.06 0.04 0.09 0.05 0.11 0.05 -999 ...
$ X28.1_Precip_Mean_mm : num -999 90.8 44.6 53 34.8 ...
$ X28.2_Temp_Mean_01degC : num -999 236.51 217.23 58.18 4.82 ...
$ X30.1_AET_Mean_mm : num -999 923 438 503 313 ...
$ X30.2_PET_Mean_mm : num -999 1534 1359 728 561 ...
Now, let’s treat our dataset:
(Yes, we spend a lot of time making our dataset ready for analysis!)
# Starting with the community file:
# Let's make a species-site matrix
MCDB.comm$Abundance <- as.numeric(as.character(MCDB.comm$Abundance))
head(MCDB.comm$Abundance)
[1] 52 4 12 20 5 9
# Add a new column with real species names, extracted and matched from the MCDB.species dataset
MCDB.comm$Species <- with(MCDB.species,
paste(Genus, Species)[match(MCDB.comm$Species_ID, Species_ID)])
# Have you seen "with" before? Click on the link below
# and see how this could have been done without "with".
# Tell us the difference!
MCDB.comm$Species <- gsub(" ", " ", MCDB.comm$Species)
head(MCDB.comm)
Site_ID Initial_year Species_ID Presence_only Abundance Mass
1 1008 2002 CHPE 0 52 NULL
2 1008 2002 CHSX 0 4 NULL
3 1008 2002 DIDE 0 12 NULL
4 1008 2002 DIME 0 20 NULL
5 1008 2002 PELO 0 5 NULL
6 1008 2002 PEPA 0 9 NULL
Species
1 Chaetodipus penicillatus
2 Chaetodipus spp.
3 Dipodomys deserti
4 Dipodomys merriami
5 Perognathus longimembris
6 Perognathus parvus
Looking good? It still does not look like a species by site matrix, right? Let’s transform the sample dataset into a matrix using picante
’s sample2matrix
function:
MCDB.comm <- with(MCDB.comm,
as.matrix(sample2matrix(
data.frame(Site_ID, # This will be our row.names
as.numeric(as.character(Abundance)),
Species))))
# How is this looking now?
MCDB.comm[1:6, 1:6]
Abrocoma bennettii Abrothrix longipilis Abrothrix olivaceus
1001 0 0 0
1002 0 0 0
1003 0 0 0
1004 0 0 0
1005 0 0 0
1006 0 0 0
Acomys spinosissimus Acomys wilsoni Acrobates pygmaeus
1001 0 0 0
1002 0 0 0
1003 0 0 0
1004 0 0 0
1005 0 0 0
1006 0 0 0
Now, let’s go to the phylogenetic tree:
# We have noticed that the labels of the phylogenetic tree
# has a different separator than the community dataset
# Let's replace the "_" in the phylogenetic tree by " "
Mammalian.tree$tip.label <- gsub("_", " ", Mammalian.tree$tip.label)
# Solve potential polytomies
Mammalian.tree <- multi2di(Mammalian.tree)
Mammalian.tree
Phylogenetic tree with 5020 tips and 5019 internal nodes.
Tip labels:
Tachyglossus aculeatus, Zaglossus bruijni, Zaglossus bartoni, Ornithorhynchus anatinus, Anomalurus beecrofti, Anomalurus derbianus, ...
Rooted; includes branch lengths.
And, now, the trait dataset. Let’s select only a few characters of interest:
PanTHERIA.traits <- with(PanTHERIA.traits,
data.frame(BodyMass = log10(X5.1_AdultBodyMass_g),
HeadLength = (X13.1_AdultHeadBodyLen_mm),
LatitudinalRange = (X26.4_GR_MidRangeLat_dd),
mPrecipRange = (X28.1_Precip_Mean_mm),
mTempRange = (X28.2_Temp_Mean_01degC),
HomeRange = (X22.1_HomeRange_km2),
Terrestriality = (X12.2_Terrestriality),
row.names = MSW05_Binomial))
Warning in data.frame(BodyMass = log10(X5.1_AdultBodyMass_g), HeadLength =
(X13.1_AdultHeadBodyLen_mm), : NaNs produced
str(PanTHERIA.traits)
'data.frame': 5416 obs. of 7 variables:
$ BodyMass : num 5.69 4.02 3.98 4.08 4.5 ...
$ HeadLength : num -999 745 828 872 1055 ...
$ LatitudinalRange: num -999 -6 21.1 39.7 47.4 ...
$ mPrecipRange : num -999 90.8 44.6 53 34.8 ...
$ mTempRange : num -999 236.51 217.23 58.18 4.82 ...
$ HomeRange : num 196.32 1.01 2.95 18.88 159.86 ...
$ Terrestriality : num 1 1 1 1 1 -999 -999 1 2 -999 ...
# Remove "-999"" and "NaN" values
PanTHERIA.traits[PanTHERIA.traits == -999] <- NA
PanTHERIA.traits <- na.omit(PanTHERIA.traits)
str(PanTHERIA.traits)
'data.frame': 433 obs. of 7 variables:
$ BodyMass : num 4.02 3.98 4.08 4.5 2.39 ...
$ HeadLength : num 745 828 872 1055 224 ...
$ LatitudinalRange: num -6 21.14 39.7 47.38 -6.19 ...
$ mPrecipRange : num 90.8 44.6 53 34.8 144 ...
$ mTempRange : num 236.51 217.23 58.18 4.82 239.14 ...
$ HomeRange : num 1.01 2.95 1.89e+01 1.60e+02 9.96e-03 ...
$ Terrestriality : num 1 1 1 1 2 1 1 2 2 2 ...
- attr(*, "na.action")= 'omit' Named int 1 6 7 8 9 10 11 12 13 14 ...
..- attr(*, "names")= chr "Camelus dromedarius" "Bos frontalis" "Bos grunniens" "Bos javanicus" ...
Finally, let’s assure that all datasets are matching in species number and names. This time, instead of using the match.phylo.data
and match.phylo.comm
functions from picante
, we will use the comparative.comm
from pez
:
Mammal.CompData <- comparative.comm(Mammalian.tree,
MCDB.comm,
PanTHERIA.traits)
Warning in comparative.comm(Mammalian.tree, MCDB.comm, PanTHERIA.traits):
Mismatch between phylogeny and other data, dropping 4876 tips
Warning in comparative.comm(Mammalian.tree, MCDB.comm, PanTHERIA.traits):
Mismatch between community matrix and other data, dropping 556 columns
Warning in comparative.comm(Mammalian.tree, MCDB.comm, PanTHERIA.traits):
Mismatch between traits and other data, dropping 289 columns
# Also, for the moment, there is no reason to have sites
# that have no species recorded in them
Mammal.CompData <- Mammal.CompData[rowSums(comm(Mammal.CompData)) > 0, ]
Mammal.CompData <- Mammal.CompData[,colSums(comm(Mammal.CompData)) > 0]
str(Mammal.CompData)
List of 6
$ phy :List of 4
..$ edge : int [1:194, 1:2] 99 100 101 102 103 104 105 106 107 108 ...
..$ edge.length: num [1:194] 48.4 2.5 4.3 2.9 6.4 0.2 4.4 7.7 58.1 12.2 ...
..$ Nnode : int 97
..$ tip.label : chr [1:98] "Napaeozapus insignis" "Zapus hudsonius" "Apodemus sylvaticus" "Otomys irroratus" ...
..- attr(*, "class")= chr "phylo"
..- attr(*, "order")= chr "cladewise"
$ comm : num [1:694, 1:98] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:694] "1002" "1005" "1008" "1009" ...
.. ..$ : chr [1:98] "Napaeozapus insignis" "Zapus hudsonius" "Apodemus sylvaticus" "Otomys irroratus" ...
$ data :'data.frame': 98 obs. of 7 variables:
..$ BodyMass : num [1:98] 1.35 1.27 1.34 2.06 1.57 ...
..$ HeadLength : num [1:98] 89.7 85.9 87.5 163.5 101.6 ...
..$ LatitudinalRange: num [1:98] 44.4 48.34 47.53 -25.97 1.98 ...
..$ mPrecipRange : num [1:98] 70.3 58.2 57.2 50 151.1 ...
..$ mTempRange : num [1:98] 40.2 25.1 90.8 163.9 250.2 ...
..$ HomeRange : num [1:98] 0.01 0.00185 0.00711 0.0013 0.00163 ...
..$ Terrestriality : num [1:98] 1 1 2 1 1 2 1 2 1 2 ...
$ env : NULL
$ dropped:List of 5
..$ comm.sp.lost : chr(0)
..$ comm.sites.lost: chr(0)
..$ phy.sp.lost : chr(0)
..$ traits.sp.lost : chr(0)
..$ env.sites.lost : chr(0)
$ names :function (x)
- attr(*, "class")= chr [1:2] "comparative.comm" "comparative.data"
Phylogenetic signal
One way to assess the existence and degree of phylogenetic non‐independence in species traits is through the calculation of the phylogenetic signal of a given trait. Phylogenetic signal can be defined as the tendency for related species to resemble each other more than they resemble species drawn at a null or alternative evolutionary model (Blomberg & Garland 2002).
Phylogenetic signal has been commonly used to investigate questions in a wide range of research areas: How strongly are certain traits correlated with each other (Felsenstein 1985)? Which processes drive community assembly (Webb et al. 2002)? Are niches conserved along phylogenies (Losos 2008) and is vulnerability to climate change clustered in the phylogeny (Thuiller et al. 2011)?
Let’s learn about a few ways of measuring phylogenetic signal:
Moran’s I and Moran’s Correlogram
One simple way of measuring phylogenetic signal is through Moran’s I (Gittleman and Kot, 1990; Moran, 1950), which estimates the existing autocorrelation between species and their traits as a function of their phylogenetic relatedness.
Positive Moran’s \(I\) values indicate positive phylogenetic correlation between species; negative values indicate that closely related species are more different than distantly related species; and an \(I\) of \(0\) indicates absence of autocorrelation.
For this, we will use geiger
’s Morain.I
function. See an example below:
# We will first calculate a proximity matrix, where we can
# change the weight of distantly related species by changing
# the parameter alpha
alphaI <- 1
wI <- 1/cophenetic(Mammal.CompData$phy)^alphaI
diag(wI) <- 0
wI[1:5, 1:5]
Napaeozapus insignis Zapus hudsonius
Napaeozapus insignis 0.000000000 0.040983607
Zapus hudsonius 0.040983607 0.000000000
Apodemus sylvaticus 0.007112376 0.007112376
Otomys irroratus 0.007112376 0.007112376
Praomys tullbergi 0.007112376 0.007112376
Apodemus sylvaticus Otomys irroratus
Napaeozapus insignis 0.007112376 0.007112376
Zapus hudsonius 0.007112376 0.007112376
Apodemus sylvaticus 0.000000000 0.017006803
Otomys irroratus 0.017006803 0.000000000
Praomys tullbergi 0.017006803 0.017006803
Praomys tullbergi
Napaeozapus insignis 0.007112376
Zapus hudsonius 0.007112376
Apodemus sylvaticus 0.017006803
Otomys irroratus 0.017006803
Praomys tullbergi 0.000000000
# We can then calculate Moran's I
Mammal.BodyMass.I <- Moran.I(x = Mammal.CompData$data$BodyMass,
weight = wI,
scaled = TRUE)
Mammal.BodyMass.I
$observed
[1] 0.09687207
$expected
[1] -0.01030928
$sd
[1] 0.01367237
$p.value
[1] 4.440892e-15
What is the meaning of this?
We have just calculated the global Moran’s I, which allowed us to assess the existence of overall phylogenetic autocorrelation for a given trait. However, we also know that the degree of autocorrelation can change across different phylogenetic distances.
We can build a Moran’s correlogram to assess how phylogenetic autocorrelation changes across different phylogenetic distance classes.
For this, we will use the functions phyloCorrelogram
and plot.phylocorrelogram
from adephylo
:
# Calculate the correlogram under a phylo4d object
Mammal.BodyMass.MoranCor <- phyloCorrelogram(phylo4d(x = Mammal.CompData$phy,
tip.data = Mammal.CompData$data$BodyMass),
n.points = 10,
ci.bs = 10)
# Represent the correlogram
plot.phylocorrelogram(Mammal.BodyMass.MoranCor)
How do you interprete this correlogram?
Blomberg’s K
A common way to measure phylogenetic signal is through Blomberg’s K statistic (Blomberg et al. 2003), which compares the observed signal in a trait to its expected signal under multiple estimations under a Brownian motion model of trait evolution on a phylogeny.
K works as a mean square ratio (MSE), where the numerator is the error assuming that the trait evolves independently from phylogenetic structure, and the denominator is corrected by the phylogenetic covariances.
K values equal to 1 correspond to a Brownian motion process. K values greater than 1 indicate strong phylogenetic signal and that traits are being conserved through the evolutionary time. Alternatively, K values closer to zero correspond to a random or convergent pattern of evolution.
(Mammal.BodyMass.K <- phylosig(Mammal.CompData$phy,
Mammal.CompData$data$BodyMass,
method = "K", test = TRUE))
[1] "x has no names; assuming x is in the same order as tree$tip.label"
$K
[1] 0.9142054
$P
[1] 0.001
How about other traits?
(Mammal.HeadLength.K <- phylosig(Mammal.CompData$phy,
Mammal.CompData$data$HeadLength,
method = "K", test = TRUE))
[1] "x has no names; assuming x is in the same order as tree$tip.label"
$K
[1] 0.901623
$P
[1] 0.001
(Mammal.mTempRange.K <- phylosig(Mammal.CompData$phy,
Mammal.CompData$data$mTempRange,
method = "K", test = TRUE))
[1] "x has no names; assuming x is in the same order as tree$tip.label"
$K
[1] 0.3389474
$P
[1] 0.001
Look at how can we interprete K in the text above the code, and tell if there is phylogenetic signal for each one of the above traits.
Evolutionary models
Brownian motion model
For continuous traits, we commonly assume a Brownian motion (BM) model of evolutionary change. This model was proposed by Felsenstein (1985) and represents traits divergence independently across the evolutionary time, due to genetic drift (and sometimes other processes, such as natural selection), in resemblance to a random walk (or to a drunkard’s walk).
The BM model assumes that the correlation among trait values is proportional to the extent of shared ancestry for pairs of species.
It is a stochastic model in which changes from one time to the next are random draws from a normal distribution with mean \(0\) and variance \(σ^2 × Δt\). This means that variance under BM motion increases linearly through time with instantaneous rate \(σ^2\).
BM is a model that can be easily simulated:
# From Liam Revell's tutorial
t <- 0:100 # time
sig2 <- 0.01
## first, simulate a set of random deviates
x <- rnorm(n = length(t) - 1,
sd = sqrt(sig2))
## now compute their cumulative sum
x <- c(0, cumsum(x))
plot(t, x,
type = "l",
ylim = c(-2, 2))
Now, with multiple simulations and changing \(\sigma^2\):
nsim <- 100
X <- matrix(rnorm(n = nsim * (length(t) - 1),
sd = sqrt(sig2)),
nsim,
length(t) - 1)
X <- cbind(rep(0, nsim),
t(apply(X, 1, cumsum)))
par(mfrow = c(1,3))
plot(t, X[1, ],
xlab = "time",
ylab = "phenotype",
ylim = c(-2, 2),
type = "l")
apply(X[2:nsim, ], 1,
function(x, t)
lines(t, x), t = t)
NULL
##
X <- matrix(rnorm(n = nsim * (length(t) - 1),
sd = sqrt(sig2/10)),
nsim,
length(t) - 1)
X <- cbind(rep(0, nsim),
t(apply(X, 1, cumsum)))
plot(t, X[1, ],
xlab = "time", ylab = "phenotype",
ylim = c(-2, 2),
type = "l")
apply(X[2:nsim, ], 1,
function(x, t) lines(t, x), t = t)
NULL
##
nsim = 100
X <- matrix(rnorm(mean = 0.02,
n = nsim * (length(t) - 1), sd = sqrt(sig2/4)),
nsim, length(t) - 1)
X <- cbind(rep(0, nsim),
t(apply(X, 1, cumsum)))
plot(t, X[1, ],
xlab = "time", ylab = "phenotype",
ylim = c(-1, 3), type = "l")
apply(X[2:nsim, ], 1,
function(x, t) lines(t, x), t = t)
NULL
If we want to think on the base of a tree:
library(phytools)
t <- 100 # total time
n <- 30 # total taxa
b <- (log(n) - log(2))/t
tree <- pbtree(b = b, n = n, t = t, type = "discrete")
simulating with both taxa-stop (n) and time-stop (t) is
performed via rejection sampling & may be slow
18 trees rejected before finding a tree
## simulate Brownian evolution on a tree with fastBM
x <- fastBM(tree, sig2 = sig2, internal = TRUE)
## visualize Brownian evolution on a tree
phenogram(tree, x, spread.labels = TRUE, spread.cost = c(1, 0))
Ornstein-Uhlenbeck model
The Ornstein-Uhlenbeck (OU) evolutionary model is based on stabilizing selection (Hansen, 1997; but see Butler and King 2004), where a given species trait is determined by its common ancestral state (\(a\)) and the strengh of the selection (\(\alpha\)) that attracts traits towards its optimal value (\(\theta\)).
#Ornstein-Uhlenbeck (OU)
nsim=50
time.steps = 50
ou<-matrix(ncol=nsim,nrow=time.steps)
ou[1,]<-0
alpha=-0.05
for(i in 2:time.steps){
ou[i,] = ou[i-1,] + alpha*(ou[i-1,]) + rnorm(nsim, mean=0, sd=1)
}
plot(1:time.steps, ou[,1],
xlim=c(0,time.steps),
ylim=c(-max(abs(ou)),
max(abs(ou))),
type="n",
xlab="Time", ylab="Trait value",
bty="n")
matlines(1:time.steps,
ou,
lty=1,
col="#FF303080",
lwd=2)
We can again use rescale
on a tree to see how it stretches to an OU model:
tree <- rcoal(10)
par1 <- par(mfrow = c(1, 2))
plot(tree)
tree_scaled <- rescale(tree, model = "OU")
plot(tree_scaled(2))
par(par1)
Early-burst model
The early-burst evolutionary model, also known as ACDC model (accelerating-decelerating; Blomberg et al. 2003) simulates scenarios where evolutionary rates increase and decrease exponentially through time, under the model \(r[t] = r[0] * exp(a * t)\), where \(r[0]\) is the initial rate, \(a\) is the rate change parameter, and \(t\) is time.
Finally, we can rescale a tree to see an EB model of evolution:
We can see all three simulations together with this:
Fitting evolutionary models to trait data
Knowing if there is phylogenetic signal in traits is nice, but it is just knowing, right? Perhaps, a more informative way of assessing the phylogenetic non-independence in species traits is to properly test how traits fit to different models of trait evolution.
Here, we will use the functions fitContinuous
from geiger
to compare the relative fit of our data to different evolutionary models:
EvoModels <- c("BM",
"OU",
"EB")
fitEvoModels <- lapply(EvoModels,
FUN = function(x) fitContinuous(multi2di(Mammal.CompData$phy),
Mammal.CompData$data[, "BodyMass", drop = FALSE],
model = x,
ncores = 2))
Warning in fitContinuous(multi2di(Mammal.CompData$phy), Mammal.CompData$data[, : Parameter estimates appear at bounds:
a
Let’s take sometime to explore the results:
# checking the Brownian motion fit
fitEvoModels[[1]]
GEIGER-fitted comparative model of continuous data
fitted 'BM' model parameters:
sigsq = 0.007886
z0 = 2.479351
model summary:
log-likelihood = -71.785030
AIC = 147.570059
AICc = 147.696375
free parameters = 2
Convergence diagnostics:
optimization iterations = 100
failed iterations = 0
frequency of best fit = 1.00
object summary:
'lik' -- likelihood function
'bnd' -- bounds for likelihood search
'res' -- optimization iteration summary
'opt' -- maximum likelihood parameter estimates
head(fitEvoModels[[1]]$res)
sigsq lnL convergence
Brent 0.007885838 -71.78503 0
Brent 0.007885838 -71.78503 0
Brent 0.007885838 -71.78503 0
Brent 0.007885838 -71.78503 0
Brent 0.007885838 -71.78503 0
Brent 0.007885838 -71.78503 0
# checking the Brownian motion fit
fitEvoModels[[2]]
GEIGER-fitted comparative model of continuous data
fitted 'OU' model parameters:
alpha = 0.004448
sigsq = 0.009201
z0 = 2.422298
model summary:
log-likelihood = -71.112667
AIC = 148.225334
AICc = 148.480653
free parameters = 3
Convergence diagnostics:
optimization iterations = 100
failed iterations = 0
frequency of best fit = 0.51
object summary:
'lik' -- likelihood function
'bnd' -- bounds for likelihood search
'res' -- optimization iteration summary
'opt' -- maximum likelihood parameter estimates
head(fitEvoModels[[2]]$res)
alpha sigsq lnL convergence
subplex 4.448143e-03 9.200411e-03 -7.111267e+01 0
L-BFGS-B 1.561300e+00 1.741961e+00 -1.104545e+02 0
subplex 4.448143e-03 9.200411e-03 -7.111267e+01 0
L-BFGS-B 4.448381e-03 9.200600e-03 -7.111267e+01 0
subplex 7.637649e-217 7.885838e-03 -7.178503e+01 0
L-BFGS-B 1.934199e-17 1.261429e-204 -1.000000e+200 0
# checking the Brownian motion fit
fitEvoModels[[3]]
GEIGER-fitted comparative model of continuous data
fitted 'EB' model parameters:
a = -0.000001
sigsq = 0.007887
z0 = 2.479357
model summary:
log-likelihood = -71.785181
AIC = 149.570362
AICc = 149.825681
free parameters = 3
Convergence diagnostics:
optimization iterations = 100
failed iterations = 0
frequency of best fit = 0.07
object summary:
'lik' -- likelihood function
'bnd' -- bounds for likelihood search
'res' -- optimization iteration summary
'opt' -- maximum likelihood parameter estimates
head(fitEvoModels[[3]]$res)
a sigsq lnL convergence
subplex -1.000087e-06 7.886638e-03 -7.178518e+01 0
subplex -1.001614e-06 7.886636e-03 -7.178518e+01 0
subplex -1.001509e-06 7.887384e-03 -7.178518e+01 0
L-BFGS-B -7.826598e-02 1.409349e+22 -2.284690e+03 0
L-BFGS-B -5.717581e-02 1.815726e-180 -1.000000e+200 0
subplex -1.000000e-06 7.886869e-03 -7.178518e+01 0
AICs <- sapply(fitEvoModels,
FUN = function(x) x$opt$aicc)
names(AICs) <- EvoModels
# Which one is the best model?
(AICtable <- aicw(AICs))
-- Akaike weights --
Rank AIC diff wi AICw
1 1 148 0.000 1.000 0.495
2 2 148 0.784 0.676 0.334
3 3 150 2.129 0.345 0.171