Community Phylogenetics in R

Overview

Community Phylogenetics: linking community ecology and evolutionary biology to your programming skills

par | by Pedro Henrique P. Braga (Concordia University) & Katherine Hébert (Université de Sherbrooke)

How do species assemble in communities? What are the involved mechanisms? These questions have fascinated ecologists for decades. As the community assembly of organisms depends on historical and ecological phenomena, it is necessary to link evolutionary and ecological hypotheses testing to address these questions. By combining phylogenetic information and species geographical co-existence, ‘community phylogenetics’ emerged as field seeking to reveal patterns and processes driving the organization and structure of ecological communities. In this workshop, we will dive into the debates and controversies surrounding the community structure and learn some of the various approaches used by community phylogenetics to detect evolutionary patterns on community assembly.

Comment les espèces s’assemblent-elles dans les communautés ? Quels sont les mécanismes impliqués ? Ces questions ont fasciné des écologistes pendant des décennies! L’assemblage d’organismes dans les communautés écologiques dépendent de phénomènes historiques et écologiques. Il est donc nécessaire de relier et de tester des hypothèses évolutives et écologiques pour répondre à ces questions. En combinant les informations phylogénétiques avec la coexistence géographique des espèces, le champ de « phylogénie des communautés » est apparu afin de révéler les patrons et les processus menant à l’organisation et à la structure des communautés écologiques. Dans cet atelier, nous allons plonger dans les débats et les controverses entourant la structure des communautés et apprendre certaines des méthodes utilisées par la phylogénétique de la communauté pour détecter les patrons évolutives lors de l’assemblage des communautés des espèces.

Preamble

We developed this workshop and self-guided tutorial to help researchers acquire a short background on phylogenetic patterns of community structure.

It was originally instructed during the 3rd QCBS R Symposium, which happened between October 10th and 12th, 2019, at the Gault Nature Reserve (McGill University), in Mont-Saint-Hilaire, QC (Canada).

Part of the material was inspired in the content of the book Phylogenies in Ecology, by Marc W. Cadotte and Jonathan Davies (2016).

If you would like to provide feedback on this material, do not hesitate to get in touch!

We will highly appreciate your comments!

Kind regards,

Pedro Henrique P. Braga and Katherine Hébert
pedrohbraga.github.io | github.com/katherinehebert
ph.pereirabraga [at] gmail [dot] com | katherine.ahebert [at] gmail [dot] com

Learning Outcomes

By the end of this workshop, participants should be able to:

  1. Identify and prepare phylogenetic trees and community data structure to perform comparative analyses;
  2. Understand the most common community phylogenetic structure metrics;
  3. Perform analyses to extract patterns from the phylogenetic structure on ecological communities;
  4. Identify, understand and, when adequate, resolve common issues when performing performing community structure;
  5. Critically think about the assumptions related to linking ecological and evolutionary processes to community structure patterns.

Outline

The main steps, described bellow, are the following:

  1. Preparation for the tutorial
  2. Phylogenetic patterns as proxies of community assembly mechanisms
  3. A brief introduction to phylogenies and community data
    1. Phylogenies: terminology and basic manipulation
    2. Community data manipulation
  4. Within-assemblage phylogenetic structure
  5. Between-assemblage phylogenetic structure
  6. Other applications

Preparation for this tutorial

We kindly ask you to run the following code to download the datasets, and install and load the libraries we will use through this course:

# Check for needed packages, and install the missing ones
required.libraries <- c("ape", "picante", 
                        "pez", "phytools",
                        "vegan", "adephylo", 
                        "phylobase", "geiger", 
                        "mvMORPH", "OUwie", 
                        "hisse", "BAMMtools",
                        "phylosignal", "Biostrings",
                        "devtools","ggplot2", 
                        "kableExtra", "betapart", "gridExtra",
                        "reshape2")

needed.libraries <- required.libraries[!(required.libraries %in% installed.packages()[,"Package"])]
if(length(needed.libraries)) install.packages(needed.libraries)

# Load all required libraries at once
lapply(required.libraries, require, character.only = TRUE)

### Install ggtree from BiocManager

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ggtree")

set.seed(1)
# Download files from Marc Cadotte's and Jonathan Davies' book 
# "Phylogenies in Ecology: A guide to concepts and methods":

dir.create("data/Jasper")

download.file("https://raw.githubusercontent.com/pedrohbraga/PhyloCompMethods-in-R-workshop/master/data/Jasper/resources/data/Jasper/jasper_data.csv", 
              "data/Jasper/jasper_data.csv")

download.file("https://raw.githubusercontent.com/pedrohbraga/PhyloCompMethods-in-R-workshop/master/data/Jasper/resources/data/Jasper/jasper_tree.phy", 
              "data/Jasper/jasper_tree.phy")

# Download the tree files from this paper

# download.file("https://onlinelibrary.wiley.com/action/
# downloadSupplement?doi=10.1111%2Fj.1461-0248.2009.01307.x&attachmentId=179085359", 
#              destfile = "data/ele_1307_sm_sa1.tre")

# If the download of the above file does not work, 
# download the first supplementary material from the following paper
# and place it within your 'data' directory:
# https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2009.01307.x

Alternatively, you can download the .zip file from the Cadotte Urban Biodiversity & Ecosystem Services (CUBES) Laboratory and extract the files data/Jasper/jasper_tree.phy and data/Jasper/jasper_data.csv to your QCBS R Symposium workshop data directory.

Then, set your working directory to the same directory of the folder CommunityPhylogenetics using the function setwd(choose.dir()) or your prefered method.

A brief introduction to phylogenies

Phylogenetic trees (i.e., evolutionary tree or cladogram) are branching diagrams illustrating the evolutionary relationships among taxa.

Phylogenetic trees can be constructed based on species’ genes and/or on traits. Phylogenetic reconstruction through analyses of molecular sequences usually consists of three distinct procedures: (i) sequence alignment; (ii) character coding; and (iii) tree building. There is a range of available options for each of these steps, with no simple objective method for choosing between them.

The root of a phylogenetic tree indicates that an ancestral lineage gave rise to all organisms on the tree. A branch point indicates where two lineages diverged. A lineage that evolved early and remains unbranched is a basal taxon. When two lineages stem from the same branch point, they are sister taxa. A branch with more than two lineages is a polytomy. Extracted from: https://doi.org/10.1016/j.tree.2012.10.015

In this course, we will not cover the steps necessary to reconstruct phylogenies. We will assume that a phylogenetic relationship is already available for the taxa of interest.

ape and the phylo object in R

The core of phylogenetics analyses in R is the library ape - an acronym that stands for Analysis of Phylogenetics and Evolution. You can run the following commands to install, load and call help for ape:

# install.packages("ape")

library(ape)

# help(package = ape)

From ape, we depict two important functions to read and load trees into R: read.nexus() and read.tree():

read.nexus() reads NEXUS formatted trees, which is organized into major units known as blocks and usually have the following basic structure:

#NEXUS

[!This is the data file used for my dissertation]

BEGIN TAXA;
    TaxLabels Scarabaeus Drosophila Aranaeus;

END;

BEGIN TREES;
    Translate beetle Scarabaeus, fly Drosophila, spider Aranaeus;
    Tree tree1 = ((1,2),3);
    Tree tree2 = ((beetle,fly),spider);
    Tree tree3 = ((Scarabaeus,Drosophila),Aranaeus);
END;

read.tree reads Newick formatted trees, which commonly have the following format:

(Bovine:0.69395,(Hylobates:0.36079,(Pongo:0.33636,(G._Gorilla:0.17147, (P._paniscus:0.19268,H._sapiens:0.11927):0.08386):0.06124):0.15057):0.54939, Rodent:1.21460)

When phylogenies are created, loaded and handled in ape, their objects are represented as a list of class phylo.

Let’s try reading the following text string of a tree (is it in Newick or in NEXUS format?) and see what R tells us when we call it:

tree.primates <- read.tree(text="((((Homo:0.21, Pongo:0.21):0.28, Macaca:0.49):0.13, 
                           Ateles:0.62):0.38, Galago:1.00);")
tree.primates

Phylogenetic tree with 5 tips and 4 internal nodes.

Tip labels:
  Homo, Pongo, Macaca, 
Ateles, Galago

Rooted; includes branch lengths.

A class phylo object distributes the information of phylogenetic trees into six main components:

str(tree.primates)
List of 4
 $ edge       : int [1:8, 1:2] 6 7 8 9 9 8 7 6 7 8 ...
 $ edge.length: num [1:8] 0.38 0.13 0.28 0.21 0.21 0.49 0.62 1
 $ Nnode      : int 4
 $ tip.label  : chr [1:5] "Homo" "Pongo" "Macaca" "\nAteles" ...
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"

Since phylo is a list, we can access its elements by using the $ symbol. We can access the matrix edge by running phylo_object$edge, which has the beginning and ending node number for all the nodes and tips of a given tree. In summary, this allows us to keep track of the internal and external nodes of the tree:

tree.primates$edge
     [,1] [,2]
[1,]    6    7
[2,]    7    8
[3,]    8    9
[4,]    9    1
[5,]    9    2
[6,]    8    3
[7,]    7    4
[8,]    6    5

The vector tip.label element contains all the labels for all tips in the tree (i.e. the names of your OTUs):

tree.primates$tip.label
[1] "Homo"     "Pongo"    "Macaca"   "\nAteles" "Galago"  

There is a little error in the tip labels, right? We can easily rewrite the labels with this:

tree.primates$tip.label <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")

Another important element is the vector edge.length, which indicates the length of each edge with relation to the root or base of the tree:

tree.primates$edge.length
[1] 0.38 0.13 0.28 0.21 0.21 0.49 0.62 1.00

Finally, Nnode contains the number of internal nodes in the tree:

tree.primates$Nnode
[1] 4

One can see all the important information a phylo object carries from a tree by plotting it with all its elements:

plot(tree.primates, 
     edge.width = 2, 
     label.offset = 0.05, 
     type = "cladogram")

nodelabels()

tiplabels()

add.scale.bar()

Handling and preparing phylogenies

Now that we know the main characteristics of a phylogenetic tree and of a phylo object, let’s dive into more details on the structure of phylogenetic trees.

Dicotomous and polytomous trees

While some phylogenetic analyses are able to handle polytomies (i.e. more than two OTUs originating from the same node in a tree; or an unresolved clade), one may be interesting in solving the polytomies to produce a binary tree.

t1 <- read.tree(text = "((A, B, C), D);") 
# See how A, B and C are within the same round brackets, 
# *i.e.* departing from the same node.
plot(t1, 
     type = "cladogram")

One may check if a tree is binary by using is.binary function and, in case it is FALSE, randomly resolve polytomies using the multi2d function:

is.binary(t1)
[1] FALSE
t2 <- multi2di(t1)
plot(t2, 
     type = "cladogram")

is.binary(t2)
[1] TRUE

Ultrametric, non-ultrametric and uninformative trees

Ultrametric trees (or approximates of them) can be used to infer both braching and temporal patterns of evolutionary history for a given clade. For a tree to be ultrametric, its matrix has to follow the following strong assumptions:

  1. Each tip has to be uniquely labelled;
  2. Each internal node of the tree diverges towards at least two “children”, i.e. is dicotomous;
  3. Along any path from the root to a tip (or leaf), the numbers labelling internal nodes strictly decrease, i.e. time must strictly increase along the path from the root;
  4. The distance from the root to all tips of the tree is constant for all clades.

In many ultrametric trees, one can assume that all extant species are sampled at the present time, and branch lengths represent time calibrated in millions of years.

This information is obtained when we infer plausible history from data that reflects time since divergence (i.e. reconstruct evolutionary history based on the molecular clock theory). When a tree violates one of the above assumptions (especially, number 4), a tree is then considered to be non-ultrametric. For instance, a non-ultrametric tree will have branches with different distances towards the tips. While this information reflects different rates of evolution across the phylogeny, many analyses will require an ultrametric tree (i.e. where models assume constant variance and equal means at the tips).

In these cases, one may be interested in “smoothing out” these differences across the evolutionary history to transform a non-ultrametric tree into a ultrametric tree.

One method is to use the mean path length (MPL) that assumes random mutation and a fixed molecular clock to calculate the age of a node as the mean of all the distances from this node to all tips diverging from it. To run this, we use the ape function chronoMPL. MPL sometimes returns negative branch lengths, meaning that it should be used with caution.

Another function we can use is chronos, also from the ape package. chronos uses the penalized maximum likelihood method to estimate divergences times.

Let’s try these methods by first creating a random tree, then tansforming its edge lengths using both chronoMPL and chronos functions, and plotting them side-by-side:

tree.NonUlt <- rtree(7)
tree.Ult.MPL <- chronoMPL(tree.NonUlt)
tree.Ult.S <- chronos(tree.NonUlt)

Setting initial dates...
Fitting in progress... get a first set of estimates
         (Penalised) log-lik = -7.852074 
Optimising rates... dates... -7.852074 

log-Lik = -7.74304 
PHIIC = 49.53 
par(mfrow=c(1,3))
plot(tree.NonUlt, edge.width = 2,
     cex = 2,
     main = "A) Non-ultrametric",
     cex.main = 2)
plot(tree.Ult.MPL, edge.width = 2,
     cex = 2,
     main = "B) Ultrametric chronoMPL",
     cex.main = 2)
plot(tree.Ult.S, edge.width = 2,
     cex = 2,
     main = "C) Ultrametric chronos",
     cex.main = 2)

You may verify if any tree is ultrametric using is.ultrametric:

is.ultrametric(tree.NonUlt) 
[1] FALSE
is.ultrametric(tree.Ult.MPL) 
[1] TRUE
is.ultrametric(tree.Ult.S)
[1] TRUE

Sometimes, trees may have no branch lengths available, but only the divergence relationships. We often refer to these trees as being non-informative trees (despite the fact that they can still give us some information about branching events):

tree.NonInf <- rtree(7)
tree.NonInf$edge.length <- NULL
plot(tree.NonInf, edge.width = 2,
     cex = 2,
     main = "D) Uninformative tree",
     cex.main = 2)

The picante package

One of the core packages for hypothesis testing in ecophylogenetics is picante (Kembel et al. 2010). picante mainly works with three types of objects: the phylogeny data as a phylo class object; the community presence-absence (binary) or abundance matrix; and a species-trait matrix.

# install.packages("picante")
library(picante)

# help(package = picante)

Preparing data for community phylogenetic analyses

For this analysis, we will use the data on the Jasper Ridge plant communities available in the book “Phylogenies in Ecology”, written by Marc Cadotte and Jonathan Davies (2016).

JasperPlants.tree <- read.tree("data/Jasper/jasper_tree.phy")
JasperPlants.comm <- read.csv("data/Jasper/jasper_data.csv", 
                             row.names = 1)

Before proceeding to our analyses, we have to make sure that both the community and phylogenetic data match. For this, the names of the tips should match the species names in the dataset. This also means that species present in one dataset should not be absent in the other.

(You thought you were done preparing your dataset, eh?)

For this, we can use the function drop.tip from the ape package:

JasperPlants.cleanTree <- drop.tip(phy = JasperPlants.tree, 
                                   tip = setdiff(JasperPlants.tree$tip.label,
                                   colnames(JasperPlants.comm)))

You can also match your datasets using the function match.phylo.comm from picante, or the function comparative.comm from pez.

# Using picante
JasperPlants.picCleanTree <- match.phylo.comm(phy = JasperPlants.tree, 
                                              comm = JasperPlants.comm)$phy
JasperPlants.picCleanComm <- match.phylo.comm(phy = JasperPlants.tree, 
                                              comm = JasperPlants.comm)$comm

# Using pez
plot(JasperPlants.picCleanTree,
     cex = 0.4)

JasperPlants.picCleanComm[1:4, 1:4]
     Geranium_dissectum Erodium_botrys Anagallis_arvensis
J11                  20             10                  5
J110                  0             15                  0
J12                  20             20                  0
J13                   0             40                  0
     Epilobium_brachycarpum
J11                       0
J110                      0
J12                       1
J13                       0

Phylogenetic patterns as proxies of community assembly mechanisms

Ecologists share the goal of understanding and describing the processes that create the uneven distribution patterns of species, which create the variation in abundance, identity and diversity we know today.

Among many factors that determine the presence of taxa in communities are the abiotic and biotic conditions of such communities (e.g., Diamond 1975; Westoby and Wright 2006; but see Hubbell 2001).

When studying biodiversity, we are often interested in two components: alpha diversity (α) that measures the diversity intrinsic to each community or site and beta diversity (β) that reflects the differences among communities or sites.

Taxon diversity (TD) is the most quantified measure of diversity, but it gives an incomplete information because the evolutionary history underlying spatial patterns is ignored. Since many decades we recognize that many ecological patterns are processes are not independent of the evolution of the lineages involved in generating these patterns.

For each tree, the tick marks correspond to loss in PD if each species from area B is lost. The tick marks show how much PD is uniquely represented by that area. Looking at just taxonomic diversity cannot distinguish between the two scenarios because it ignores a critical aspect of the evolutionary context. Extracted from: https://link.springer.com/chapter/10.1007/978-3-319-22461-9_3

Since taxa that recently diverged tend to be ecologically similar (Darwin 1859; Lord et al. 1995; Wiens and Graham 2005), a direct link may exist between the evolutionary relatedness of organisms in a community, the characters they possess, and the ecological processes that determine their distribution and abundance.

In fact, many communities exhibit nonrandom patterns of evolutionary relatedness among co-occurring species (reviewed in Webb et al. 2002): a phenomenon we refer to as phylogenetic community structure.

Within-assemblage phylogenetic structure

Phylogenetic \(\alpha\) diversity (PD)

One simple way to measure evolutionary diversity within a community is through Daniel Faith’s PD metric (Faith, 1992; http://goo.gl/wM08Oy).

Faith’s PD (1992) sums the branch lengths of all co-occurring species in a given site, from the tips to the root of the phylogenetic tree.

Higher PD values are given to communities that has more evolutionary divergent taxa and older history, while lower PD values represent assemblages that have taxa with more recent evolutionary history.

. Extracted from: https://link.springer.com/chapter/10.1007/978-3-319-22461-9_3

Faith’s PD can be implemented using the pd function in the picante package.

In addition to Faith’s PD, the pd function also returns species richness (SR).

SR is the same as observed species richness (\(S_{obs}\)) or taxonomic \(\alpha\) diversity.

Jasper.PD <- pd(samp = JasperPlants.picCleanComm, 
                tree = JasperPlants.picCleanTree,
                include.root = FALSE)

head(Jasper.PD) # See the first six rows of the resulting matrix
           PD SR
J11  831.4887 13
J110 501.8703  9
J12  769.9249 14
J13  712.0532 13
J14  640.5896 11
J15  640.2884 13

And how do PD estimates compare with SR?

cor.test(Jasper.PD$PD, Jasper.PD$SR)

    Pearson's product-moment correlation

data:  Jasper.PD$PD and Jasper.PD$SR
t = 9.0965, df = 28, p-value = 7.457e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7321097 0.9338472
sample estimates:
      cor 
0.8643903 
plot(Jasper.PD$PD, 
     Jasper.PD$SR, 
     xlab = "Phylogenetic Diversity", ylab = "Species Richness", 
     pch = 16)

With this, we can see that interpreting PD across sites that have different richness can be complicated. To overcome this problem, we can compare PD values to those expected by cahnce. To do this, we can use null models that will randomize some of the aspects of the assemblages or of the evolutionary relationship between species.

Randomizations and null models

We are often interested in assessing whether or not observed patterns differ from null expectation. picante has many functions that allow us to test hypotheses under different types of null models:

Null Model Description
taxa.labels Shuffles taxa labels across tips of phylogeny (across all taxa included in phylogeny)
richness Randomizes community data matrix abundances within samples (maintains sample species richness)
frequency Randomizes community data matrix abundances within species (maintains species occurence frequency)
sample.pool Randomizes community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability
phylogeny.pool Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability
independentswap Randomizes community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness
trialswap Randomizes community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness

The steps are straight-forward and are applicable to multiple frameworks!

We can take a very simple null model that will randomly shuffle the tip labels on the phylogeny (corresponding to the taxa.labels in picante). After shuffling the tips of our phylogeny, we will recalculate our metrics (in this case PD). We will save this information and repeat the previous steps an \(n\) number of times, until we have our distribution of null values.

We can then use these values to obtain a standardized metric by calculating an effect size against the null values using a \(z\)-value. This \(z\) value will be a dimensionless value that will indicate the number of standard deviations an observation is from the mean of the distribution. We can calculate it by:

\(z = \frac{x - \mu}{\sigma}\)

where \(x\) is our observation (for example, the PD for a given community), \(\mu\) is the mean and \(sigma\) is the standard deviation of the null or random distribution.

If the value that we obtain from \(z\) is positive, it means that the number of standard deviations are above the mean; and if it is negative, the number of standard deviations are below the mean.

Extracted from: http://arpha.pensoft.net//showfigure.php?filename=singlefig_91657.jpg

Thus, the standardized value for PD is:

\[z_{PD} = \frac{PD_{obs} - \mu_{PD_{null}}}{\sigma_{PD_{null}}}\]

We can calculate the standardized effect size (ses) of PD for each community using the ses.pd() function in picante:

# This is a computationally intensive process and take some time to run
Jasper.ses.pd <- ses.pd(samp = JasperPlants.picCleanComm, 
                        tree = JasperPlants.picCleanTree, 
                        null.model = "taxa.labels", 
                        runs = 1000)
head(Jasper.ses.pd)
     ntaxa   pd.obs pd.rand.mean pd.rand.sd pd.obs.rank   pd.obs.z   pd.obs.p
J11     13 831.4887     793.6866   62.90851         733  0.6009067 0.73226773
J110     9 501.8703     622.8810   61.81899          28 -1.9575002 0.02797203
J12     14 769.9249     829.1513   62.50560         174 -0.9475372 0.17382617
J13     13 712.0532     791.1854   65.65384         122 -1.2052945 0.12187812
J14     11 640.5896     709.7741   62.17668         142 -1.1127091 0.14185814
J15     13 640.2884     793.9467   64.83166          16 -2.3701116 0.01598402
     runs
J11  1000
J110 1000
J12  1000
J13  1000
J14  1000
J15  1000

Reflection time!

At the community level, phylogenetic clustering which is often regarded as evidence of environmental filtering, while phylogenetic overdispersion is often associated to limiting similarity. Nevertheless, these hypothetical links assume many ecological and evolutionary processes,may be weakened when there is a disparity between observed patterns and the assumed on-going processes. As a consequence, no one should interprete non-random phylogenetic patterns without care and detailed investigation.

Phylogenetic Community Relatedness

Another way of looking into the phylogenetic structure of communities is to quantify to what degree closely related taxa co-occur.

The basis for patterns of phylogenetic community structure is usually linked to a simple framework (table 1; Webb et al. 2002).

When species traits responsible for their physiological tolerances are conserved, an environmental filtering that limits the range of viable ecological strategies at a given site is expected to select co-occurring species more related than expected by chance, i.e. generate a pattern known as phylogenetic clustering.

On the other hand, competitive exclusion can limit the ecological similarity of co-occurring species, generating a pattern of phylogenetic overdispersion or phylogenetic evenness.

Extracted from: https://eliotmiller.weebly.com/research.html

Finally, when traits are diverging faster across the evolutionary time, the effects of habitat filtering should be weaker, producing evenly dispersed patterns of relatedness; while competition or limiting similarity is expected to produce random or clustered patterns.

If communities are assembled independently with respect to traits (e.g., Hubbell 2001), then patterns of relatedness should be resemble random expectation.

Assembly.process Under.niche.conservatism Under.niche.divergence
Limiting similarity Even dispersion Random or clustered dispersion
Habitat filtering Clustered dispersion Even dispersion
Neutral assembly Random dispersion Random dispersion

Over the recent years, other processes have been shown to drive phylogenetic patterns of community structure, such as, isolation, time since isolation, diversification, scale-dependent assembly and local and regional composition and richness.

Phylogenetic resemblance matrix

An important step prior to the estimation of the phylogenetic structure of communities is to create a phylogenetic resemblance matrix, or a phylogenetic variance-covariance matrix. A phylogenetic variance-covariance matrix is nothing more than the calculated distances between taxa in a tree.

Extracted from: https://royalsocietypublishing.org/doi/10.1098/rstb.2012.0341

In R, we commonly refer to this matrix as a matrix of cophenetic distances among taxa and we can use the function cophenetic.phylo from picante to calculate it:

# Create a Phylogenetic Distance Matrix {picante}

JasperPlants.cophenDist <- cophenetic.phylo(JasperPlants.picCleanTree)

Net relatedness index (NRI)

One common metric to evaluate the phylogenetic structure of commmunities is the net relatedness index (NRI), which is based on the mean phylogenetic distances (MPD) calculated from the pairwise cophenetic branch lengths distances of the co-occurring taxa in each community.

NRI is obtained by multiplying the standardized effect size mean phylogenetic (pairwise) distances (calculated via a null model of interest) by \(-1\).

Negative NRI values indicate phylogenetic overdispersion, i.e. co-occurring taxa is less closely related than expected by a given null model; while positive NRI values represent phylogenetic clustering, i.e. co-occurring taxa are more closely related than expected by a null model.

In R, we will use the function ses.mpd from picante to calculate the NRI of our communities:

# Estimate Standardized Effect Size of the MPD on the cophenetic matrix

JasperPlants.ses.mpd <- ses.mpd(JasperPlants.picCleanComm, 
                                JasperPlants.cophenDist, 
                                null.model = "taxa.labels", 
                                abundance.weighted = FALSE, 
                                runs = 100)

head(JasperPlants.ses.mpd)
     ntaxa  mpd.obs mpd.rand.mean mpd.rand.sd mpd.obs.rank  mpd.obs.z
J11     13 199.4659      194.9025    8.669032           68  0.5264091
J110     9 146.6226      194.1784   12.903549            2 -3.6854782
J12     14 195.5006      193.8915    8.638529           51  0.1862731
J13     13 192.1528      194.6418    8.541832           35 -0.2913844
J14     11 185.2070      194.8345   10.128520           13 -0.9505355
J15     13 185.1665      194.2807    7.954847           14 -1.1457422
      mpd.obs.p runs
J11  0.67326733  100
J110 0.01980198  100
J12  0.50495050  100
J13  0.34653465  100
J14  0.12871287  100
J15  0.13861386  100
# Calculate NRI

JasperPlants.NRI <- as.matrix(-1 * 
                                ((JasperPlants.ses.mpd[,2] - JasperPlants.ses.mpd[,3]) /
                                   JasperPlants.ses.mpd[,4]))
rownames(JasperPlants.NRI) <- row.names(JasperPlants.ses.mpd)
colnames(JasperPlants.NRI) <- "NRI"

head(JasperPlants.NRI)
            NRI
J11  -0.5264091
J110  3.6854782
J12  -0.1862731
J13   0.2913844
J14   0.9505355
J15   1.1457422

Nearest taxon index (NTI)

Another way to assess the phylogenetic structure of communities is through the calculation of the nearest taxon index (NTI). NTI differs from NRI by being based on the mean nearest neighbour phylogenetic distance (MNTD), instead of the overall mean pairwise distances (MPD).

MNTD is obtained by calculating the mean phylogenetic distance of each taxon of a given community to its nearest neighbour in a tree. By doing this, one can observe that NTI is more sensitive to the clustering or overdispersion patterns happening closer to the tips of a tree, while NRI represents an overall pattern of structuring across the whole phylogenetic tree.

As in NRI, higher values of NRI indicate phylogenetic clustering and lower values of NRI indicate phylogenetic overdispersion.

# Estimate Standardized Effect Size of the MNTD on the cophenetic matrix

JasperPlants.ses.mntd <- ses.mntd(JasperPlants.picCleanComm, 
                                  JasperPlants.cophenDist, 
                                  null.model = "taxa.labels", 
                                  abundance.weighted = FALSE, 
                                  runs = 100) 

head(JasperPlants.ses.mntd)
     ntaxa mntd.obs mntd.rand.mean mntd.rand.sd mntd.obs.rank mntd.obs.z
J11     13 76.65059       79.27968     12.27747            47 -0.2141392
J110     9 71.32136       88.93312     17.20764            17 -1.0234849
J12     14 67.04067       76.60485     11.94680            24 -0.8005648
J13     13 65.27537       79.56577     12.58896            15 -1.1351535
J14     11 74.54219       83.60746     14.28146            24 -0.6347575
J15     13 47.69848       79.28997     11.10752             1 -2.8441529
     mntd.obs.p runs
J11  0.46534653  100
J110 0.16831683  100
J12  0.23762376  100
J13  0.14851485  100
J14  0.23762376  100
J15  0.00990099  100
# Calculate NTI

JasperPlants.NTI <- as.matrix(-1 * 
                                ((JasperPlants.ses.mntd[,2] - JasperPlants.ses.mntd[,3]) / 
                                   JasperPlants.ses.mntd[,4]))
rownames(JasperPlants.NTI) <- row.names(JasperPlants.ses.mntd)
colnames(JasperPlants.NTI) <- "NTI"

head(JasperPlants.NTI)
           NTI
J11  0.2141392
J110 1.0234849
J12  0.8005648
J13  1.1351535
J14  0.6347575
J15  2.8441529

We can then compare between the wet and dry Jasper Ridge plots!

Let us do it with MPD!

JasperPlants.ses.mpd$habitat <- c(rep("wet", 15), 
                                  rep("dry", 15))

# Test whether differences between wet and dry habitats are significant

t.test(JasperPlants.ses.mpd$mpd.obs.z ~ JasperPlants.ses.mpd$habitat)

    Welch Two Sample t-test

data:  JasperPlants.ses.mpd$mpd.obs.z by JasperPlants.ses.mpd$habitat
t = 1.4657, df = 27.149, p-value = 0.1542
alternative hypothesis: true difference in means between group dry and group wet is not equal to 0
95 percent confidence interval:
 -0.2152359  1.2926470
sample estimates:
mean in group dry mean in group wet 
       -0.1810248        -0.7197303 
# Compare ses.mpd.z between habitats

plot.new()
boxplot(mpd.obs.z ~ habitat, 
     data = JasperPlants.ses.mpd, 
     xlab = "Habitat", 
     ylab = "SES(MPD)")
abline(h = 0, col = "gray")

And now, with MNTD:

JasperPlants.ses.mntd$habitat <- c(rep("wet", 15), 
                                   rep("dry", 15))

# Compare ses.mpd.z between habitats
plot.new()

# Test whether differences between wet and dry habitats are significant
t.test(JasperPlants.ses.mntd$mntd.obs.z ~ JasperPlants.ses.mntd$habitat)

    Welch Two Sample t-test

data:  JasperPlants.ses.mntd$mntd.obs.z by JasperPlants.ses.mntd$habitat
t = -0.35446, df = 20.711, p-value = 0.7266
alternative hypothesis: true difference in means between group dry and group wet is not equal to 0
95 percent confidence interval:
 -0.8973162  0.6361631
sample estimates:
mean in group dry mean in group wet 
       -1.0850761        -0.9544996 
boxplot(mntd.obs.z ~ habitat, 
     data = JasperPlants.ses.mntd, 
     xlab = "Habitat", 
     ylab = "SES(MNTD)")
abline(h = 0, col = "gray")

Between-assemblage phylogenetic structure

Phylogenetic beta diversity (PBD)

One may not be interested only in the patterns of \(\alpha\) diversity, but in understanding how communities are evolutionary distinct from each other, i.e. knowing the pairwise phylogenetic \(\beta\) diversity.

You might already know \(\beta\)-diversity as a measure of the dissimilarity between communities in terms of their composition. As you might guess, phylogenetic beta diversity is a measure of the dissimilarity between communities in terms of the evolutionary lineages they contain.

Evaluating phylogenetic \(\beta\)-diversity can provide insight about where communities differ in terms of their evolutionary histories, and even about which clades are driving these differences.

PhyloSor

Many metrics are available to assess the phylogenetic \(\beta\) diversity of communities. Here, we will start with the PhyloSor index, which is the phylogenetic analog of the Sørensen index (Bryant et al., 2008; Swenson, 2011; Feng et.al., 2012). It is based on the total length of branches shared and unshared between paired communities.

PhyloSor can be calculated using the function phylosor from picante:

Let’s first import the data:

# Import site by species matrix
comm <- read.csv("data/Jasper/jasper_data.csv", row.names = 1)

# Import phylogeny
phy <- read.tree("data/Jasper/jasper_tree.phy")

Then, we can calculate and visualize PhyloSor:

# Calculate observed PhyloSor
jasper.sor <- phylosor(samp = comm, 
                       tree = phy)

# Let's take a look! 
# (note: darker colour = more dissimilar)
heatmap(as.matrix(jasper.sor), 
        Rowv = NA, 
        Colv = NA) # without default clustering 

Standardized effect size (SES_PhyloSor)

We can go one step further and test whether the observed PBD values differ from a null expectation, using null models.

To do this, we will follow this flow chart, which outlines the process of calculating standardized effect sizes of phylogenetic \(\beta\)-diversity measures (in this case, PhyloSor).

Flow chart by Leprieur et al. (2012)

First, we will use the function phylosor.rnd to obtain PhyloSor values of phylogenetic \(\beta\)-diversity obtained by randomizing taxa labels on the phylogeny:

# generate 100 null expectations of Phylosor

jasper.sor.rndTaxa <- phylosor.rnd(samp = comm, 
                                   tree = phy,
                                   cstSor = TRUE,
                                   null.model = "taxa.labels", # Pay attention to the null models!
                                   runs = 100)

We then need to test whether our observed phylogenetic \(\beta\)-diversity differs from this random expectation. There is no standardized effect size function available within picante, so we will create our own null model to test our hypothesis:

# create function to calculate the standardized effect size of a phylogenetic beta-diversity metric. In this case, we are using it for PhyloSor.
# (this is modelled after ses.pd by Pedro!)
ses.PBD <- function(obs, rand){
  
  # first, we make sure our observed PhyloSor values are numeric
  pbd.obs <- as.numeric(obs) 
  
  # then, we take the mean of the 100 null expectations we generated
  rand <- t(as.data.frame(lapply(rand, as.vector)))
  pbd.mean <- apply(rand, MARGIN = 2, 
                    FUN = mean, 
                    na.rm = TRUE)
  # as well as their standard deviation
  pbd.sd <- apply(rand, 
                  MARGIN = 2, 
                  FUN = sd, 
                  na.rm = TRUE)
  
  # now, we can calculate the standardized effect size (SES)!
  pbd.ses <- (pbd.obs - pbd.mean)/pbd.sd
  
  # rank observed PhyloSor (we use this to calculate p-values for SES)
  pbd.obs.rank <- apply(X = rbind(pbd.obs, rand), 
                        MARGIN = 2, 
                        FUN = rank)[1, ]
  pbd.obs.rank <- ifelse(is.na(pbd.mean), NA, pbd.obs.rank)
  
  # return results in a neat dataframe
  data.frame(pbd.obs, 
             pbd.mean, 
             pbd.sd, 
             pbd.obs.rank, 
             pbd.ses, 
             pbd.obs.p = pbd.obs.rank/(dim(rand)[1] + 1))
}

Using this function, we can determine whether communities are:

  • more dissimilar than expected by chance (i.e. obs > mean(null) = positive SES_PhyloSor)
  • more similar than expected by chance (i.e. obs < mean(null) = negative SES_PhyloSor)
  • randomly dissimilar, i.e. as dissimilar you would expect by random chance (obs = mean(null) = 0 PhyloSor).

In other words, this measure can tell you whether community structure is random or not - so, you can find out whether some evolutionary (and/or ecological) processes are shaping the \(\beta\)-diversity you’ve observed.

# calculate the standardized effect size of Phylosor (SES_PhyloSor) using our function  
jasper.ses.sor  <- ses.PBD(obs = jasper.sor,
                           rand = jasper.sor.rndTaxa)
pbd.obs pbd.mean pbd.sd pbd.obs.rank pbd.ses pbd.obs.p
45 0.527 0.573 0.052 22 -0.887 0.218
46 0.647 0.594 0.053 87 1.001 0.861
47 0.589 0.575 0.055 63 0.260 0.624
48 0.628 0.618 0.055 56 0.187 0.554
49 0.652 0.732 0.034 1 -2.369 0.010
50 0.687 0.701 0.050 38 -0.273 0.376
51 0.770 0.756 0.043 65 0.316 0.644
52 0.671 0.716 0.041 13 -1.089 0.129
53 0.665 0.684 0.047 35 -0.406 0.347
54 0.575 0.638 0.052 12 -1.214 0.119
55 0.649 0.664 0.046 34 -0.331 0.337

Let’s also visualize SES_PhyloSor as a heatmap!

# note: converting a vector to a distance matrix can be tricky, so let's
# use our PhyloSor matrix as a template, since it already has row/column
# names in the right places:
jasper.ses.sor.m <- jasper.sor 
# assign SES_PhyloSor value to each cell
for(i in 1:length(jasper.sor)) {jasper.ses.sor.m[i] <- jasper.ses.sor$pbd.ses[i]}

# plot heatmap using ggplot this time!
library(ggplot2)
library(reshape2)
jasper.ses.sor.m %>% as.matrix() %>% melt() %>% # manipulate data into a format that works for ggplot
  ggplot() + 
  geom_tile(aes(x = Var1, y = Var2, fill = value)) + # create the heatmap
  scale_fill_gradient2(low = "blue", mid = "white", high = "red") + # create a diverging color palette
  xlab("sites") + ylab("sites") + labs(fill = "SES_PhyloSor") + # edit axis and legend titles
  theme(axis.text.x = element_text(angle = 90)) # rotates x axis labels

Looking at this heatmap, we can see that some communities are a lot more dissimilar than expected by chance (high positive SES_PhyloSor), such as J25-J32. What processes could explain this higher-than-expected \(\beta\)-diversity?

Some communities are also more similar than expected by chance (very negative SES_PhyloSor), like J26-J27. What processes could explain this lower-than-expected \(\beta\)-diversity?

UniFrac

UniFrac is another presence-absence phylogenetic \(\beta\)-diversity metric, which measures the unique fraction of the phylogeny that is unshared between a pair of communities. Like PhyloSor, this metric is mostly sensitive to dissimilarity at the tips of the phylogeny, and does not easily pick up on basal clades.

To compute the UniFrac metric, we can use the function unifrac in the package picante.

jasper.uni <- unifrac(comm = comm, tree = phy)

Let’s compare UniFrac to PhyloSor!

library(reshape2)
sor.heatmap <- jasper.sor %>% as.matrix %>% melt() %>%
                  ggplot() + geom_tile(aes(x = Var1, y = Var2, fill = value)) +
                  ggtitle("PhyloSor") + xlab("sites") + ylab("sites")
uni.heatmap <- jasper.uni %>% as.matrix %>% melt() %>%
                  ggplot() + geom_tile(aes(x = Var1, y = Var2, fill = value)) +
                  ggtitle("UniFrac") + xlab("sites") + ylab("sites")
gridExtra::grid.arrange(sor.heatmap, uni.heatmap, ncol = 2)

Exercise: SES_UniFrac

There is no function to calculate the standardized effect size of UniFrac in one shot. However, it is possible to write the code for this yourself! For this exercise, we challenge you to calculate SES_UniFrac and plot a heatmap showing the results.

The first step would be to randomize the community matrix (50 randomizations - it’s a bit long to do!). You can do this with the function randomizeMatrix from the picante package. We suggest you choose a null model that randomly shuffles species across sites while maintaining the species richness of each site (hmm.. which one could that be? ?randomize_matrix).

# create a list to store of randomized community matrics 
jasper.rnd <- list()
for(i in 1:50){ # randomize the community 50 times
  jasper.rnd[[i]] <- randomizeMatrix(comm, null.model = "richness")
}

Now that we have our random community matrices, let’s calculate UniFrac for each of these!

jasper.uni.rnd <- lapply(jasper.rnd, unifrac, tree = phy)

We can then use the ses.PBD function we created earlier to calculate a standardized effect size for UniFrac.

jasper.ses.uni <- ses.PBD(obs = jasper.uni, rand = jasper.uni.rnd)
pbd.obs pbd.mean pbd.sd pbd.obs.rank pbd.ses pbd.obs.p
45 0.642 0.567 0.092 37 0.819 0.725
46 0.522 0.561 0.080 16 -0.492 0.314
47 0.583 0.578 0.078 27 0.055 0.529
48 0.542 0.565 0.078 17 -0.296 0.333
49 0.517 0.569 0.086 10 -0.600 0.196
50 0.476 0.573 0.071 5 -1.373 0.098
51 0.375 0.559 0.084 2 -2.188 0.039
52 0.495 0.585 0.064 7 -1.422 0.137
53 0.502 0.564 0.069 10 -0.909 0.196
54 0.596 0.571 0.076 26 0.332 0.510
55 0.520 0.559 0.077 13 -0.506 0.255

Now, let’s visualize SES_UniFrac as a heatmap!

# note: converting a vector to a distance matrix can be tricky, so let's
# use our UniFrac matrix as a template, since it already has row/column
# names in the right places:
jasper.ses.uni.m <- jasper.uni 
# assign UniFrac value to each cell
for(i in 1:length(jasper.uni)) {jasper.ses.uni.m[i] <- jasper.ses.uni$pbd.ses[i]}

# plot heatmap using ggplot
jasper.ses.uni.m %>% as.matrix() %>% melt() %>% # manipulate data into a format that works for ggplot
  ggplot() + 
  geom_tile(aes(x = Var1, y = Var2, fill = value)) + # create the heatmap
  scale_fill_gradient2(low = "blue", mid = "white", high = "red") + # create a diverging color palette
  xlab("sites") + ylab("sites") + labs(fill = "SES_UniFrac") + # edit axis and legend titles
  theme(axis.text.x = element_text(angle = 90)) # rotates x axis labels

There’s a lot of negative SES_UniFrac values here! What does that tell us about the phylogenetic dissimilarity between sites?

Decomposing phylogenetic beta diversity

Turnover & nestedness

The beta diversity we’ve been measuring can actually be broken down into two components (Baselga, 2010; Leprieur et al., 2012):

Nestedness: species loss/gain between sites attributable to differences in richness.

Turnover: the replacement of species between sites.

To illustrate this, let’s imagine three islands with different sets of species on them - like these!

Kai Bae View Point, Ko Chang, Thailand

If the \(\beta\)-diversity between islands is completely due to nestedness, you would find that the communities on species-poor islands are always subsets of the more species-rich islands’ communities (sites A1-A3 below).

If the \(\beta\)-diversity between islands is completely due to turnover, communities on each island would be a completely different set of species (sites B1-B3 below).

If the \(\beta\)-diversity between islands is due to both nestedness and turnover, each island community might contain a subset of species in common with species-rich islands, as well as a set of species that is completely replaced between sites (sites C1-C3 below).

Modified from Fig. 1 in Baselga (2010)


So, what is the point of breaking \(\beta\)-diversity down into these 2 components?

Overall, doing this allows you to better understand how communities differ in their composition, which provides insight about the assembly processes that are shaping them!

Turnover allows us to test hypotheses about why some species are replaced between sites. Is it because they are “filtered” by local abiotic and/or biotic conditions? Is it because they cannot disperse between sites? Are there historical barriers?

Nestedness allows us to test hypotheses about why certain species are “lost” or “gained” between sites. Is it because selective extinctions have occurred in some sites? Is it because a few species are very good dispersers and can therefore be anywhere?

Decomposing PhyloSor

Now that we’ve established what turnover and nestedness are, let’s move on to actually breaking down the \(\beta\)-diversity we’ve observed into these two components. Baselga (2010) proposed an additive partitioning framework that decomposes the pair-wise Sørensen’s dissimilarity index into two additive components representing (i) ‘true’ turnover of species and (ii) richness differences between nested communities. In 2012, Leprieur et al. (2012) extended Baselga’s (2010) framework to decompose PhyloSor into two components accounting for ‘true’ phylogenetic turnover and phylogenetic diversity gradients, respectively.

Baselga (2012) published the package betapart which allows us to do this decomposition in R, with the function phylo.beta.pair(). (Note: the betapart package also works for taxonomic and functional \(\beta\)-diversity!)

First, we need to convert our site x species matrix to presences/absences.

preabs <- comm
preabs[preabs > 0] <- 1 

Now, we can calculate PhyloSor and its decomposition into turnover and nestedness:

library(betapart)
phylosor.betapart <- phylo.beta.pair(x = preabs, 
                                tree = phy, 
                                index.family = "sorensen")

We now have 3 matrices:
-phylo.beta.sim: turnover distance matrix
-phylo.beta.sne: nestedness distance matrix
-phylo.beta.sor: PhyloSor distance matrix

Let’s take a look at heatmaps of the first 2 matrices:

# Turnover matrix:
phylosor.turn <- phylosor.betapart$phylo.beta.sim %>% 
  as.matrix() %>% melt() %>% 
  ggplot() + geom_tile(aes(x = Var1, y = Var2, fill = value)) + # create the heatmap
  xlab("sites") + ylab("sites") + labs(fill = "Turnover") + # edit axis and legend titles
  scale_fill_viridis_c(option = "plasma", limits = range(0,0.5)) +
  theme(axis.text.x = element_text(angle = 90)) # rotates x axis labels

# Nestedness matrix:
phylosor.nest <- phylosor.betapart$phylo.beta.sne %>% 
  as.matrix() %>% melt() %>% 
  ggplot() + geom_tile(aes(x = Var1, y = Var2, fill = value)) + # create the heatmap
  xlab("sites") + ylab("sites") + labs(fill = "Nestedness") + # edit axis and legend titles
    scale_fill_viridis_c(option = "plasma", limits = range(0,0.5)) +
  theme(axis.text.x = element_text(angle = 90)) # rotates x axis labels

# plot both heatmaps next to each other
gridExtra::grid.arrange(phylosor.turn, phylosor.nest, ncol = 2)

We can also use a violin plot to look at the distribution of turnover and nestedness values, as well as their median:

# create dataframe
df <- data.frame(turnover = as.vector(phylosor.betapart$phylo.beta.sim), 
                 nestedness =as.vector(phylosor.betapart$phylo.beta.sne))
# create violin plot
df %>% melt() %>% ggplot() + 
  geom_violin(aes(x = variable, y = value, fill = variable), 
              draw_quantiles = 0.5) + # show median
  theme_classic()

In both plots, we can see that turnover is contributing more to the \(\beta\)-diversity between sites, compared to nestedness. This means species are usually being replaced between sites - and you could test some hypotheses to explain why that is!

Going further: Applications of these methods

This tutorial has mainly covered how to measure phylogenetic \(\alpha\) and \(\beta\) diversity. This is often only the first step of many studies. Here are some examples of things that people have done, and some ideas for things you could do with your own work!

1. Identifying hotspots of phylogenetic diversity for conservation.

Making decisions and laws to govern the conservation of biodiversity is often difficult, because there are so many different aspects of diversity than can be prioritized. Much of the time, the goal is to protect areas containing certain key species, or to protect the most species-rich communities, to ensure efficient use of conservation resources. Community phylogenetics can add further insight into this decision-making if the goal is to account for evolutionary history.

For example, some conservation projects might prioritize communities that harbour a high diversity of evolutionary lineages (i.e. phylogenetic \(\alpha\)-diversity), because these areas represent many long and irreplaceable evolutionary histories. Phylogenetic \(\beta\)-diversity would provide a metric to detect communities with more “unique” lineages - or more “evolutionary rare” communities - within the context of the region under assessment. Both of these would simply require spatial coordinates for the sites in our site x species matrix.

There are already several examples of this - here’s one that uses phylogenetic \(\beta\)-diversity to determine the spatial mismatch between protected areas and phylogenetic “hotspots”:

Devictor et al. (2013), Fig. 3: Spatial distribution of phylogenetic diversity and its relative turnover.

2. Testing theories of how communities are assembled.

For a long time, our understanding of community assembly - or how species come to colonize, persist, and coexist with other species in a community - was mostly limited to how ecological processes (competitive exclusion, environmental filtering, dispersal limitation, etc.) shape species composition. As community phylogenetics have evolved, community ecologists have been able to test hypotheses about how evolutionary processes shape phylogenetic community composition and structure as well.

For example, you can assess the relationship between phylogenetic \(\beta\)-diversity and the environmental distances between sites to determine the importance of environmental filtering over evolutionary time scales (which would be driven by mechanisms like adaptation and speciation). You can also assess this relationship between \(\beta\)-diversity and spatial distances between sites, which would give you insight into how dispersal limitation has shaped communities via its influence on adaptation, speciation, and extinctions too.

3. Delineating ecoregions or biomes.

Community phylogenetics have been increasingly used to reevaluate and delineate biome and ecoregion boundaries, so they can better reflect the evolutionary history that has shaped the distinctiveness of regional biotas. This is called phylogenetic regionalization.

Phylogenetic \(\beta\)-diversity measures like PhyloSor or turnover allow you to assess the dissimilarities between biotas that can determine biome or regional boundaries based on evolutionary distinctiveness.

For example, Holt et al. (2013) used phylogenetic \(\beta\)-diversity to reevaluate Wallace’s zoogeographic regions worldwide for birds, mammals, and amphibians:

Holt et al. 2013, Fig. 1: Map of the terrestrial zoogeographic realms and regions of the world based on analytical clustering of phylogenetic turnover of assemblages of 21,037 species of amphibians, nonpelagic birds, and nonmarine mammals worldwide.



Daru et al. (2016) also used phylogenetic turnover to delineate regions of evolutionary distinctiveness of woody plants in southern Africa:

Evolutionary distinctiveness of woody plant species within the 15 phyloregions, quantified as the mean of pairwise PBSim values between each phyloregion and all other phyloregions.




Here’s another example of phylogenetic regionalization, this time delineating marine plant regions (Daru et al. 2017):

Relationships among marine phyloregions based on phylogenetic turnover (pβsim) of seagrass species worldwide (a) in geographic space, (b) in NMDS ordination space.

Thank you!

Thank you for attending and participating in this workshop about community phylogenetics!

We hope you now have a working grasp of how to prepare and use phylogenetic trees and community data. You now have the conceptual and analytical tools to evaluate metrics of phylogenetic community structure and composition, both at the alpha and beta levels!

Now, go forth and conquer (and happy Thanksgiving)!

References

Ackerly, D. D. and Reich, P. B. (1999), Convergence and correlations among leaf size and function in seed plants: a comparative test using independent contrasts . Am. J. Bot., 86: 1272-1281. doi:10.2307/2656775

Ackerly, D. D., Schwilk, D. W. and Webb, C. O. (2006), NICHE EVOLUTION AND ADAPTIVE RADIATION: TESTING THE ORDER OF TRAIT DIVERGENCE. Ecology, 87: S50-S61.

Cavender‐Bares, J. , Kozak, K. H., Fine, P. V. and Kembel, S. W. (2009), The merging of community ecology and phylogenetic biology. Ecology Letters, 12: 693-715. doi:10.1111/j.1461-0248.2009.01314.x

Baselga, A. (2010). Partitioning the turnover and nestedness components of beta diversity. Global ecology and biogeography, 19(1), 134-143.

Daru, B.H., Bank, M.V., Maurin, O., Yessoufou, K., Schaefer, H., Slingsby, J.A., & Davies, T.J. (2016). A novel phylogenetic regionalization of phytogeographical zones of southern Africa reveals their hidden evolutionary affinities.

Daru, B. H., Holt, B. G., Lessard, J. P., Yessoufou, K., & Davies, T. J. (2017). Phylogenetic regionalization of marine plants reveals close evolutionary affinities among disjunct temperate assemblages. Biological conservation, 213, 351-356.

Devictor, V. , Mouillot, D. , Meynard, C. , Jiguet, F. , Thuiller, W. and Mouquet, N. (2010), Spatial mismatch and congruence between taxonomic, phylogenetic and functional diversity: the need for integrative conservation strategies in a changing world. Ecology Letters, 13: 1030-1040.

Holt, B. G., Lessard, J. P., Borregaard, M. K., Fritz, S. A., Araújo, M. B., Dimitrov, D., … & Nogués-Bravo, D. (2013). An update of Wallace’s zoogeographic regions of the world. Science, 339(6115), 74-78.

Leprieur F, Albouy C, De Bortoli J, Cowman PF, Bellwood DR, Mouillot D (2012) Quantifying Phylogenetic Beta Diversity: Distinguishing between ‘True’ Turnover of Lineages and Phylogenetic Diversity Gradients. PLoS ONE 7(8): e42760.

Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and community ecology. Annu. Rev. Ecol. Syst., 33, 475–505.