TBRU microbiome data analysis page

Getting started

Microbiome data analysis seems daunting, but it is not. There are a few core set of skills that one must learn, but just like a mere 26 letters in the alphabet and a few general rules can be used to produce a masterpiece like James Joyce's Ulysses, the same is true for all next generation sequencing data analysis. The pipelines are straightforward, and once the learning curve has plateaued, they are a lot of fun! The same set of concepts can be applied to any sort of next generation data generation technique, ranging from 16S amplicon DNA sequencing, metagenomic DNA sequencing, RNAseq, metabolomics, and more!

Qiime

Download Qiime here

RStudio

You can download RStudio here

Uparse pipeline

You can read about and download steps to process 16S amplicon data on the drive5.com webpage. There are other methods to generate OTUs, but this is the method we use.

All about Phyloseq!

Phyloseq is a Bioconductor package that integrates all of the necessary types of data to describe a microbiome. Specifically, the sequence data, sample metadata, taxonomy information of each sequence, and a phylogenetic tree of the sequences are all easily integrated into one "phyloseq object". Extracting data from this object in R is simple, and it makes downstream analyses simple and reproducible.

Conda and Anaconda

Conda is a package management system that helps you find and install new packages. Read more about it here. Conda is very easy to install.

It is also easy to install Anaconda--see here.

The miniconda version of python2.7 (miniconda2/bin/python) is required for LEfSe and some other microbiome tools. Installing conda and anaconda will likely modify your $PATH to make this the default, but if not, then be aware that when running any python scripts (e.g., run_lefse.py), you'll need to point to the directory containing the miniconda version of python2.7.

You can find out the default python on your system by opening a terminal window and typing "which python".