Getting your OTU table into R


Once you’ve finished picking your OTUs and you have a lovely BIOM formatted OTU table, you may want to leave QIIME and move to a more flexible environment (like R) for doing your analyses.  Getting that table ready and importing it as a data.frame in R is fairly simple, but not necessarily straightforward, so we’ll walk through the necessary steps and tools.

First thing, your otu_table_mc2_w_tax.biom file ought to have any assigned taxonomy included for each picked OTU.  If this were a standard table, you can think of OTUs as rows, and samples as columns…the 7 levels of taxonomy assigned to each OTU (those that got matches) would be 7 additional columns and are referred to as “observation metadata”.  We want to go ahead and add our “sample metadata” from our mapping file to our otu table while we’re at it.  First, check out the help file and then run the code, and check the summary to make sure everything stuck:

biom add-metadata --help
biom add-metadata -i otu_table_mc2_w_tax.biom -o otu_table_mc2_metadata_w_tax.biom -m ~/FILEPATH/mapping_file.txt --sample-header Location,Elevation
biom summarize-table -i otu_table_mc2_metadata_w_tax.biom

This will add the metadata columns named “Location” and “Elevation” from your tab-separated mapping file (Assuming you have those column names) as supplementary “rows” in your BIOM file.  If it worked, you’ll see your metadata columns in the “Sample Metadata Categories:” row of the summary output.  Now it’s in good shape to move it into R.


BIOM comes in two flavors: JSON, and HDF5.  Older QIIME versions used JSON and the current QIIME version uses HDF5.  There are a few BIOM packages in CRAN (Comprehensive R Archive Network), but getting them to read HDF5 format seems futile.  Fortunately, Paul McMurdie and Joseph Paulson put together a great package that ACTUALLY WORKS!  It’s not currently on CRAN, but we can still easily download and install the latest release from your R (RStudio!) console by typing:

install.packages("devtools") # if not already installed
devtools::install_github("biomformat", "joey711")

If your version of R is current (I believe version 3.2.3+) then you should be all set.  If you are running an older version you may get an error message saying the biomformat package is not available for your R release.  Just give yourself a well-deserved upgrade using the following and try again (Mac users can upgrade using HomeBrew):

sudo apt-get update
sudo apt-get dist-upgrade

Now that you have the right tool, let’s look at how to use it:

In RStudio, run the following to load the biomformat package:


Now, navigate to the “packages” tab and click on “biomformat” to view the help files.  It’s good to read through these a couple of times, but if you are in a hurry to get that OTU table imported, here’s what you can do:

1) Define a file path for your OTU table with all that juicy metadata attached to it:

file_path <- "~/YOUR/FILEPATH/otu_table_mc2_metadata_w_tax.biom"

2)  Read the BIOM file and load it to an object (in this case, named “dat”):

dat <- read_biom(file_path)

3) Coerce dat into a matrix (with a lowercase ‘m’) and then into a data frame:

#You can't go straight from a "sparse Matrix" to a data frame! 
otu_table <-

4) Get taxonomy:

taxonomy <- observation_metadata(dat)

5) Get sample metadata:

metadata <- sample_metadata(dat)

If we take a peek at the first 6 rows from our three data frames they should look something like what you see below (note: I limited the number of columns so it would fit reasonably)…

                            Mon.WS5 PaCo.4 O5.NoLyz 69CV.5 PaTiTi.3 DS.5 PaTiTi.2 PaCo.15 PaTiTi.1 S4
SH206549.07FU_AF145321_refs       1     28      106      7      106   15      141       4       99 17
SH212843.07FU_GU721652_reps       0      0        0      0        0    0        0       0        0  0
SH190986.07FU_KJ130026_reps       7      0        0      0        0    0        0       0       11  0
SH212235.07FU_GU188414_reps       0     10        0      0        0  480        0       0        0  0
SH424854.07FU_U61681_refs         0      0        0      0        0    0        0       0        0  0
SH221495.07FU_JX064536_reps       0      0        0      0        0    0        0       0       33  0


         BarcodeSequence Description         Host_Taxon            Location
Mon.WS5        AACAACAAC     Mon_WS5 Montipora_capitata               K-Bay
PaCo.4         AACAACAAC      PaCo_4 Odontaster_validus          Antarctica
O5.NoLyz       AACAACAAC    O5_NoLyz     Mycale_grandis               K-Bay
69CV.5         AACAACAAC      69CV-5          Mellicope          Culvert_69
PaTiTi.3       AACAACAAC    PaTiTi_3 Odontaster_validus          Antarctica
DS.5           AACAACAAC        DS-5             Myrica      Dan_St-Palikea


                            taxonomy1        taxonomy2          taxonomy3       taxonomy4         taxonomy5        taxonomy6                   taxonomy7
SH206549.07FU_AF145321_refs  k__Fungi p__Basidiomycota c__Tremellomycetes  o__Tremellales f__Incertae sedis  g__Cryptococcus          s__Cryptococcus sp
SH212843.07FU_GU721652_reps  k__Fungi  p__unidentified    c__unidentified o__unidentified   f__unidentified  g__unidentified                 s__Fungi sp
SH190986.07FU_KJ130026_reps  k__Fungi    p__Ascomycota c__Sordariomycetes   o__Xylariales f__Incertae sedis g__Monographella s__Monographella cucumerina
SH212235.07FU_GU188414_reps  k__Fungi    p__Ascomycota c__Lecanoromycetes  o__Lecanorales   f__Cladoniaceae      g__Cladonia s__Cladonia novochlorophaea
SH424854.07FU_U61681_refs    k__Fungi    p__Ascomycota c__Sordariomycetes  o__Hypocreales    f__Nectriaceae      g__Fusarium        s__Fusarium begoniae
SH221495.07FU_JX064536_reps  k__Fungi    p__Ascomycota c__Lecanoromycetes o__Peltigerales    f__Lobariaceae        g__Sticta        s__Sticta fuliginosa

And that’s all there is to getting your big fancy BIOM file into a properly formatted and useful state in R!

Leave a Reply