To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Modelling of zero-inflati… - University of Gothenburg, Sweden Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

Modelling of zero-inflation improves inference of metagenomic gene count data

Journal article
Authors Viktor Jonsson
Tobias Österlund
Olle Nerman
Erik Kristiansson
Published in Statistical Methods in Medical Research
Volume 28
Issue 12
Pages 3712-3728
ISSN 0962-2802
Publication year 2019
Published at Department of Mathematical Sciences
Pages 3712-3728
Language en
Links dx.doi.org/10.1177/0962280218811354
Keywords Metagenomics, human microbiome, environmental sequencing, Bayesian modeling, Markov chain Monte Carlo, MCMC, zero-inflation, generalized linear models, differential abundance analysis, poisson, regression, diversity, Health Care Sciences & Services, Mathematical & Computational Biology, Medical Informatics, Mathematics
Subject categories Mathematics, Medical informatics

Abstract

Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to non-detected genes. This makes the statistical analysis challenging. In this study, we present a new hierarchical Bayesian model for inference of metagenomic gene abundance data. The model uses a zero-inflated overdispersed Poisson distribution which is able to simultaneously capture the high gene-specific variability as well as zero observations in the data. By analysis of three comprehensive datasets, we show that zero-inflation is common in metagenomic data from the human gut and, if not correctly modelled, it can lead to substantial reductions in statistical power. We also show, by using resampled metagenomic data, that our model has, compared to other methods, a higher and more stable performance for detecting differentially abundant genes. We conclude that proper modelling of the gene-specific variability, including the excess of zeros, is necessary to accurately describe gene abundances in metagenomic data. The proposed model will thus pave the way for new biological insights into the structure of microbial communities.

Page Manager: Webmaster|Last update: 9/11/2012
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?