Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

Bayesian localization of … - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

Bayesian localization of CNV candidates in WGS data within minutes

Artikel i vetenskaplig tidskrift
Författare John Wiedenhoeft
A. Cagan
R. Kozhemyakina
R. Gulevich
Alexander Schliep
Publicerad i Algorithms for Molecular Biology
Volym 14
Nummer/häfte 1
Publiceringsår 2019
Publicerad vid Institutionen för data- och informationsteknik, datavetenskap (GU)
Språk en
Länkar dx.doi.org/10.1186/s13015-019-0154-...
Ämnesord HMM, Wavelet, CNV, Bayesian inference, hidden markov-models, analysis toolkit, domestication, adaptation, webgestalt, evolution, variants, behavior, fox, Biochemistry & Molecular Biology, Biotechnology & Applied Microbiology, Mathematical & Computational Biology
Ämneskategorier Cell- och molekylärbiologi

Sammanfattning

Background Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. Results In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler. Conclusions Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?