- Biomarker discovery with machine learning
- InVi: Integration & Visualization of genomic data
- Ioniser: Assisting glycostructures annotation
- mitoChip-seq: Mitochondria-specific peak detection
- Odyssey 2.1.1: Imputation of genomic data
- P-PSY-Finder: Detection of processed pseudogenes
- REAPER: A light-weight file monitor
- TC-Hunter: Transgenic insertion sites detection
Tool development projects
These projects usually emerge from analyses that we have identified to be of general interest. BCF devotes time to implement workflows and tools, making them publicly available through our GitHub page. We are thankful to our collaborators who helped us to validate the predictions from the resulting tools.
Tumor radiotherapy and basic radiation research rely on the accurate relation between absorbed dose and the therapeutic/biological effect after irradiation. Mis-correlation of the dose-response can cause severe issues, such as under-treatment of cancers (and thus disease progression) or risk exposure of healthy tissue leading to secondary diseases. We are developing a machine learning tool based on omics data for biomarker discovery in radiation research.
Advanced visualization of genomic data is vital to allow researchers explore and understand the complexities of their experimental data or large-scale datasets. Complex data visualization techniques exist today but their nature makes them difficult to use. To facilitate the exploration and creation of advanced genomics visualizations and support knowledge discovery, we developed the software InVi (Integration and Visualization of Genomic Data) and CiGUI (Circos Graphic User Interface) which rely on Circos for circular displays.
The characterization of glycosylated proteins is a challenging task in the proteomics field as they are commonly presented by multiple glycoforms. The Ioniser assists in identifying potentially novel glycoforms, without the need for prior knowledge of the existing glycostructures for a given peptide. It processes and filters large amounts of mass-to-charge (m/z) ratio and abundance data allowing the user to identify additional glycosylated proteins based on user-specified parameters.
ChIP-Seq is a powerful method for identifying genome-wide DNA binding sites for transcription factors and other proteins. Standard software and pipelines are available for the analysis and interpretation of such data. However, the mitochondria genome has been neglected and most of these algorithms are not adequate to correctly processed proteins targeting this small circular chromosome. Here we present a simple workflow to automate some basic statistics and visualization aids with a focus on the mitochondrial genome.
Publications
Odyssey 2.1.0 is a semi-autonomous workflow designed for the preparation, phasing and imputation of genomic data. Odyssey 2.1.1 is modified to run directly from the data folder on a HPC system or designated file system that contains your data of interest which can be specified in the Setting.conf. Additionally, the option for imputation has been narrowed to using Minimac2 due to speed differences compared to Impute4. Other functionalities of Odyssey remain the same and can be reviewed in the modified documentation materials.
Processed pseudogenes (PΨgs) are disabled gene copies that are transcribed and may affect expression of paralogous genes. Moreover, their insertion in the genome can disrupt the structure or the regulatory region of a gene, affecting its transcription. These events have been identified as occurring mutations during cancer development, thus being able to identify processed pseudogenes and their location will improve the somatic mutation testing in the clinical setting. PΨFinder is a tool that can automatically predict novel PΨgs from DNA sequencing data and determine their location in the genome with high accuracy. It generates high quality figures and tables that aid in the interpretation of the results and guide the experimental validation. PΨFinder is a complementary analysis to any mutational screening in the identification of disease-causing mutations within cancer and other diseases.
When performing mass spectrometry analyses, large amounts of data is produced. As the computer which performs this analysis has limited storage, it is of great interest to move the files to another storage as soon as possible. The Reaper monitors a specific directory where the files are created and updated, and with a user defined time unit checks for changes in that directory. When a file hasn’t changed in size by the third check, it is then copied to the appropriate location.
Transgenic animal models are crucial for the study of gene function and disease, and are widely utilized in basic biological research, agriculture and pharma industries. Since the current methods for generating transgenic animals result in the random integration of the transgene under study, the phenotype may be compromised due to disruption of known genes or regulatory regions. We implemented TC-hunter, Transgene-Construct hunter, an open tool that identifies transgene insertion sites and provides simple reports and visualization aids. It relies on common tools used in the analysis of high-throughput data and makes use of chimeric reads and discordant read pairs to identify and support the transgenic insertion site.