Empowering precision medicine through high performance computing clusters

medicine. Furthermore, Molecular Dynamics (MD) earned a great importance in aiding genome research. Sequencing studies of cancer have allowed to detect and characterize mutated genes that drive tumorigenesis. As a complementary approach, from a biophysical perspective, MD simulations, executed on HPC architectures, have permitted to investigate the role played by pathological mutations on the molecular mechanism of activation.


Big Data Next-Generation Sequencing for translational research
associated with disease, response to treatment, or future patient prognosis. Whole Genome Sequencing (WGS) is a genomics technique that allows to detect all types of genetic variations (single nucleotide powerful feature joined to the maps of genetic variation in populations thus enabling the integration of diagnosis, genetic counselling into treatment decision-making. In 2015 Taylor et al. extensively applied whole-genome sequencing as tool for diagnosis of genetic disorders in routine clinical practice on 500 patients (including 156 independent evidence of pathogenicity in 21% of cases (33/156) using several analysis strategies that improved the accuracy of variant calling and detection rates. More in general WGS provides a picture of the whole landscape of driver mutation and mutational signature in diseases. Several HPC bioinformatic pipelines have been developed to characterize and prioritize genetics variant [2][3].
Whole-exome sequencing (WES) is a genomic technique for sequencing all of the protein-coding genes in a genome (also known as the exome) [4][5]. It has been applied to cancer and rare diseases to identify both the actionable somatic variants in the coding regions and for known disease phenotypes [6]. WES has been also applied for diagnosis of young patients without all spectrum of symptoms [7] and prenatal diagnosis [8]. Furthermore, detecting the causative mutation can suggest how to modify the treatment and prevent more invasive l trials. Targeted-exome sequencing (TES) is a genomic technique for which a subset of genes or regions of the genome are isolated and sequenced. genomic ranges of interest and enables sequencing at much higher tools to detect mutations in genes or genomic regions that are known or suspected to be associated to the disease of interest; the panel can be sensitive approach for the analysis of the cancer genome. It eliminates in short time much of the background noise generated by WES, since ideal tool for translational medicine and clinical settings.
RNA sequencing (RNA-Seq) is a sequencing technique able to reveal the presence and quantity of RNA in a biological sample at a to analyse the continuously changing cellular transcriptome. It has been extensively applied to patients to identify the molecular bases of many biological processes and diseases, including cancer [11][12]. In a better comprehension of the molecular mechanisms underlying prognosis and drug sensitivity. It addresses several aspects of the genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.) [13][14][15][16][17].
a big-data secure repository for storing, cataloguing and querying cancer genome 'omics https:// tcga-data.nci.nih.gov) cancer genome sequences, alignments, mutation information and molecular changes in cancer genome datasets, such as community. Another two available big-data resources on cancer are the Cancer Cell Line Encyclopedia (CCLE) [19] and the Genomics of Drug Sensitivity in Cancer [20]. As translational immediate impact on precision medicine, link among genomic biomarkers and drug sensitivity in hundreds of cancer cell lines are available for patients. With particular reference to CCLE, a big-data HPC analysis has been extensively performed on 935 paired-end RNA-seq experiments downloaded from CCLE repository, aiming at addressing novel putative cell-line fusion detection algorithms have been applied to the CCLE dataset in order to provide in silico a reliable consensus result set of about 1,700 predicted novel fusion gene candidates in all the human malignant cell lines. Such results, querieble on gene fusion database web portal (Ligea -http://hpc-bioinformatics.cineca.it/fusion) could represent (GTEx), tumor (TCGA) and cancer cell line (CCLE) tissues provide to translate information contained in the big-data bioinformatics of HPC in bioinformatics and computational biology is essential to reach these goals in a reasonable time.
ChIP-seq is a sequencing technique that combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing. It is a powerful method to identify genome-wide DNA binding sites for transcription factors and other proteins. Furthermore, it can be used to precisely map global binding sites for any protein of interest [22][23][24]. and epigenetic carcinogenesis, or any other disease related to [26], chromatin remodeling and microRNAs that act as regulatory interactions between genomic and environmental conditions [28]; they case of pancreatic ductal adenocarcinoma (PDAC) subtypes the study of epigenomic landscapes integrated with data of Chip-seq and RNAseq has allowed to predict aggressiveness and survival in some subtype of PADC [29], thus providing potential new markers and therapeutic targets.
Metagenomics is a sequencing technique that allows to study the genetic material recovered directly from environmental samples. It has been extensively applied to characterize virus genome heterogeneity, without in vitro replication biases, in the microbial community present in the clinical samples. High-throughput pyrosequencing has been virus directly in nasopharyngeal swabs in the context of the microbial community [30][31][32][33]. metagenomic research area: the study of human microbiome, is a major player in the immune system, since researchers believes that immune reactions are closely linked to the distribution of microbial communities throughout a person's life [34].

Structural characterization of pathogenic mutations
Historically, the HPC role in Medicine is even precedent to the NGS revolution, starting in the '90 with the availability of accurate in silico proteins in aqueous environment and then nucleic acids and membrane proteins).
HPC, in particular, has been widely applied in cancer research with Molecular Dynamics simulations characterizing cancer related proteins [35][36][37][38]; evaluating the impact of somatic mutations or the activity of anticancer drugs [39][40][41][42]. MD has been also applied for the characterization of viral proteins [43][44]. non synonymous SNPs obtained by NGS and microarray-based platforms, has increased the need for in silico methods capable to provide information at atomic level on the structural and dynamic alterations produced in mutated proteins. MD simulation is routinely complemented by other complementary methods such as Homology modelling, Molecular docking, and Drug Design. Application of these methods has become a standard tool in human genome research, since they proved to be able to rationalize the impact of pathogenic mutations [45][46][47].
questions about structural properties and long-range dynamics of protein and nucleic acids, thus allowing the formulation of rational hypothesis of clinical data [48][49][50][51]. In (Figure 2