High Throughput Reproducible Literate Phylogenetic Analysis

Rohit Goswami*, S. Ruhila

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a holistic approach from a literate programming perspective to frame and solve systems biology problems. In particular, given the large data-sets required for answering questions relating to evolutionary histories we focus on the generalization and workflow required on a typical SLURM or PBS TORQUE queue driven high performance computing cluster. We demonstrate how to leverage multiple CLI tools compiled for efficient use in a portable manner on heterogeneous computational resources and further demonstrating the use of R to generate literate data-driven plots and analysis. High Performance Computing cluster (HPC) bottlenecks and installation barriers are also discussed and mitigation strategies are developed. As a concrete example we demonstrate the estimation of a phylogenetic tree, used to pose and answer questions on evolutionary lineages. In this manner, a generalized approach which can be used for systems biology is elucidated for manipulating phylogenetic data, including its validation, multiple sequence alignment, tree estimation through different models and reproduction.

Original languageEnglish
Title of host publicationPDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing
EditorsHari Singh Rawat, Ravindara Bhatt, Pradeep Kumar Gupta, Vivek Kumar Seghal
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages337-340
Number of pages4
ISBN (Electronic)9781665454018
DOIs
Publication statusPublished - 2022
Event7th International Conference on Parallel, Distributed and Grid Computing, PDGC 2022 - Solan, India
Duration: 25 Nov 202227 Nov 2022

Publication series

NamePDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing

Conference

Conference7th International Conference on Parallel, Distributed and Grid Computing, PDGC 2022
Country/TerritoryIndia
CitySolan
Period25/11/2227/11/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Other keywords

  • high-performance-computing
  • literate-programming
  • phylogenetics
  • r-lang
  • reproducible-research

Fingerprint

Dive into the research topics of 'High Throughput Reproducible Literate Phylogenetic Analysis'. Together they form a unique fingerprint.

Cite this