Reproducible High Performance Computing without Redundancy with Nix

Rohit Goswami*, S. Ruhila, Amrita Goswami, Sonaly Goswami, Debabrata Goswami

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High performance computing (HPC) clusters are typically managed in a restrictive manner; the large user base makes cluster administrators unwilling to allow privilege escalation. Here we discuss existing methods of package management, including those which have been developed with scalability in mind, and enumerate the drawbacks and advantages of each management methodology. We contrast the paradigms of containerization via docker, virtualization via KVM, pod-infrastructures via Kubernetes, and specialized HPC packaging systems via Spack and identify key areas of neglect. We demonstrate how functional programming due to reliance on immutable states has been leveraged for deterministic package management via the nix-language expressions. We show its associated ecosystem is a prime candidate for HPC package management. We further develop guidelines and identify bottlenecks in the existing structure and present the methodology by which the nix ecosystem should be developed further as an optimal tool for HPC package management. We assert that the caveats of the nix ecosystem can easily mitigated by considerations relevant only to HPC systems, without compromising on functional methodology and features of the nix-language. We show that benefits of adoption in terms of generating reproducible derivations in a secure manner allow for workflows to be scaled across heterogeneous clusters. In particular, from the implementation hurdles faced during the compilation and running of the d-SEAMS scientific software engine, distributed as a nix-derivation on an HPC cluster, we identify communication protocols for working with SLURM and TORQUE user resource allocation queues. These protocols are heuristically defined and described in terms of the reference implementation required for queue-efficient nix builds.

Original languageEnglish
Title of host publicationPDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing
EditorsHari Singh Rawat, Ravindara Bhatt, Pradeep Kumar Gupta, Vivek Kumar Seghal
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages238-242
Number of pages5
ISBN (Electronic)9781665454018
DOIs
Publication statusPublished - 2022
Event7th International Conference on Parallel, Distributed and Grid Computing, PDGC 2022 - Solan, India
Duration: 25 Nov 202227 Nov 2022

Publication series

NamePDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing

Conference

Conference7th International Conference on Parallel, Distributed and Grid Computing, PDGC 2022
Country/TerritoryIndia
CitySolan
Period25/11/2227/11/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Other keywords

  • functional-derivations
  • functional-package-management
  • high-performance-computing
  • nix-lang
  • reproducible-research

Fingerprint

Dive into the research topics of 'Reproducible High Performance Computing without Redundancy with Nix'. Together they form a unique fingerprint.

Cite this