A job dispatcher for large and heterogeneous HPC systems running modern applications

Cristian Galleguillos, Zeynep Kiziltan, Ricardo Soto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-performance Computing (HPC) systems have become essential instruments in our modern society. As they get closer to exascale performance, HPC systems become larger in size and more heterogeneous in their computing resources. With recent advances in AI, HPC systems are also increasingly being used for applications that employ many short jobs with strict timing requirements. HPC job dispatchers need to therefore adopt techniques to go beyond the capabilities of those developed for small or homogeneous systems, or for traditional compute-intensive applications. In this paper, we present a job dispatcher suitable for today's large and heterogeneous systems running modern applications. Unlike its predecessors, our dispatcher solves the entire dispatching problem using Constraint Programming (CP) with a model size independent of the system size. Experimental results based on a simulation study show that our approach can bring about significant performance gains over the existing CP-based dispatchers in a large or heterogeneous system.

Original languageEnglish
Title of host publication27th International Conference on Principles and Practice of Constraint Programming, CP 2021
EditorsLaurent D. Michel
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959772112
DOIs
StatePublished - 1 Oct 2021
Externally publishedYes
Event27th International Conference on Principles and Practice of Constraint Programming, CP 2021 - Virtual, Montpellier, France
Duration: 25 Oct 202129 Oct 2021

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume210
ISSN (Print)1868-8969

Conference

Conference27th International Conference on Principles and Practice of Constraint Programming, CP 2021
Country/TerritoryFrance
CityVirtual, Montpellier
Period25/10/2129/10/21

Keywords

  • Constraint programming
  • Heterogeneous systems
  • HPC systems
  • Large systems
  • On-line job dispatching
  • Resource allocation

Cite this