escience2019 has ended
Back To Schedule
Thursday, September 26 • 1:00pm - 1:30pm
Evaluation of Pilot Jobs for Apache Spark Applications on HPC Clusters

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Valerie Hayot-Sasson (Concordia University) and Tristan Glatard (Concordia University)

Big Data is becoming prominent throughout many scientific fields and, as a result, scientific communities are seeking Big Data frameworks to accelerate the processing of their increasingly data-intensive pipelines. However, while scientific communities typically rely on High-Performance Computing (HPC) clusters for the parallelization of their pipelines, many popular Big Data frameworks such as Hadoop and Spark were primarily designed to be executed on dedicated commodity infrastructures. As Big Data frameworks cannot leverage HPC schedulers directly, they must be executed on an overlay cluster atop an HPC allocations. This is problematic as application resource requirements needed by the HPC scheduler may not be known by the user. Pilot scheduling strategies have been developed to address the limitations of traditional HPC batch job schedulers. Pilot schedulers, such as HTCondor and DIRAC, decouple resource provisioning from task scheduling, thereby enabling efficient resource utilization through dynamic scheduling. This paper evaluates the benefits pilot-scheduling strategies over traditional batch submission on HPC clusters with overlay Apache Spark clusters. We evaluate the overall speedup brought on by employing pilot-scheduling strategies through the application of four increasing resource configurations. Overall, we find that there is little benefit to using pilot scheduling strategies, though it can bring 2x when system queuing times are very slow. However, these occurrences are rare. Generally pilots have approximately the same makespan as batch. Despite makespan differences being found to be mostly due queuing times, pilots did not appear to have any advantage in this regard, potentially due to system scheduling policies. Regardless, pilots may still be useful when application wall times are underestimated. This remains to be investigated.


Valerie Hayot-Sasson

Concordia University

Thursday September 26, 2019 1:00pm - 1:30pm PDT
Kon Tiki Room