escience2019 has ended
Back To Schedule
Thursday, September 26 • 10:30am - 2:30pm
Pegasus Scientific Workflows with Containers

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Workflows are a key technology for enabling complex scientific computations. They capture the interdependencies between processing steps in data analysis and simulation pipelines as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes to promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and scientific reproducibility. Application containers such as Docker and Singularity are increasingly becoming a preferred way for bundling user application code with complex dependencies, to be used during workflow execution.

Pegasus is being used in a number of scientific domains doing production grade science. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called CyberShake to generate hazard maps for the Southern California region. In March 2017, SCEC conducted a CyberShake study on DOE systems ORNL Titan and NCSA BlueWaters. Overall, the study required 450,000 node-hours of computation across the two systems. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses.

The goal of the tutorial is to introduce the benefits of modeling pipelines in a portable way with use of scientific workflows with application containers. We will examine the workflow lifecycle at a high level and issues and challenges associated with various steps in the workflow lifecycle such as creation, execution and monitoring and debugging. Through hands on exercises, we will model an application pipeline, bundle the application codes in containers, and execute the pipeline on distributed computing infrastructures. The attendees will leave the tutorial with knowledge on how to implement their own computations using containers and workflows.


Karan Vahi

University of Southern California

Mats Rynge

USC Information Sciences Institute

Thursday September 26, 2019 10:30am - 2:30pm
Cockatoo Room