Eric Lyons (University of Massachusetts Amherst), George Papadimitriou (University of Southern California), Cong Wang (University of North Carolina at Chapel Hill), Komal Thareja (University of North Carolina at Chapel Hill), Paul Ruth (University of North Carolina at Chapel Hill), J. J. Villalobos (Rutgers Discovery Informatics Institute), Ivan Rodero (Rutgers Discovery Informatics Institute), Ewa Deelman (University of Southern California), Michael Zink (University of Massachusetts Amherst), and Anirban Mandal (University of North Carolina at Chapel Hill)
Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientists workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.