Wednesday, September 25 • 1:30pm - 2:00pm
SciInc: A Container Runtime for Incremental Recomputation

Andrew Youngdahl (DePaul University), Dai-Hai Ton-That (DePaul University), and Tanu Malik (DePaul University)

Reviewing a computational experiment by repeating it and verifying its results is a time consuming task. A proper review often entails iteratively assessing the impact of changed arguments and datasets upon the results of a computation. Altering subsets of inputs, however, repeats all computational steps of an experiment even if steps are not impacted by the changed input. Minimizing redundant computations through partial recomputation and memoization is a promising incremental recomputation approach to improve review efficiency.

Current container technology, commonly used for sharing and reviewing experiments in new environments, does not provide support for incremental recomputation. In this paper we present SciInc a container runtime system that, given a computation, efficiently repeats iterative computations by reusing partial results
which are identical in both the repeat and the original. The run-time maintains an in-memory versioned provenance trace of the computation, and uses the trace to detect and adjust changes via a memoization-capable change propagation algorithm. Using a novel checkpoint/restore mechanism we show how incremental recomputation can be achieved within the container run-time without modifying programs or introducing new software stacks. We choose light-weight data structures for storing and implementing
the trace to maintain the invariant of reproducible computation within the container run-time. To determine the effectiveness of change propagation and memoization, we compare against popular container technology and incremental recomputation methods using published data analysis experiments


Tanu Malik

DePaul University

