StateDB: A Price-Aware Checkpointing Library for Amazon EC2 Spot Instances – Københavns Universitet

StateDB: A Price-Aware Checkpointing Library for Amazon EC2 Spot Instances

Master's Thesis Defense by Patrick-Ranjit Madsen

Abstract:

Amazon EC2 Spot instances are virtual machines that can be dynamically allocated, but are subject to fluctuating prices relative to the number of idle Amazon EC2 instances. Spot instances are allocated when an initial dollar bid is above or equal to the current price, and unceremoniously terminated the second that is no longer the case. Consequently, data-loss becomes a real issue compared to normal EC2 instances, requiring applications to incorporate fault-tolerant means of limiting the impact of such revocation events. Because the probability of a revocation event is so directly influenced by the fluctuating price relative to the bid, any heuristic that hopes to make informed decisions about when to checkpoint must have near-realtime updates of said price-fluctuations to do so.

StateDB is a highly concurrent, price-aware checkpointing library for Go(lang) applications that is meant to both facilitate and simplify the checkpointing of an application. The library enable applications to specify mutable and immutable sections of the application state, and thus minimise write-times by leveraging incremental checkpointing. In addition, the library enables automatic scheduling of checkpoints using a client customisable heuristic that is provided with realtime price-updates, along with read- and write statistics from checkpoints. To test the library, a non-persistent version of an animal movement simulation is compared to a persistent version, with varying immutable- and mutable state-sizes, and the overhead of the checkpointing library is measured in terms of interrupted compute time and responsiveness to price-fluctuations.

Results show that the library is able to correctly react to price-changes and checkpointing signals from the heuristic, and these events are successfully translated into checkpoints on persistent storage through incremental checkpointing.

Supervisors: Marcos Vaz Salles (main supervisor) and Michael Kirkedal Thomsen, DIKU

Censor: Czeslaw Kazimierczak, CSC Danmark A/S