Baltasar Dinis

February 22, 2023 at 11:00 AM on Zoom / Soda Hall

Restart-Rollback: a novel fault model for persistent distributed systems

Abstract: Trusted Execution Environments (TEEs) ensure the confidentiality and integrity of computations in hardware. Subject to the TEE’s threat model, the hardware shields a computation from most externally induced fault behavior except crashes. As a result, a crash-fault tolerant (CFT) replication protocol should be sufficient when replicating trusted code inside TEEs. However, TEEs do not provide efficient and general means of ensuring the freshness of external, persistent state. Therefore, CFT replication is insufficient for TEE computations with external state, as this state could be rolled back to an earlier version when a TEE restarts. Furthermore, using BFT protocols in this setting is too conservative, because these protocols are designed to tolerate arbitrary behavior, not just rollback during a restart. In this talk, I'll present the restart-rollback (RR) fault model for replicating TEEs, which precisely captures the possible fault behaviors of TEEs with external state. Then, we show that existing replication protocols can be easily adapted to this fault model with few changes, while retaining their original performance. We adapted two widely used crash fault tolerant protocols — the ABD read/write register protocol and the Paxos consensus protocol — to the RR model. Furthermore, we leverage these protocols to build a replicated metadata service called TEEMS, and then show that it can be used to add TEE-grade confidentiality, integrity, and freshness to untrusted cloud storage services. In the second part of this talk I'll present ongoing follow-up work, where we apply the RR model outside the TEE use-case. By leveraging RR in the context of replicated Key Value Stores, we can allow for certain replicas to synchronize less regularly, which allows batching (and in turn better performance), without sacrificing durability in the event of a full system crash.

Bio: Baltasar is currently a research assistant at INESC-ID in Lisbon. In the past four years he has been working with professors Rodrigo Rodrigues (also from INESC-ID) and Peter Druschel (from the Max Planck Institute for Software Systems in Saarbruecken, Germany), mainly on the Restart-Rollback fault model and its applications. His research interests include the theory and practice of distributed and operating systems as well as security.

Security Lab