Ishtiyaque Ahmad
November 12, 2021 at 12:00 PM on Zoom / Soda Hall
Coeus: A System for Oblivious Document Ranking and Retrieval
Abstract: Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this talk, I will describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely used term frequency-inverse document frequency (tf-idf ) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds — a 24× improvement over a baseline system.
Bio: Ishtiyaque Ahmad is a third year PhD student in the Department of Computer Science at University of California, Santa Barbara. He is co-supervised by Professors Divy Agrawal, Amr El Abbadi, and Trinabh Gupta. His research interest lies in the intersection of distributed systems and privacy. His current work focuses on building systems that are scalable and at the same time provide strong privacy to their users.