Storing data on untrusted servers

We have two research papers and a selection of available code.

Paper: Super-Efficient Aggregating History-Independent Persistent Authenticated Dictionaries

Abstract
Authenticated dictionaries allow users to send lookup requests to an untrusted server and get authenticated answers. Persistent authenticated dictionaries (PADs) add queries against historical versions. We consider a variety of different trust models for PADs and we present several extensions, including support for aggregation and a rich query language, as well as hiding information about the order in which PADs were constructed. We consider variations on tree-like data structures as well as a design that improves efficiency by speculative future predictions. We improve on prior constructions and feature two designs that can authenticate historical queries with constant storage per update and several designs that can return constant-sized authentication results.

A PDF version of the paper is available.

@InProceedings{crosbypads,
  author =       {Scott A. Crosby and Dan S. Wallach},
  title =        {Super-Efficient Aggregating History-Independent Persistent Authenticated Dictionaries},
  booktitle =    {Proceedings of ESORICS 2009},
  year =         2009,
  address =      {Saint Malo, France},
  month =        sep,
  pages = {671--688}
}

Tech Report: Authenticated dictionaries: Real-world costs and tradeoffs

Abstract
Authenticated dictionaries are a widely discussed paradigm to enable verifiable integrity for data storage on untrusted servers, such as today's widely used ``cloud computing'' resources, allowing a server to provide a ``proof,'' typically in the form of a slice through a cryptographic data structure, that the results of any given query are the correct answer, including that the absence of a query result is correct. Persistent authenticated dictionaries (PADs) further allow queries against older versions of the structure. This research presents implementations of a variety of different PAD algorithms, some based on Merkle tree-style data structures and others based on individually signed ``tuple'' statements (with and without RSA accumulators). We present system throughput benchmarks, presenting costs in terms of time, storage, and bandwidth as well as considering how much money would be required given standard cloud computing costs. We conclude that Merkle tree PADs are preferable in cases with frequent updates, while tuple-based PADs are preferable with higher query rates. For Merkle tree PADs, red-black trees outperform treaps and skiplists. Applying Sarnak-Tarjan's versioned node strategy, with a cache of old hashes at every node, to red-black trees yields the fastest Merkle tree PAD implementation, notably using half the memory of the more commonly used applicative path copying strategy. For tuple PADs, although we designed and implemented an algorithm using RSA accumulators that offers constant update size, constant storage per update, constant proof size, and sublinear computation per update, we found that RSA accumulators are so expensive that they are never worthwhile. We find that other optimizations in the literature for tuple PADs are more cost-effective.

A PDF version of the tech report is available.

@techreport{crosbypadbench,
    author = {Scott A. Crosby and Dan S. Wallach},
    title = {Authenticated dictionaries: {R}eal-world costs and tradeoffs},
    institution = {Rice University},
    number = {CS TR10-14},
    address = {Houston, TX},
    month = dec,
    year = 2010
}

Code

We have our original Python, and SWIG'ed C++ implementation of this code available, merged in with our C++ History tree code at http://tamperevident.cs.rice.edu/git/python_cpp_misc.git. This codebase may be of independent interest as it includes implementations of skiplists, treaps, and red-black trees that are implemented in a bottom up or functional fashion. Our implementations access the tree node storage through an opaque interface, making it easy to implement standard mutable trees, as well as path copying trees and sarnak-tarjan trees. We have these tree algorithms implemented in both Python and C++.