Tamper evident logging

Paper: Efficient data structures for tamper-evident logging

Abstract

Many real-world applications wish to collect tamper-evident logs for forensic purposes. This paper considers the case of an untrusted logger, serving a number of clients who wish to store their events in the log, and kept honest by a number of auditors who will challenge the logger to prove its correct behavior. We propose semantics of tamper-evident logs in terms of this auditing process. The logger must be able to prove that individual logged events are still present, and that the log, as seen now, is consistent with how it was seen in the past. To accomplish this efficiently, we describe a tree-based data structure that can generate such proofs with logarithmic size and space, improving over previous linear constructions. Where a classic hash chain might require an 800~MB trace to prove that a randomly chosen event is in a log with 80 million events, our prototype returns a 3~KB proof with the same semantics. We also present a flexible mechanism for the log server to present authenticated and tamper-evident search results for all events matching a predicate. This can allow large-scale log servers to selectively delete old events, in an agreed-upon fashion, while generating efficient proofs that no inappropriate events were deleted.

We describe a prototype implementation and measure its performance on an 80 million event syslog trace at 1,750 events per second using a single CPU core. Performance improves to 10,500 events per second if cryptographic signatures are offloaded, corresponding to 1.1 TB of logging throughput per week.

A PDF version of the paper is available.

@InProceedings{crosbylogging,
  author =       {Scott A. Crosby and Dan S. Wallach},
  title =        {Efficient data structures for tamper-evident logging},
  booktitle =    {Proceedings of the 18th USENIX Security Symposium},
  year =         2009,
  address =      {Montreal, Canada},
  month =        aug,
}

Code

We have two implementations of this codebase. Our reference implementation is the Java implementation. Our git repository is available locally at http://tamperevident.cs.rice.edu/git/fastsig.git or on github at https://github.com/scrosby/fastsig in package edu.rice.historytree . This is our second generation implementation and is simpler, more clear and is more careful with building pruned tree and is slightly more efficient. This implementation supports Merkle aggregation.

We also have our original Python, and SWIG'ed 'C++ implementation as used in our Usenix Security paper. This code is merged in with the PAD code described below and is available at http://tamperevident.cs.rice.edu/git/python_cpp_misc.git. This implementation also supports Merkle aggregation. Persistent authenticated dictionaries