Harvard Computer Science Technical Reports for 2006

TR-01-06
Amal Ahmed. 2006. ``Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types.''

TR-02-06 [tr-02-06.ps.gz (2395 K), tr-02-06.pdf (969 K)]
Benjamin C. Lee. 2006. ``An Architectural Assessment of SPEC CPU Benchmark Relevance.''
SPEC compute intensive benchmarks are often used to evaluate processors in high-performance systems. However, such evaluations are valid only if these benchmarks are representative of more comprehensive real workloads. I present a comparative architectural analysis of SPEC CPU benchmarks and the more realistic SPEC Java Server benchmark. This analysis indicates the integer subset of CPU benchmarks are broadly representative of the branching and cache characteristics of server and business workloads.

I also leverage the collected data to perform a statistical regression analysis, estimating an application's performance as a linear function of its observed architectural characteristics. A linear model derived from simulated SPEC CPU data appears to fit the observed data quite well, with 94 percent of performance variance explained by four predictors that incorporate cache and queue usage.

The linear model predicts server benchmark performance to within 2 percent of the observed value. Despite the regression model's close fit to the observed data, its predictive ability is constrained by the trace collection facility. It is infeasible to use the performance of one trace suite to predict the performance of another trace suite if the underlying trace collection processes differ across suites.

TR-03-06 [tr-03-06.ps.gz (362 K), tr-03-06.pdf (256 K)]
Kiran-Kumar Muniswamy-Reddy. 2006. ``Deciding how to store provenance .''
Provenance of a file is metadata pertaining to the history of the file. Provenance, unlike normal metadata stored in file systems, is retrieved primarily by running queries. This implies that provenance has to be indexed and should have a query interface. We believe that databases are the most appropriate place to store provenance as they provide both indexing and query capabilities.

The goal of this paper is to explore the most appropriate schema and database technology for storing provenance. In the paper we discuss the different possible schemas for storing provenance and the tradeoffs in choosing each of the schemas. We then characterize the behavior of some of the popular database architectures under provenance recording/ querying workloads. The database architectures that we considered are: RDBMS, Schemaless Embedded Databases (Berkeley DB), XML, and LDAP. Finally, we present preliminary performance results for the database architecture for provenance recording and some common provenance queries. Our results indicate that schemaless embedded databases have the best performance under most provenance workloads. The results also indicate that RDBMS has the best space utilization under most provenance workloads.

TR-04-06 [tr-04-06.ps.gz (27742 K), tr-04-06.pdf (1570 K)]
Uri Braun and Avi Shinnar . 2006. ``A Security Model for Provenance .''
Most security models are designed to protect data. Some also deal with traditional metadata. Provenance metadata introduces additional complexity, as does the delicate interactions between provenance metadata and the data it describes.

We designed a security model for provenance metadata. Our requirements were derived from potential users. The security model consists of two non-interfering models. One protects the structure or work-flow --- namely which ancestors and descendants are accessible to which users. A second model specifies which node attributes are accessible to which users. Our evaluation suggests that our security model meets the users' requirements.

TR-05-06 [tr-05-06.ps.gz (329 K), tr-05-06.pdf (654 K)]
Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo I. Seltzer . 2006. ``Provenance-Aware Storage Systems .''
A Provenance-Aware Storage System (PASS) is a storage system that automatically collects and maintains provenance or lineage, the complete history or ancestry of an item. We discuss the advantages of treating provenance as meta-data collected and maintained by the storage system, rather than as manual annotations stored in a separately administered database. We present a PASS implementation, discussing the challenges and performance cost, and the new functionality it enables. We show that with reasonable overhead, we can provide useful functionality not available in today's file systems or provenance management systems.
TR-06-06 [tr-06-06.ps.gz (739 K), tr-06-06.pdf (499 K)]
Geoffrey Goodell, Mema Roussopoulos , and Scott Bradner . 2006. ``A Directory Service for Perspective Access Networks .''
Network fragmentation occurs when the accessibility of a network-based resource to an observer is a function of how the observer is connected to the network. In the context of the Internet, network fragmentation is well-known and occurs in many situations, including an increasing preponderance of network address translation, firewalls, and virtual private networks. Recently, however, new threats to Internet consistency have received media attention. Alternative namespaces have emerged as the result of formal objections to the process by which Internet names and addresses are provisioned. In addition, various governments and service providers around the world have deployed network technology that (accidentally or intentionally) restricts access to certain Internet content. Combined with the aforementioned sources of fragmentation, these new concerns provide ample motivation for a network that allows users the ability to specify not only the network location of Internet resources they want to view but also the perspectives from which they want to view them. Our vision of a Perspective Access Network is a peer-to-peer overlay network that incorporates routing and directory services that allow non-hierarchical organization. In this paper, we present the design, implementation, and evaluation of a directory service for such networks. We demonstrate its feasibility and efficacy using measurements from a test deployment using PlanetLab.
TR-07-06 [tr-07-06.ps.gz (219 K), tr-07-06.pdf (104 K)]
Rani Nelken and Stuart M. Shieber. 2006. ``Computing The Kullback-Leibler Divergence Between Probabilistic Automata Using Rational Kernels.''
Kullback-Leibler divergence is a natural distance measure between two probabilistic finite-state automata. Computing this distance is difficult, since it requires a summation over a countably infinite number of strings. Nederhof and Satta (2004) recently provided a solution in the course of solving the more general problem of finding the cross-entropy between a probabilistic context-free grammar and an unambiguous probabilistic automaton. We propose a novel solution for two unambiguous probabilistic automata, by showing that Kullback-Leibler divergence can be defined as a rational kernel (Cortes et al., 2004) over the expectation semiring (Eisner 2002). Using this definition, the computation is performed using the general algorithm for rational kernels, yielding an elegant and efficient solution.
TR-08-06 [tr-08-06.ps.gz (7076 K), tr-08-06.pdf (4296 K)]
Benjamin C. Lee and David M. Brooks. 2006. ``Regression Modeling Strategies for Microarchitectural Performance and Power Prediction.''
We propose regression modeling as an effective approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This report addresses fundamental challenges in microarchitectural simulation costs via statistical modeling.

Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) per benchmark application-specific models designed to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing model sensitivity to sample and region sizes, we find 4,000 samples from a design space of approximately 22 billion points, are sufficient for both application-specific and regional modeling and prediction. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

TR-09-06
Rachel Greenstadt and Michael D. Smith. 2006. ``Collaborative Scheduling: Threats and Promises.''

TR-10-06 [tr-10-06.ps.gz (834 K), tr-10-06.pdf (452 K)]
Aleksandar Nanevski, Greg Morrisett , and Lars Birkedal . 2006. ``Polymorphism and Separation in Hoare Type Theory .''
In previous work we have proposed a Dependent Hoare Type Theory (HTT) as a framework for development and reasoning about higher-order functional programs with effects of state, aliasing and nontermination. The main feature of HTT is the type of Hoare triples {P}x:A{Q} specifying computations with precondition P and postcondition Q, that return a result of type A.

Here we extend HTT with predicative type polymorphism. Type quantification is possible in both types and assertions, and we can also quantify over Hoare triples. We show that as a consequence it becomes possible to reason about disjointness of heaps in the assertion logic of HTT. We use this expressiveness to interpret the Hoare triples in the ``small footprint'' manner advocated by Separation Logic, whereby a precondition tightly describes the heap fragment required by the computation. We support stateful commands of allocation, lookup, strong update, deallocation, and pointer arithmetic.

TR-11-06 [tr-11-06.ps.gz (3762 K), tr-11-06.pdf (1230 K)]
Hamilton Y. Chong and Steven J. Gortler. 2006. ``Scene Optimized Shadow Mapping.''
Although shadow mapping has found widespread adoption among those who seek realistic and believable lighting, the algorithm possesses an aliasing issue that is difficult to control. Recent insights have spurred a flurry of activity and bred a number of novel heuristic approaches to such control. Unfortunately, many of the proffered heuristics, while improving shadow quality for certain scenes, offer little guarantee outside these specific domains. In contrast, we present a metric formulation of shadow quality that consequently permits us to solve for optimal parameters (under a chosen metric) for mitigating the aliasing issue.
TR-12-06 [tr-12-06.ps.gz (9073 K), tr-12-06.pdf (2563 K)]
Geoffrey Lewis Goodell. 2006. ``Perspective Access Networks.''
Perspective Access Networks provide an infrastructure from which users can specify the location from which they wish to view the Internet. The ability to specify location has become necessary as the Internet has become increasingly inconsistent. An increasing preponderance of middleboxes, location-dependent services, and large-scale content filtering have contributed to this situation.

Our work offers the following contributions. First, we propose an infrastructure that routes traffic to a location from which a given resource can be viewed, taking instructions from user-specified attributes describing the desired location. Second, we analyze the tradeoff between the expressivity of user requests and the finite resources available within the network for propagating metadata about available perspectives. Third, we stipulate a set of real scenarios that fall within the limits of what can reasonably be handled by a system appropriately tuned to manage the tradeoff, and we argue that the specific algorithm we propose can handle the scenarios effectively.

TR-13-06
Geoffrey Mainland, Matt Welsh, and Greg Morrisett. 2006. ``Flask: A Language for Data-driven Sensor Network Programs.''

TR-14-06 [tr-14-06.ps.gz (1892 K), tr-14-06.pdf (392 K)]
Aleksandar Nanevski, Amal Ahmed, Greg Morrisett, and Lars Birkedal . 2006. ``Abstract Predicates and Mutable ADTs in Hoare Type Theory .''
Hoare Type Theory (HTT) combines a dependently typed, higher-order language with monadically-encapsulated, stateful computations. The type system incorporates pre- and post-conditions, in a fashion similar to Hoare and Separation Logic, so that programmers can modularly specify the requirements and effects of computations within types.

This paper extends HTT with quantification over abstract predicates (i.e., higher-order logic), thus embedding into HTT the Extended Calculus of Constructions.

When combined with the Hoare-like specifications, abstract predicates provide a powerful way to define and encapsulate the invariants of private state; that is, state which may be shared by several functions, but is not accessible to their clients. We demonstrate this power by sketching a number of abstract data types and functions that demand ownership of mutable memory, including an idealized custom memory manager.

TR-15-06 [tr-15-06.ps.gz (594 K), tr-15-06.pdf (422 K)]
David Sarne and Barbara J Grosz. 2006. ``Sharing Experiences to Learn User Characteristics in Dynamic Environments with Sparse Data.''
This paper investigates the problem of estimating the value of probabilistic parameters needed for decision making in environments in which an agent, operating within a multi-agent system, has no a-priori information about the structure of the distribution of parameter values. The agent must be able to produce estimations even when it may have only made a small number of direct observations, and thus it must be able to operate with sparse data. The paper describes a mechanism that enables the agent to significantly improve its estimation with only a small number of observations by augmenting its direct observations with those obtained by other agents with which it is coordinating. To avoid undesirable bias in relatively heterogeneous environments while effectively using relevant data to improve its estimations, the mechanism weighs the contributions of other agents' observations based on a real-time estimation of the level of similarity between each of these agents and itself. The ``coordination autonomy" (CA) module of a coordination-manager system provided an empirical setting for evaluation. Simulation-based evaluations demonstrate that the proposed mechanism outperforms estimations based exclusively on an agent's own observations as well as estimations based on an unweighted aggregate of all other agents' observations.
TR-16-06 [tr-16-06.ps.gz (534 K), tr-16-06.pdf (347 K)]
David Sarne and Barbara J Grosz. 2006. ``Estimating Information Value in Collaborative Multi-Agent Planning Systems.''
This paper addresses the problem of identifying the value of information held by a teammate on a distributed, multi-agent team. It focuses on a distributed scheduling task in which computer agents support people who are carrying out complex tasks in a dynamic environment. The paper presents a decision-theoretic algorithm for determining the value of information that is potentially relevant to scheduling revisions, but is directly available only to the person and not the computer-agent. The design of a ``coordination autonomy" (CA) module within a coordination-manager system provided the empirical setting for this work. By design, the CA module depends on an external scheduler module to determine the specific effect of additional information on overall system performance. The paper describes two methods for reducing the number of queries the CA issues to the scheduler, enabling it to satisfy computational resource constraints placed on it. Experimental results indicate the algorithm improves system performance and establish the exceptional efficiency---measured in terms of the number of queries required for estimating the value of information---that can be achieved by the query-reducing methods.
TR-17-06 [tr-17-06.ps.gz (442 K), tr-17-06.pdf (174 K)]
Alexandra Fedorova, Margo Seltzer, and Michael D. Smith. 2006. ``Cache-Fair Thread Scheduling for Multicore Processors .''
We present a new operating system scheduling algorithm for multicore processors. Our algorithm reduces the effects of unequal CPU cache sharing that occur on these processors and cause unfair CPU sharing, priority inversion, and inadequate CPU accounting. We describe the implementation of our algorithm in the Solaris operating system and demonstrate that it produces fairer schedules enabling better priority enforcement and improved performance stability for applications. With conventional scheduling algorithms, application performance on multicore processors varies by up to 36% depending on the runtime characteristics of concurrent processes. We reduce this variability by up to a factor of seven.
TR-18-06 [tr-18-06.ps.gz (676 K), tr-18-06.pdf (347 K)]
Julius Degesys, Ian Rose, Ankit Patel, and Radhika Nagpal. 2006. ``DESYNC: Self-Organizing Desynchronization and TDMA on Wireless Sensor Networks .''
Desynchronization is a novel primitive for sensor networks: it implies that nodes perfectly interleave periodic events to occur in a round-robin schedule. This primitive can be used to evenly distribute sampling burden in a group of nodes, schedule sleep cycles, or organize a collision-free TDMA schedule for transmitting wireless messages. Here we present DESYNC, a biologically-inspired self-maintaining algorithm for desynchronization in a single-hop network. We present (1) theoretical results proving convergence and bounding convergence rates, (2) experimental results on TinyOS-based Telos sensor motes, and (3) a DESYNC-based TDMA protocol. DESYNC-TDMA addresses two weaknesses of traditional TDMA: it does not require a global clock and it automatically adjusts to the number of participating nodes, so that bandwidth is always fully utilized. Experimental results show a reduction in message loss under high contention from approximately 58% to less than 1%, as well as a 25% increase in throughput over the default Telos MAC protocol.
TR-19-06
Jonathan Ledlie, Peter Pietzuch, Michael Mitzenmacher, and Margo Seltzer. 2006. ``Wired Geometric Routing.''

TR-20-06
Jonathan Ledlie, Paul Gardner, and Margo Seltzer. 2006. ``Network Coordinates in the Wild .''

TR-21-06 [tr-21-06.ps.gz (496 K), tr-21-06.pdf (164 K)]
Zhenming Liu and Michael Mitzenmacher. 2006. ``Codes for Deletion and Insertion Channels with Segmented Errors.''
We consider deletion channels and insertion channels under an additional segmentation assumption: the input consists of disjoint segments of b consecutive bits, with at most one error per segment. Under this assumption, we demonstrate simple and computationally efficient deterministic encoding and decoding schemes that achieve a high provable rate even under worst-case errors. We also consider more complex schemes that experimentally achieve higher rates under random error.
TR-22-06 [tr-22-06.ps.gz (567 K), tr-22-06.pdf (227 K)]
Michael O. Rabin and Christopher Thorpe. 2006. ``Time-Lapse Cryptography.''
The notion of "sending a secret message to the future" has been around for over a decade. Despite this, no solution to this problem is in common use, or even attained widespread acceptance as a fundamental cryptographic primitive. We name, construct and specify an implementation for this new cryptographic primitive, "Time-Lapse Cryptography", with which a sender can encrypt a message so that it is guaranteed to be revealed at an exact moment in the future, even if this revelation turns out to be undesirable to the sender. Our solution combines new ideas with Pedersen distributed key generation, Feldman verifiable threshold secret sharing, and ElGamal encryption, all of which rest upon the single, broadly accepted Decisional Diffie-Hellman assumption. We develop a Time-Lapse Cryptography Service ("the Service") based on a network of parties who jointly perform the service. The protocol is practical and secure: at a given time the Service publishes a public key so that anyone can use it, even anonymously. Senders encrypt their messages with this public key whose private key is not known to anyone -- not even a trusted third party -- until a predefined and specific future time, at which point the private key is constructed and published. At or after that time, anyone can decrypt the ciphertext using this private key. The Service is envisioned as a public utility publishing a continuous stream of encryption keys and subsequent corresponding time-lapse decryption keys. We complement our theoretical foundation with descriptions of specific attacks and defenses, and describe important applications of our service in sealed bid auctions, insider stock sales, clinical trials, and electronic voting.