Harvard Computer Science Technical Reports for 2008

TR-01-08 [tr-01-08.pdf (162 K)]
Rebecca Nesson, Floris Roelofsen, and Barbara J. Grosz. 2008. ``Rational Coordinated Anaphora Theory.''
Rational Coordinated Anaphora theory is a novel explanatory theory that predicts how speakers generate anaphoric referring expressions in discourse, how hearers interpret them, and how all conversational participants coordinate their strategies to promote clear communication while minimizing effort. Its main premise is that conversational participants are, and expect each other to be rational. This paper presents Rational Coordinated Anaphora theory in detail and then contrasts it with Centering theory, demonstrating the reason for the robustness of Centering theory's main premise, Rule 1, and explaining some of Centering theory's previously puzzling limitations.
TR-02-08
Kevin Dale, Hanspeter Pfister, and Avrom Pfeffer. 2008. ``.''

TR-03-08 [tr-03-08.pdf (830 K)]
Kristen Lovin, Benjamin Lee, Xiaoyao Liang, David Brooks, and Gu-Yeon Wei. 2008. ``Empirical Performance Models for 3T1D Memories.''
Process variation poses a significant threat to the performance and reliability of the 6T SRAM cell. In response, research has turned to new memory cell models, such as the 3T1D DRAM cell, as potential replacement designs. If designers are to seriously consider this new design, performance models are needed to better understand the behavior of this cell. We propose a decoupled approach for collecting Monte Carlo HSPICE data, reducing simulation times by simulating memory array components separately based on their contribution to the worst-case read critical path. We use this Monte Carlo data to train regression models, which accurately predict retention and access times of a 3T1D memory array with a median error of 7.39%
TR-04-08 [tr-04-08.pdf (246 K)]
Kiran-Kumar Muniswamy-Reddy, Joseph Barillari, Uri Braun, David A. Holland, Diana Maclean, Margo Seltzer, and Stephen D. Holland. 2008. ``Layering in Provenance-Aware Storage Systems.''
Digital provenance describes the ancestry or history of a digital document. Provenance provides answers to questions such as: "How does the ancestry of these objects differ?" "Are there source code files tainted by proprietary software?" "How was this object created?'"

Prior systems used to collect and maintain provenance operate within a single layer of abstraction: the system call boundary, a workflow specification language, or in a domain-specific application level. The provenance collected at each of these layers of abstraction is different, and all of it is important at one time or another.

All of these solutions fundamentally fail to account for the different layers of abstraction at which users need to reason about their data and processes. None of these systems support queries across different layers of abstraction to answer a question such as "The calculated values in my spreadsheet have changed. Is this due to a change in the spreadsheet, a difference in the spreadsheet application, the libraries being used, or the operating system being used?"

We present an architecture for provenance collection that facilitates the integration of provenance across multiple layers of abstraction and across network boundaries. We show how the need to support provenance collection at multiple layers drives the architecture. We present provenance-aware use cases from the field of thermography and quantify system overheads, showing that we can provide new functionality with acceptable overhead.

TR-05-08 [tr-05-08.pdf (911 K)]
Rebecca Nesson, Giorgio Satta, and Stuart Shieber. 2008. ``Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar.''
Tree-Local Multi-Component Tree-Adjoining Grammar (TL-MCTAG) is an appealing formalism for natural language representation because it arguably allows the encapsulation of the appropriate domain of locality within its elementary structures (Kallmeyer and Romero 2007). As a result, it has frequently been put to use in a growing body of research into incorporating semantics into the Tree-Adjoining Grammar (TAG) framework (Kallmeyer and Joshi 2003; Han 2006; Nesson and Shieber 2006, 2007). Although TL-MCTAG was .rst introduced by Weir (1988) and shown at that time to be equivalent in expressivity to TAG, the complexity of TL-MCTAG is still not well-understood. Perhaps because of its equivalence to TAG, questions of processing ef.ciency have not adequately been addressed. This paper offers a thorough examination of the problem of TL-MCTAG recognition, showing that even highly restricted forms of TL-MCTAG are NP-hard to recognize. However, in spite of the provable dif.culty of the recognition problem, we offer several algorithms that can substantially improve processing ef.ciency. First, we present a parsing algorithm that improves on the standard parsing method and runs in polynomial time when both the fan-out--the maximum number of trees in a tree set--and rank--the maximum number of children of a node in a valid derivation tree--of the input grammar are bounded. Second, we offer an optimal, ef.cient algorithm for factorizing a grammar to produce a strongly-equivalent TL-MCTAG grammar with the rank of the grammar minimized.
TR-06-08 [tr-06-08.pdf (521 K)]
Xiaoyao Liang , Benjamin Lee, Gu-Yeon Wei, and David Brooks. 2008. ``Design and Test Strategies for Microarchitectural PostFabrication Tuning.''
Process variations are the major hurdle for continued technology scaling. Both systematic and random variations will affect the critical delay of fabricated chips, causing a wide frequency and power distribution. Tuning techniques are capable of adapting the microarchitecture to mitigate the impact of variations at post-fabrication testing time. Most of the existing techniques ignore testing cost or simply assume a naive exhaustive testing scheme. But testing has associated costs, which might be prohibitively expensive for a large space of post-fabrication tuning configurations. This paper proposes a new post-fabrication testing framework that accounts for testing costs. This framework uses on-chip canary circuits to capture systematic variation while using statistical analysis to estimate random variation. Regression model is applied to predict the chip performance and power. These techniques comprise an integrated framework that identifies the most energy efficient post-fabrication tuning configuration for each chip. The testing cost for the proposed framework is low, usually converging with fewer than two rounds of tests. At low cost, the proposed test framework can achieve 93 to 96 percent of BIPS3/W optimal tuning results, even under very large variations. Furthermore, the test framework fits into existing test flows without major changes to testing facilities.
TR-07-08 [tr-07-08.pdf (670 K)]
Yuanchen Zhu . 2008. ``A New Algorithm for View-Dependent Optimization of Terrain with Sub-Linear Time CPU Processing.''
This paper presents new schemes for view-dependent continuous level-of-detail (LOD) rendering of terrain which update output meshes with sub-linear CPU processing.

We use a directed acyclic graph (DAG) abstraction for the longest-edge-bisection based multiresolution model. The other component of our refinement framework is the saturated monotonic perspective-division based error function. We made the critical observation that, for a vertex, the difference between the reciprocals of this particular error function for two different viewpoints is bounded by the distance between the two viewpoints, times a per-vertex constant. We call this the bounded variation property.

Utilizing this property, we introduce the distance deferral table, a circular array based structure that schedules deferred processing of DAG vertices according to viewpoint motion. We then use the distance deferral table to optimize the traditional threshold-based refinement algorithm and the dual-queue constrained optimization algorithm to allow sub-linear CPU run-time.x