Harvard Computer Science Technical Reports for 2001

TR-01-01 [tr-01-01.ps.gz (83 K)]
Russ Cox and William Josephson. 2001. ``Communication Timestamps for Filesystem Synchronization.''
The problem of detecting various kinds of update conflicts in file system synchronization following a network partition is well-known. All systems of which we are aware use the version vectors of Parker et al. These require O(R*F) storage space for F files shared among R replicas. We propose a number of different methods, the most space-efficient of which uses O(R*F) space in the worst case, but O(R+F) in the expected case.

To gain experience with the various methods, we implemented a file synchronization tool called Tra. Based on this experience, we discuss the advantages and disadvantages of each particular method.

Tra itself turns out to be useful for a variety of tasks, including home directory maintenance, operating system installation, and managing offline work. We discuss some of these uses.

TR-02-01 [tr-02-01.ps.gz (71 K)]
Wheeler Ruml. 2001. ``Incomplete Tree Search using Adaptive Probing.''
When not enough time is available to fully explore a search tree, different algorithms will visit different leaves. Depth-first search and depth-bounded discrepancy search, for example, make opposite assumptions about the distribution of good leaves. Unfortunately, it is rarely clear a priori which algorithm will be most appropriate for a particular problem. Rather than fixing strong assumptions in advance, we propose an approach in which an algorithm attempts to adjust to the distribution of leaf costs in the tree while exploring it. By sacrificing completeness, such flexible algorithms can exploit information gathered during the search using only weak assumptions. As an example, we show how a simple depth-based additive cost model of the tree can be learned on-line. Empirical analysis using a generic tree search problem shows that adaptive probing is competitive with systematic algorithms on a variety of hard trees and outperforms them when the node-ordering heuristic makes many mistakes. Results on boolean satisfiability and two different representations of number partitioning confirm these observations. Adaptive probing combines the flexibility and robustness of local search with the ability to take advantage of constructive heuristics.
TR-03-01 [tr-03-01.ps.gz (31 K)]
AJ Shankar and Wing Yung. 2001. ``gNarLI: A Practical Approach to Natural Language Interfaces to Databases.''
Most attempted natural language interfaces to databases take too general an approach, and so either require large amounts of setup time or do not provide adequate natural language support for a given domain.

Our approach, gNarLI, is a pragmatic one: it focuses on one small, easily- and well-defined domain at a time (say, an Oscars movie database). Domain definitions consist of simple rule-based actions. gNarLI provides a flexible pattern-matching preprocessor, an intuitive join processor, and language-independent pronoun support ("Who won best actor in 1957?" ... "Where was he born?"), in addition to considerable freedom in mapping rules to one or more portions of a SQL statement.

All aspects of the program are fully customizable on a domain basis. Appropriately sized domains can be constructed from scratch in less than 15 hours. The ease and speed of domain construction (requiring no programming skills) make gNarLi adaptable to different areas with little difficulty.

For small domains, gNarli works surprisingly well. As domain size increases, though, its join processor and rules-based approach become less successful.

TR-04-01 [tr-04-01.ps.gz (137 K)]
Wheeler Ruml. 2001. ``Assigning Features using Additive Clustering.''
If the promise of computational modeling is to be fully realized in higher-level cognitive domains such as language processing, principled methods must be developed to construct the semantic representations that serve as these models' input and/or output. In this paper, we propose the use of an established formalism from mathematical psychology, additive clustering, as a means of automatically assigning discrete features to objects using only pairwise similarity data. Similar approaches have not been widely adopted in the past, as existing methods for the unsupervised learning of such models do not scale well to large problems. We propose a new algorithm for additive clustering, based on heuristic combinatorial optimization. Through extensive empirical tests on both human and synthetic data, we find that the new algorithm is more effective than previous methods and that it also scales well to larger problems. By making additive clustering practical, we take a significant step toward scaling connectionist models beyond hand-coded examples.
TR-05-01 [tr-05-01.ps.gz (109 K)]
Norman Ramsey and Elod Csirmaz. 2001. ``An Algebraic Approach to File Synchronization.''
We present a sound and complete proof system for reasoning about operations on filesystems. The proof system enables us to specify a file-synchronization algorithm that can be combined with several different conflict-resolution policies. By contrast, previous work builds the conflict-resolution policy into the specification, or worse, does not specify the behavior formally. We present several alternatives for conflict resolution, and we address the knotty question of timestamps.
TR-06-01 [tr-06-01.ps.gz (64 K)]
Eleni Drinea, Mihaela Enachescu, and Michael Mitzenmacher. 2001. ``Variations on Random Graph Models for the Web .''
In this paper, we introduce variations on random graph models for Web-like graphs. We provide new ways of interpreting previous models, introduce non-linear models that extend previous work, and suggest models based on feedback between users and search engines.
TR-07-01 [tr-07-01.ps.gz (148 K)]
Michael Mitzenmacher. 2001. ``Dynamic Models for File Sizes and Double Pareto Distributions.''
In this paper, we introduce and analyze a new generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines ideas from recent work by Downey with ideas from recent work on random graph models for the Web. Unlike similar previous work, our Recursive Forest File model allows new files to be created and old files to be deleted over time, and our analysis covers problematic issues such as correlation among file sizes. Moreover, our model allows natural variations where files that are copied or modified are more likely to be copied or modified subsequently.

Previous empirical work suggests that file sizes tend to have a lognormal body but a Pareto tail. The Recursive Forest File model explains this behavior, yielding a double Pareto distribution, which has a Pareto tail but close to a lognormal body. We believe the Recursive Forest model may be useful for describing other power law phenomena in computer systems as well as other fields.

TR-08-01 [tr-08-01.ps.gz (114 K)]
Michael Mitzenmacher. 2001. ``A Brief History of Generative Models for Power Law and Lognormal Distributions.''
Power law distributions are an increasingly common model for computer science applications; for example, they have been used to describe file size distributions and in- and out-degree distributions for the Web and Internet graphs. Recently, the similar lognormal distribution has also been suggested as an appropriate alternative model for file size distributions. In this paper, we briefly survey some of the history of these distributions, focusing on work in other fields. We find that several recently proposed models have antecedents in work from decades ago. We also find that lognormal and power law distributions connect quite naturally, and hence it is not surprising that lognormal distributions arise as a possible alternative to power law distributions.
TR-09-01
Steven Gortler. 2001. ``.''

TR-10-01
Robert Costa. 2001. ``Integrating On-Demand Alias Analysis into Schedulers for Advanced Microprocessors.''
REPORT WITHDRAWN
TR-11-01 [tr-11-01.ps.gz (576 K)]
Zheng Wang . 2001. ``Progressive Profiling: A Methodology Based on Profile Propagation and Selective Profile Collection.''
In recent years, Profile-Based Optimization (PBO) has become a key technique in program optimization. In PBO, the optimizer uses information gathered during previous program executions to guide the optimization process. Even though PBO has been implemented in many research systems and some software companies, there has been little research on how to make PBO effective in practice.

In today's software industry, one major hurdle in applying PBO is the conflict between the need for high-quality profiles and the lack of time for long profiling runs. For PBO to be effective, the profile needs to be representative of how the users or a particular user runs the program. For many modern applications that are large and interactive, it takes a significant amount of time to collect high-quality profiles. This problem will only become more prominent as application programs grow more complex. A lengthy profiling process is especially impractical in software production environments, where programs are modified and rebuilt almost daily. Without enough time for extensive profiling runs, the benefit from applying PBO is severely limited. This in turn hampers the interest in running PBO and increases the dependency on hand tuning in software development and testing.

In order to obtain high-quality profiles in a software production environment without lengthening the daily build cycle, we seek to change the current practice where a new profile must be generated from scratch for each new program version. Most of today's profiles are generated for a specific program version and become obsolete once the program changes. We propose progressive profiling, a new profiling methodology that propagates a profile across program changes and re-uses it on the new version. We use static analysis to generate a mapping between two versions of a binary program, then use the mapping to convert an existing profile for the old version so that it applies to the new version. When necessary, additional profile information is collected for part of the new version to augment the propagated profile. Since the additional profile collection is selective, we avoid the high expense of re-generating the entire profile. With progressive profiling, we can collect profile information from different generations of a program and build a high-quality profile through accumulation over time, despite frequent revisions in a software production environment.

We present two different algorithms for matching binary programs for the purpose of profile propagation, and use common application programs to evaluate their effectiveness. We use a set of quantitative metrics to compare propagated profiles with profiles collected directly on the new versions. Our results show that for program builds that are weeks or even months apart, profile propagation can produce profiles that closely resemble directly collected profiles. To understand the potential for time saving, we implement a prototype system for progressive profiling and investigate a number of different system models. We use a case study to demonstrate that by performing progressive profiling over multiple generations of a program, we can save a significant amount of profiling time while sacrificing little profile quality.

TR-12-01 [tr-12-01.ps.gz (194 K)]
Lee D. Feigenbaum. 2001. ``Automated translation: generating a code generator.''
A key problem in retargeting a compiler is to map the compiler's intermediate representation to the target machine's instruction set. /par One method to write such a mapping is to use grammar-like rules to relate a tree-based intermediate representation with an instruction set. A dynamic-programming algorithm finds the least costly instructions to cover a given tree. Work in this family includes Burg, BEG, and twig. The other method, utilized by gcc and VPO, uses a hand-written ``code expander'' which expands intermediate representation into naive code. The naive code is improved via machine-independent optimizations while maintaining it as a sequence of machine instructions. Because they are inextricably linked to a compiler's intermediate representation, neither of these mappings can be reused for anything other than retargeting one specific compiler. /par Lambda-RTL is a language for specifying the semantics of an instruction set independent of any particular intermediate representation. We analyze the properties of a machine from its Lambda-RTL description, then automatically derive the necessary mapping to a target architecture. By separating such analysis from compilers' intermediate representations, Lambda-RTL in conjunction with our work allows a single machine description to be used to build multiple compilers, along with other tools such as debuggers or emulators.

Our analysis categorizes a machine's storage locations as special registers, general-purpose registers, or memory. We construct a data-movement graph by determining the most efficient way to move arbitrary values between locations. We use this information at compile time to determine which temporary locations to use for intermediate results of large computations.

To derive a mapping from an intermediate representation to a target machine, we first assume a compiler-dependent translation from the intermediate representation to register-transfer lists. We discover at compile-compile time how to translate these register-transfer lists to machine code and also which register-transfer lists we can translate. To do this, we observe that values are either constants, fetched from locations, or the results of applying operators to values. Our data-movement graph covers constants and fetched values, while operators require an appropriate instruction to perform the effect of the operator. We search through an instruction set discovering instructions to implement operators via the use of algebraic identities, inverses, and rewrite laws and the introduction of unwanted side effects.

TR-13-01 [tr-13-01.ps.gz (338 K)]
Christian James Carrillo. 2001. ``Instruction-Stream Compression.''
This thesis presents formal elements of instruction-stream compression.

We introduce notions of instruction representations, compressors and the general ``patternization'' function for representations to sequences. We further introduce the Lua-ISC language, an implementation of these elements. Instruction-stream compression algorithms are expressed, independently of the target architecture, in Lua-ISC. The language itself handles instruction decoding and encoding, patternization and compression; programs within it are compact and readable.

We perform experiments in instruction representation using Lua-ISC. Our results indicate that the choice of representation and patternization method affect compressor performance, and suggest that current design methodologies may overlook opportunities in lower-level representations.

Finally, we discuss four instruction-stream compression algorithms and their expressions in Lua-ISC, two of which are our own. The first exploits inter-program redundancy due to static compilation; the second allows state-based compression techniques to function in a random-access environment by compressing instructions as sets of blocks.

TR-14-01 [tr-14-01.ps.gz (4644 K)]
Salimah Addetia. 2001. ``CacheDAFS: User Level Client-Side Caching for the Direct Access File System.''
This thesis focuses on the design, implementation and evaluation of user-level client-side caching for the Direct Access File System (DAFS). DAFS is a high performance file access protocol designed for local file sharing in high-speed, low latency networked environments.

DAFS operates over memory-to-memory interconnects such as Virtual Interface (VI). VI provides a standard for efficient network communication by moving software overheads into hardware and eliminating the operating system from common data trasfers. While much work has been done on message passing and distributed shared memory in VI-like environments, DAFS is one of the first attempts to extend user-level networking to network file systems. In the environment of high-speed networks with virtual interfaces, software overheads such as data copies and translation, buffer managment and context switches become important bottlenecks. The DAFS protocol departs from traditional network file system practices to enhance performance.

Distributed file systems use client-side caching to improve performance by reducing network traffic, disk traffic and server load. The DAFS client omits any caching. This thesis presents a user-space cache for DAFS called cacheDAFS with a careful design that avoids most bottlenecks in network file system protocols and user-level networking environments. CacheDAFS maintains perfect consistency among DAFS clients using NFSv4-like open delegations. Changes to the DAFS API in order to add caching are minimal and results show that DAFS applications can use cacheDAFS to reap all the standard benefits of caching.