# Harvard Computer Science Technical Reports for 1996

TR-02-96 [tr-02-96.ps.gz (442 K)]
Stanley F. Chen. 1996. Building Probabilistic Models for Natural Language.''
Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistically-trained models are an attractive alternative. These models are generally probabilistic, yielding a score reflecting sentence frequency instead of a binary grammaticality judgement. Probabilistic models of language are a fundamental tool in speech recognition for resolving acoustically ambiguous utterances. For example, we prefer the transcription "forbear" to "four bear" as the former string is far more frequent in English text. Probabilistic models also have application in optical character recognition, handwriting recognition, spelling correction, part-of-speech tagging, and machine translation.

In this thesis, we investigate three problems involving the probabilistic modeling of language: smoothing n-gram models, statistical grammar induction, and bilingual sentence alignment. These three problems employ models at three different levels of language; they involve word-based, constituent-based, and sentence-based models, respectively. We describe techniques for improving the modeling of language at each of these levels, and surpass the performance of existing algorithms for each problem. We approach the three problems using three different frameworks. We relate each of these frameworks to the Bayesian paradigm, and show why each framework used was appropriate for the given problem. Finally, we show how our research addresses two central issues in probabilistic modeling: the sparse data problem and the problem of inducing hidden structure.

TR-03-96 [tr-03-96.ps.gz (407 K)]
Roni Khardon. 1996. Learning to be Competent.''
The thesis presents a new approach for the study of competent cognitive behavior. The approach, learning to be competent, suggests that learning phenomena and the competencies attributed to intelligence should be studied together. Instead of requiring omniscience or otherwise optimal performance, we claim that the tasks and success criteria should be defined behaviorally; that is, a system is competent if it functions well in its environment. We further suggest that competent behavior should only be expected in light of a learning experience in the same or similar environment, and that the solutions exhibited should be computationally efficient. These ideas are presented in a formal setting, so that the various tasks and their proposed solutions can be studied and analyzed. yThus, one contribution of this approach is in formalizing the problem in a form that is amenable to analysis, while at the same time being cognitively and computationally plausible.

The learning to reason framework is used to study the problem of logical reasoning in propositional domains. We consider a variety of possible interfaces for learning, and describe learning algorithms that interact with them, thus demonstrating the robustness of this approach. The results show that learning to reason is possible even in cases where the traditionally separate problems, namely concept learning and reasoning by proving assertions, do not have efficient solutions.

In the course of studyi[\smallskipamount]hvjcvkvcn[\smallskipamount]g reasoning tasks, we develop a model based representation, the set of characteristic models, which supports efficient solutions for several forms of logical reasoning. This representation is utilized in the learning to reason framework, and helps in deriving the above mentioned results. The representation is also shown to have other applications, in the theory of relational databases, and in computational tasks that arise in the design of such databases.

The task of acting in a dynamic world in order to achieve some goals is studied in the learning to act framework. We present results on supervised learning of action strategies in the form of production rule systems. The framework and the results combine features from the area of symbolic computation and that of reactive agents, which have been previously seen as opposed if not contradictory, and thus advance our understanding of the problems.

TR-04-96 tr-04-96.ps
J. Bradley Chen, Michael D. Smith, and Brian N. Bershad. 1996. Morph: A Framework for Platform-Specific Optimization.''
The Morph Project is a unique compilation environment that improves the performance and maintainability of large software systems running on heterogeneous hardware platforms. Morph accomplishes these goals through the repartitioning of the compilation process and through the combination of compilation and executable re-writing technology. Morph maximizes the performance of an executable by repartitioning the compilation process so that it better supports the unrestricted use of host-specific compile-time optimizations. Morph maximizes the lifetime of an executable by automatically retargetting an executable so that it evolves to match evolutionary changes or configuration pecularities in the underlying system hardware. Morph will enable the automatic evolution of executables so that they achieve the best performance possible based on host-specific architectural features and program usage patterns.
TR-05-96 [tr-05-96.ps.gz (95 K)]
Yu Hu and S. Lennart Johnsson. 1996. A Data--Parallel Implementation of $O(N)$ Hierarchical $N$--body Methods.''
The $O(N)$ hierarchical $N$--body algorithms and Massively Parallel Processors allow particle systems of 100 million particles or more to be simulated in acceptable time. We present a data--parallel implementation of Anderson's method and demonstrate both efficiency and scalability of the implementation on the Connection Machine CM--5/5E systems. The communication time for large particle systems amounts to about 10--25%, and the overall efficiency is about 35%. The evaluation of the potential field of a system of 100 million particles takes 3 minutes and 15 minutes on a 256 node CM--5E, giving expected four and seven digits of accuracy, respectively. The speed of the code scales linearly with the number of processors and number of particles.
TR-06-96 [tr-06-96.ps.gz (134 K)]
Y. Charlie Hu and S. Lennart Johnsson. 1996. On the Accuracy of Poisson's formula based {N}--body Algorithms.''
We study the accuracy--cost tradeoffs of a Poisson's formula based hierarchical $N$--body method. The parameters that control the degree of approximation of the computational elements and the separateness of interacting elements, govern both the arithmetic complexity and the accuracy of the method. Empirical models for predicting the execution time and the accuracy of the potential and force evaluations for three-dimensional problems are presented. We demonstrate how these models can be used to minimize the execution time for a prescribed error and verify the predictions through simulations on particle systems with up to one million particles. An interesting observation is that for a given error, defining the near--field to consist of only nearest neighbor elements yields a lower computational complexity for a given error than the two--element separation recommended in the literature. We also show that the particle distribution may have a significant impact on the error.
TR-07-96 [tr-07-96.ps.gz (33 K)]
Christopher Small. 1996. MiSFIT: A Freely Available Tool for Building Safe Extensible Systems.''
The boundary between application and system is becoming increasingly permeable. Extensible applications, such as web browsers, database systems, and operating systems, demonstrate the value of allowing end-users to extend and modify the behavior of what was formerly considered to be a static, inviolate system. Unfortunately, flexibility often comes with a cost: systems unprotected from misbehaved end-user extensions are fragile and prone to instability.

Object-oriented programming models are a good fit for the development of this kind of system. An extensions can be designed as a refinement of an existing class, and loaded into a running system. In our model, when code is downloaded into the system, it is used to replace a virtual function on an existing C++ object. Because our tool is source-language neutral, it can be used to build safe extensible systems written in other languages as well.

There are three methods commonly used to make end-user extensions safe: restrict the extension language (e.g., Java), interpret the extension language (e.g., Tcl), or combine run-time checks with a trusted environment. The third technique is the one discussed here; it offers the twin benefits of the flexibility to implement extensions in an unsafe language, such as C++, and the performance of compiled code.

MiSFIT, the Minimal i386 Software Fault Isolation Tool, can be used as the central component of a tool set for building safe extensible systems in C++. MiSFIT transforms C++ code, compiled by g++, into safe binary code. Combined with a runtime support library, the overhead of MiSFIT is an order of magnitude lower than the overhead of interpreted Java, and permits safe extensible systems to be written in C++.

TR-08-96 [tr-08-96.ps.gz (102 K)]
Roni Khardon and Dan Roth. 1996. Learning to Reason with a Restricted View.''
The current emphasis of the research in learning theory is on the study of inductive learning (from examples) of concepts (binary classifications of examples). The work in AI identifies other tasks, such as reasoning, as essential for intelligent agents, but those are not supported by the current learning models. The Learning to Reason framework was devised to reconcile inductive learning and efficient reasoning. The framework highlights the fact that new learning questions arise when learning in order to reason. This paper addresses the task of deductive reasoning, and investigates learning to reason problems in which the examples seen are only partially specified.

The paper presents several interpretations for partial information in the interface with the environment, and develops model based representations and reasoning algorithms that are suitable to deal with partially observable worlds. Then, learning to reason algorithms that cope with partial information are developed. These results exhibit a tradeoff between learnability, the strength of the oracles used in the interface and the expressiveness of the queries asked.

This work shows that one can learn to reason with respect to expressive worlds, that cannot be learned efficiently in the traditional learning framework and do not support efficient reasoning in the traditional reasoning framework.

TR-09-96 [tr-09-96.ps.gz (78 K)]
Amr Fahmy and Abdelsalam Heddaya. 1996. Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk.''
Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of {\em communicable memory} and {\em lazy barriers} to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations---and other distributed memory architectures in general---based on the Bulk Synchronous Parallel (BSP) model. BSPk emphasizes efficiency in communication by minimizing local memory-to-memory copying, % [for the purpose of communication], and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing (MP) and distributed shared memory (DSM) programming styles are supported in BSPk. MP helps processes efficiently exchange short-lived unnamed data values, when the identity of either the sender or receiver is known to the other party. By contrast, DSM supports communication between processes that may be mutually anonymous, so long as they can agree on variable names in which to store shared temporary or long-lived data.
TR-10-96 [tr-10-96.ps.gz (81 K)]
David G. Cory, Amr F. Fahmy, and Timothy F. Havel. 1996. Ensemble Quantum Computing by Nuclear Magnetic Resonance Spectroscopy.''
A quantum computer is a machine which can operate in parallel on all its possible inputs at once, and can be programmed much like a conventional digital computer. It is based on the fact that the number of degrees of freedom in a collection of coupled two-state quantum systems grows exponentially with the number of systems. A major problem with quantum computers is that the amount of information that can be extracted from a quantum system is extremely limited, because the act of measurement irreversibly alters the state of the system. This phenomenon, known as wave function collapse, severely restricts the range of problems that can be solved on a quantum computer. We present a new computational model, which differs from a quantum computer only in that the result of measuring an observable is the expectation value of the observable, rather than a random eigenvalue thereof. We show that such an expectation value quantum computer can solve NP-complete problems in polynomial time, by showing that it can solve the satisfiability problem in linear time. This abstract computational model can be realized, to a certain extent, by nuclear magnetic resonance spectroscopy on macroscopic ensembles of quantum spins, namely molecules in a test tube. This is made possible by identifying a manifold of statistical spin states, called pseudo-pure states, whose mathematical description is isomorphic to that of an isolated spin system. The result is a novel NMR computer which can be programmed much like a quantum computer, but in other respects more closely resembles a DNA computer. Most notably, when applied to intractable combinatorial problems an NMR computer can use an amount of sample, rather than time, which grows exponentially with the size of the problem. Although NMR computers will be limited by current technology to exhaustive searches over only 15 to 20 bits, searches over as much as 50 bits are in principle possible, and more advanced algorithms could greatly extend the range of applicability of such machines.
TR-11-96
Leslie G. Valiant. 1996. REPLACE BY REVISED PAPER A Neuroidal Architecture for Cognitive Computation TR-04-98.ps.gz.''

TR-12-96 [tr-12-96.ps.gz (66 K)]
Lillian Lee. 1996. Learning of Context-Free Languages: A Survey of the Literature.''
We survey methods for learning context-free languages (CFL's) in the theoretical computer science literature. We first present some important negative results. Then, we consider five types of methods: those that take text as input, those that take structural information as input, those that rely on CFL formalisms that are not based on context-free grammars, those which learn subclasses of CFL's, and stochastic methods. A description of the subclasses of CFL's considered is provided, as is an extensive bibliography.
TR-13-96 [tr-13-96.ps.gz (80 K)]
Yu Charlie Hu , S. Lennart Johnsson , and Shang-Hua Teng. 1996. High Performance Fortran for Highly Irregular Problems.''
We present a general data parallel formulation for highly irregular problems in High Performance Fortran (HPF). Our formulation consists of (1) a method for linearizing irregular data structures (2) a data parallel implementation (in HPF) of graph partitioning algorithms applied to the linearized data structure, (3) techniques for expressing irregular communication and nonuniform computations associated with the elements of linearized data structures.

We demonstrate and evaluate our formulation on a parallel, hierarchical N-body method for the evaluation of potentials and forces of nonuniform particle distributions. Our experimental results demonstrate that efficient data parallel (HPF) implementations of highly nonuniform problems are feasible with the proper language/compiler/runtime support. Our data parallel N-body code provides a much needed benchmark'' code for evaluating and improving HPF compilers.

TR-14-96 [tr-14-96.ps.gz (66 K)]
Yu Charlie Hu , S. Lennart Johnsson, and Shang-Hua Teng . 1996. A Data--Parallel Adaptive N--body Method.''
We present a data--parallel formulation of an adaptive version of Anderson's method for $N$--body particle interactions. Our formulation consists of a storage and computationally efficient array--based representation for the nonuniform hierarchy that models arbitrary particle distributions for the computational procedure. We also present data--parallel implementations (in HPF) of several well known partitioning methods. These partitioning methods balance nodal weights (for computation). We present preliminary experimental results for these partitioning schemes and discuss the complete code for adaptive particle simulations.
TR-15-96 [tr-15-96.ps.gz (71 K)]
Yu Charlie Hu , Shang-Hua Teng, and S. Lennart Johnsson . 1996. A Data-Parallel Implementation of the Geometric Partitioning Algorithm.''
We present a data-parallel, High Performance Fortran (HPF) implementation of the geometric partitioning algorithm. The geometric partitioning algorithm has provably good partitioning quality. To our knowledge, our implementation is the first data--parallel implementation of the algorithm. Our data--parallel formulation makes extensive use of segmented prefix sums and parallel selections, and provide a data-parallel procedure for geometric sampling. Experiments in partitioning particles for load--balance and data interactions as required in hierarchical N-body algorithms and iterative algorithms for the solution of equilibrium equations on unstructured meshes by the finite element method have shown that the geometric partitioning algorithm has an efficient data--parallel formulation. Moreover, the quality of the generated partitions is competitive with that offered by the spectral bisection technique and better than the quality offered by other partitioning heuristics.
TR-16-96 [tr-16-96.ps.gz (69 K)]
Yu Charlie Hu and S. Lennart Johnsson . 1996. On the Accuracy of Anderson's Fast N-body Algorithm.''
We present an empirical study of the accuracy-cost tradeoffs of Anderson's method. The various parameters that control the degree of approximation of the computational elements and the separateness of interacting computational elements govern both the arithmetic complexity and the accuracy of the method. Our experiment shows that for a given error requirement, using a near-field containing only nearest neighbor boxes and a hierarchy depth that minimizes the number of arithmetic operations minimizes the total number of arithmetic operations.
TR-17-96 [tr-17-96.ps.gz (93 K)]
Nadia Shalaby. 1996. Efficient Parallel FFTs for Different Computational Models.''
We select the Fast Fourier Transfrom (FFT) to demonstrate a methodology for deriving the optimal parallel algorithm according to predetermined performance metrics, within a computational model. Following the vector space framework for parallel permutations, we provide a specification language to capture the algorithm, derive the optimal parallel FFT specification, compute the arithmetic, memory, communication and load-balance complexity metrics, apply the analytical performance evaluation to PRAM, LPRAM, BSP and LogP computational models, and compare with actual performance results.
TR-18-96 [tr-18-96.ps.gz (89 K)]
Nadia Shalaby and S. Lennart Johnsson. 1996. Hierarchical Load Balancing for Parallel Fast Legendre Transforms.''
We present a parallel Fast Legendre Transform (FLT) based on the Driscol-Healy algorithm with computation complexity O(N log^2 N)\$. The parallel FLT is load-balanced in a hierarchical fashion. We use a load-balanced FFT to deduce a load-balanced parallel fast cosine transform, which in turn serves as a building block for the Legendre transform engine, from which the parallel FLT is constructed. We demonstrate how the arithmetic, memory and communication complexities of the parallel FLT are hierarchically derived via the complexity of its modular blocks.