Harvard Computer Science Technical Reports for 1989

TR-01-89
Philip Klein and Clifford Stein. 1989. ``A Parallel Algorithm for Eliminating Cycles in Undirected Graphs.''
We give an efficient parallel algorithm for finding a maximal set of edge-disjoint cycles in an undirected graph. The algorithm can be generalized to handle a weighted version of the problem.
TR-02-89
Lisa Hellerstein, Philip Klein, and Robert Wilber. 1989. ``On the time-space complexity of reachability queries for preprocessed graphs.''
How much can preprocessing help in solving graph problems? In this paper, we consider the problem of reachability in a directed bipartite graph, and propose a model for evaluating the usefulness of preprocessing in solving this problem. We give tight bounds for restricted versions of the model that suggest that preprocessing is of limited utility.
TR-03-89
Milena Mihail and Umesh Vazirani. 1989. ``On the Magnification of 0-1 Polytopes.''

TR-04-89
Milena Mihail and Umesh Vazirani. 1989. ``Conductance and Convergence of Markov Chains.''
Let $P$ be an irreducible and strongly aperiodic (i.e. $p_ {ii} \geq \frac {1} {2} \forall i)$ stochastic matrix. We obtain non- asymptotic bounds for the convergence rate of a Markov chain with transition matrix $P$ in terms of the {\it conductance} of $P$. These results have been so far obtained only for time-reversible Markov chains via partially linear algebraic arguments. Our proofs eliminate the linear algebra and therefore naturally extend to general Markov chains. The key new idea is to view the action of a strongly aperiodic stochastic matrix as a weighted averaging along the edges of the {\it underlying graph of} $P$. Our results suggest that the conductance (rather than the second largest eigenvalue) best quantifies the rate of convergence of strongly aperiodic Markov chains.
TR-05-89
Meichum Hsu and Va-On Tam. 1989. ``Transaction Synchronization in Distributed Shared Virtual Memory Systems.''
Distributed shared virtual memory (DSVM) is an abstraction which integrates the memory space of different machines in a local area network environment into a single logical entity. The algorithm responsible for maintaining this {\it virtually} shared image is called the {\it memory coherence algorithm} . In this paper, we study the interplay between memory coherence and {\it process synchronization} . In particular, we devise two-phase locking- based algorithms in a distributed system under two scenarios: {\it with} and {\it without} an underlying memory coherence system. We compare the performance of the two algorithms using simulation, and argue that significant performance gain can potentially result from bypassing memory coherence and supporting process synchronization directly on distributed memory. We also study the role of the {\it optimistic} algorithms in the context of DSVM, and show that some optimistic policy appears promising under the scenarios studied.
TR-06-89
Azer Bestavros. 1989. ``A VLSI Chip for the Real-time Information Dispersal and Retrieval for Security and Fault-Tolerance.''
In this paper, we describe SETH, a hardwired implementation of the recently proposed ``Information Dispersal Algorithm'' (IDA). SETH allows the real-time dispersal of information into different pieces as well as the retrieval of the original information from the available pieces. SETH accepts a stream of data and a set of ``keys'' so as to produce the required streams of dispersed data to be stored on (or communicated with) the different sinks. The chip might as well accept the streams of data from the different sinks along with the necessary controls and keys so as to reconstruct the original information. We begin this paper by introducing the Information Dispersal Algorithm and give an overview of SETH operation. The different functions are described and system block diagrams of varying levels of details are presented. Next, we present an implementation of SETH using Scalable CMOS technology that has been fabricated using the MOSIS 3-micon process. We conclude the paper with potential applications and extensions of SETH. In particular, we emphasize the promise of the Information Dispersal Algorithm in the design of I/O subsystems, Redundant Array of Inexpensive Disks (RAID) systems, reliable communication and routing in distributed/parallel systems. SETH demonstrates that using IDA in these applications is feasible.
TR-07-89
L.G. Valiant. 1989. ``General Purpose Parallel Architectures.''
The possibilities for efficient general purpose parallel computers are examined. First some network models are reviewed. It is then shown how these networks can efficiently implement some basic message routing functions. Finally, various high level models of parallel computation are described and it is shown that the above routing functions are sufficient to implement them efficiently.
TR-08-89
L.G. Valiant. 1989. ``Bulk-Synchronous Parallel Computers.''
We attribute the success of the von Neumann model of sequential computation to the fact that it is an efficient bridge between software and hardware. On the one hand high level languages can be efficiently compiled on to this model. On the other, it can be efficiently implemented in hardware in current technology. We argue that an analogous bridge between software and hardware is required for parallel computation if that is to become as widely used. We introduce the bulk-synchronous parallel (BSP) model as a candidate for this role. We justify this suggestion by giving a number of results that quantify its efficiency both in implementing some high level language features, as well as in being implemented in hardware.
TR-09-89
Lilei Chen. 1989. ``Scheduling Initialization Equations for Parallel Execution.''

TR-10-89
Donald Beaver and Joan Feigenbaum. 1989. ``Hiding Information from Several Oracles.''
Abadi, Feigenbaum, and Kilian have considered {\it computations with encrypted data} [AFK]. Let $f$ be a function that is not provably computable in randomized polynomial time; randomized polynomial-time machine A wants to query an oracle B for $f$ to obtain $f(x)$, without telling B exactly what $x$ is. Several well-known random-self-reducible functions, such as discrete logarithm and qua\-dra\-tic residuosity, are {\it encryptable} in this sense; that is, A can query B about an instance while hiding some significant information about the instance. It is shown in [AFK] that, if $f$ is an NP-hard function, A cannot query B while keeping secret all but the size of the instance, assuming that the polynomial hierarchy does not collapse. This negative result holds even if the oracle B has ``infinite'' computational power. Here we show that A {\it can} query $n$ oracles B$_1$, $ldots$, B$_n$, where $n=|x|$, and obtain $f(x)$ while hiding all but $n$ from each B$_i$, {\it for any boolean function f} . This answers a question due to Rivest that was left open in [AFK]. Our proof adapts techniques developed by Ben-Or, Goldwasser, Wigderson and Chaum, Crepeau, Damg\a ard for using Shamir's {\it secret- sharing} scheme to hide information about inputs to distributed computations [BGW], [CCD], [S].
TR-11-89
Donald Beaver. 1989. ``Perfect Privacy for Two-Party Protocols.''
We examine the problem of computing functions in a distributed manner based on private inputs, revealing the value of a function without revealing any additional information about the inputs. A function $f(x_1,ots,x_n)$ is $t$-private if there is a protocol whereby $n$ parties, each holding an input value, can compute $f,$ and no subset of $t$ or fewer parties gains any information other than what is computable from $f(x_1,ots,x_n).$ The class of $t$ -private {\em boolean} functions for $t \geqlceil \frac {n} {2} \rceil$ was described in [CK89]. We give a characterization of 1- private functions for two parties, without the restriction to the boolean case. We also examine a restricted form of private computation in the $n$-party case, and show that addition is the only privately computable function in that model.
TR-12-89
Azer A. Bestavros. 1989. ``The Input Output Real-Time Automaton: A model for real-time parallel computation.''
In this paper, we propose a unified framework for embedded real- time systems in which it would be possible to see the relation between issues of specification, implementation, correctness, and performance. The framework we suggest is based on the IORTA {\it (Input-Output Real-Time Automata)} model which is an extension of the IOA model introduced. An IORTA is an abstraction that encapsulates a system task. An embedded system is viewed as a set of interacting IORTAs. IORTAs communicate with each other and with the external environment using {\it signals} . A signal carries a sequence of {\it events} , where an event represents an instantiation of an {\it action} at a specific point in time. Actions can be generated by either the environment or the IORTAs. Each IORTA has a {\it state} . The state of an IORTA is observable and can only be changed by local {\it computations} . Computations are triggered by actions and have to be scheduled to meet specific timing constraints. IORTAs can be {\it composed} together to form higher level IORTAs. A specification of an IORTA is a description of its behavior (i.e. how it reacts to stimuli from the environment). An IORTA is said to {\it implement} another IORTA, if it is impossible to differentiate between their external behaviors. This is the primary tool that is used to verify that an implementation meets the required specification.
TR-13-89
Michael J. Kearns. 1989. ``The Computational Complexity of Machine Learning.''
This thesibs is a study of the computational complexity of machine learning from examples in the distribution-free model introduced by L.G. Valiant. In the distribution-free model, a learning algorithm receives positive and negative examples of an unknown target set ( {\it or concept} ) that is chosen from some known class of sets ( {\it concept class} ). These examples are generated randomly according to a fixed but unknown probability distribution representing Nature, and the goal of the learning algorithm is to infer an hypothesis concept that closely approximates the target concept with respect to the unknown distribution. This thesis is concerned with proving theorems about learning in this formal mathematical model. We are interested in the phenomenon of {\it efficient} learning in the distribution-free model, in the standard polynomial-time sense. Our results include general tools for determining the polynomial- time learnability of a concept class, an extensive study of efficient learning when errors are present in the examples, and lower bounds on the number of examples required for learning in our model. A centerpiece of the thesis is a series of results demonstrating the computational difficulty of learning a number of well-studied concept classes. These results are obtained by reducing some apparently hard number-theoretic problems from cryptography to the learning problems. The hard-to-learn concept classes include the sets represented by Boolean formulae, deterministic finite automata and a simplified form of neural networks. We also give algorithms for learning powerful concept classes under the uniform distribution, and give equivalences between natural models of efficient learnability. This thesis also includes detailed definitions and motivation for the distribution- free model, a chapter discussing past research in this model and related models, and a short list of important open problems.
TR-14-89
Steven Lloyd Salzberg. 1989. ``Learning with Nested Generalized Exemplars.''
This thesis presents a theory of learning called nested generalized exemplar theory (NGE), in which learning is accomplished by storing objects in Euclidean $n$-space, $E^n$, as hyper-rectangles. The hyper-rectangles may be nested inside one another to arbitrary depth. In contrast to most generalization processes, which replace symbolic formulae by more general formulae, the generalization process for NGE learning modifies hyper-rectangles by growing and reshaping them in a well-defined fashion. The axes of these hyper-retangles are defined by the variables measured for each example. Each variable can have any range on the real line; thus the theory is not restricted to symbolic or binary values. The basis of this theory is a psychological model called exemplar-based learning, in which examples are stored strictly as points in $E^n$. This thesis describes some advantages and disadvantages of NGE theory, positions it as a form of exemplar-based learning, and compares it to other inductive learning theories. An implementation has been tested on several different domains, four of which are presented in this thesis: predicting the recurrence of breast cancer, classifying iris flowers, predicting survival times for heart attack patients, and a discrete event simulation of a prediction task. The results in these domains are at least as good as, and in some cases significantly better than other learning algorithms applied to the same data. Exemplar-based learning is emerging as a new direction for machine learning research. The main contribution of this thesis is to show how an exemplar-based theory, using nested generalizations to deal with exceptions, can be used to create very compact representations with excellent modelling capability.
TR-15-89
Harry R. Lewis. 1989. ``Finite-State Analysis of Asynchronous Circuits with Bounded Temporal Uncertainty.''

TR-16-89
Jon Christensen, Joe Marks, Robert Walsh, and Mark Friedell. 1989. ``Flux Tracing: A Flexible Infrastructure for Global Shading.''
Flux tracing is a flexible, efficient, and easily implemented mechanism for determining scene intervisibility. Flux tracing can be combined with a variety of reflection models in several ways, yielding many different global-shading techniques. Several shading techniques based on flux tracing are illustrated. All provide intensity gradients and shadows with penumbras resulting from area light sources at finite distances. Some provide specular reflection and color bleeding, and some are extremely efficient. Flux tracing is both an expedient means of constructing an efficient global shader and a flexible tool for experimental development of global shading algorithms.
TR-17-89
Mark Friedell, Joe Marks, Robert Walsh, and Jon Christensen. 1989. ``Efficient Use of Image and Intervisibility Coherence in Rendering and Radiosity Calculations.''
Rendering algorithms based on image-area sampling were first proposed almost 20 years ago. In attempting to solve the hidden- surface problem from many pixels simultaneously, area-sampling algorithms are the most aggressive attempts to exploit image coherence. Although area-sampling algorithms are intuitively appealing, they usually do not perform well. When adequate image coherence is present, however, they can perform extremely well for some image regions. We present in this paper two new, hybrid rendering algorithms that combine area sampling with point- sampling techniques. In the presence of significant image coherence, these algorithms are considerably faster than any generally applicable alternative algorithms; for no image are they perceptibly slower. We also show how similar hybrid techniques can exploit intervisibility coherence to efficiently determine form factors for radiosity calculations.
TR-18-89
Lisa Rubin Neal. 1989. ``The Role of User Models in System Design.''
The inadequacy of systems designer's models of users has led to difficulties with the use of and the rejection of systems. We formulate a multi-dimensional user model, of which two dimensions are related to learning and decision-making styles and additional dimensions are related to the plurality of expertises involved in the use of a system. The incorporation of a user model characterizing relevant cognitive variables allows the tailoring of a system to the needs and abilities of its users. We use a number of computer games as rich tasks which allow us to derive information about users. We present the results of our research examining computer games and discuss the implications of this research for system design.
TR-19-89
Yuh-Dauh Lyuu. 1989. ``Fast Fault-Tolerant Parallel Communication with Low Congestion and On-Line Maintenance Using Information Dispersal.''
Space-efficient Information Dispersal Algorithm (IDA) is applied to communication in parallel computers to achieve fast communication, low congestion, and fault tolerance on various networks. All schemes run {\em within\/} their said time bounds without long delay. Let $N$ denote the size of the network. In the case of hypercube, our communication scheme runs in $\,2dot log N+1\,$ time using constant size buffers. Its probability of successful routing is at least $\,1-N^ {-2.419dotlog N + 1.5} $, proving Rabin's conjecture. The same scheme tolerates $\,N/(12dot edotlog N)\,$ random link failures with probability at least $\,1-2dot Ndot (log N)^ {-log N/12} \,$ ($e=2.718ldots$). It can also tolerate $\,N/c\,$ random link failures with probability $\,1- O(N^ {-1} )\,$ for some constant $c$. For a class of $d$-way shuffle networks, our scheme runs in $\,\approx 2dot ln N/lnln N\,$ time using constant size buffers. Its probability of successful routing is at least $\,1-N^ {-ln N/2} $. The same scheme tolerates $\,N/12\,$ random link failures with probability at least $\,1- N^ {-lnlnln N/12} .\,$ For a class of $d$-way digit-exchange networks, our scheme runs in $\,\approx 6dot ln N/lnln N\,$ time using constant size buffers. Its probability of successful routing is at least $\,1- o(N^ {-7dotln N} )$. The same scheme tolerates $\, N/(6dot e)\,$ random link failures with probability at least $\,1-2dot N^ {-lnlnln N/2} $. Another fault model where links fail independently with a constant probability is also considered. Numerical calculations show that with practical failure probabilities and sizes of the hypercube, our routing scheme for the hypercube performs well with high probability. On-line and efficient wire testing and replacement on the hypercube can be realized if our fault-tolerant routing scheme is used. Let $\,\alpha\,$ denote the total number of links. It is shown that $\,\approx\alpha/352\,$ wires can be disabled simultaneously without disrupting the ongoing computation or degrading the routing performance much.
TR-20-89
Mihaly Gereb-Graus. 1989. ``Lower Bounds on Parallel, Distributed and Automata Computations.''
In this thesis we present a collection of lower bound results: A hierarchy of complexity classes on tree languages (analogous to the polynomial hierarchy) accepted by alternating finite state machines is introduced. By separating the deterministic and the nondeterministic classes of our hierarchy we give a negative answer to the folklore question whether the expressive power of the treeautomata is the same as that of the finite state automaton that can walk on the edges of the tree (bugautomaton). We prove that three-head one-way DFA cannot perform string-matching, that is, no three-head one-way DFA accepts the language $L={x#y \mid x$ {\rm is a substring of} $y$, {\rm where} $x,y \in {0,1} ^*} $. We prove that in a one round fair coin flipping (or voting) scheme with $n$ participants, there is at least one participant who has a chance to decide the outcome with probability at least $3/n-o(1/n)$. We prove an optimal lower bound on the average time required by any algorithm that merges two sorted lists on the parallel comparison tree model. We present a proof of a negative answer for a question raised by Skyum and Valiant, namely, whether the class of symmetric boolean functions has a p-complete family. We give a combinatorial characterization for the concept classes learnable from negative (or positive) examples only in the so called distribution free learning model.
TR-21-89
Vinod Kathail and Dan C. Stefanescu. 1989. ``A Data Mapping Parallel Language.''

TR-22-89
Athanasios Tsantilas. 1989. ``An Analysis of the Valiant-Brebner Hypercube Routing Algorithm.''
We present the best known analysis of the running time of the Valiant-Brebner algorithm for routing $h$-relations in the $n$- dimensional hypercube. Refining an elegant proof technique due to Ranade we prove that for $h \sim n/omega(n)$, where $omega(n)$ tends to infinity arbitrarily slowly, the running time of this algorithm is $2n+O(n/lnomega(n))$ with very high probability. The same analysis holds for the directed $n$-butterfly as well.
TR-23-89
C. Kaklamanis, D. Krizanc, and A. Tsantilas. 1989. ``Tight Bounds for Oblivious Routing in the Hypercube.''
We prove that given an $N$-node communication network with maximum indegree $d$, any deterministic oblivious algorithm for routing an arbitrary permutation requires $Omega(\sqrt {N} /d)$ time. This is an improvement of a result by Borodin and Hopcroft. For the $N$-node hypercube, in particular, we show a matching upper bound by exhibiting a deterministic oblivious algorithm which routes any permutation in $O(\sqrt {N} / log N)$ time. The best previously known upper bound was $O(\sqrt {N} )$.
TR-24-89
Robert J. Walsh. 1989. ``SIMD Algorithms for Image Rendering.''
A parameterized cost model for SIMD machines is developed and used to rank projective image rendering algorithms. The ranking considers such an extremely broad range of remote communication costs that it holds for all practical situations. Ranked first is a new projective algorithm presented in the thesis. Empirical results support the ranking developed from the analytic models. The cost model is also applied to ray tracing, so that precomputed approximate sorts of surface primitives can be rigorously used to speed parallel ray tracing. The general-purpose architecture assumption is justified by showing that special-purpose machines actually render images more slowly. We conjecture that neither SIMD nor MIMD architectures have a fundamental advantage when efficient algorithms are known for both machine types. The algorithms and analysis presented can serve as a model for SIMD computations in other application domains.