**TR-01-93**[tr-01-93.ps.gz (117 K)]-
S. Lennart Johnsson and Kapil K. Mathur. 1993. ``Massively Parallel Computing: Mathematics and communications
libraries.''
*Massively parallel computing holds the promise of extreme performance. The utility of these systems will depend heavily upon the availability of libraries until compilation and run-time system technology is developed to a level comparable to what today is common on most uniprocessor systems. Critical for performance is the ability to exploit locality of reference and effective management of the communication resources. We discuss some techniques for preserving locality of reference in distributed memory architectures. In particular, we discuss the benefits of multidimensional address spaces instead of the conventional linearized address spaces, partitioning of irregular grids, and placement of partitions among nodes. Some of these techniques are supported as language directives, others as run-time system functions, and others still are part of the Connection Machine Scientific Software Library, CMSSL. We briefly discuss some of the unique design issues in this library for distributed memory architectures, and some of the novel ideas with respect to managing data allocation, and automatic selection of algorithms with respect to performance. The CMSSL also includes a set of communication primitives we have found very useful on the Connection Machine systems in implementing scientific and engineering applications. We briefly review some of the techniques used to fully utilize the bandwidth of the binary cube network of the CM-2 and CM-200 Connection Machine systems.* **TR-02-93**[tr-02-93.ps.gz (130 K)]-
Kapil K. Mathur and S. Lennart Johnsson. 1993. ``All-to-All Communication on the Connection Machine CM-200.''
*Detailed algorithms for all-to-all broadcast and reduction are given for arrays mapped by binary or binary-reflected Gray code encoding to the processing nodes of binary cube networks. Algorithms are also given for the local computation of the array indices for the communicated data, thereby reducing the demand for communications bandwidth. For the Connection Machine system CM- 200, Hamiltonian cycle based all-to-all communication algorithms yields a performance that is a factor of two to ten higher than the performance offered by algorithms based on trees, butterfly networks, or the Connection Machine router. The peak data rate achieved for all-to-all broadcast on a 2048 node Connection Machine system CM-200 is 5.4 Gbytes/sec when no reordering is required. If the time for data reordering is included, then the effective peak data rate is reduced to 2.5 Gbytes/sec.* **TR-03-93**-
Alexandros Gerbessiotis. 1993. ``Topics in Parallel and Distributed Computation.''
*With advances in communication technology, the introduction of multiple-instruction multiple-data parallel computers and the increasing interest in neural networks, the fields of parallel and distributed computation have received increasing attention in recent years. We study in this work the bulk-synchronous parallel model, that attempts to bridge the software and hardware worlds with respect to parallel computing. It offers a high level abstraction of the hardware with the purpose of allowing parallel programs to run efficiently on diverse hardware platforms. We examine direct algorithms on this model and also give simulations of other models of parallel computation on this one as well as on models that bypass it. While the term parallel computation refers to the execution of a single or a set of closely coupled tasks by a set of processors, the term distributed computation refers to more loosely coupled or uncoupled tasks being executed at different locations. In a distributed computing environment it is sometimes necessary that one computer send to the remaining ones various pieces of information. The term broadcasting is used to describe the dissemination of information from one computer to the others in such an environment. We examine various classes of random graphs with respect to broadcasting and establish results related to the minimum time required to perform broadcasting from any vertex of such graphs. Various models of the human brain, as a collection of distributed elements working in parallel, have been proposed. Such elements are connected together in a network. The network in the human brain is sparse. How such sparse networks of simple elements can perform any useful computation is a topic currently little understood. We examine in this work a graph construction problem as it relates to neuron allocation in a model of neural networks recently proposed. We also examine a certain class of random graphs with respect to this problem and establish various results related to the distribution of the sizes of sets of neurons when various learning tasks are performed on this model. Experimental results are also presented and compared to theoretically derived ones.* **TR-04-93**-
Barbara Grosz, H.T. Kung, Margo Seltzer, Stuart Shieber, and Michael Smith. 1993. ``Infrastructure for Research towards Ubiquitous Information
Systems.''
**TR-05-93**-
Dan Stefanescu and Yuli Zhou. 1993. ``An Equational Framework for the Flow Analysis of Higher Order
Functional Programs.''
*This paper presents a novel technique for the static analysis of functional programs. The method uses the original Cousots' framework expanded by a syntactic based abstraction methodology. The main idea is to represent each computational entity in a functional program in relation to its (concrete) call string, i.e. the string of function calls leading to its computation. Furthermore, the abstraction criteria consists in choosing a relation of equivalence over the set of all call strings. Based on this relation of equivalence, the method generates a monotonic system of equations such that its least solution is the desired result of the analysis. This approach generalizes previous techniques (0CFA, 1CFA, etc.) in flow analyses and allows for program-directed design of frameworks for approximate analysis of programs. The method is proven correct with respect to a rewriting system based operational semantics.* **TR-06-93**[tr-06-93.ps.gz (265 K)]-
Zdene k Johan, Kapil K. Mathur, S. Lennart Johnsson, and Thomas J.R. Hughes. 1993. ``An Efficient Communication Strategy for Finite Element Methods
on the Connection Machine CM-5 System.''
*Performance of finite element solvers on parallel computers such as the Connection Machine CM-5 system is directly related to the efficiency of the communication strategy. The objective of this work is two-fold: First, we propose a data-parallel implementation of a partitioning algorithm used to decompose unstructured meshes. The mesh partitions are then mapped to the vector units of the CM- 5. Second, we design gather and scatter operations taking advantage of data locality coming from the decomposition to reduce the communication time. This new communication strategy is available in the CMSSL [8]. An example illustrates the performance of the proposed strategy.* **TR-07-93**[tr-07-93.ps.gz (37 K)]-
Kapil K. Mathur and S. Lennart Johnsson. 1993. ``All-to-All Communication Algorithms for Distributed BLAS.''
*Dense Distributed Basic Linear Algebra Subroutine (DBLAS) algorithms based on all-to-all broadcast and all-to-all reduce are presented. For DBLAS, at each all-to-all step, it is necessary to know the data values and the indices of the data values as well. This is in contrast to the more traditional applications of all-to -all broadcast (such as a N-body solver) where the identity of the data values is not of much interest. Detailed schedules for all- to-all broadcast and reduction are given for the data motion of arrays mapped to the processing nodes of binary cube networks using binary encoding and binary-reflected Gray encoding. The algorithms compute the indices for the communicated data locally. No communication bandwidth is consumed for data array indices. For the Connection Machine system CM-200, Hamiltonian cycle based all-to-all communication algorithms improves the performance by a factor of two to ten over a combination of tree, butterfly network, and router based algorithms. The data rate achieved for all-to-all broadcast on a 256 node Connection Machine system CM- 200 is 0.3 Gbytes/sec. The data motion rate for all-to-all broadcast, including the time for index computations and local data reordering, is about 2.8 Gbytes/sec for a 2048 node system. Excluding the time for index computation and local memory reordering the measured data motion rate for all-to-all broadcast is 5.6 Gbytes/s. On a Connection Machine system, CM-200, with 2048 processing nodes, the overall performance of the distributed matrix vector multiply (DGEMV) and vector matrix multiply (DGEMV with TRANS) is 10.5 Gflops/s and 13.7 Gflops/s respectively.* **TR-08-93**[tr-08-93.ps.gz (516 K)]-
Kapil K. Mathur, Zdene k Johan, S. Lennart Johnsson, and Thomas J.R. Hughes. 1993. ``Massively Parallel Computing: Unstructured Finite Element
Simulations.''
*Massively parallel computing holds the promise of extreme performance. Critical for achieving high performance is the ability to exploit locality of reference and effective management of the communication resources. This article describes two communication primitives and associated mapping strategies that have been used for several different unstructured, three- dimensional, finite element applications in computational fluid dynamics and structural mechanics.* **TR-09-93**[tr-09-93.ps.gz (34 K)]-
S. Lennart Johnsson. 1993. ``The Connection Machine Systems CM-5.''
*The Connection Machine system CM-5 is a parallel computing system scalable to Tflop performance, hundreds of Gbytes of primary storage, Tbytes of secondary storage and Gbytes/s of I/O bandwidth. The system has been designed to be scalable over a range of up to three orders of magnitude. We will discuss the design goals, innovative software and hardware features of the CM- 5 system, and some experience with the system.* **TR-10-93**[tr-10-93.ps.gz (80 K)]-
Ed Dengler, Mark Friedell, and Joe Marks. 1993. ``Constraint-Driven Diagram Layout.''
*Taking both perceptual organization and aesthetic criteria into account is the key to high-quality diagram layout, but makes for a more difficult problem than pure aesthetic layout. Computing the layout of a network diagram that exhibits a specified perceptual organization can be phrased as a constraint-satisfaction problem. Some constraints are derived from the perceptual-organization specification: the nodes in the diagram must be positioned so that they form specified perceptual gestalts, i.e., certain groups of nodes must form perceptual groupings by proximity, or symmetry, or shape motif, etc. Additional constraints are derived from aesthetic considerations: the layout should satisfy criteria that concern the number of link crossings, the sum of link lengths, or diagram area, etc. Using a generalization of a simple mass-spring layout technique to ``satisfice'' constraints, we show how to produce high-quality layouts with specified perceptual organization for medium-sized diagrams (10--30 nodes) in under 30 seconds on a workstation.* **TR-11-93**[tr-11-93.ps.gz (1341 K)]-
Zdene k Johan, Kapil K. Mathur, S. Lennart Johnsson, and Thomas J.R. Hughes. 1993. ``An Efficient Communication Strategy for Finite Element Methods
on the Connection Machine CM-5 System.''
*The objective of this paper is to propose communication procedures suitable for unstructured finite element solvers implemented on distributed-memory parallel computers such as the Connection Machine CM-5 system. First, a data-parallel implementation of the recursive spectral bisection (RSB) algorithm proposed by Pothen {\em et al.} is presented. The RSB algorithm is associated with a node renumbering scheme which improves data locality of reference. Two-step gather and scatter operations taking advantage of this data locality are then designed. These communication primitives make use of the indirect addressing capability of the CM-5 vector units to achieve high gather and scatter bandwidths. The efficiency of the proposed communication strategy is illustrated on large-scale three-dimensional fluid dynamics problems.* **TR-12-93**-
Stanley F. Chen. 1993. ``Aligning Sentences in Bilingual Corpora Using Lexical
Information.''
*In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown {\it et al.} , 1991; Gale and Church 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.* **TR-13-93**[tr-13-93.ps.gz (149 K)]-
Zhenyu Li and Victor Milenkovic. 1993. ``Compaction and Separation Algorithms for Non-Convex Polygons and
Their Applications.''
*Given a two dimensional, non-overlapping layout of convex and non- convex polygons, compaction can be thought of as simulating the motion of the polygons as a result of applied ``forces.'' We apply compaction to improve the material utilization of an already tightly packed layout. Compaction can be modeled as a motion of the polygons that reduces the value of some functional on their positions. Optimal compaction, planning a motion that reaches a layout that has the global minimum functional value among all reachable layouts, is shown to be NP-complete under certain assumptions. We first present a compaction algorithm based on existing physical simulation approaches. This algorithm uses a new velocity-based optimization model. Our experimental results reveal the limitation of physical simulation: even though our new model improves the running time of our algorithm over previous simulation algorithms, the algorithm still can not compact typical layouts of one hundred or more polygons in a reasonable amount of time. The essential difficulty of physical based models is that they can only generate velocities for the polygons, and the final positions must be generated by numerical integration. We present a new position-based optimization model that allows us to calculate directly new polygon positions via linear programming that are at a local minimum of the objective. The new model yields a translational compaction algorithm that runs two orders of magnitude faster than physical simulation methods. We also consider the problem of separating overlapping polygons using a minimal amount of motion and show it to be NP-complete. Although this separation problem looks quite different from the compaction problem, our new model also yields an efficient algorithm to solve it. The compaction/separation algorithms have been applied to marker making: the task of packing polygonal pieces on a sheet of cloth of fixed width so that total length is minimized. The compaction algorithm has improved cloth utilization of human generated pants markers. The separation algorithm together with a database of human-generated markers can be used for automatic generation of markers that approach human performance.* **TR-14-93**-
Eyal Kushilevitz, Silvio Micali, and Rafail Ostrovsky. 1993. ``Universal Boolean Judges and Their Characterization.''
*We consider the classic problem of $n$ honest (but curious) players with private inputs $x_1,ldots, x_n$ who wish to compute the value of some pre-determined function $f(x_1,ldots,x_n)$, so that at the end of the protocol every player knows the value of $f(x_1,ldots,x_n)$. The players have unbounded computational resources and they wish to compute $f$ in a totally {\em private\/} ($n$-private) way. That is, after the completion of the protocol, which all players honestly follow, no coalition (of arbitrary size) can infer any information about the private inputs of the remaining players above of what is already been revealed by the value of $f(x_1,ldots,x_n)$. Of course, with the help of a {\em trusted judge for computing $f$} , players can trivially compute $f$ in a totally private manner: every player secretly gives his input to the trusted judge and she announces the result. Previous research was directed towards implementing such a judge ``mentally'' by the players themselves, and was shown possible under various assumptions. Without assumptions, however, it was shown that most functions {\em can not\/} be computed in a totally private manner and thus we must rely on a trusted judge. If we have a trusted judge for $f$ we are done. Can we use a judge for a ``simpler'' function $g$ in order to compute $f$ $n$- privately? In this paper we initiate the study of the {\em complexity} of such judges needed to achieve total privacy for arbitrary $f$. We answer the following two questions: {\em How complicated such a judge should be, compared to $f$?} and {\em Does there exists some judge which can be used for all $f$?} We show that there exists {f universal boolean} judges (i.e. the ones that can be used for any $f$) and give a complete characterization of all the boolean functions which describe universal judges. In fact, we show, that a judge computing {\em any\/} boolean function $g$ which itself cannot be computed $n$- privately (i.e., when there is no judge available) is {\em universal\/} . Thus, we show that for all boolean functions, the notions of {f universality\/} and {f $n$-privacy} are {\em complimentary\/} . On the other hand, for non-boolean functions, we show that this two notions are {\em not\/} complimentary. Our result can be viewed as a strong generalization of the two-party case, where Oblivious Transfer protocols were shown to be universal.* **TR-15-93**-
Thomas Cheatham. 1993. ``An Unbundled Compiler.''
**TR-16-93**-
Cecile Tiberghien Balkanski. 1993. ``Actions, Beliefs and Intentions in Multi-Action Utterances.''
*Multi-action utterances convey critical information about agents' beliefs and intentions with respect to the actions they talk about or perform. Two such utterances may, for example, describe the same actions while the speakers of these utterances hold beliefs about these actions that are diametrically opposed. Hence, for a language interpretation system to understand multi-action utterances, it must be able (1) to determine the actions that are described and the ways in which they are related, and (2) to draw appropriate inferences about the agents' mental states with respect to these actions and action relations. This thesis investigates the semantics of two particular multi-action constructions: utterances with means clauses and utterances with rationale clauses. These classes of utterances are of interest not only as exemplars of multi-action utterances, but also because of the subtle differences in information that can be felicitously inferred from their use. Their meaning is shown to depend on the beliefs and intentions of the speaker and agents whose actions are being described as well as on the actions themselves. Thus, the thesis demonstrates (a) that consideration of mental states cannot be reserved to pragmatics and (b) that other aspects of natural language interpretation besides the interpretation of mental state verbs or plan recognition may provide information about mental states. To account for this aspect of natural language interpretation, this thesis presents a theory of logical form, a theory of action and action relations, an axiomatization of belief and intention, and interpretation rules for means clauses and rationale clauses. Together these different pieces constitute an interpretation model that meets the requirements specified in (1) and (2) above and that predicts the set of beliefs and intentions shown to be characteristic of utterances with means clauses and rationale clauses. This model has been implemented in the MAUI system (Multi-Action Utterance Interpreter), which accepts natural language sentences from a user, computes their logical form, and answers questions about the beliefs and intentions of the speaker and actor regarding the actions and action relations described.* **TR-17-93**[tr-17-93.ps.gz (452 K)]-
Wheeler Ruml. 1993. ``Stochastic Approximation Algorithms for Number Partitioning.''
*This report summarizes research on algorithms for finding particularly good solutions to instances of the NP-complete number -partitioning problem. Our approach is based on stochastic search algorithms, which iteratively improve randomly chosen initial solutions. Instead of searching the space of all $2^ {n-1} $ possible partitionings, however, we use these algorithms to manipulate indirect encodings of candidate solutions. An encoded solution is evaluated by a decoder, which interprets the encoding as instructions for constructing a partitioning of a given problem instance. We present several different solution encodings, including bit strings, permutations, and rule sets, and describe decoding algorithms for them. Our empirical results show that many of these encodings restrict and reshape the solution space in ways that allow relatively generic search methods, such as hill climbing, simulated annealing, and the genetic algorithm, to find solutions that are often as good as those produced by the best known constructive heuristic, and in many cases far superior. For the algorithms and representations we consider, the choice of solution representation plays an even greater role in determining performance than the choice of search algorithm.* **TR-18-93**[tr-18-93.ps.gz (186 K)]-
William George, Ralph G. Brickner, and S. Lennart Johnsson. 1993. ``POLYSHIFT Communications Software for the Connection Machine
System CM-200.''
*We describe the use and implementation of a polyshift function {f PSHIFT} for circular shifts and end-off shifts. Polyshift is useful in many scientific codes using regular grids, such as finite difference codes in several dimensions, multigrid codes, molecular dynamics computations, and in lattice gauge physics computations, such as quantum Chromodynamics (QCD) calculations. Our implementation of the {f PSHIFT} function on the Connection Machine systems CM-2 and CM-200 offers a speedup of up to a factor of 3-4 compared to {f CSHIFT} when the local data motion within a node is small. The {f PSHIFT} routine is included in the Connection Machine Scientific Software Library (CMSSL).* **TR-19-93**[tr-19-93.ps.gz (152 K)]-
S. Lennart Johnsson and Kapil K. Mathur. 1993. ``High Performance, Scalable Scientific Software Libraries.''
*Massively parallel processors introduces new demands on software systems with respect to performance, scalability, robustness and portability. The increased complexity of the memory systems and the increased range of problem sizes for which a given piece of software is used, poses serious challenges to software developers. The Connection Machine Scientific Software Library, CMSSL, uses several novel techniques to meet these challenges. The CMSSL contains routines for managing the data distribution and provides data distribution independent functionality. High performance is achieved through careful scheduling of operations and data motion, and through the automatic selection of algorithms at run-time. We discuss some of the techniques used, and provide evidence that CMSSL has reached the goals of performance and scalability for an important set of applications.* **TR-20-93**[tr-20-93.ps.gz (128 K)]-
Karen Lochbaum. 1993. ``A Collaborative Planning Approach to Discourse Understanding.''
*Approaches to discourse understanding fall roughly into two categories: those that treat mental states as elemental and thus reason directly about them, and those that do not reason about the beliefs and intentions of agents themselves, but about the propositions and actions that might be considered objects of those beliefs and intentions. The first type of approach is a mental phenomenon approach, the second a data-structure approach (Pollack, 1986b). In this paper, we present a mental phenomenon approach to discourse understanding and demonstrate its advantages over the data-structure approaches used by other researchers. The model we present is based on the collaborative planning framework of SharedPlans (Grosz and Sidner, 1990; Lochbaum, Grosz, and Sidner, 1990; Grosz and Kraus, 1993). SharedPlans are shown to provide a computationally realizable model of the intentional component of Grosz and Sidner's theory of discourse structure (Grosz and Sidner, 1986). Additionally, this model is shown to simplify and extend approaches to discourse understanding that introduce multiple types of plans to model an agent's motivations for producing an utterance (Litman, 1985; Litman and Allen, 1987; Ramshaw, 1991; Lambert and Carberry, 1991).* **TR-21-93**[tr-21-93.ps.gz (1215 K)]-
Ellie Baker and Margo Seltzer. 1993. ``Evolving Line Drawings.''
*This paper explores the application of interactive genetic algorithms to the creation of line drawings. We have built a system that starts with a collection of drawings that are either randomly generated or input by the user. The user selects one such drawing to mutate or two to mate, and a new generation of drawings is produced by randomly modifying or combining the selected drawing(s). This process of selection and procreation is repeated many times to evolve a drawing. A wide variety of complex sketches with highlighting and shading can be evolved from very simple drawings. This technique has enormous potential for augmenting and enhancing the power of traditional computer-aided drawing tools, and for expanding the repertoire of the computer- assisted artist.* **TR-22-93**[tr-22-93.ps.gz (72 K)]-
Ralph G. Brickner, William George, S. Lennart Johnsson, , and Alan Ruttenberg. 1993. ``A Stencil Compiler for the Connection Machine Models CM-2/200.''
*In this paper we present a Stencil Compiler for the Connection Machine Models CM-2 and CM-200. A {\em stencil} is a weighted sum of circularly-shifted CM Fortran arrays. The stencil compiler optimizes the data motion between processing nodes, minimizes the data motion within a node, and minimizes the data motion between registers and local memory in a node. The compiler makes novel use of the communication system and has highly optimized register use. The compiler natively supports two-dimensional stencils, but stencils in three or four dimensions are automatically decomposed. Portions of the system are integrated as part of the CM Fortram programming system, and also as part of the system microcode. The compiler is available as part of the Connection Machine Scientific Software Library (CMSSL) Release 3.1.* **TR-23-93**[tr-23-93.ps.gz (74 K)]-
S. Lennart Johnsson. 1993. ``CMSSL: A Scalable Scientific Software Library.''
*Massively parallel processors introduce new demands on software systems with respect to performance, scalability, robustness and portability. The increased complexity of the memory systems and the increased range of problem sizes for which a given piece of software is used poses serious challenges for software developers. The Connection Machine Scientific Software Library, CMSSL, uses several novel techniques to meet these challenges. The CMSSL contains routines for managing the data distribution and provides data distribution independent functionality. High performance is achieved through careful scheduling of operations and data motion, and through the automatic selection of algorithms at run-time. We discuss some of the techniques used, and provide evidence that CMSSL has reached the goals of performance and scalability for an important set of applications.* **TR-24-93**[tr-24-93.ps.gz (66 K)]-
Kathy Ryall, Joe Marks, Murray Mazer, and Stuart Shieber. 1993. ``Annotating Floor Plans Using Deformable Polygons.''
*The ability to recognize regions in a bitmap image has applications in various areas, from document recognition of scanned building floor plans to processing of scanned forms. We consider the use of deformable polygons for delineating partially or fully bounded regions of a scanned bitmap that depicts a building floor plan. We discuss a semi-automated interactive system, in which a user positions a seed polygon in an area of interest in the image. The computer then expands and deforms the polygon in an attempt to minimize an energy function that is defined so that configurations with minimum energy tend to match the subjective boundaries of regions in the image. When the deformation process is completed, the user may edit the deformed polygon to make it conform more closely to the desired region. In contrast to area-filling techniques for delineating areal regions of images, our approach works robustly for partially bounded regions.*