2004 Corporate Affiliates Meeting
Computer Science
POSTER SESSION
Martell Hall
Monday, October 18
3:15 p.m.
| Presenter | Authors | Title/Abstract/Poster |
|---|---|---|
| Kostas Bekris | Kostas Bekris, Lydia Kavraki | Robotic Sensor Networks and Applications in Autonomous Navigation |
| Michael Calhoun | Michael Calhoun, Scott Rixner | Profiling TCP Segmentation Offload on Linux |
| Brian Chen | Brian Chen, Viacheslav Fofanov, David Kristensen, Marek Kimmel, Olivier Lichtarge, Professor Lydia Kavraki | Algorithms for Structural Comparison and Statistical Analysis of 3D Protein |
| Anwis Das | Anwis Das, Scott Crosby, Dan Wallach | On the Usefulness of the Web of Trust |
| Anshu Dasgupta | Keith Cooper, Anshuman Dasgupta, Jason Eckhardt | Graph-Coloring Register Allocators Revisited |
| Yuri Dotsenko | Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey | Multi-platform Co-array Fortran Compiler |
| John Garvin | Bradley Broom, John Garvin, John Mellor-Crummey | RCC: A Compiler for the R Language for Statistical Computing |
| Alexander Grosul | Keith Cooper, Alex Grosul, Timothy Harvey, Steven Reeves, Devika Subramanian, Linda Torczon, Todd Waterman | Ordering of Code Transformations in the Optimizing Compiler |
| Ajay Gulati | Ajay Gulati and Peter Varman | Scheduling with QoS in Parallel I/O Systems |
| Andreas Haeberlen | Eliot Flannery, Andreas Haeberlen, Lydia Kavraki, Andrew Ladd, Algis Rudys, Dan Wallach | Practical Robust Localization over Large-Scale 802.11 Wireless Networks |
| Mackale Joyner | Zoran Budimlic, Mackale Joyner, Ken Kennedy, Rui Zhang | Improving Object-Inlining for High-Performance Java Scientific Applications |
| Tao Ju | Tao Ju | Robust Repair of Polygonal Models |
| Hyong-Youb Kim | Hyong-Youb Kim, Scott Rixner | TCP Acceleration: Connection Handoff to the Network Interface |
| Anirban Mandal | Anirban Mandal, Ken Kennedy et al | Scheduling Workflows on the Grid |
| Gabriel Marin | Gabriel Marin, John Mellor-Crummey | Developing Scalable Cross-Architecture Predictions of Memory Latency for Scientific Applications |
| Ben McMahan | Ben McMahan, Moshe Vardi | Projection Pushing Revisited |
| Santashil PalChaudhuri | Shu Du, Santashil PalChaudhuri, Amit Kumar Saha, Khoa To | Physical Implementation and Evaluation of Ad Hoc Network Routing Protocols Using Unmodified Simulation Models |
| Guoqiang Pan | Guoqiang Pan, Moshe Vardi | Symbolic Decision Procedures for QBF |
| Erion Plaku | Lydia Kavraki, Erion Plaku | Distributed Probabilistic Roadmap of Trees for Large-Scale Motion Planning |
| Mathias Ricken | Robert "Corky" Cartwright, Mathias Ricken | Unit Testing for Concurrent Programs |
| David Schwarz | Allison Heath, Lydia Kavraki, Mark Moll, David Schwarz | Receptor Flexibility for Drug Design Using Collective Coordinate Expansive Spaces |
| Amarda Shehu | Lydia Kavraki, Amarda Shehu, Hernan Stamati, Ming Zhang | Solving Molecular Kinematics with Constraints |
| Elspeth Simpson | Robert "Corky" Cartwright, James Hsia, Elspeth Simpson, Dan Smith | Taming Java for the Classroom |
| Deian Tabakov | Deian Tabakov, Moshe Vardi | Who is the Fastest? |
| Jeff Toole | Jeff Toole | Outdoor Navigation: Monte-Carlo Localization |
| Todd Waterman | Keith Cooper, Todd Waterman | Adaptive Inlining |
| Paul Willmann | Hyong-youb Kim, Scott Rixner, Vijay Pai, Paul Willmann | An Efficient Programmable 10 Gigabit Ethernet Network Interface Card |
Robotic Sensor Networks and Applications in Autonomous Navigation
Sensor networks have emerged as a promising research area due to recent advances in embedded systems and wireless networking that have allowed the construction of sensing devices that can be easily embedded in the real world. On the other hand, an increasing amount of work in the robotics literature has moved towards distributed research, where multiple robots are cooperating to achieve a task. The intersection of the two fields, wireless sensor networks and distributed robotics, has given rise to a new research paradigm, the robotic sensor networks. Although this is a recently developed research area, there has already been a significant amount of work that demands for a review and a proper classification. In this poster we provide a taxonomy of related research in robotic sensor networks and cover some of the recent developments in the field. Furthermore, we focus on the problem of navigating a robotic platform with the aid of sensor network and study various versions of this problem, from the case that the robot is aware of the map of the environment to the case that only the received signal strength and the network's connectivity are available. The goal of this research if to show that the synergy of robotics and networking can already produce cheap, yet efficient systems that will be able to autonomously interact with the environment and reduce the burden of human operators in cumbersome tasks. POSTER
Profiling TCP Segmentation Offload on Linux
TCP Segmentation Offload (TSO) is a hardware modification that seeks to reduce the load on the host CPU by offloading the segmentation of TCP/IP packets. Typically, the operating system creates packets by dividing the data to be sent into small segments, appending headers to each of them, and then passing these packets out onto the network. However, TSO allows the operating system, with the help of the driver, to create a larger TSO frame of up to 64KB in size (instead of the usual 1.5K). This larger frame is passed to the card where it is broken into regular sized packets and delivered onto the network. TSO reduces the computational requirements of sending data on the network by both reducing computational tasks (such as segmentation and header creation) and improving the efficiency of the memory management in the operating system (such as memory/buffer allocation/de-allocation and page alignment). This modification improves CPU utilization for bulk TCP sends and in applications with a large computational and networking component (such as MPI, NFS, and web servers). POSTER
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein
The comparison of structural subsites in proteins is increasingly relevant to the prediction of their biological function. To address this problem, we present the Match Augmentation algorithm (MA). Given a structural motif of interest, such as a functional site, MA searches a target protein structure for a match: the set of atoms with the greatest geometric and chemical similarity. MA is extremely efficient because it exploits the fact that the amino acids in a structural motif are not equally important to function. Using motif residues ranked on functional significance via the Evolutionary Trace (ET), MA prioritizes its search by initially forming matches with functionally significant residues, then, guided by ET, it augments this partial match stepwise until the whole motif is found. With this hierarchical strategy, MA runs considerably faster than other methods, and almost always identifies matches in homologs known to have cognate functional sites. Second, in order to interpret matches, we further introduce a statistical method using nonparametric density estimation of the frequency distribution of structural matches. Our results show that the hierarchy of functional importance within structural motifs speeds up the search within targets, and points to a new method to score their statistical significance. POSTER
On the Usefulness of the Web of Trust
The most common way to scale the public key system is to build it without a fixed infrastructure. Such systems are based on entitities verifying and signing the keys of entities that they have previously authenticated. This research aims to solve the problem of whether such systems have the potential to work if deployed. First, we build a framework to model the problem. Then, we apply the framework to real large world data sets and provide analysis on various scenarios. POSTER
Graph-Coloring Register Allocators RevisitedGraph coloring register allocators have been well studied and widely implemented in compiler infrastructures. Traditional graph coloring allocators do not consider program structure while making coloring and spill decisions. In our work, we study the impact of incorporating control flow information into a graph coloring allocation framework. To this end, we evaluate the Callahan-Koblenz algorithm that uses program structure to intelligently place spill code. We compare this strategy to the Chaitin-Briggs allocator -- a relatively well-understood graph coloring algorithm that does not use program structure to guide spill decisions. Our preliminary evaluation indicates that incorporating program structure into the algorithm can significantly enhance the resulting allocations. POSTER
Multi-platform Co-array Fortran CompilerCo-array Fortran (CAF)---a small set of extensions to Fortran 90---is an emerging model for scalable, global address space parallel programming. CAF's global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for managing the details of communication from developers to compilers. We present a prototype implementation of an open-source, multiplatform CAF compiler that generates code well-suited for today's commodity clusters as well as hardware shared memory platforms. The CAF compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. We describe key details of our compiler to generating efficient code for multiple platforms. Experiments compare the performance of CAF and MPI versions of a broad range of codes on an Alpha cluster with a Quadrics interconnect, an Itanium 2 cluster with a Myrinet 2000 interconnect, an Itanium 2 cluster with a Quadrics interconnect and SGI Altix 3000. These experiments show that our CAF compiler transforms CAF programs into code that delivers performance comparable to that of hand-optimized MPI programs for such applications as neutron transport problem (Sweep3D), earthquake simulation (Spark98) and NASA computational fluid dynamics applications (NAS benchmarks). POSTER
RCC: A Compiler for the R Language for Statistical ComputingThe R language is widely used for bioinformatics and biostatistics applications. R enables scientists to get quick results from programs written at a high conceptual level, but the performance of the R interpreter is unacceptable for heavy-duty applications. Translating R programs into low-level C or Fortran by hand is time-intensive and error-prone. RCC (http://hipersoft.cs.rice.edu/rcc/) is a compiler developed at Rice that automatically translates R code into C. RCC uses the interpreter's runtime libraries to achieve a complete translation, including R's intensive features, including introspection and self-modifying code. Translating R into C with a few simple optimizations currently yields performance ranging from equal to interpreted code to a factor of 3 improvement for various test codes. RCC enables further optimization via intensive code analysis on R syntax trees and source-to-source analysis and transformation on C output. Hand optimization experiments suggest that factor improvements of over 100 may eventually be possible.
Ordering of Code Transformations in the Optimizing CompilerThe fixed order of application of code transformations used in most modern optimizing compilers does not produce best results over all input programs. This poster describes how we construct program-specific orderings of transformations that result in improvements of 5%-40% across the range of input programs and compilation objectives. POSTER
Scheduling with QoS in Parallel I/O SystemsParallel I/O architectures are increasingly deployed for high performance computing and in shared data centers. In these environments it is desirable to provide QoS-based allocation of disk bandwidth to different applications sharingthe I/O system. In this poster, we present a model of disk bandwidth allocation, and provide efficient scheduling algorithms to assign the bandwidth among the concurrent applications. POSTER
Practical Robust Localization over Large-Scale 802.11 Wireless NetworksWe demonstrate a system built using probabilistic techniques that allows for remarkably accurate localization across our entire office building using nothing more than the built-in signal intensity meter supplied by standard 802.11 cards. While prior systems have required significant investments of human labor to build a detailed signal map, we can train our system by spending less than one minute per office or region, walking around with a laptop and recording the observed signal intensities of our building's unmodified base stations. We actually collected over two minutes of data per office or region, about 28 man-hours of effort. Using less than half of this data to train the localizer, we can localize a user to the precise, correct location in over 95% of our attempts, across the entire building. Even in the most pathological cases, we almost never localize a user any more distant than to the neighboring office. A user can obtain this level of accuracy with only two or three signal intensity measurements, allowing for a high frame rate of localization results. Furthermore, with a brief calibration period, our system can be adapted to work with previously unknown user hardware. We present results demonstrating the robustness of our system against a variety of untrained time-varying phenomena, including the presence or absence of people in the building across the day. Our system is sufficiently robust to enable a variety of location-aware applications without requiring special-purpose hardware or complicated training and calibration procedures. POSTER
Improving Object-Inlining for High-Performance Java Scientific ApplicationsOur improvements made to the compile-time object-inlining optimization increases the scientific community’s acceptance of the Java programming language in the development of high-performance scientific applications because the performance gap between Java and accepted languages such as C and Fortran has decreased. Java is a popular programming language and many developers are most efficient developing applications in Java. As a result, productivity can increase if developers are able to use Java to develop scientific applications. The object-inlining compile-time optimization used to improve run-time performance of Java applications was analyzed to discover opportunities for further improvement. Using results from the analysis, the optimization was extended to allow more objects to be inlined. As a result, the run-time performance of the applications improved. The techniques used in this extension to object-inlining enable the scientific community to be more accepting of Java in high-performance scientific computing. POSTER
Robust Repair of Polygonal ModelsPolygonal models are widely used for representing 3D objects in computer graphics. However, polygonal models often contain errors, such as holes, gaps, self-intersections, and extraneous polygons. Many applications, such as rapid prototyping and FEM computations, require a closed model with a well-defined inside and outside. A robust method for repairing polygonal models is presented, which guarantees to produce a closed surface given arbitrary models represented as polygon soups (i.e., bag of polygons). Geometric features on the original model, such as sharp edges and corners, are accurately preserved. The method is simple to implement and handles gigantic models (>50M triangles) efficiently on commodity PCs. POSTER
TCP Acceleration: Connection Handoff to the Network InterfaceThis poster presents a mechanism that reduces computation and memory bandwidth requirements of TCP processing for server workloads. The operating system hands off established connections to network interfaces capable of handling TCP. While the interface takes over TCP processing, socket buffers remain in main memory, reducing the amount of memory required on the interface. The socket interface to the application remains unchanged. A prototype web server based on an existing programmable network interface card is used for performance analysis.
Scheduling Workflows on the GridWe describe new strategies for scheduling and executing Workflow Applications on the Grid. Workflow scheduling is based on heuristic scheduling strategies that use sophisticated performance models. The Workflow is executed using a novel strategy to bind and launch the application components onto heterogeneous resources. Our experiments with two workflow applications [EMAN: a Bio-imaging application and Montage: an astronomical application] show that application of these scheduling techniques result in makespan reduction [upto 95% over random] and good load balance. POSTER
Developing Scalable Cross-Architecture Predictions of Memory Latency for Scientific ApplicationsThe gap between processor and memory speeds grows with each new generation of microprocessors, making memory hierarchy response a critical factor limiting performance. For this reason, we have been working to model the impact of memory hierarchy on program performance. This poster presents a method for characterizing the data access pattern of an application in a machine independent fashion. We collect memory reuse distance histograms for each memory reference in a program during repeated executions with small data inputs. Then, we model the structure and scaling of each reference's reuse distance histogram as a function of problem size. This approach enables us to predict the number of cache misses at the instruction level for architectures and problem sizes that we did not measure. To validate our approach, we compare loop level cache miss predictions versus measurements using hardware performance counters for several NAS benchmarks on different platforms.
The join operation, which combines tuples from multiple relations, is the most fundamental and, typically, the most expensive operation in database queries. The standard approach to join-query optimization is cost based, which requires developing a cost model, assigning an estimated cost to each query-processing plan, and searching in the space of all plans for a plan of minimal cost. Two other approaches can be found in the database-theory literature. The first approach, initially proposed by Chandra and Merlin, focused on minimizing the number of joins rather then on selecting an optimal join order. Unfortunately, this approach requires a homomorphism test, which itself is NP-complete, and has not been pursued in practical query processing. The second, more recent, approach focuses on structural properties of the query in order to find a project-join order that will minimize the size of intermediate results during query evaluation. For example, it is known that for Boolean project-join queries a project-join order can be found such that the arity of intermediate results is the treewidth of the join graph plus one.
In our work we pursue the structural-optimization approach, motivated by its success in the context of constraint satisfaction. We chose a setup in which the cost-based approach is rather ineffective; we generate project-join queries with a large number of relations over databases with small relations. We show that a standard SQL planner (we use PostgreSQL) spends an exponential amount of time on generating plans for such queries, with rather dismal results in terms of performance. We show how structural techniques, including projection pushing and join reordering, can yield exponential improvements in query execution time. Finally, we combine early projection and join reordering in an implementation of the bucket-elimination method from constraint satisfaction to obtain another exponential improvement. POSTER
Physical Implementation and Evaluation of Ad Hoc Network Routing Protocols Using Unmodified Simulation ModelsEvaluating ad~hoc network routing protocols is difficult due to the complexity of possible network topology changes and the resulting protocol interactions. The most common method of evaluation, network simulation, allows repeatable experiments but may fail to capture the precise behavior of the real system. On the other hand, testbed protocol implementation allows the real system itself to be measured but is much more time- and equipment-intensive and is generally much more difficult. To address this conflict between simulation and testbed implementation, in this work, we present the design and implementation of a new system that allows existing simulation models of ad~hoc network routing protocols to be used without modification, to create a testbed implementation of the same protocol. We have evaluated the simplicity and portability of our approach across multiple protocols and multiple operating systems through example implementations in our architecture of the DSR and AODV routing protocols in FreeBSD and Linux using the existing, unmodified ns-2 simulation models. We also illustrate the ability of the resulting protocol implementations to handle real, demanding applications by presenting a demonstration of this DSR implementation transmitting real-time video over a multihop mobile ad~hoc network including mobile robots being remotely operated based on the transmitted video stream. POSTER
Symbolic Decision Procedures for QBF
Much recent work has gone into adapting techniques that were originally developed for SAT solving to QBF solving. In particular, QBF solvers are often based on SAT solvers. Most competitive QBF solvers are search-based. In this work we explore an alternative approach to QBF solving, based on symbolic quantifier elimination. We extend some recent symbolic approaches for SAT solving to symbolic QBF solving, using various decision-diagram formalisms such as OBDDs and ZDDs. In both approaches, QBF formulas are solved by eliminating all their quantifiers. We compare our symbolic solver to several competitive search-based solvers and show that it compares favorably with search-based solvers on various benchmarks consisting of non-random formulas. POSTER
Distributed Probabilistic Roadmap of Trees for Large-Scale Motion PlanningHigh-dimensional problems arising from complex robotic systems test the limits of current motion planners and require the development of efficient distributed motion planners that take full advantage of available resources. This work shows how to effectively distribute the computation of the Probabilistic Roadmap of Trees (PRT) algorithm. By increasing the power of the local planner and by using more complex milestones, PRT can distribute its computation almost evenly among processors allowing us to solve very high-dimensional problems that exceed the resources available to single machines and cannot be efficiently addressed with existing planners. Our experiments show nearly linear speedups with eighty processors and indicate that similar speedups can be obtained with several hundred processors. POSTER
Unit Testing for Concurrent ProgramsIn our recent experience developing production programs in Java, unit testing has proven effective in assuring the reliability of code with a single thread of control. Unfortunately, we have found unit testing much less effective at assuring the reliabilty of code with multiple threads of control. Since thread scheduling is non-deterministic, a unit test can succeed on one run and fail on the next. To make test-driven development work for concurrent programs, we will develop an open source framework extending JUnit that supports the schedule-based unit testing of Java code by (i) executing unit tests according to specific schedules, (ii) providing a tool for generating a representative set of schedules, (iii) applying the Eraser Algorithm during unit testing to confirm that shared variables are accessed using a consistent locking protocol, and (iv) providing a tool to ensure that the schedules remain consistent with the program code base as it evolves. POSTER
Receptor Flexibility for Drug Design Using Collective Coordinate Expansive SpacesThe vast majority of modern drug design tools assume that the target, usually a large, complex protein, is a rigid object, and even those that allow flexibility usually do so in a very local and limited way. In fact, proteins are highly flexible and dynamic, and failure to take this flexibility into account while screening for drug candidates could mean that viable compounds go entirely overlooked. We are using a variant of a search technique originally developed for robotic path planning to perform conformational search of proteins, with the ultimate aim of finding stable structures for use as targets for ligand docking and/or de novo drug design. Modern robotic path planning algorithms were designed with complex, high-degree-of-freedom robot systems in mind, and thus are well-suited to the complexity of high-degree-of-freedom macromolecular systems. Our method combines dimensionality reduction using PCA-derived collective coordinates with an energy-guided, coverage-focused search technique to find novel, stable, conformations of the receptor for later use in molecular docking or de novo design.
Solving Molecular Kinematics with ConstraintsAdvanced analysis of receptor-ligand interactions is key to understanding cellular processes such as cell adhesion, migration, and differentiation. It requires the consideration of radically different conformations for the receptor and the ligand. We borrow the term 'kinematics' from robotics to describe the flexibility of molecules. Often, we are interested in finding out whether certain atoms of a molecule can reach predefined positions in space. For this, we need to solve a kinematics problem for molecules under spatial constraints. Our work can be used as a geometric filter for conformational analysis of molecules: instead of performing a wide conformational search and then checking which are the low-energy conformations that satisfy our constraints, we try to get to these conformations directly by developing novel procedures.
We present two procedures: an algebraic procedure that generates all solutions to spatial constraints by subdividing the cartesian parameter space, and a heuristic procedure that steers a given conformation toward a target conformation that satisfies spatial constraints by greedily adjusting torsional parameters. While our algebraic procedure is complete, our heuristic procedure's time complexity is linear in the number of torsional parameters. It can also be configured to respect additional torsion constraints due to the protein secondary structure. Our heuristic procedure can also accommodate large protein systems by reducing the protein model granularity from all-atom to backbone only.
Our geometric filters are independent of any energy models and thus can be readily integrated into various conformational search packages. This work in solving molecular kinematics with constraints can be used to select and refine protein/ligand pairs and thus offer insight, validation, and new experiments when studying biomolecular interactions. POSTER
Java has become the canonical language for introductory programming classes, but its numerous constructs and complex syntax are difficult for beginning programmers to grasp. We have developed a hierarchy of "language levels" to make Java more accessible to beginners. Each level supports progressively richer models of computation, enforced by syntax restrictions and code augmentation. The introduction of programming principles follows Rice's own introductory curriculum. Our language level facility is fully integrated with DrJava, a pedagogic development environment for Java. POSTER
In the field of formal verification, we often use automata to represent programs, and other automata to represent the properties of these programs. In order to verify that a program is consistent with its specification, we need to be able to complement automata efficiently. We investigated automata complementation by using three existing algorithms for minimizing automata. We also investigated the size of the complemented automaton and the probability for having a universal automaton under different conditions. We will present our preliminary results and will discuss work in progress. POSTER
Outdoor Navigation: Monte-Carlo LocalizationVirgil, the Rice Automated Tourguide, has a new localization algorithm. A Monte-Carlo particle filter is used to estimate Virgil's position based on GPS and odometry data.
The demand for high-quality compilation has led to research in adaptive compilation over the last few years. Adaptive compilation is an iterative process that uses runtime feedback to guide different optimization parameters. We examine the application of adaptive techniques to the procedure inlining phase of an optimizing compiler. We have built an inliner that exposes the inlining decisions to adaptive control using an elegant and simple parameterization. Preliminary results using this tool indicate that searching for a good set of program-specific inlining decisions can produce a substantial performance improvement, particularly when compared to a static decision-making algorithm. POSTER
An Efficient Programmable 10 Gigabit Ethernet Network Interface CardThis research examines the hardware and software mechanisms necessary for an efficient programmable 10 Gigabit Ethernet network interface card. Network interface processing requires support for a large volume of frame data, low-latency access to frame metadata, and high computational requirements for frame processing. Our research proposes three mechanisms to improve programmable network interface efficiency. First, a partitioned memory organization enables low-latency access to control data and high-bandwidth access to frame contents from a high-capacity memory. Second, a novel distributed task-queue mechanism enables parallelization of frame processing across many low-frequency cores, while using software to maintain total frame ordering. Finally, the addition of two new atomic read-modify-write instructions reduces frame ordering overheads by 50%. Combining these hardware and software mechanisms enables a network interface card to saturate a full-duplex 10 Gbps Ethernet link by utilizing 6 processor cores and 4 banks of on-chip SRAM operating at 170 MHz, along with external 500 MHz GDDR DRAM. POSTER