UC Berkeley EECS Technical ReportsThe UC Berkeley EECS Technical Memorandum Series provides a dated archive of EECS research. It includes Ph.D. theses and master's reports as well as technical documents that complement traditional publication media such as journals. For example, technical reports may document work in progress, early versions of results that are eventually published in more traditional media, and supplemental information such as long proofs, software documentation, code listings, or elaborated examples.http://www.eecs.berkeley.edu/Pubs/TechRpts/2014-11-23T02:07:05Z2014-11-23T02:07:05ZenMicromechanical Disk Array for Enhanced Frequency Stability Against Bias Voltage FluctuationsLingqi Wuhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-182.html2014-11-20T08:00:00Z2014-11-20T08:00:00Z<p>Micromechanical Disk Array for Enhanced Frequency Stability Against Bias Voltage Fluctuations</p>
<p>
Lingqi Wu</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-182<br>
November 20, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-182.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-182.pdf</a></p>
<p>High-Q capacitive-gap transduced micromechanical resonators constructed via MEMS technology have recently taken center-stage among potential next generation timing and frequency reference devices that might satisfy present and future applications. Notably, oscillators referenced to very high Q capacitive-gap transduced MEMS resonators have already made inroads into the low-end timing market, and research devices have been reported to satisfy GSM phase noise requirements while only consuming less than 80 µW of power. Meanwhile, such devices have also posted some impressively low acceleration sensitivities, with measured sensitivity vectors less than 0.5 ppb/g. Interestingly, theory predicts that the acceleration sensitivity of these devices should be even better than this, if not for frequency instability due to electrical stiffness. Indeed, electrical stiffness is predicted to set lower limits on not only short-term stability, but long-term as well, especially when one considers frequency variations due to charging or temperature-induced geometric shifts. Pursuant to circumventing electrical stiffness-based instability, this work introduces a more circuit design-friendly equivalent circuit model that uses negative capacitance to capture the influence of electrical stiffness on device and circuit behavior. This new circuit model reveals that capacitive-gap transduced micromechanical resonators can offer better stability against electrical-stiffness-based frequency instability when used in large mechanically-coupled arrays. Measurements confirm that a 215-MHz 50-resonator disk array achieves a 3.5× enhancement in frequency stability against dc-bias voltage variation over a stand-alone single disk counterpart. The new equivalent circuit predicts the measurement data and its trends quite well, creating good confidence for using this circuit to guide new oscillator and filter designs that, depending on the application, can enhance or suppress electrical stiffness.</p>
<p><strong>Advisor:</strong> Clark Nguyen</p>2014-11-20T08:00:00ZSynthesis of Layout Engines from Relational ConstraintsThibaud HottelierRas Bodikhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-181.html2014-11-19T08:00:00Z2014-11-19T08:00:00Z<p>Synthesis of Layout Engines from Relational Constraints</p>
<p>
Thibaud Hottelier and Ras Bodik</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-181<br>
November 19, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-181.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-181.pdf</a></p>
<p>We present an algorithm for synthesizing efficient document layout engines from relational specifications. These specifications are high level in that a single specification can produce engines for distinct layout situations. Specifically, our engines are functional attribute grammars, while the specifications are relational attribute grammars. By synthesizing functions from relations (constraints), we obviate the need for constraint solving at runtime, shifting this cost to compilation time. Intuitively, the synthesized functions execute only value propagations and bypass the backtracking search performed by constraint solvers. By working on hierarchical, grammar-induced specifications, we make synthesis applicable to previously intractable relational specifications. We decompose them into smaller subproblems which are tackled in isolation by off-the-shelf synthesis procedures. The functions thus generated are subsequently composed into an attribute grammar which satisfies the relational specification. Our experiments show that we can generate layout engines for non-trivial data visualizations, and that our synthesized engines are between 39- to 200-times faster than general-purpose constraint solvers.</p>2014-11-19T08:00:00ZLong-term Recurrent Convolutional Networks for Visual Recognition and DescriptionJeffrey DonahueLisa HendricksSergio GuadarramaMarcus RohrbachSubhashini VenugopalanKate SaenkoTrevor Darrellhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.html2014-11-17T08:00:00Z2014-11-17T08:00:00Z<p>Long-term Recurrent Convolutional Networks for Visual Recognition and Description</p>
<p>
Jeffrey Donahue, Lisa Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko and Trevor Darrell</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-180<br>
November 17, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.pdf</a></p>
<p>Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep'', are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.</p>2014-11-17T08:00:00ZA Framework for Composing High-Performance OpenCL from Python DescriptionsMichael Andersonhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-177.html2014-11-14T08:00:00Z2014-11-14T08:00:00Z<p>A Framework for Composing High-Performance OpenCL from Python Descriptions</p>
<p>
Michael Anderson</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-177<br>
November 14, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-177.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-177.pdf</a></p>
<p>Parallel processors have become ubiquitous; most programmers today have access to parallel hardware such as multi-core processors and graphics processors. This has created an implementation gap, where efficiency programmers with knowledge of hardware details can attain high performance by exploiting parallel hardware, while productivity programmers with application-level knowledge may not understand low-level performance trade-offs. Ideally, we would like to be able to write programs in productivity languages such as Python or MATLAB, and achieve performance comparable to the best hand-tuned code.
<p>One approach toward achieving this ideal is to write libraries that get high efficiency on certain operations, and call these libraries from the productivity environment. We propose a framework that addresses two problems with this approach: that it fails to fuse operations for efficiency, and that it may not consider runtime information such as shapes and sizes of data structures. With our framework, efficiency programmers write and/or generate customized OpenCL snippets at runtime and the framework automatically fuses, compiles, and executes these operations based on a Python description. </p>
<p>We evaluate the framework with case studies of two very different applications: space-time adaptive radar processing and optical flow. For a space-time adaptive radar processing application, our framework's implementation is competitive with a hand-coded implementation that uses a vendor-optimized library. For optical flow, a computer vision application, the framework achieves frame rates that are between 0.5x and 0.97x hand-coded OpenCL performance.</p></p>
<p><strong>Advisor:</strong> Kurt Keutzer</p>2014-11-14T08:00:00ZLarge-Scale, Low-Latency State Estimation Of Cyber- physical Systems With An Application To Traffic EstimationTimothy Hunterhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-176.html2014-11-09T08:00:00Z2014-11-09T08:00:00Z<p>Large-Scale, Low-Latency State Estimation Of Cyber- physical Systems With An Application To Traffic Estimation</p>
<p>
Timothy Hunter</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-176<br>
November 9, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-176.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-176.pdf</a></p>
<p>Large physical systems are increasingly prevalent, and designing estimation strategies for them has become both a practical necessity and a complicated problem. Their sensing infrastructure is usually ad-hoc, and the estimate of interest is often a complex function of the data. At the same time, computing power is rapidly becoming a commodity. We show with the study of two estimation tasks in urban transportation how the proper design of algorithms can lead to significant gains in scalability compared to existing solutions.
<p>A common problem in trip planning is to make a given deadline such as arriving at the airport within an hour. Existing routing services optimize for the expected time of arrival, but do not provide the most reliable route, which accounts for the variability in travel times. Providing statistical information is even harder for trips in cities which undergo a lot of variability. This thesis aims at building scalable algorithms for inferring statistical distributions of travel time over very large road networks, using GPS points from vehicles in real-time. We consider two complementary algorithms that differ in the characteristics of the GPS data input, and in the complexity of the model: a simpler streaming Expectation-Maximization algorithm that leverages very large volumes of extremely noisy data, and a novel Markov Model-Gaussian Markov Random Field that extracts global statistical correlations from high-frequency, privacy-preserving trajectories. </p>
<p>These two algorithms have been implemented and deployed in a pipeline that takes streams of GPS data as input, and produces distributions of travel times accessible as output. This pipeline is shown to scale on a large cluster of machines and can process tens of millions of GPS observations from an area that comprises hundreds of thousands of road segments. This is to our knowledge the first research framework that considers in an integrated fashion the problem of statistical estimation of traffic at a very large scale from streams of GPS data.</p></p>
<p><strong>Advisor:</strong> Alexandre Bayen and Pieter Abbeel</p>2014-11-09T08:00:00ZBounds on the Energy Consumption of Computational KernelsAndrew Gearharthttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-175.html2014-10-23T07:00:00Z2014-10-23T07:00:00Z<p>Bounds on the Energy Consumption of Computational Kernels</p>
<p>
Andrew Gearhart</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-175<br>
October 23, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-175.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-175.pdf</a></p>
<p>As computing devices evolve with successive technology generations, many machines target either the mobile or high-performance computing/datacenter environments. In both of these form factors, energy consumption often represents the limiting factor on hardware and software efficiency. On mobile devices, limitations in battery technology may reduce possible hardware capability due to a tight energy budget. On the other hand, large machines such as datacenters and supercomputers have budgets directly related to energy consumption and small improvements in energy efficiency can significantly reduce operating costs. Such challenges have influenced research upon the impact of applications, operating and runtime systems upon energy consumption. Until recently, little consideration was given to the potential energy efficiency of algorithms themselves.
<p>A dominant idea within the high-performance computing (HPC) community is that applications can be decomposed into a set of key computational problems, called kernels. Via automatic performance tuning and new algorithms for many kernels, researchers have successfully demonstrated performance improvements on a wide variety of machines. Motivated by the large and increasingly growing dominant cost (in time and energy) of moving data, algorithmic improvements have been attained by proving lower bounds on the data movement required to solve a computational problem, and then developing communication-optimal algorithms that attain these bounds. </p>
<p>This thesis extends previous research on communication bounds and computational kernels by presenting bounds on the energy consumption of a large class of algorithms. These bounds apply to sequential, distributed parallel and heterogeneous machine models and we detail methods to further extend these models to larger classes of machines. We argue that the energy consumption of computational kernels is usually predictable and can be modeled via linear models with a handful of terms. Thus, these energy models (and the accompanying bounds) may apply to many HPC applications when used in composition. </p>
<p>Given energy bounds, we analyze the implications of such results under additional constraints, such as an upper bound on runtime, and also suggest directions for future research that may aid future development of a hardware/software co-tuning process. Further, we present a new model of energy efficiency, Cityscape, that allows hardware designers to quickly target areas for improvement in hardware attributes. We believe that combining our bounds with other models of energy consumption may provide a useful method for such co-tuning; i.e. to enable algorithm and hardware architects to develop provably energy-optimal algorithms on customized hardware platforms.</p></p>
<p><strong>Advisor:</strong> James Demmel and Tarek I. Zohdi</p>2014-10-23T07:00:00ZFast 4D Sheared Filtering for Interactive Rendering of Distribution EffectsLing-Qi YanSoham Uday MehtaRavi RamamoorthiFredo Durandhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-174.html2014-10-23T07:00:00Z2014-10-23T07:00:00Z<p>Fast 4D Sheared Filtering for Interactive Rendering of Distribution Effects</p>
<p>
Ling-Qi Yan, Soham Uday Mehta, Ravi Ramamoorthi and Fredo Durand</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-174<br>
October 23, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-174.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-174.pdf</a></p>
<p>Soft shadows, depth of field, and diffuse global illumination are common distribution effects, usually rendered by Monte Carlo ray tracing. Physically correct, noise-free images can require hundreds or thousands of ray samples per pixel, and take a long time to compute. Recent approaches have exploited sparse sampling and filtering; the filtering is either fast (axis-aligned), but requires more input samples, or needs fewer input samples but is very slow (sheared). We present a new approach for fast sheared filtering on the GPU. Our algorithm factors the 4D sheared filter into four 1D filters. We derive complexity bounds for our method, showing that the per-pixel complexity is reduced from O(n^2 l^2) to O(nl), where n is the linear filter width (filter size is O(n^2)) and l is the (usually very small) number of samples for each dimension of the light or lens per pixel (spp is l^2). We thus reduce sheared filtering overhead dramatically. We demonstrate rendering of depth of field, soft shadows and diffuse global illumination at interactive speeds. We reduce the number of samples needed by 5 − 8×, compared to axis-aligned filtering, and framerates are 4× faster for equal quality.</p>2014-10-23T07:00:00ZMultiSE: Multi-Path Symbolic Execution using Value SummariesKoushik SenGeorge NeculaLiang GongPhilip Wontae Choihttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-173.html2014-10-17T07:00:00Z2014-10-17T07:00:00Z<p>MultiSE: Multi-Path Symbolic Execution using Value Summaries</p>
<p>
Koushik Sen, George Necula, Liang Gong and Philip Wontae Choi</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-173<br>
October 17, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-173.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-173.pdf</a></p>2014-10-17T07:00:00ZA Longitudinal and Cross-Dataset Study of Internet Latency and Path StabilityMosharaf ChowdhuryRachit AgarwalVyas SekarIon Stoicahttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-172.html2014-10-11T07:00:00Z2014-10-11T07:00:00Z<p>A Longitudinal and Cross-Dataset Study of Internet Latency and Path Stability</p>
<p>
Mosharaf Chowdhury, Rachit Agarwal, Vyas Sekar and Ion Stoica</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-172<br>
October 11, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-172.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-172.pdf</a></p>
<p>We present a retrospective and longitudinal study of Internet latency and path stability using three large-scale traceroute datasets collected over several years: Ark and iPlane from 2008 to 2013 and a proprietary CDN’s traceroute dataset spanning 2012 and 2013. Using these different “lenses”, we revisit classical properties of Internet paths such as end-to- end latency, stability, and of routing graph structure. Iterative data analysis at this scale is challenging given the idiosyncrasies of different collection tools, measurement noise, and the diverse analysis we desire. To this end, we leverage re- cent big-data techniques to develop a scalable data analysis toolkit, Hummus, that enables rapid and iterative analysis on large traceroute measurement datasets. Our key findings are: (1) overall latency seems to be decreasing; (2) some geographical regions still have poor latency; (3) route stability (prevalence and persistence) is increasing; and (4) we ob- serve a mixture of effects in the routing graph structure with high-degree ASes rapidly increasing in degree and lower- degree ASes forming denser “communities”.</p>2014-10-11T07:00:00ZTypeDevil: Dynamic Type Inconsistency Analysis for JavaScriptMichael PradelParker SchuhKoushik Senhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-171.html2014-10-07T07:00:00Z2014-10-07T07:00:00Z<p>TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript</p>
<p>
Michael Pradel, Parker Schuh and Koushik Sen</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-171<br>
October 7, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-171.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-171.pdf</a></p>
<p>Dynamic languages, such as JavaScript, give programmers the freedom to ignore types, and enable them to write concise code in short time. Despite this freedom, many programs follow implicit type rules, for example, that a function has a particular signature or that a property has a particular type. Violations of such implicit type rules often correlate with problems in the program. This paper presents TypeDevil, a mostly dynamic analysis that warns developers about inconsistent types. The key idea is to assign a set of observed types to each variable, property, and function, to merge types based in their structure, and to warn developers about variables, properties, and functions that have inconsistent types. To deal with the pervasiveness of polymorphic behavior in real-world JavaScript programs, we present a set of techniques to remove spurious warnings and to merge related warnings. Applying TypeDevil to widely used benchmark suites and real-world web applications reveals 15 problematic type inconsistencies, including correctness problems, performance problems, and dangerous coding practices.</p>2014-10-07T07:00:00ZProvably Efficient Algorithms for Numerical Tensor AlgebraEdgar Solomonikhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-170.html2014-09-30T07:00:00Z2014-09-30T07:00:00Z<p>Provably Efficient Algorithms for Numerical Tensor Algebra</p>
<p>
Edgar Solomonik</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-170<br>
September 30, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-170.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-170.pdf</a></p>
<p>This thesis targets the design of parallelizable algorithms and communication-efficient parallel schedules for numerical linear algebra as well as computations with higher-order tensors. Communication is a growing bottleneck in the execution of most algorithms on parallel computers, which manifests itself as data movement both through the network connecting different processors and through the memory hierarchy of each processor as well as synchronization between processors. We provide a rigorous theoretical model of communication and derive lower bounds as well as algorithms in this model. Our analysis concerns two broad areas of linear algebra and of tensor contractions. We demonstrate the practical quality of the new theoretically-improved algorithms by presenting results which show that our implementations outperform standard libraries and traditional algorithms.
<p>We model the costs associated with local computation, communication, and synchronization. We introduce a new technique for deriving lower bounds on tradeoffs between these costs and apply them to algorithms in both dense and sparse linear algebra as well as graph algorithms. These lower bounds are attained by what we refer to as 2.5D algorithms, which we give for matrix multiplication, Gaussian elimination, QR factorization, the symmetric eigenvalue problem, and the Floyd-Warshall all-pairs shortest-paths algorithm. 2.5D algorithms achieve lower interprocessor bandwidth cost by exploiting auxiliary memory. Algorithms employing this technique are well known for matrix multiplication, and have been derived in the BSP model for LU and QR factorization, as well as the Floyd-Warshall algorithm. We introduce alternate versions of LU and QR algorithms which have measurable performance improvements over their BSP counterparts, and we give the first evaluations of their performance. For the symmetric eigenvalue problem, we give the first 2.5D algorithms, additionally solving challenges with memory-bandwidth efficiency that arise for this problem. We also give a new memory-bandwidth efficient algorithm for Krylov subspace methods (repeated multiplication of a vector by a sparse-matrix). </p>
<p>The latter half of the thesis contains algorithms for higher-order tensors, in particular tensor contractions. We introduce Cyclops Tensor Framework, which provides an automated mechanism for network-topology-aware decomposition and redistribution of tensor data. It leverages 2.5D matrix multiplication to perform tensor contractions communication-efficiently. The framework is capable of exploiting symmetry and antisymmetry in tensors and utilizes a distributed packed-symmetric storage format. Finally, we consider a theoretically novel technique for exploiting tensor symmetry to lower the number of multiplications necessary to perform a contraction via computing some redundant terms that allow preservation of symmetry and then cancelling them out with low-order cost. We analyze the numerical stability and communication efficiency of this technique and give adaptations to antisymmetric and Hermitian matrices. This technique has promising potential for accelerating coupled-cluster (electronic structure) methods both in terms of computation and communication cost.</p></p>
<p><strong>Advisor:</strong> James Demmel</p>2014-09-30T07:00:00ZHigh Performance Machine Learning through Codesign and RoofliningHuasha ZhaoJohn F. Cannyhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-169.html2014-09-27T07:00:00Z2014-09-27T07:00:00Z<p>High Performance Machine Learning through Codesign and Rooflining</p>
<p>
Huasha Zhao and John F. Canny</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-169<br>
September 27, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-169.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-169.pdf</a></p>
<p>Machine learning (ML) is a cornerstone of the new data revolution. Most attempts to scale machine learning to massive datasets focus on parallelization on computer clusters. The BIDMach project instead explores the untapped potential (especially from GPU and SIMD hardware) inside individual machines. Through careful codesign of algorithms and ``rooflining'', we have demonstrated multiple orders of magnitude speedup over other systems. In fact, BIDMach running on a single machine exceeds the performance of cluster systems on most common ML tasks, and has run computer-intensive tasks on 10-terabyte datasets. We can further show that BIDMach runs at close to the theoretical limits imposed by CPU/GPU, memory or network bandwidth. BIDMach includes several innovations to make the data modeling process more agile and effective: likelihood ``mixins'' and interactive modeling using Gibbs sampling.
<p>These results are very encouraging but the greatest potential for future hardware-leveraged machine learning appears to be on MCMC algorithms: We can bring the performance of sample-based Bayesian inference up close to symbolic methods. This opens the possibility for a general-purpose ``engine'' for machine learning whose performance matches specialized methods. We demonstrate this approach on a specific problem (Latent Dirichlet Allocation), and discuss the general case. </p>
<p>Finally we explore scaling ML to clusters. In order to benefit from parallelization, rooflined nodes require very high network bandwidth. We show that the aggregators (reducers) on other systems do not scale, and are not adequate for this task. We describe two new approaches, butterfly mixing and ``Kylix'' which cover the requirements of machine learning and graph algorithms respectively. We give roofline bounds for both approaches.</p></p>
<p><strong>Advisor:</strong> John F. Canny</p>2014-09-27T07:00:00ZA Hybrid Dynamical Systems Theory for Legged LocomotionSamuel Burdenhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-167.html2014-09-25T07:00:00Z2014-09-25T07:00:00Z<p>A Hybrid Dynamical Systems Theory for Legged Locomotion</p>
<p>
Samuel Burden</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-167<br>
September 25, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-167.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-167.pdf</a></p>
<p>Legged locomotion arises from intermittent contact between limbs and terrain. Since it emerges from a closed-loop interaction, reductionist study of body mechanics and terrestrial dynamics in isolation have failed to yield comprehensive strategies for forward- or reverse-engineering locomotion. Progress in locomotion science stands to benefit a diverse array of engineers, scientists, and clinicians working in robotics, neuromechanics, and rehabilitation. Eschewing reductionism in favor of a holistic study, we seek a systems-level theory tailored to the dynamics of legged locomotion.
<p>Parsimonious mathematical models for legged locomotion are hybrid, as the system state undergoes continuous flow through limb stance and swing phases punctuated by instantaneous reset at discrete touchdown and liftoff events. In their full generality, hybrid systems can exhibit properties such as nondeterminism and orbital instability that are inconsistent with observations of organismal biomechanics. By specializing to a class of intrinsically self-consistent dynamical models, we exclude such pathologies while retaining emergent phenomena that arise in closed-loop studies of locomotion. </p>
<p>Beginning with a general class of hybrid control systems, we construct an intrinsic state-space metric and derive a provably-convergent numerical simulation algorithm. This resolves two longstanding problems in hybrid systems theory: non-trivial comparison of states from distinct discrete modes, and accurate simulation up to and including Zeno events. Focusing on models for periodic gaits, we prove that isolated discrete transitions generically lead the hybrid dynamical system to reduce to an equivalent classical (smooth) dynamical system. This novel route to reduction in models of rhythmic phenomena demonstrates that the closed-loop interaction between limbs and terrain is generally simpler than either taken in isolation. Finally, we show that the non-smooth flow resulting from arbitrary footfall timing possesses a non-classical (Bouligand) derivative. This provides a foundation for design and control of multi-legged maneuvers. Taken together, these contributions yield a unified analytical and computational framework -- a hybrid dynamical systems theory -- applicable to legged locomotion.</p></p>
<p><strong>Advisor:</strong> S. Shankar Sastry</p>2014-09-25T07:00:00ZA Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic SpecificationsDorsa SadighEric KimSamuel CooganS. Shankar SastrySanjit A. Seshiahttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-166.html2014-09-20T07:00:00Z2014-09-20T07:00:00Z<p>A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications</p>
<p>
Dorsa Sadigh, Eric Kim, Samuel Coogan, S. Shankar Sastry and Sanjit A. Seshia</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-166<br>
September 20, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-166.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-166.pdf</a></p>
<p>We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP that incorporates a deterministic Rabin automaton generated from the desired LTL property. The reward function of the product MDP is defined from the acceptance condition of the Rabin automaton. This construction allows us to apply techniques from learning theory to the problem of synthesis for LTL specifications even when the transition probabilities are not known a priori. We prove that our method is guaranteed to find a controller that satisfies the LTL property with probability one if such a policy exists, and we suggest empirically with a case study in traffic control that our method produces reasonable control strategies even when the LTL property cannot be satisfied with probability one.</p>2014-09-20T07:00:00ZAccuracy of the s-step Lanczos method for the symmetric eigenproblemErin CarsonJames Demmelhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-165.html2014-09-17T07:00:00Z2014-09-17T07:00:00Z<p>Accuracy of the s-step Lanczos method for the symmetric eigenproblem</p>
<p>
Erin Carson and James Demmel</p>
<p>
EECS Department<br>
University of California, Berkeley<br>
Technical Report No. UCB/EECS-2014-165<br>
September 17, 2014</p>
<p>
<a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-165.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-165.pdf</a></p>
<p>The $s$-step Lanczos method is an attractive alternative to the classical Lanczos method as it enables an $O(s)$ reduction in data movement over a fixed number of iterations. This can significantly improve performance on modern computers. In order for $s$-step methods to be widely adopted, it is important to better understand their error properties. Although the $s$-step Lanczos method is equivalent to the classical Lanczos method in exact arithmetic, empirical observations demonstrate that it can behave quite differently in finite precision.
<p>In this paper, we demonstrate that bounds on accuracy for the finite precision Lanczos method given by Paige [\emph{Lin. Alg. Appl.}, 34:235--258, 1980] can be extended to the $s$-step Lanczos case assuming a bound on the condition numbers of the computed $s$-step bases. Our results confirm theoretically what is well-known empirically: the conditioning of the Krylov bases plays a large role in determining finite precision behavior. In particular, if one can guarantee that the basis condition number is not too large throughout the iterations, the accuracy and convergence of eigenvalues in the $s$-step Lanczos method should be similar to those of classical Lanczos. This indicates that, under certain restrictions, the $s$-step Lanczos method can be made suitable for use in many practical cases.</p></p>2014-09-17T07:00:00Z