Friday, March 28, 2008

Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing

Mar 2008


(1 hr 14 min)

Gene Cooperman's talk takes us on a journey through three varied, but interconnected topics.

First,

our research lab has engaged in a series of disk-based computations extending over five years. Disks have traditionally been used for filesystems, for virtual memory, and for databases. Disk-based computation opens up an important fourth use: an abstraction for multiple disks that allows parallel programs to treat them in a manner similar to RAM. The key observation is that 50 disks have approximately the same parallel bandwidth as a _single_ RAM subsystem. This leaves latency as the primary concern.


A second key is

the use of techniques like delayed duplicate detection to avoid latency. For example, hash accesses accesses can be saved (even saved on disk), until there are sufficiently many pending accesses to use standard streaming techniques. We have designed a library for search problems that exploits the high parallel bandwidth while hiding the latency. We build abstractions for search that employ parallel disk-based hash arrays with the same speed as a single hash array in a single RAM subsystem. In the case of Rubik's cube, we exploited this mechanism by using seven terabytes of distributed disk in a search problem that showed that 26 moves suffice to solve Rubik's cube. Our initial efforts emphasize idempotent operations, so that we can easily recover from hardware or software faults.


We next intend to:

apply a more general solution for fault recovery: checkpointing. This separate effort in our lab has now produced a mature, robust user-level checkpointing program has now matured. The package works successfully in tests on OpenMPI, MPICH-2, OpenMP, and parallel iPython (used in SciPy and NumPy). Our DMTCP package transparently checkpoints parallel, multi-threaded processes, with no modification either to the operating system or to the application binaries. Extrapolating from current experiments, we estimate that we can checkpoint a 1,000 node parallel computation in a matter of minutes. We are currently searching for a testbed on which to demonstrate this scalability.

Thursday, March 27, 2008

Optimization for Machine Learning

Mar 2008


(56 min)

S.V.N. Vishwanathan, the Research Scientist, presents:

Regularized risk minimization is at the heart of many machine learning algorithms. The underlying objective function to be minimized is convex, and often non-smooth. Classical optimization algorithms cannot handle this efficiently. In this talk we present two algorithms for dealing with convex non-smooth objective functions.


  1. First, we extend the well known BFGS quasi-Newton algorithm to handle non-smooth functions;
  2. Second, we show how bundle methods can be applied in a machine learning context. We present both theoretical and experimental justification of our algorithms.

Sunday, March 23, 2008

Authors@Google: Dr. Robert Zubrin

Mar 2008


(1 hr 32 min)

Dr. Robert Zubrin presented:

In his book "Energy Victory: Winning the War on Terror by Breaking Free of Oil," world-renowned engineer and best-selling author Robert Zubrin lays out a bold plan for breaking the economic stranglehold that the OPEC oil cartel has on our country and the world. Zubrin presents persuasive evidence that our decades-long relationship with OPEC has resulted in the looting of our economy, the corruption of our political system, and now the funding and protection of terrorist regimes and movements that are committed to our destruction.


More information at: http://energyvictory.net/