DistArray - Distributed array computing for Python
Kurt Smith, Robert Grant
- Audience level:
- Intermediate
- Category:
- Python Libraries
Description
DistArray is an up-and-coming Python package providing distributed NumPy-like multidimensional arrays, ufuncs, and IO to bring the strengths of NumPy to data-parallel high-performance computing (HPC). We build on widely-used Python HPC libraries and have introduced the Distributed Array Protocol to exchange arrays without copying with external distributed libraries like Trilinos.
Abstract
DistArray provides general multidimensional NumPy-like distributed arrays to Python. It intends to bring the strengths of NumPy to data-parallel high-performance computing. DistArray has a similar API to NumPy.
DistArray is for users who
- know and love Python and NumPy,
- want to scale NumPy to larger distributed datasets,
- want to interactively play with distributed data but also
- want to run batch-oriented distributed programs,
- want an easier way to drive and coordinate existing MPI-based codes,
- have a lot of data that may already be distributed,
- want a global view ("think globally") with local control ("act locally"),
- need to tap into existing parallel libraries like Trilinos, PETSc, or Elemental, and
- want the interactivity of IPython and the performance of MPI.
DistArray is designed to work with other packages that implement the [Distributed Array Protocol]( Distributed Array Protocol).
DistArray was started by Brian Granger in 2008 and is currently being developed at Enthought by a team led by Kurt Smith, in partnership with Bill Spotz from Sandia's (Py)Trilinos project.