DistArray 0.3: development release
Documentation: http://distarray.readthedocs.org
License: Three-clause BSD
Python versions: 2.7 and 3.3
OS support: *nix and Mac OS X
What is DistArray?
DistArray aims to bring the strengths of NumPy to data-parallel high-performance computing. It provides distributed multi-dimensional NumPy-like arrays and distributed ufuncs, distributed IO capabilities, and can integrate with external distributed libraries, like Trilinos. DistArray works with NumPy and builds on top of it in a flexible and natural way.
0.3 Release
This is the second development release.
Noteworthy improvements in 0.3 include:
- support for distributions over a subset of processes;
- distributed reductions with a simple NumPy-like API: da.sum(axis=3) ;
- an apply() function for easier computation with process-local data;
- performance improvements and reduced communication overhead;
- cleanup, renamings, and refactorings;
- test suite improvements for parallel testing; and
- start of a more frequent release schedule.
DistArray is not ready for real-world use. We want to get input from the larger scientific-Python community to help drive its development. The API is changing rapidly and we are adding many new features on a fast timescale. DistArray is currently implemented in pure Python for maximal flexibility. Performance improvements are ongoing.
Existing features
Distarray:
- has a client-engine (or master-worker) process design -- data resides on the worker processes, commands are initiated from master;
- allows full control over what is executed on the worker processes and integrates transparently with the master process;
- allows direct communication between workers bypassing the master process for scalability;
- integrates with IPython.parallel for interactive creation and exploration of distributed data;
- supports distributed ufuncs (currently without broadcasting);
- builds on and leverages MPI via MPI4Py in a transparent and user-friendly way;
- supports NumPy-like structured multidimensional arrays;
- has basic support for unstructured arrays;
- supports user-controllable array distributions across workers (block, cyclic, block-cyclic, and unstructured) on a per-axis basis;
- has a straightforward API to control how an array is distributed;
- has basic plotting support for visualization of array distributions;
- separates the array’s distribution from the array’s data -- useful for slicing, reductions, redistribution, broadcasting, and other operations;
- implements distributed random arrays;
- supports .npy-like flat-file IO and hdf5 parallel IO (via h5py); leverages MPI-based IO parallelism in an easy-to-use and transparent way; and
- supports the distributed array protocol [protocol], which allows independently developed parallel libraries to share distributed arrays without copying, analogous to the PEP-3118 new buffer protocol.
Planned features and roadmap
- Distributed slicing
- Re-distribution methods
- Integration with Trilinos [Trilinos] and other packages [petsc] that subscribe to the distributed array protocol [protocol]
- Distributed broadcasting
- Distributed fancy indexing
- MPI-only communication for non-interactive deployment on clusters and supercomputers
- Lazy evaluation and deferred computation for latency hiding
- Out-of-core computations
- Extensive examples, tutorials, documentation
- Support for distributed sorting and other non-trivial distributed algorithms
- End-user control over communication and temporary array creation, and other performance aspects of distributed computations
History
Brian Granger started DistArray as a NASA-funded SBIR project in 2008. Enthought picked it up as part of a DOE Phase II SBIR [SBIR] to provide a generally useful distributed array package. It builds on IPython, IPython.parallel, NumPy, MPI, and interfaces with the Trilinos suite of distributed HPC solvers (via PyTrilinos [Trilinos]).
[protocol] | (1, 2) http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/ |
[Trilinos] | (1, 2) http://trilinos.org/ |
[petsc] | http://www.mcs.anl.gov/petsc/ |
[SBIR] | http://www.sbir.gov/sbirsearch/detail/410257 |