Python gets a big data boost from DARPA

Continuum Analytics will extend the widely used NumPy library for distributed systems

DARPA encouraged the funding recipients to release products based on their work and to release their code as open source, so the innovations can be widely used and supported outside of the military. The Defense Department is trying to avoid commissioning software that gets used only by the military, which may then become prohibitively time-consuming and expensive to update.

"With big data systems, you find new things you want to look at every week. You can't wait for that process any more," Wang said.

Headquartered in Austin, Texas, Continuum Analytics offers add-on products and services that help organizations use Python for data analysis. The company will use the DARPA money to continue development of a number of add-on technologies it has been working on, including Blaze, Numba and Bokeh, all of which provide advanced features not offered in Python itself.

At the PyData 2012 conference in New York last November, Continuum engineer Stephen Diehl discussed how Blaze would operate, describing the library as a potential successor to NumPy.

NumPy has limitations that Blaze seeks to correct, Diehl said. Most notably, NumPy only offers the ability to store a series of numbers as one continuous string of data. "It is a single buffer, a continuous block of memory. That may be OK for some uses, but the real world is more heterogenous," he said in a presentation.

Blaze can "endow [data] with structure," Diehl said. It will also allow programmers to establish multidimensional arrays and store these arrays in a distributed architecture, across multiple machines.

Bokeh is a Python library that can visually render large data sets using the HTML 5 Canvas tag, while Numba is a Python compiler that recognizes NumPy calls. Numba is included in Continuum's flagship product, Anaconda, a Python distribution with a number of premium data analysis features.

