47. Datablox

Audience level:
Big Data
March 11th 8:40 a.m. – 8:45 a.m.


Datablox is an open source, python based framework for building Big Data applications. It is designed around the idea of combining small reusable components to build systems. We present the design of the framework and talk about how it can be used by application developers by giving a few use cases.


Datablox is an open source framework for building Big Data applications. The philosophy behind the framework is building small reusable "blocks" and "wiring" them together to build systems. Individual blocks are written in Python and the framework provides a DSL for connecting them. This enables programmers to write a toolbox of general purpose blocks and users (including non-programmers) can wire them together to build systems. In addition to reuse, this approach also allows Datablox runtime to distribute applications across various nodes, ensure fault-tolerance and scale automatically based on loads. A block includes not only code but also external software dependencies like databases and libraries. Datablox tracks these dependencies and uses the Engage framework to deploy the required software stack automatically. Datablox is designed to work on large scales: either private clusters or the cloud (e.g. Amazon Web Services and Rackspace), and to work with online or batch data.

Datablox is similar to analytics frameworks such as Twitter Storm and Apache Flume, but allows more general application architectures.

We will present the design of Datablox framework, describe the kind of applications which can be written in it, talk about how it can be used by programmers and give a few use cases.