Rationale ========= Why Arbalest? ------------- * Lightweight library over heavyweight frameworks that can be composed with existing data tools * Python is a `de facto `_ `lingua `_ `franca `_ for data science * Configuration as code * Batteries included, for example, strategies for ingesting time series or sparse data (`arbalest.pipeline`), or integration with an existing pipeline topology (`arbalest.contrib`) Use Cases --------- Arbalest is not a MapReduce framework, but rather designed to make Amazon Redshift (and all its strengths) easy to use with typical data workflows and tools. Here are a few examples: * You are already using a `MapReduce `_ `framework `_ to process data in S3. Arbalest could make the results of an `Elastic MapReduce `_ job queryable with SQL in Redshift. You can then hand off to Arbalest to define additional ETL in plain old SQL. * You treat S3 as a catch all data sink, perhaps persisting JSON messages or events from a message system like `Kafka `_ or RabbitMQ. Arbalest can expose some or all of this data into a data warehouse using Redshift. The ecosystem of SQL is now available for dashboards, reports, ad-hoc analysis. * You have complex pipelines that could benefit from a fast, SQL queryable data sink. Arbalest has support out of the box (`arbalest.contrib`) to integrate with tools like `Luigi `_ to be part of a multi-dependency, multi-step pipeline topology.