Arbalest is a Python data pipeline orchestration library for Amazon S3 and Amazon Redshift. It takes care of the heavy lifting of making data queryable at scale in AWS.

It takes care of:

  • Ingesting data into Amazon Redshift
  • Schema creation and validation
  • Creating highly available and scalable data import strategies
  • Generating and uploading prerequisite artifacts for import
  • Running data import jobs
  • Orchestrating idempotent and fault tolerant multi-step ETL pipelines with SQL

Getting started is easy with pip:

pip install arbalest

Arbalest is licensed under the MIT License.


Indices and tables