A modern data workflow platform
-
Updated
Jul 30, 2020 - Python
A modern data workflow platform
A list of useful resources to learn Data Engineering from scratch
Quilt is a versioned data portal for S3
Pandas on AWS
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Clean APIs for data cleaning. Python implementation of R package Janitor
Example project implementing best practices for PySpark ETL jobs and applications.
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Open Metadata and Governance
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
A daily digest of the articles or videos I've found interesting, that I want to share with you.
An Awesome List of Open-Source Data Engineering Projects
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
An automatic ML model optimization tool.
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Study materials for the Google Cloud Professional Data Engineering Exam
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Projects done in the Data Engineering Nanodegree by Udacity.com
Interactive computing for complex data processing, modeling and analysis in Python 3
Tool to build production-ready pipelines for experimentation with Kedro and MLflow
Ansible playbook to deploy distributed technologies
The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
Documentation for data enthusiasts
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."