datalake

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of Fishtown Analytics)

etl snowflake datawarehousing dbt elt datawarehouse datalake dataengineering datavault datavault20

Updated Aug 11, 2020
TSQL

PaloAltoNetworks / pan-cortex-data-lake-python

Star

Python idiomatic SDK for Cortex™ Data Lake.

Updated Jul 6, 2020
Python

LearningJournal / SparkProgrammingInScala

Star

Apache Spark Course Material

scala big-data spark apache-spark bigdata data-lake datalake spark-sql spark-scala

Updated Jul 26, 2020
Scala

ExpediaGroup / apiary

Star

Apiary provides modules which can be combined to create a federated cloud data lake

aws hive datalake hive-metastore

Updated Feb 19, 2020

AbsaOSS / enceladus

Star

Dynamic Conformance Engine

scala spark spring mongodb hadoop bigdata datalake

Updated Aug 10, 2020
Scala

abdullahkhawer / aws-auto-terminate-idle-emr

Star

AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Updated Aug 1, 2019
Python

ExpediaGroup / apiary-data-lake

Star

Terraform scripts for deploying Apiary Data Lake

apiary datalake

Updated Aug 11, 2020
HCL

leesf / hudi-resources

Star

汇总Apache Hudi相关资料

bigdata apache stream-processing data-integration datalake hudi apachehudi incremental-processing hudi-resources

Updated Aug 10, 2020

elastacloud / parquet-usql

Star

A custom extractor designed to read parquet for Azure Data Lake Analytics

azure extractor parquet datalake adla custom-extractor custom-outputter

Updated Feb 13, 2018
C#

brfulu / us-accidents-data-engineering

Star

Udacity Data Engineer Nanodegree - Capstone project

aws airflow spark athena s3 datalake

Updated Dec 19, 2019
Python

aravinthsci / Spark_Delta_Lake

Star

Delta Lake Examples

spark datalake apachespark delta-lake deltalake

Updated Apr 24, 2020
Jupyter Notebook

caiomsouza / microsoft-big-data-scientist-and-ai

Star

Microsoft Big Data, Data Scientist, and AI

Updated Jul 31, 2020

vim89 / datalake-etl-pipeline

Star

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

python big-data spark apache-spark hadoop etl xml python3 xml-parsing pyspark json-parser datalake hadoop-mapreduce spark-sql etl-framework hadoop-hdfs etl-pipeline etl-components

Updated Aug 1, 2020
Python

lynnlangit / learning-nosql

Star

Companion repository to Linked In Learning course 'Cloud NoSQL for SQL Pros'

data nosql dynamodb aws-dynamodb datalake gcp-bigtable

Updated Apr 28, 2020

nnthanh101 / social-listening

Star

The AI-Driven Social Media Dashboard solutions provides customers with a CloudFormation template that is easy to deploy to use Amazon Translate, Amazon Comprehend, Amazon Kinesis, Amazon Athena, and Amazon QuickSight to build a natural-language-processing (NLP)-powered social media dashboard for tweets.

social-media cloudformation aws-lambda data-visualization data-analysis aws-kinesis aws-ecs datalake aws-athena aws-comprehend aws-translate aws-quicksight

Updated Jul 20, 2020
JavaScript

CalvinHartwell / getting-started-with-kylo

Star

An introduction to using Kylo, an open source data lake builder from Teradata

spark hadoop gitbook teradata nifi apache-nifi hdp datalake kylo thinkbig thinkbiganalytics

Updated Jun 17, 2017

mfilipelino / kafka2hdfs

Star

pyspark streaming kafka(0.8.2) to hdfs

kafka spark spark-streaming hdfs datalake

Updated Dec 13, 2018
Python

LearningJournal / Spark-Streaming-In-Scala

Star

Apache Spark 3 - Structured Streaming Course Material

scala big-data spark apache-spark bigdata spark-streaming datalake spark-sql

Updated Aug 5, 2020
Scala

luanoliveira1992 / bigdatamusic

Star

bigdata python3 datalake pipenv

Updated Mar 31, 2020

hau-mal / articles

Star

BigData Blog articles

azure bigdata nifi datalake hdinsight

Updated Dec 8, 2019

pactera-ai / data2lake

Star

a tool to form a lake on AWS from your data

aws data automation datalake

Updated Aug 11, 2020
Python

gfelot / DEND-DateLake-Spark

Star

Use of Spark to get data from S3 then wrangle it to make available back in S3 with a better schema

aws spark python3 udacity-nanodegree datalake udacity-data-engineer-nanodegree

Updated Mar 31, 2020
Python

MehdiTAZI / BigData-Platform

Star

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]