DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Updated Jul 28, 2020
Java

blockchain-etl / ethereum-etl

Star

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

export bigquery aws csv sql etl ethereum transaction gcp google-cloud erc20 erc20-tokens blockchain-analytics erc721

Updated Jul 22, 2020
Python

panther-labs / panther

Star

Detect threats with log data and improve cloud security posture

react python go graphql aws security typescript serverless etl bigdata compliance security-automation auto-remediation

Updated Jul 29, 2020
Go

react-csv / react-csv

Star

React components to build CSV files on the fly basing on Array/literal object of data

etl reactjs excel reporting csv-document

Updated Jul 10, 2020
JavaScript

rwynn / monstache

Star

a go daemon that syncs MongoDB to Elasticsearch in realtime

go golang elasticsearch sync synchronization mongodb connector etl daemon realtime oplog tail river change-streams

Updated Jul 28, 2020
Go

singer-io / getting-started

Star

This repository is a getting started guide to Singer.

python etl data-analysis singer etl-framework

Updated Jan 16, 2020
Makefile

PhantomInsights / baby-names-analysis

Star

Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.

etl numpy eda pandas python3 seaborn matplotlib python-requests

Updated Jan 22, 2020
Python

ananas-analytics / ananas-desktop

Star

A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.

visualization etl analytics business-intelligence data-modeling hackable-data

Updated Jul 16, 2020
Java

koopjs / koop

Star

🔮 Transform, query, and download geospatial data on the web.

nodejs api server etl arcgis geojson gis spatial data-management arcgishub

Updated Jul 20, 2020
Shell

AlexIoannides / pyspark-example-project

Star

Example project implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jul 9, 2020
Python

dotnetcore / SmartCode

Star

SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!

code-generator etl dotnet dotnetcore dotnet-core smartcode

Updated Jun 10, 2020
C#

seanharr11 / etlalchemy

Star

Extract, Transform, Load: Any SQL Database in 4 lines of Code.

python sqlalchemy database etl migrations etl-framework

Updated May 23, 2019
Python

grailbio / bigslice

Star

A serverless cluster computing system for the Go programming language

go golang etl cluster bigdata machinelearning mapreduce computing

Updated Jul 27, 2020
Go

2ndQuadrant / pglogical

Star

Logical Replication extension for PostgreSQL 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.

subscription replication etl zero-downtime postgresql data-transformation publish-subscribe cdc logical-decoding data-transport database-replication

Updated Jun 1, 2020
C

reugn / go-streams

Star

Go stream processing library

redis kafka pipeline etl data-stream pipelines streams stream-processing pulsar aerospike kafka-streams data-pipeline streaming-data stream-processor apache-pulsar

Updated Jul 9, 2020
Go

datacleaner / DataCleaner

Star

The premier open source Data Quality solution

data-science data database etl desktop data-analysis mdm profiling datacleaner dataquality

Updated May 27, 2020
Java

appbaseio / abc

Star

Power of appbase.io via CLI, with nifty imports from your favorite data sources

cli elasticsearch etl appbase

Updated Jul 29, 2020
Go

Cascading / cascading

Star

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.

machine-learning hadoop etl cascading data-engineering scalding flink tez

Updated Nov 29, 2018
Java

Cinchoo / ChoETL

Star

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml formatted files)

yaml parser json csv csharp etl dotnet xml writer flat reader parquet keyvalue parquet-files etl-framework cinchoo-etl

Updated Jul 19, 2020
C#

WeBankFinTech / WeDataSphere

Star

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

bi kafka spark hive hadoop etl scheduler ide hbase portal mask sqoop data-quality data-map

Updated Jul 2, 2020

deeplearning4j / DataVec

Star

ETL Library for Machine Learning - data pipelines, data munging and wrangling

machine-learning formatter schema spark pipeline etl transformations datapipeline data-munging svmlight hadoop-ecosystem writables

Updated Jun 1, 2020
Java

YotpoLtd / metorikku

Star

A simplified, lightweight ETL Framework based on Apache Spark

scala sql big-data spark etl distributed-computing etl-framework etl-pipeline

Updated Jul 29, 2020
Scala

etl

Here are 1,095 public repositories matching this topic...

Jeffail / benthos

linq2db / linq2db

nerevu / riko

mara / mara-pipelines

thbar / kiba

compose / transporter

thenaturalist / awesome-business-intelligence

awslabs / aws-data-wrangler

WeBankFinTech / DataSphereStudio

blockchain-etl / ethereum-etl

panther-labs / panther

react-csv / react-csv

rwynn / monstache

singer-io / getting-started

PhantomInsights / baby-names-analysis

ananas-analytics / ananas-desktop

koopjs / koop

AlexIoannides / pyspark-example-project

dotnetcore / SmartCode

seanharr11 / etlalchemy

grailbio / bigslice

2ndQuadrant / pglogical

reugn / go-streams

datacleaner / DataCleaner

appbaseio / abc

Cascading / cascading

Cinchoo / ChoETL

WeBankFinTech / WeDataSphere

deeplearning4j / DataVec

YotpoLtd / metorikku

Improve this page

Add this topic to your repo

Jun	JUL	Aug
	29
2019	2020	2021