data-engineering
Here are 899 public repositories matching this topic...
-
Updated
Jul 31, 2021
The Data Engineering Cookbook
-
Updated
Apr 2, 2021
Roadmap to becoming a data engineer in 2021
-
Updated
May 28, 2021
Opened from the Prefect Public Slack Community
elliot: Hey folks is anyone else getting a deprecation warning on marshmallow? Something like:
...
/home/rof/.pyenv/versions/3.9.1/lib/python3.9/site-packages/marshmallow/fields.py:198
/home/rof/.pyenv/versions/3.9.1/lib/python3.9/site-packages/marshmallow/fields.py:198: RemovedInMarshmallow4Warning: Passing fi
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
Declarative stream processing for mundane tasks and data engineering
-
Updated
Jul 31, 2021 - Go
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
-
Updated
Jul 30, 2021 - Python
A list of useful resources to learn Data Engineering from scratch
-
Updated
Jun 9, 2021
-
Updated
Jul 26, 2021 - JavaScript
Currently lakeFS register openapi handlers and handle all specific routes.
In case of a call to /api/v1/test, the unknown path under the API prefix, the mux will serve the request by the UI handler and return a valid HTML (UI) page.
The expected behaviour is to return a non-2xx status code with JSON error - prefered the internal error format, so the developer will handle an error and not fai
Quilt is a self-organizing data hub for S3
-
Updated
Jul 31, 2021 - Jupyter Notebook
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
-
Updated
May 10, 2021 - Jupyter Notebook
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
-
Updated
Jul 3, 2021
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
-
Updated
Mar 9, 2020 - Python
if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.
`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)
@classmethod
def create_testing_pyspark_session(cls):
return Sp
Hi ,
I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?
Accumulated knowledge and experience in the field of Data Engineering
-
Updated
Jun 2, 2021
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
Updated
Mar 5, 2020 - Python
A Data Engineering & Machine Learning Knowledge Hub
-
Updated
Jul 30, 2021
Data validation and organization of metadata for data frames and database tables
-
Updated
Jul 31, 2021 - R
Polyglot workflows without leaving the comfort of your technology stack.
-
Updated
Jun 7, 2021 - Ruby
An Awesome List of Open-Source Data Engineering Projects
-
Updated
May 22, 2021
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
-
Updated
Jul 30, 2021 - TypeScript
In the repository handler
- removeEntity tries to delete then if delete is not supported issues a purge, the purge method issues an audit log
- There are 2 callers to purgeRelationship only one of which audit logs
This is inconsistent.
I suggest we move the relationship audit log to the purge method, which means that both callers will audit log.
Data profiling, testing, and monitoring for SQL accessible data.
-
Updated
Jul 30, 2021 - Python
-
Updated
Feb 7, 2021 - CSS
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
-
Updated
Nov 29, 2018 - Java
A daily digest of the articles or videos I've found interesting, that I want to share with you.
-
Updated
Jul 31, 2021
Improve this page
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."


Screenshot
Description
chart 3 dot menu is behind the chart title panel in chart maximize mode