Create HTML profiling reports from pandas DataFrame objects
-
Updated
Nov 8, 2020 - Jupyter Notebook
{{ message }}
Create HTML profiling reports from pandas DataFrame objects
Data validation and organization of metadata for data frames and database tables
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Profile and monitor your ML data pipeline end-to-end
An RDF Unit Testing Suite
Profile and monitor your ML data pipeline end-to-end
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
DTCleaner: data cleaning using multi-target decision trees.
A tool to help improve data quality standards in observational data science.
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Automated data quality suggestions and analysis with Amazon Deequ on AWS Glue
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
DataOps for Government
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)
e.g. sqlalchemy.exc.IdentifierError: Identifier 'quality_check_airport_report_next_interval_SNN_daily_unique_quality_check' exceeds maximum length of 63 characters
A Node.js tool to examine the correctness of Open Data Metadata and build custom dataset profiles
Data quality control tool built on spark and deequ
This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.
Data validation library for PySpark 3.0.0
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."
We are trying to use GE with GCP DataProc clusters. While cluster creation we are installing great-expectations==0.12.4. This installs ruamel.yaml==0.15.35 as dependency. After cluster creation if we try to import great_expectations we get error:
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/default/lib/python3.6/site-packages/great_expectations/_