-
Updated
Sep 25, 2021
{{ message }}
Create HTML profiling reports from pandas DataFrame objects
There are a couple of cases where we pass logging fields using "Context" value called "log_fields".
In order to prevent misuse of these keys, which pass as strings, it would be better to have a constant and document each one.
We're using marshmallow to parse whylogs config from YAML
However, Pydantic is much more powerful as it allows users to set config via various mechanims, from YAML, JSON to Environment settings.
We should consider moving to pydantic
Data validation and organization of metadata for data frames and database tables
re_data - data quality framework. Build on top of dbt, re_data helps you find, debug and resolve problems in your data.
Data profiling, testing, and monitoring for SQL accessible data.
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Profile and monitor your ML data pipeline end-to-end
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
An RDF Unit Testing Suite
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
As odd-platform supports redshift and so on, it would be awesome to support BigQuery integration.
Great Expectations Airflow operator
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
A tool to help improve data quality standards in observational data science.
Automated data quality suggestions and analysis with Deequ on AWS Glue
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
DTCleaner: data cleaning using multi-target decision trees.
DataOps for Government
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4