data-engineering

Screenshot

I've added a red vertical ruler so that you see the issue

Description

As already explained in numerous issues, the use of 'Inter' font is problematic, it does not allow to align dates for instance,
and does not play nice with numbers either.

In my supe

Current behavior

You get an error if you try to upload the same file name

azure.core.exceptions.ResourceExistsError: The specified blob already exists.
RequestId:5bef0cf1-b01e-002e-6

Proposed behavior

The task should take in an overwrite argument and pass it to [this line](https://github.com/PrefectHQ/prefect/blob/6cd24b023411980842fa77e6c0ca2ced47eeb83e/src/prefect/

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

make a batch from a long query string
run validation
render result to data docs
See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4

Under the hood, Benthos csv input uses the standard encoding/csv packages's csv.Reader struct.

The current implementation of csv input doesn't allow setting the LazyQuotes field.

We have a use case where we need to set the LazyQuotes field in order to make things work correctly.

Is your feature request related to a problem? Please describe.
I have a framework that handles the offline store. It creates the tables, indexes, reads data from different data sources, does some transformations, and then inserts into the offline store. As a part of this, I can construct the entities, feature views, feature services, etc, a instance of the ParsedRepo class for Feast. What I n

In the architecture page, under the Overview section, the overview image shows two LBs pointing to a lakeFS environment. This is no longer true and should show a single LB pointing to that environment.

click has a CLIRunner to test CLI applications, however, it's limiting (e.g., monkeypatch doesn't work well). So we started to modify the test_cli.py tests to call the functions directly (e.g., install.main(use_lock=True). But given this change, we are no longer testing that cli args actually become the right function arguments (e.g., if we pass --use-lock), this should imply, we pass `ins

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

Background

This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.

Criteria reiterated here for the benefit of discussion:

It sh

In a lot of classes we use LoggerFactory to initialize logger

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DefaultAuthorizer implements Authorizer {
  private static final Logger LOG = LoggerFactory.getLogger(DefaultAuthorizer.class);

This could be simplified to the following, with no need to initialize logger using LoggerFactory

import lombok.exte

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

A large amount of output goes to the log, this should not happen by default.

Expected Behavior

much less content in the output of the FVT and the build bu default

Switch on debug in the logging configuration and then see all the output.

Steps To Reproduce

run the build

Env

Dec	JAN	Feb
	08
2021	2022	2023

data-engineering

Here are 1,060 public repositories matching this topic...

apache / superset

Screenshot

Description

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Current behavior

Proposed behavior

great-expectations / great_expectations

airbytehq / airbyte

Jeffail / benthos

feast-dev / feast

awslabs / aws-data-wrangler

treeverse / lakeFS

adilkhash / Data-Engineering-HowTo

kantord / just-dashboard

quiltdata / quilt

GoogleCloudPlatform / data-science-on-gcp

benthecoder / yt-channels-DS-AI-ML-CS

san089 / goodreads_etl_pipeline

ploomber / ploomber

AlexIoannides / pyspark-example-project

pyjanitor-devs / pyjanitor

Background

abhishek-ch / around-dataengineering

oleg-agapov / data-engineering-book

sodadata / soda-sql

san089 / Udacity-Data-Engineering-Projects

open-metadata / OpenMetadata

gunnarmorling / awesome-opensource-data-engineering

mlrun / mlrun

automaticmode / active_workflow

odpi / egeria

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Env

dataform-co / dataform

Improve this page

Add this topic to your repo