parquet

Append class to all HashCodeBuilders in Gaffer for the below issue to minimise hash collisions.

@Test
    void name() {
        Foo foo = new Foo();
        Bar bar = new Bar();

        assertFalse(foo.equals(bar));
        assertNotEquals(foo.hashCode(), bar.hashCode()); //fails
    }

    class Bar {
        int a = 3;

        @Override
        public int hashCode() {

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c

Currently, there isn't a way to get the table properties in the SparkOrcWriter via the WriterFactory.

Over time we've had some things leak into the diff methods that make it more cumbersome to use BigDiffy via code instead of CLI.

For example diffAvro here https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L284

User has to manually pass in schema otherwise we they receive a non-informative error regarding null schema, add

Problem description

Our CI takes some time to run, a significant chunk of this time is spent on creating the required Python environment.

We can speed this up by using caching every new created environment, so that we don't need to create the same environment more than once. (We may have to remove old environments eventually)

We can use https://github.com/actions/cache for this

Jan	FEB	Mar
	08
2020	2021	2022

parquet

Here are 185 public repositories matching this topic...

gchq / Gaffer

apache / parquet-mr

uber / petastorm

quiltdata / quilt

apache / parquet-format

Netflix / iceberg

skale-me / skale

HariSekhon / DevOps-Python-tools

Cinchoo / ChoETL

apache / parquet-cpp

Intel-bigdata / OAP

ranaroussi / pystore

moshe / elasticsearch_loader

spotify / ratatool

elastacloud / parquet-dotnet

scikit-hep / awkward-0.x

ironSource / parquetjs

Chabane / bigdata-playground

cldellow / sqlite-parquet-vtable

sunchao / parquet-rs

51zero / eel-sdk

JDASoftwareGroup / kartothek

Problem description

mukunku / ParquetViewer

mjakubowski84 / parquet4s

fraugster / parquet-go

lightcopy / parquet-index

awslabs / amazon-s3-find-and-forget

indix / schemer

saurfang / sparksql-protobuf

spotify / gcs-tools

Improve this page

Add this topic to your repo