Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
Updated
Mar 21, 2022 - Go
{{ message }}
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
List of libraries, tools and APIs for web scraping and data processing.
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Concurrent and multi-stage data ingestion and data processing with Elixir
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Large-scale pretraining for dialogue
Extract Transform Load for Python 3.5+
Describe the bug
pa.errors.SchemaErrors.failure_cases only returns the first 10 failure_cases
Note: Please read [this guide](https://matthewrocklin.c
(1) Add docstrings to methods
(2) Covert .format() methods to f strings for readability
(3) Make sure we are using Python 3.8 throughout
(4) zip extract_all() in ingest_flights.py can be simplified with a Path parameter
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
setting pretrained_model_name will not only define the model arch but also load the pre-trained checkpoint. We should have another hparam to control whether to load pre-trained checkpoint or not.
A list about Apache Kafka
Hello Benito,
For a specific task I need a "bitwise exclusive or"-function, but I realized xidel doesn't have one. So I created a function for that.
I was wondering if, in addition to the EXPath File Module, you'd be interested in integrating the EXPath Binary Module as well. Then I can use bin:xor() instead (although for
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Harmonious distributed data analysis in Rust.
Write unit test coverage for SafeDataset and SafeDataLoader, along with the functions in utils.py.
Advanced and Fast Data Transformation in R
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Machine Learning notebooks for refreshing concepts.
Elastic data processing with Apache Pulsar and Apache Flink
The exception in subject is thrown by the following code:
from datetime import date
from pysparkling.sql.session import SparkSession
from pysparkling.sql.functions import collect_set
spark = SparkSession.Builder().getOrCreate()
dataset_usage = [
('steven', 'UUID1', date(2019, 7, 22)),
]
dataset_usage_schema = 'id: string, datauid: string, access_date: date'
df = spaManipulating VASP files with Python.
An open source framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud.
Python Adaptive Signal Processing
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."
Is your feature request related to a problem?
Currently, if a user tries to access an index that is larger than the dataset length or tensor length, an internal error is thrown which is not easy to understand.
Description of the possible solution
We can catch the error and throw a more descriptive e