List of libraries, tools and APIs for web scraping and data processing.
-
Updated
Dec 3, 2020 - Makefile
{{ message }}
List of libraries, tools and APIs for web scraping and data processing.
A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Extract Transform Load for Python 3.5+
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Concurrent and multi-stage data ingestion and data processing with Elixir
Large-scale pretraining for dialogue
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
A list about Apache Kafka
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Write unit test coverage for SafeDataset and SafeDataLoader, along with the functions in utils.py.
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Harmonious distributed data analysis in Rust.
Machine Learning notebooks for refreshing concepts.
Manipulating VASP files with Python.
Python Adaptive Signal Processing
Advanced and Fast Data Transformation in R
Partition created by <> operations, consume the smaller partitions in parallel.Collection of Data Processing Agreement (DPA) and GDPR compliance resources
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Elastic data processing with Apache Pulsar and Apache Flink
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."
Is your feature request related to a problem? Please describe.
Currently, the
MultiIndexschema component str representation is the same asthe
DataFrameSchemarepresentationDescribe the solution you'd like
the
MultiIndexschema component should implement its own str representationso that it doesn't render
columnsand instead shoulds the indexes.