data-wrangling
Here are 550 public repositories matching this topic...
A Python toolbox for gaining geometric insights into high-dimensional data
-
Updated
Jul 19, 2021 - Python
Carefully curated resource links for data science in one place
-
Updated
May 22, 2021
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
-
Updated
Aug 15, 2021 - Go
-
Updated
Aug 19, 2021 - Python
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
-
Updated
Aug 10, 2021 - TypeScript
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
-
Updated
Jan 5, 2021 - Jupyter Notebook
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
-
Updated
Aug 9, 2021 - HTML
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
-
Updated
Aug 18, 2021 - C#
Like Awk but with SQL and table joins
-
Updated
May 27, 2021 - Tcl
Data Cleaning Libraries with Python
-
Updated
Mar 20, 2019 - Jupyter Notebook
Tools for test driven data-wrangling and data validation.
-
Updated
Apr 26, 2021 - Python
These materials are really excellent, and I have a small suggestion that you should feel free to take or leave as you see fit. In the section "Knowing your way around RStudio" it might be beneficial to highlight the different options that users can change about the RStudio appearance in Tools > Global Options.
Some of these options available in "Global Options" are really helpful and if users k
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
-
Updated
Feb 9, 2021 - R
Web scrapping and related analytics using Python tools
-
Updated
Jun 7, 2020 - Jupyter Notebook
Materials for following along with Hands-On Data Analysis with Pandas.
-
Updated
Jul 14, 2021 - Jupyter Notebook
Data transformation and utility functions for R
-
Updated
May 12, 2021 - R
JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
-
Updated
May 11, 2018 - JavaScript
The links on the Setup page could be improved to meet web accessibility standards. People using screen readers may navigate a page by links, so the link descriptions should be meaningful, and "here" should be avoided. (See WebAIM: Links and Hypertext.)
For example, the line "
Dear Community,
There is a typo in the section titled "The StringsAsFactors argument" after the second block of code that demonstrates the use of the str() function. Right after the code boxes is written "We can see that the $Color and $State columns are factors and $Speed is a numeric column", but the box shows that the $Color column is a vector of strings.
Regards,
Rodolfo
Teaching feedback
- I felt like
nuniquewas arbitrarily (re)introduced when it was necessary. It wouldn't be top-of-mind for students solving problems. - The lesson answers need to be adjacent to the exercises.
- I like the pre-introduction of masks and then circling back around to explain them.
- I feel like Part 4 needs to be broken up and integrated across other lessons: it felt thin on its own.
- Horizo
Currently the episode 15 reflections text expects to happen after we go over functions but due to lesson episode reordering this is no longer the case. We either need to come up with other reflections or move the break after functions (which might be too long)
https://swcarpentry.github.io/python-novice-gapminder/15-coffee/index.html
Exploratory data analysis
-
Updated
Jan 2, 2019 - Jupyter Notebook
12-hour intro to data science in R, no prior knowledge assumed
-
Updated
Aug 18, 2021 - R
In episode _episodes_rmd/12-time-series-raster.Rmd
There is a big chunk of code that can probably be made to look nicer via dplyr:
# Plot RGB data for Julian day 133
RGB_133 <- stack("data/NEON-DS-Landsat-NDVI/HARV/2011/RGB/133_HARV_landRGB.tif")
RGB_133_df <- raster::as.data.frame(RGB_133, xy = TRUE)
quantiles = c(0.02, 0.98)
r <- quantile(RGB_133_df$X133_HARV_landRGB.1, q
The discussion of data types and data structures in "Vectors and data types" could be clarified. Perhaps even defining these terms before using them would help. Also note that the first sentence of the section reads "A vector is the most common and basic data type in R, and is pretty much the workhorse of R." perhaps this should be changed to "basic data structure"
Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.
-
Updated
Mar 4, 2020 - HTML
Automatic transformation of untidy spreadsheet-like data into tidy form
-
Updated
Jun 2, 2020 - R
Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
-
Updated
May 31, 2021 - Jupyter Notebook
A 3-hour introductory workshop on pandas with notebooks and exercises for following along.
-
Updated
Jul 27, 2021 - HTML
Improve this page
Add a description, image, and links to the data-wrangling topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-wrangling topic, visit your repo's landing page and select "manage topics."


I would like to import a
.zstor.zstdfile but currently that file type is not recognized to be imported by OpenRefine.Proposed solution
Perhaps using Apache Commons Compress (we already have usage in
ImportingUtilities.java)https://commons.apache.org/proper/commons-compress/examples.html#Zstandard
Allow importing the
.zstor.zstdfile from my local computer, as well as fr