apache-spark
Here are 933 public repositories matching this topic...
酷玩 Spark: Spark 源代码解析、Spark 类库等
-
Updated
May 26, 2019 - Scala
Interactive and Reactive Data Science using Scala and Spark.
-
Updated
Jun 2, 2020 - JavaScript
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
-
Updated
Sep 30, 2020 - Jupyter Notebook
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
-
Updated
Sep 7, 2020 - Java
The current azure-pipelines.yaml is highly duplicated, especially the Test stages (E2E Tests, E2E Backward Compatibility Tests, and E2E Forward Compatibility Tests).
This should be refactored to remove duplication to make it easy to maintain (e.g, adding a new Spark version to test against).
Apache Spark docker image
-
Updated
Sep 25, 2020 - Dockerfile
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
-
Updated
Oct 1, 2020 - Go
PySpark + Scikit-learn = Sparkit-learn
-
Updated
Oct 24, 2017 - Python
(Deprecated) Scikit-learn integration package for Apache Spark
-
Updated
Dec 3, 2019 - Python
A curated list of awesome Apache Spark packages and resources.
-
Updated
Oct 2, 2020
-
Updated
Oct 2, 2020 - Jupyter Notebook
C# and F# language binding and extensions to Apache Spark
-
Updated
Nov 1, 2019 - C#
R interface for Apache Spark
-
Updated
Oct 2, 2020 - R
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
-
Updated
Jan 24, 2017 - Scala
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
-
Updated
Mar 9, 2020 - Python
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
-
Updated
Jul 25, 2018 - Python
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
-
Updated
Jan 8, 2020 - Scala
A command-line tool for launching Apache Spark clusters.
-
Updated
Sep 17, 2020 - Python
REST web service for the true real-time scoring (<1 ms) of R, Scikit-Learn and Apache Spark models
-
Updated
Aug 5, 2020 - Java
Streaming System 相关的论文读物
-
Updated
Mar 31, 2018
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
-
Updated
Jul 29, 2020 - Jupyter Notebook
A list about Apache Kafka
-
Updated
Sep 16, 2020
The Internals of Spark Structured Streaming
-
Updated
Sep 26, 2020 - Dockerfile
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
-
Updated
Sep 14, 2015 - Shell
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
-
Updated
Sep 3, 2020 - Scala
A boilerplate for writing PySpark Jobs
-
Updated
Jul 1, 2020 - Python
Improve this page
Add a description, image, and links to the apache-spark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the apache-spark topic, visit your repo's landing page and select "manage topics."


MLflow seems to have a length limit of 5000 when setting tags (see below).