apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
{{ message }}
See what the GitHub community is most excited about today.
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Source code for Twitter's Recommendation Algorithm
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Rocket Chip Generator
State of the Art Natural Language Processing
An open protocol for secure data sharing
The Scala 3 compiler, also known as Dotty.
Spark: The Definitive Guide's Code Repository
CMAK is a tool for managing Apache Kafka clusters
A Spark plugin for reading and writing Excel files
A STAC/OGC API Features Web Service
educational microarchitectures for risc-v isa
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
workbench identity and access management
Bloop is a build server and CLI tool to compile, test and run Scala fast from any editor or build tool.
Functional GraphQL library for Scala
Open-source high-performance RISC-V processor
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.