Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 6,778 public repositories matching this topic...
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Updated
Apr 3, 2022 - Python
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
Updated
Jul 7, 2022 - Python
Learn and understand Docker&Container technologies, with real DevOps practice!
-
Updated
Jul 1, 2022 - Go
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
-
Updated
May 18, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
-
Updated
Jul 7, 2022 - Python
At this moment relu_layer op doesn't allow threshold configuration, and legacy RELU op allows that.
We should add configuration option to relu_layer.
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
-
Updated
Jun 21, 2022 - Java
List of Data Science Cheatsheets to rule the world
-
Updated
Jun 9, 2022
A Flexible and Powerful Parameter Server for large-scale machine learning
-
Updated
Jun 17, 2022 - Java
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Updated
Jul 7, 2022 - Jupyter Notebook
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
-
Updated
Feb 8, 2022 - Python
Alluxio, data orchestration for analytics and machine learning in the cloud
-
Updated
Jul 7, 2022 - Java
Feature request
Overview
Currently, the DELETE operation returns an empty result. It would be more useful if it returned the number of deleted rows.
Motivation
The number of deleted rows is an obvious metric that users would want from a delete operation.
Further details
Currently, DeleteCommand.scala is explicitly returning an empty DataFrame [here](https://g
PipelineAI Kubeflow Distribution
-
Updated
Apr 24, 2020 - Jsonnet
Problem:
The current log will output something like val_function_0 while it should be val_mean_squared_error_0.
Solution:
"val/{}_{}".format(type(metric).__name__, i) use the name of the type of metric (metric is an instance of torchmetrics.metric.Metric so the type of it is function), that's why the output looks like val_function_0. It should use the name of the metri
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
-
Updated
Apr 21, 2022 - Python
I have a simple regression task (using a LightGBMRegressor) where I want to penalize negative predictions more than positive ones. Is there a way to achieve this with the default regression LightGBM objectives (see https://lightgbm.readthedocs.io/en/latest/Parameters.html)? If not, is it somehow possible to define (many example for default LightGBM model) and pass a custom regression objective?
酷玩 Spark: Spark 源代码解析、Spark 类库等
-
Updated
May 18, 2022 - Scala
The Hunting ELK
-
Updated
May 12, 2021 - Jupyter Notebook
Interactive and Reactive Data Science using Scala and Spark.
-
Updated
Oct 19, 2021 - JavaScript
State of the Art Natural Language Processing
-
Updated
Jul 7, 2022 - Scala
Used Spark version
Spark Version: 2.4.4
Used Spark Job Server version
SJS version: v0.11.1
Deployed mode
client on Spark Standalone
Actual (wrong) behavior
I can't get config, when post a job with 'sync=true'. I got it:
http://localhost:8090/jobs/ff99479b-e59c-4215-b17d-4058f8d97d25/config
{"status":"ERROR","result":"No such job ID ff99479b-e59c-4215-b17d-4058f8d97d25"
Created by Matei Zaharia
Released May 26, 2014
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Describe the bug
Using a time dimension on a runningTotal measure on Snowflake mixes quoted and unquoted columns in the query. This fails the query, because Snowflake has specific rules about quoted columns. Specifically:
So "date_from" <> date_from
To Reproduce
Steps to reproduce