big-data
Here are 2,258 public repositories matching this topic...
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
-
Updated
Nov 24, 2020
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Updated
Oct 1, 2020 - Python
-
Updated
Nov 27, 2020 - Python
PredictionIO, a machine learning server for developers and ML engineers.
-
Updated
Oct 14, 2020 - Scala
An open source cybersecurity protocol for syncing decentralized graph data.
-
Updated
Nov 23, 2020 - JavaScript
CMAK is a tool for managing Apache Kafka clusters
-
Updated
Nov 17, 2020 - Scala
The latest copy of the CPython grammar tests in test_grammar.py has several @skips and FIXMEs. Some of them seem easy to fix, e.g. some parser bugs or missing warnings that would be helpful, others are entire features. We should fix the easy ones and make sure there are tickets for the rest.
Problem:
catboost version: 0.23.2
Operating System: all
Tutorial: https://github.com/catboost/tutorials/blob/master/custom_loss/custom_metric_tutorial.md
Impossible to use custom metric (С++).
Code example
from catboost import CatBoost
train_data = [[1, 4, 5, 6],
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Updated
Nov 27, 2020 - Jupyter Notebook
Reproducible Data Science at Scale!
-
Updated
Nov 26, 2020 - Go
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
-
Updated
Nov 24, 2020 - Erlang
Would you like to add more error handling for return values from functions like the following?
- malloc ⇒ moloch_trie_init
- [MOLOCH_LOCK_INIT](https://github.com/aol/moloch/blob/664ffe25810380f12823941c210
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
-
Updated
Sep 1, 2020 - Python
Targetting 4.1
Proposal
Take the existing:
hazelcast/hazelcast/src/main/java/com/hazelcast/map/impl/proxy/MapProxyImpl.java
Line 916 in 4fed159
public <R> Iterator<R> iterator(int fetchSize, int partitionId, Projection<Map.Entry<K, V>, R> projection,
And create a public API in IMap which differs from the above in that it:
- does not require partitionId but work
BigDL: Distributed Deep Learning Framework for Apache Spark
-
Updated
Nov 23, 2020 - Scala
Apache Ignite
-
Updated
Nov 27, 2020 - Java
PrestoDB https://prestodb.io .. is widely used as SQL frontend for many different data-sources, including ElasticSearch, and even files in S3 .. would be very nice if there would be a Connector available for Vespa.
Hi, if my spark app is using 2 storage type, both S3 and Azure Data Lake Store Gen2, could I put spark.delta.logStore.class=org.apache.spark.sql.delta.storage.AzureLogStore, org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
Thanks in advance
An easy to use, self-service open BI reporting and BI dashboard platform.
-
Updated
Jun 16, 2020 - TSQL
Bare bone examples of machine learning in TensorFlow
-
Updated
Mar 14, 2017 - Python
Improve this page
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."



Now insert and query share the resource ( Max Process Count control) 。 When the query with high TPS,the insert will get error (“error: too many process”). I think separator the resource for Insert and Query will makes sense. Ensure enough resource for insert。It looks like Use Yarn, Insert and Query use the different resource quota。
Or the simple way , Can we set Ratio for Insert and