The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
-
Updated
Apr 1, 2022
{{ message }}
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a free analytics DBMS for big data
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
An open source cybersecurity protocol for syncing decentralized graph data.
PredictionIO, a machine learning server for developers and ML engineers.
CMAK is a tool for managing Apache Kafka clusters
The Data Engineering Cookbook
Problem:
_catboost.pyx in _catboost._set_features_order_data_pd_data_frame()
_catboost.pyx in _catboost.get_cat_factor_bytes_representation()
CatBoostError: Invalid type for cat_feature[non-default value idx=1,feature_idx=336]=2.0 : cat_features must be integer or string, real number values and NaN values should be converted to string.
Could you also print a feature name, not o
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
The Complete MLOps Stack
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
The Metadata Platform for the Modern Data Stack
Need to migrate trino-maven-plugin to use GitHub actions instead of Travis CI. See trinodb/trino#6859 for some guidance.
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Add --add-exports jdk.management/com.ibm.lang.management.internal only when OpenJ9 detected.
Otherwise we got WARNING: package com.ibm.lang.management.internal not in jdk.management in logs
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
A common error I have seen when developing Delta Lake using IntelliJ is the following (below). These errors prevent IntelliJ from running unit tests (though, running unit tests via build/sbt core/test still works fine).
Error:(46, 28) object DeltaSqlBaseParser is not a member of package io.delta.sql.parser
import io.delta.sql.parser.DeltaSqlBaseParser._
Error:(143, 34) not
Apache Ignite
... to make it easier to read Vespa documentation on an e-reader / offline
Vespa documentation is generated using Jekyll from .md and .html files, look into options for generating the artifact as part of site generation (there might be plugins we can use here)
Use case:
Concatenate TEXT or to TEXT castable type values using a given separator, for example concatenate address parts by comma.
SELECT concat_ws(',', '535 Mission St.', '14th floor', 'San Francisco', 'CA', '94105') AS address;
--> 535 Mission St., 14th floor, San Francisco, CA 94105Feature description:
Add support for the `concat_ws ( sep text, val1 "any" [
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."
Is your feature request related to a problem? Please describe.
Many static type checkers have issues finding Cython's stubs.
Here is from running mypy on my current project:
The same issue can be seen when using
import Cython as cython: