a-mma / AquilaDB

NOTICE

Hey there 👋,

Here's the roadmap for AquilaDB refactoring:

AquilaDB v1.0
Technical Specifications - finalization after review
Technical Specifications - public draft for review
[Update - Jul 31 2020] White paper published after review
[Update - Jul 25 2020] White paper public draft is now available for review.

AquilaDB core team have temporarily stopped development for upcoming ~~4 months~~ 2 months period. We've decided to take a step back and face the whiteboard again. Which is required to ensure AquilaDB's sustainability to continue as an Open Source project and to reduce stress over limited resources that we have in the development process. We will see you soon for sure.

We know that some of you have reached here as part of your time critical projects. We're sorry for the inconvinience. And Don't worry, we can direct you to some alternatives that we know:

The examples available in our documentation will work in all these platforms with small API changes.

If you're learning Machine Learning techniques and interested in Similarity Search, play around and bear with us.

If you wanted to lend a hand to help the community, please check the issues section. We're happy to merge your pull requests. Any new crazy addition is also encouraged. Also please extend your help towards our Discord community support as well.

And finally, Stay Home 🏠, Stay Healthy 🧘

regards, a_mma team

Do you like this project? We love getting a star ⭐ and shout-out 🗣️from you in return! 🤗

Community support: discord chatroom for discussions

Documentation

AquilaDB

AquilaDB is a Decentralized Vector Database to store Feature Vectors along with JSON Document Metadata. Do k-NN retrieval from anywhere, even from the darkest rifts of Aquila (in progress). It is dead simple to set up, language agnostic and drop in addition for your Machine Learning Applications. AquilaDB, as of current features is ready solution for Machine Learning engineers and Data scientists to build Neural Information Retrieval applications out of the box with minimal dependencies (visit wiki page for use case examples).

AquilaDB 1.0 release is a distant goal to achieve. Visit contribute section below to see detailed development plan and milestones. We make sure that each release and AquilaDB Master branch are stable with all features planned up to date. All new pull requests are made to develop branch. So, develop is the default and bleeding edge branch with all the latest updates.

Github, Docker Hub, Documentation (dedicated Wiki page)

Who is this for

If you are working on a data science project and need to store a hell lot of data and retrieve similar data based on some feature vector, this will be a useful tool to you, with extra benefits a real world web application needs.
Are you dealing with a lot of images and related metadata? Want to find the similar ones? You are at the right place.
If you are looking for a document database, this is not the right place for you.

Technology

AquilaDB is not built from scratch. Thanks to OSS community, it is based on a couple of cool open source projects out there. We took a couch and added some wheels and jetpacks to make it a super cool butt rest for Data Scientists and ML Engineers. While CouchDB provides us network and scalability benefits, FAISS and Annoy provides superfast similarity search. Along with our peer management service, AquilaDB provides a unique solution.

Prerequisites

You need docker installed.

Usage

AquilaDB is quick to setup and run as docker a container. All you need to do is either build it from source or pull it from Docker hub.

Option 1: build from source

clone this repository
build image: docker build -f <Dockerfile name> -t ammaorg/aquiladb:latest .

Option 2: pull from dockerhub

pull image: docker pull ammaorg/aquiladb:latest

Finally, deploy

deploy: docker run -d -i -p 50051:50051 -v "<local data persist directory>:/data" -t ammaorg/aquiladb:latest

AquilaDB as a kubernetes service

Run the following kubectl command to get Aquiladb as a service exposed to a k8s-cluster

deploy: kubectl apply -f https://github.com/a-mma/AquilaDB/blob/<Github branch>/kubernetes/aquiladb.yml

Client SDKs

We currently have multiple client libraries in progress to abstract the communication between deployed AquilaDB and your applications.

Python

Node JS

AquilaDB exposes gRPC APIs for the clients. Which means, you can communicate directly to AquilaDB from your favourite language (API reference). Above clients makes use of that to abstract the communication details from end user. If you are familiar with gRPC and would like to contribute a new client library in any other language, please let us know. Protocol buffers API reference. Example usage of APIs in node js.

Benchmark

For benchmark results, visit https://aquiladb.xyz/docs/adb-benchmarks

Progress

This project is still under active development (pre-release). It can be used as a standalone database now. Peer manager is a work in progress, so networking capabilities are not available now. With release v1.0 we will release pre-optimized version of AquilaDB.

Contribute

We have prepared a document to get anyone interested to contribute, immediately started with AquilaDB.

Here is our high level release roadmap.

Learn

We have started meeting developers and do small talks on AquilaDB. Here are the slides that we use on those occasions: http://bit.ly/AquilaDB-slides

Video:

As of current AquilaDB release features, you can build Neural Information Retrieval applications out of the box without any external dependencies. Here are some useful links to learn more about it and start building:

These use case examples will give you an understanding of what is possible and what not: https://github.com/a-mma/AquilaDB/wiki
Microsoft published a paper and youtube video on this to onboard anyone interested:
- paper: https://www.microsoft.com/en-us/research/uploads/prod/2017/06/INR-061-Mitra-neuralir-intro.pdf
- video: https://www.youtube.com/watch?v=g1Pgo5yTIKg
Embeddings for Everything: Search in the Neural Network Era: https://www.youtube.com/watch?v=JGHVJXP9NHw
Autoencoders are one such deep learning algorithms that will help you to build semantic vectors - foundation for Neural Information retrieval. Here are some links to Autoencoders based IR:
Note that, the idea of information retrieval applies not only to text data but for any data. All you need to do is, encode any source datatype to a dense vector with deep neural networks.

Our Sponsors

LOVE

to sponsor this project contact@aquiladb.xyz

Citing AquilaDB

If you use AquilaDB in an academic paper, we would 😍 to be cited. Here are the two ways of citing AquilaDB:

\footnote{https://github.com/a-mma/AquilaDB}

@misc{a_മ്മ2019AquilaDB,
  title={AquilaDB: Neural Information Retrieval Solution},
  author={Jubin Jose},
  howpublished={\url{https://github.com/a-mma/AquilaDB}},
  year={2019}
}

License

Apache License 2.0 license file

created with ❤️ a-mma.indic (a_മ്മ)

Jul	AUG	Sep
	12
2019	2020	2021

a-mma / AquilaDB

README.md

NOTICE

AquilaDB

Who is this for

Technology

Prerequisites

Usage

Option 1: build from source

Option 2: pull from dockerhub

Finally, deploy

AquilaDB as a kubernetes service

Client SDKs

Benchmark

Progress

Contribute

Learn

Our Sponsors

Citing AquilaDB

License

About

Releases

Contributors 4

Languages

a-mma / AquilaDB

Join GitHub today

Clone with HTTPS

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

NOTICE

AquilaDB

Who is this for

Technology

Prerequisites

Usage

Option 1: build from source

Option 2: pull from dockerhub

Finally, deploy

AquilaDB as a kubernetes service

Client SDKs

Benchmark

Progress

Contribute

Learn

Our Sponsors

Citing AquilaDB

License

About

Topics

Resources

License

Releases

Contributors 4

Languages