DBHub

Boilerplate for async ingestion and querying of DBs

This repo aims to provide working code and reproducible setups for bulk data ingestion and querying from numerous databases via their Python clients. Wherever possible, async database client APIs are utilized for data ingestion. The query interface to the data is exposed via async FastAPI endpoints. To enable reproducibility across environments, Dockerfiles are provided as well.

The docker-compose.yml does the following:

Set up a local DB server in a container
Set up local volume mounts to persist the data
Set up a FastAPI server in another container
Set up a network bridge such that the DB server can be accessed from the FastAPI server
Tear down all the containers once development and testing is complete

Currently implemented

Neo4j
Elasticsearch
Meilisearch
Qdrant
Weaviate
LanceDB

Goals

The main goals of this repo are explained as follows.

Ease of setup: There are tons of databases and client APIs out there, so it's useful to have a clean, efficient and reproducible workflow to experiment with a range of datasets, as well as databases for the problem at hand.
Ease of distribution: We may want to expose (potentially sensitive) data to downstream client applications, so building an API on top of the database can be a very useful tool to share the data in a controlled manner
Ease of testing advanced use cases: Search databases (either full-text keyword search or vector DBs) can be important "sources of truth" for contextual querying via LLMs like ChatGPT, allowing us to ground our model's results with factual data.

Pre-requisites

Python 3.10+
Docker
A passion to learn more about and experiment with databases!

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
.github/workflows		.github/workflows
data		data
dbs		dbs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

data

data

dbs

dbs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

DBHub

Boilerplate for async ingestion and querying of DBs

Currently implemented

Goals

Pre-requisites

About

Releases 21

Contributors 3

Languages

License

prrao87/db-hub-fastapi

Folders and files

Latest commit

History

Repository files navigation

DBHub

Boilerplate for async ingestion and querying of DBs

Currently implemented

Goals

Pre-requisites

About

Topics

Resources

License

Stars

Watchers

Forks

Languages