Introduction to Qdrant
Getting familiar with qdrant and its functionality.
Yash Worlikar Sat Dec 09 2023 4 min readWhat is Qdrant?
Qdrant is an open-source vector database optimized for storing, searching, and managing vectors. It can be set up locally, on-premise, and Qdrant Cloud (Qdrant’s SaaS solution). The data is stored in the form of vectors along with an option to store some metadata (known as payload) for each vector.
Exploring the Qdrant Ecosystem
Qdrant is distributed under the Apache-2.0 license so it’s available for both personal and commercial use. It comes with a user-friendly REST API and gRPC interface that makes it very simple to interact with.
Due to its open-source nature, we also have the option to modify the source code to suit our business needs. It’s written in rust so it’s highly reliant and specifically designed for applications that are reliant on vector searches such as image search, similarity search, and anomaly detection.
Qdrant officially supports the client libraries for multiple languages including:
- Python
- dotnet (only supports gRPC currently)
- rust
- Typescript
- go
But it’s also possible to generate your client libraries using the available OpenAPI definitions or the protobuff definitions for REST and gRPC API respectively.
Qdrant has native integrations with existing frameworks like Langchain, LlamaIndex, and Microsoft Semantic Kernel making it highly available and very easy to use with your existing stack.
What makes Qdrant unique?
There are multiple reasons that make Qdrant stand out:
- State-of-the-art search optimization
- Cloud-native support for distributed deployment and replication
- Developed in Rust with speed and efficiency in mind
- Powerful recommendation API with various Strategies and Flexibility:
- FastEmbed: Embedding Inference library for faster embedding generation.
- Simple and easy-to-use API making it accessible to developers of all skill levels
- Open-source allowing community contributions and transparency
Getting Started with Qdrant
We can start a Qdrant instance in multiple ways. For example, you can start a Qdrant server with docker using the following commands
NOTE: Make sure that docker is running before using these commands
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
This command initiates the Qdrant server, enabling the REST API on port 6333 and the gRPC API on port 6334 along with a permanent storage named “qdrant_storage” in your working directory.
Additionally, the Qdrant server offers a user interface (UI) frontend accessible at localhost:6333/dashboard. This UI allows data visualization and experimentation with the provided REST APIs.
Optionally we could also use the Python client to set up an internal Qdrant instance with either in-memory or persistence storage without using the Qdrant server.
pip install qdrant-client
from qdrant_client import QdrantClient
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance, for testing, CI/CD
# OR
client = QdrantClient(path="path/to/db") # Persists changes to disk, fast prototyping
- We can use either the provided client libraries within a host application or the external APIs to communicate with the server
- Once a request is received by the server it’s processed internally and the response is sent back.
- All Qdrant APIs are Idempotent meaning calling the same API multiple times is the same as calling it only once.
- However, when similar requests with different data are made, the new data typically overwrites or updates the existing information.
Qdrant Database Structure Overview:
A single data item in the storage is called a point. The point consists of a unique ID, the vector data and any other additional information can be stored as payload.
The points cannot be stored directly in the database, they must be stored in a collection. A collection is a data container consisting of a group of points with a predefined vector length and a search metric.
Whenever a query is made against the Qdrant database, we must specify the collection name as points only exist within collections and are isolated from other collections. Qdrant searches the collection for points with similar values.
- Collection: A named group of points in the database.
- Point: A record containing the vector with an optional payload.
- Payload: Any information or metadata associated with a single vector that can be stored in JSON format.
Conclusion
Qdrant is a powerful and versatile vector database that is well-suited for modern applications. Its open-source nature, high performance, and ease of use make it stand out from the competition. With its open-source availability, active community, and extensive documentation, Qdrant offers a compelling choice for anyone looking for robust solution and build the next generation of AI powered applications.