-
Notifications
You must be signed in to change notification settings - Fork 165
Description
Title: qdrant-client: set_model attempts network connection despite HF_HUB_OFFLINE=1 and local cache
Current Behavior
When the HF_HUB_OFFLINE=1 environment variable is set, QdrantClient.set_model() still attempts to download the embedding model from Hugging Face. This fails in an offline environment, even when the model is already present in the local cache, preventing the client from initializing.
The logs paradoxically show fastembed reporting "offline mode is enabled" as the reason for a network connection failure, indicating that while the flag is recognized, the connection attempt is not being properly suppressed.
Steps to Reproduce
-
Set up an environment with no internet access (e.g., a firewalled server or a Docker container).
-
Set the environment variable:
export HF_HUB_OFFLINE=1. -
Pre-download the embedding model into the specified cache directory (
/app/.cache/fastembed). -
Confirm the model files are present in the cache. The directory structure and size should be verified:
$ du -h -d 3 /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/ 241M /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/blobs 4.0K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/refs 20K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots/faf4aa4225822f3bc6376869cb1164e8e3feedd0 20K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots 241M /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/
-
Run the following code:
import os from qdrant_client import QdrantClient QDRANT_HOST = "localhost" QDRANT_PORT = 6333 EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" CACHE_DIR = "/app/.cache/fastembed" # Example cache path os.environ['FASTEMBED_CACHE_PATH'] = CACHE_DIR # This step fails due to network connection attempts despite the model being cached client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT) client.set_model(EMBEDDING_MODEL, cache_dir=CACHE_DIR) print("Client initialized successfully.") # This line is never reached
-
Observe the error logs showing repeated attempts to connect to
huggingface.co.
Relevant Log Output:
2025-10-28 13:31:18.048 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: Cannot reach https://huggingface.co/...: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable. Falling back to other sources.
2025-10-28 13:31:18.048 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
Expected Behavior
When HF_HUB_OFFLINE=1 is set, qdrant-client should first check the specified cache_dir for the model. If the model files exist locally—as confirmed above—it should load them directly without initiating any network connections. The initialization should succeed seamlessly in an air-gapped environment.
Possible Solution
The issue appears to originate in the fastembed dependency. The model management logic must be updated to prioritize checking for a local model in the cache before attempting any download logic. When HF_HUB_OFFLINE=1 is set, the network download path should be completely bypassed, and the client should rely solely on the cached files.
Context (Environment)
We are deploying an application using qdrant-client in a secured, air-gapped production environment. All dependencies and models are pre-packaged into a container image. This bug is a blocker for our deployment, as the application fails to start due to its inability to operate in a true offline mode.
- Python Version: 3.12.12
- Operating System: Linux (Docker)
- Key Environment Variables:
HF_HUB_OFFLINE=1FASTEMBED_CACHE_PATH=/app/.cache/fastembed