Skip to content

qdrant-client: set_model attempts network connection despite HF_HUB_OFFLINE=1 and local cache #565

@koolay

Description

@koolay

Title: qdrant-client: set_model attempts network connection despite HF_HUB_OFFLINE=1 and local cache

Current Behavior

When the HF_HUB_OFFLINE=1 environment variable is set, QdrantClient.set_model() still attempts to download the embedding model from Hugging Face. This fails in an offline environment, even when the model is already present in the local cache, preventing the client from initializing.

The logs paradoxically show fastembed reporting "offline mode is enabled" as the reason for a network connection failure, indicating that while the flag is recognized, the connection attempt is not being properly suppressed.

Steps to Reproduce

  1. Set up an environment with no internet access (e.g., a firewalled server or a Docker container).

  2. Set the environment variable: export HF_HUB_OFFLINE=1.

  3. Pre-download the embedding model into the specified cache directory (/app/.cache/fastembed).

  4. Confirm the model files are present in the cache. The directory structure and size should be verified:

    $ du -h -d 3 /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/
    
    241M    /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/blobs
    4.0K    /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/refs
    20K     /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots/faf4aa4225822f3bc6376869cb1164e8e3feedd0
    20K     /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots
    241M    /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/
  5. Run the following code:

    import os
    from qdrant_client import QdrantClient
    
    QDRANT_HOST = "localhost"
    QDRANT_PORT = 6333
    EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
    CACHE_DIR = "/app/.cache/fastembed" # Example cache path
    
    os.environ['FASTEMBED_CACHE_PATH'] = CACHE_DIR
    
    # This step fails due to network connection attempts despite the model being cached
    client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT)
    client.set_model(EMBEDDING_MODEL, cache_dir=CACHE_DIR)
    
    print("Client initialized successfully.") # This line is never reached
  6. Observe the error logs showing repeated attempts to connect to huggingface.co.

Relevant Log Output:

2025-10-28 13:31:18.048 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: Cannot reach https://huggingface.co/...: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable. Falling back to other sources.
2025-10-28 13:31:18.048 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.

Expected Behavior

When HF_HUB_OFFLINE=1 is set, qdrant-client should first check the specified cache_dir for the model. If the model files exist locally—as confirmed above—it should load them directly without initiating any network connections. The initialization should succeed seamlessly in an air-gapped environment.

Possible Solution

The issue appears to originate in the fastembed dependency. The model management logic must be updated to prioritize checking for a local model in the cache before attempting any download logic. When HF_HUB_OFFLINE=1 is set, the network download path should be completely bypassed, and the client should rely solely on the cached files.

Context (Environment)

We are deploying an application using qdrant-client in a secured, air-gapped production environment. All dependencies and models are pre-packaged into a container image. This bug is a blocker for our deployment, as the application fails to start due to its inability to operate in a true offline mode.

  • Python Version: 3.12.12
  • Operating System: Linux (Docker)
  • Key Environment Variables:
    • HF_HUB_OFFLINE=1
    • FASTEMBED_CACHE_PATH=/app/.cache/fastembed

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions