Switch VLMs Download to Streaming #1752

Matvezy · 2025-11-25T01:01:58Z

Description

Currently all roboflow_api downloads save the whole object into RAM. For large VLMs, the weight files size exceed the RAM capacity in many cases, making VLMs unusable. This PR fixes the issue by adding chunked download and switching transformers models to use it.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Below are testing screenshots. Prior to a fix RAM usage kept increasing alongside file download progress, eventually killing the whole process (1st screenshot).
After the fix RAM usage stays fixed at chunk size ~1MB , in this case the inference works (2nd screenshot).
Tested it by running Qwen-2.5-VL inference on L4 machine with 16GB RAM. Prior to the fix the process was getting killed and Qwen-2.5-VL, after the fixed inference successfully ran on my machine.

Any specific deployment considerations

No

Docs

Docs updated? What were the changes: No

inference/core/roboflow_api.py

+        with open(temp_file_path, 'wb') as f:
+            for chunk in response.iter_content(chunk_size=chunk_size):
+                if chunk:
+                    f.write(chunk)


To fix this issue, we need to ensure that responses resulting from API calls where secrets (API keys) are sent via query parameters are not stored as cleartext files on disk, especially if these responses could include confidential information (such as the API key itself, credentials, or sensitive configuration). There are two options:

Option 1: Avoid sending API keys as query parameters. Instead, use HTTP headers for authentication. This will remove the risk of API keys being echoed in the response and stop them from appearing in URLs written to logs or disk.

Option 2: If the URLs must include API keys (for backwards compatibility or external requirements), then prior to writing the response content to disk, we must either:

Parse and redact any API keys or secret information from the response content before writing,

Or, if the content itself is non-sensitive binary (e.g., model weights), ensure by contract that no secrets are returned. At minimum, do not log the URL with queries, and do not persist such responses if they could contain secrets.

Recommended fix: Refactor all relevant HTTP request code (specifically _get_from_url() and _stream_url_to_cache()) so that API keys are never sent via query parameters, but rather through HTTP headers such as Authorization: Bearer <token> or custom headers. Update construction of API URLs in get_roboflow_model_data() so "api_key" is removed from query parameters, and instead set proper headers for requests. This way, no secret will ever be present in the request URL, nor in the persisted content.

Code changes:

Update all locations where the "api_key" is appended to params in API URL construction to remove it.

Update build_roboflow_api_headers() (presumed utility) to include the API key as a header for authentication if not already.

Ensure that _get_from_url(), and thus _stream_url_to_cache(), are calling requests with the API key in headers only.

Ensure that logging of URLs in these flows does not print secrets.

All changes are to be made in blocks shown for inference/core/roboflow_api.py; adjust only the code you have been shown related to query parameter construction and network request authorization. No additional dependencies are needed.

inference/core/roboflow_api.py

+                if chunk:
+                    f.write(chunk)
+                    if computed_md5 is not None:
+                        computed_md5.update(chunk)


inference/core/roboflow_api.py

+                                f"Downloading {filename}: {downloaded_mb:.1f}MB"
+                            )
+
+                        logger.info(progress_msg)


To fix this vulnerability, the code must ensure that sensitive information (e.g., API keys, secrets, tokens) is never logged. Specifically, the filename variable—used in the progress messages—must be sanitized before logging/printing. The sanitization should guarantee that sensitive query parameters like api_key, token, or any similar fields are removed from the string prior to logging.
In practical terms, this could be accomplished by stripping any query string from filenames extracted from URLs or by redacting any sensitive fields before logging. The safest approach is to avoid logging filenames that are derived directly from URLs containing sensitive query parameters. If filenames must be logged (for traceability or progress reporting), they should be passed through a sanitizer that removes query parameters or replaces known-sensitive values with [REDACTED].

Implementation steps:

Before using filename in any log or print statement, ensure any sensitive query parameters (such as api_key, token, signature, etc.) are removed or replaced with a placeholder (e.g., [REDACTED]).

Implement a helper function, e.g., sanitize_filename_for_logging, that parses filename as a URL (if applicable), strips query parameters, or replaces the value of sensitive fields with [REDACTED].

Use this sanitized version of filename in all logging and printing within the affected progress reporting code in inference/core/roboflow_api.py in the _stream_url_to_cache function.

inference/core/roboflow_api.py

+                            )
+
+                        logger.info(progress_msg)
+                        print(f"\r{progress_msg}", end="", flush=True)


The best way to fix this is to prevent sensitive information such as API keys from ever being printed to stdout or logs. Specifically, any variable derived from URLs or filenames that may contain sensitive query parameters must be sanitized before logging or printing. In this context, it is necessary to ensure that the API key (or any other sensitive data) is not included in progress_msg on line 1016. The sanitization should be applied to the filename variable, removing any query strings or sensitive components, so that the printed output is guaranteed to be devoid of sensitive data.
Steps to fix:

Sanitize the filename so that it contains only the base filename (no query params, no API key).

Replace logging/printing code (line 1015 and 1016) to use the sanitized version.

Make sure sanitization is enforced even if future changes modify how the filename is derived.
No additional imports are needed; use Python’s standard string manipulation methods.

Edit only the affected file and region:

In inference/core/roboflow_api.py, sanitize the filename before constructing progress_msg and use the sanitized version for logging/printing in the _stream_url_to_cache function.

Matvezy added 2 commits November 24, 2025 23:48

change download from full to streaming for transformers

6728b6e

logs added

6aa2fcb

Matvezy requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners November 25, 2025 01:01

github-advanced-security bot found potential problems Nov 25, 2025

View reviewed changes

style

5d9fab4

github-advanced-security bot found potential problems Nov 25, 2025

View reviewed changes

@@ -365,8 +365,8 @@
                         ("device", device_id),
                         ("dynamic", "true"),
                     ]
-                    if api_key is not None:
-                        params.append(("api_key", api_key))
+                    # Remove API key from query params – authentication will use headers instead.
+                    # params.append(("api_key", api_key))
                     if (
                         INTERNAL_WEIGHTS_URL_SUFFIX == "serverless"
@@ -882,7 +882,7 @@
                 try:
                     response = requests.get(
                         wrap_url(url),
-                        headers=build_roboflow_api_headers(),
+                        headers=build_roboflow_api_headers(api_key=api_key if 'api_key' in locals() else None),
                         timeout=ROBOFLOW_API_REQUEST_TIMEOUT,
                         verify=ROBOFLOW_API_VERIFY_SSL,
                         stream=stream,

@@ -962,6 +962,21 @@
                 )
                 from inference.core.utils.file_system import dump_bytes_atomic, dump_bytes
+                def sanitize_filename_for_logging(fname: str) -> str:
+                    try:
+                        # Only keep the base path, drop the query string if present
+                        # If this is not a URL, just return as is
+                        if "?" in fname:
+                            fname = fname.split("?")[0]
+                        # Redact any common sensitive keywords that may appear
+                        # as a belt-and-suspenders catch. This is basic but helpful.
+                        sensitive_keys = ["api_key", "token", "signature", "key"]
+                        for keyword in sensitive_keys:
+                            fname = fname.replace(keyword, "[REDACTED]")
+                        return fname
+                    except Exception:
+                        return "[UNSAFE_FILENAME]"
                 initialise_cache(model_id=model_id)
                 cache_file_path = get_cache_file_path(file=filename, model_id=model_id)
@@ -987,6 +1002,8 @@
                     last_logged_mb = 0
                     log_interval_mb = 10
+                    safe_filename = sanitize_filename_for_logging(filename)
                     with open(temp_file_path, "wb") as f:
                         for chunk in response.iter_content(chunk_size=chunk_size):
                             if chunk:
@@ -1004,12 +1021,12 @@
                                         total_mb = total_size_int / (1024 * 1024)
                                         percent = (downloaded_bytes / total_size_int) * 100
                                         progress_msg = (
-                                            f"Downloading {filename}: {downloaded_mb:.1f}MB / "
+                                            f"Downloading {safe_filename}: {downloaded_mb:.1f}MB / "
                                             f"{total_mb:.1f}MB ({percent:.1f}%)"
                                         )
                                     else:
                                         progress_msg = (
-                                            f"Downloading {filename}: {downloaded_mb:.1f}MB"
+                                            f"Downloading {safe_filename}: {downloaded_mb:.1f}MB"
                                         )
                                     logger.info(progress_msg)

@@ -987,6 +987,8 @@
                     last_logged_mb = 0
                     log_interval_mb = 10
+                    # Sanitize filename to avoid leaking sensitive data
+                    sanitized_filename = filename.split("?")[0].split("/")[-1] if filename else ""
                     with open(temp_file_path, "wb") as f:
                         for chunk in response.iter_content(chunk_size=chunk_size):
                             if chunk:
@@ -1004,12 +1006,12 @@
                                         total_mb = total_size_int / (1024 * 1024)
                                         percent = (downloaded_bytes / total_size_int) * 100
                                         progress_msg = (
-                                            f"Downloading {filename}: {downloaded_mb:.1f}MB / "
+                                            f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB / "
                                             f"{total_mb:.1f}MB ({percent:.1f}%)"
                                         )
                                     else:
                                         progress_msg = (
-                                            f"Downloading {filename}: {downloaded_mb:.1f}MB"
+                                            f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB"
                                         )
                                     logger.info(progress_msg)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch VLMs Download to Streaming #1752

Switch VLMs Download to Streaming #1752

Matvezy commented Nov 25, 2025 •

edited

Loading

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Uh oh!

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Switch VLMs Download to Streaming #1752

Are you sure you want to change the base?

Switch VLMs Download to Streaming #1752

Conversation

Matvezy commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Matvezy commented Nov 25, 2025 •

edited

Loading