Skip to content

Conversation

@Matvezy
Copy link
Contributor

@Matvezy Matvezy commented Nov 25, 2025

Description

Currently all roboflow_api downloads save the whole object into RAM. For large VLMs, the weight files size exceed the RAM capacity in many cases, making VLMs unusable. This PR fixes the issue by adding chunked download and switching transformers models to use it.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Below are testing screenshots. Prior to a fix RAM usage kept increasing alongside file download progress, eventually killing the whole process (1st screenshot).
After the fix RAM usage stays fixed at chunk size ~1MB , in this case the inference works (2nd screenshot).
Tested it by running Qwen-2.5-VL inference on L4 machine with 16GB RAM. Prior to the fix the process was getting killed and Qwen-2.5-VL, after the fixed inference successfully ran on my machine.

image (5) image (6)

Any specific deployment considerations

No

Docs

  • Docs updated? What were the changes: No

with open(temp_file_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
f.write(chunk)

Check failure

Code scanning / CodeQL

Clear-text storage of sensitive information High

This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.

Copilot Autofix

AI 1 day ago

To fix this issue, we need to ensure that responses resulting from API calls where secrets (API keys) are sent via query parameters are not stored as cleartext files on disk, especially if these responses could include confidential information (such as the API key itself, credentials, or sensitive configuration). There are two options:

  • Option 1: Avoid sending API keys as query parameters. Instead, use HTTP headers for authentication. This will remove the risk of API keys being echoed in the response and stop them from appearing in URLs written to logs or disk.
  • Option 2: If the URLs must include API keys (for backwards compatibility or external requirements), then prior to writing the response content to disk, we must either:
    • Parse and redact any API keys or secret information from the response content before writing,
    • Or, if the content itself is non-sensitive binary (e.g., model weights), ensure by contract that no secrets are returned. At minimum, do not log the URL with queries, and do not persist such responses if they could contain secrets.

Recommended fix: Refactor all relevant HTTP request code (specifically _get_from_url() and _stream_url_to_cache()) so that API keys are never sent via query parameters, but rather through HTTP headers such as Authorization: Bearer <token> or custom headers. Update construction of API URLs in get_roboflow_model_data() so "api_key" is removed from query parameters, and instead set proper headers for requests. This way, no secret will ever be present in the request URL, nor in the persisted content.

Code changes:

  • Update all locations where the "api_key" is appended to params in API URL construction to remove it.
  • Update build_roboflow_api_headers() (presumed utility) to include the API key as a header for authentication if not already.
  • Ensure that _get_from_url(), and thus _stream_url_to_cache(), are calling requests with the API key in headers only.
  • Ensure that logging of URLs in these flows does not print secrets.

All changes are to be made in blocks shown for inference/core/roboflow_api.py; adjust only the code you have been shown related to query parameter construction and network request authorization. No additional dependencies are needed.


Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -365,8 +365,8 @@
             ("device", device_id),
             ("dynamic", "true"),
         ]
-        if api_key is not None:
-            params.append(("api_key", api_key))
+        # Remove API key from query params – authentication will use headers instead.
+        # params.append(("api_key", api_key))
 
         if (
             INTERNAL_WEIGHTS_URL_SUFFIX == "serverless"
@@ -882,7 +882,7 @@
     try:
         response = requests.get(
             wrap_url(url),
-            headers=build_roboflow_api_headers(),
+            headers=build_roboflow_api_headers(api_key=api_key if 'api_key' in locals() else None),
             timeout=ROBOFLOW_API_REQUEST_TIMEOUT,
             verify=ROBOFLOW_API_VERIFY_SSL,
             stream=stream,
EOF
@@ -365,8 +365,8 @@
("device", device_id),
("dynamic", "true"),
]
if api_key is not None:
params.append(("api_key", api_key))
# Remove API key from query params – authentication will use headers instead.
# params.append(("api_key", api_key))

if (
INTERNAL_WEIGHTS_URL_SUFFIX == "serverless"
@@ -882,7 +882,7 @@
try:
response = requests.get(
wrap_url(url),
headers=build_roboflow_api_headers(),
headers=build_roboflow_api_headers(api_key=api_key if 'api_key' in locals() else None),
timeout=ROBOFLOW_API_REQUEST_TIMEOUT,
verify=ROBOFLOW_API_VERIFY_SSL,
stream=stream,
Copilot is powered by AI and may make mistakes. Always verify output.
if chunk:
f.write(chunk)
if computed_md5 is not None:
computed_md5.update(chunk)

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

f"Downloading {filename}: {downloaded_mb:.1f}MB"
)

logger.info(progress_msg)

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 1 day ago

To fix this vulnerability, the code must ensure that sensitive information (e.g., API keys, secrets, tokens) is never logged. Specifically, the filename variable—used in the progress messages—must be sanitized before logging/printing. The sanitization should guarantee that sensitive query parameters like api_key, token, or any similar fields are removed from the string prior to logging.
In practical terms, this could be accomplished by stripping any query string from filenames extracted from URLs or by redacting any sensitive fields before logging. The safest approach is to avoid logging filenames that are derived directly from URLs containing sensitive query parameters. If filenames must be logged (for traceability or progress reporting), they should be passed through a sanitizer that removes query parameters or replaces known-sensitive values with [REDACTED].

Implementation steps:

  • Before using filename in any log or print statement, ensure any sensitive query parameters (such as api_key, token, signature, etc.) are removed or replaced with a placeholder (e.g., [REDACTED]).
  • Implement a helper function, e.g., sanitize_filename_for_logging, that parses filename as a URL (if applicable), strips query parameters, or replaces the value of sensitive fields with [REDACTED].
  • Use this sanitized version of filename in all logging and printing within the affected progress reporting code in inference/core/roboflow_api.py in the _stream_url_to_cache function.

Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -962,6 +962,21 @@
     )
     from inference.core.utils.file_system import dump_bytes_atomic, dump_bytes
 
+    def sanitize_filename_for_logging(fname: str) -> str:
+        try:
+            # Only keep the base path, drop the query string if present
+            # If this is not a URL, just return as is
+            if "?" in fname:
+                fname = fname.split("?")[0]
+            # Redact any common sensitive keywords that may appear
+            # as a belt-and-suspenders catch. This is basic but helpful.
+            sensitive_keys = ["api_key", "token", "signature", "key"]
+            for keyword in sensitive_keys:
+                fname = fname.replace(keyword, "[REDACTED]")
+            return fname
+        except Exception:
+            return "[UNSAFE_FILENAME]"
+
     initialise_cache(model_id=model_id)
     cache_file_path = get_cache_file_path(file=filename, model_id=model_id)
 
@@ -987,6 +1002,8 @@
         last_logged_mb = 0
         log_interval_mb = 10
 
+        safe_filename = sanitize_filename_for_logging(filename)
+
         with open(temp_file_path, "wb") as f:
             for chunk in response.iter_content(chunk_size=chunk_size):
                 if chunk:
@@ -1004,12 +1021,12 @@
                             total_mb = total_size_int / (1024 * 1024)
                             percent = (downloaded_bytes / total_size_int) * 100
                             progress_msg = (
-                                f"Downloading {filename}: {downloaded_mb:.1f}MB / "
+                                f"Downloading {safe_filename}: {downloaded_mb:.1f}MB / "
                                 f"{total_mb:.1f}MB ({percent:.1f}%)"
                             )
                         else:
                             progress_msg = (
-                                f"Downloading {filename}: {downloaded_mb:.1f}MB"
+                                f"Downloading {safe_filename}: {downloaded_mb:.1f}MB"
                             )
 
                         logger.info(progress_msg)
EOF
@@ -962,6 +962,21 @@
)
from inference.core.utils.file_system import dump_bytes_atomic, dump_bytes

def sanitize_filename_for_logging(fname: str) -> str:
try:
# Only keep the base path, drop the query string if present
# If this is not a URL, just return as is
if "?" in fname:
fname = fname.split("?")[0]
# Redact any common sensitive keywords that may appear
# as a belt-and-suspenders catch. This is basic but helpful.
sensitive_keys = ["api_key", "token", "signature", "key"]
for keyword in sensitive_keys:
fname = fname.replace(keyword, "[REDACTED]")
return fname
except Exception:
return "[UNSAFE_FILENAME]"

initialise_cache(model_id=model_id)
cache_file_path = get_cache_file_path(file=filename, model_id=model_id)

@@ -987,6 +1002,8 @@
last_logged_mb = 0
log_interval_mb = 10

safe_filename = sanitize_filename_for_logging(filename)

with open(temp_file_path, "wb") as f:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
@@ -1004,12 +1021,12 @@
total_mb = total_size_int / (1024 * 1024)
percent = (downloaded_bytes / total_size_int) * 100
progress_msg = (
f"Downloading {filename}: {downloaded_mb:.1f}MB / "
f"Downloading {safe_filename}: {downloaded_mb:.1f}MB / "
f"{total_mb:.1f}MB ({percent:.1f}%)"
)
else:
progress_msg = (
f"Downloading {filename}: {downloaded_mb:.1f}MB"
f"Downloading {safe_filename}: {downloaded_mb:.1f}MB"
)

logger.info(progress_msg)
Copilot is powered by AI and may make mistakes. Always verify output.
)

logger.info(progress_msg)
print(f"\r{progress_msg}", end="", flush=True)

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 1 day ago

The best way to fix this is to prevent sensitive information such as API keys from ever being printed to stdout or logs. Specifically, any variable derived from URLs or filenames that may contain sensitive query parameters must be sanitized before logging or printing. In this context, it is necessary to ensure that the API key (or any other sensitive data) is not included in progress_msg on line 1016. The sanitization should be applied to the filename variable, removing any query strings or sensitive components, so that the printed output is guaranteed to be devoid of sensitive data.
Steps to fix:

  1. Sanitize the filename so that it contains only the base filename (no query params, no API key).
  2. Replace logging/printing code (line 1015 and 1016) to use the sanitized version.
  3. Make sure sanitization is enforced even if future changes modify how the filename is derived.
    No additional imports are needed; use Python’s standard string manipulation methods.

Edit only the affected file and region:

  • In inference/core/roboflow_api.py, sanitize the filename before constructing progress_msg and use the sanitized version for logging/printing in the _stream_url_to_cache function.

Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -987,6 +987,8 @@
         last_logged_mb = 0
         log_interval_mb = 10
 
+        # Sanitize filename to avoid leaking sensitive data
+        sanitized_filename = filename.split("?")[0].split("/")[-1] if filename else ""
         with open(temp_file_path, "wb") as f:
             for chunk in response.iter_content(chunk_size=chunk_size):
                 if chunk:
@@ -1004,12 +1006,12 @@
                             total_mb = total_size_int / (1024 * 1024)
                             percent = (downloaded_bytes / total_size_int) * 100
                             progress_msg = (
-                                f"Downloading {filename}: {downloaded_mb:.1f}MB / "
+                                f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB / "
                                 f"{total_mb:.1f}MB ({percent:.1f}%)"
                             )
                         else:
                             progress_msg = (
-                                f"Downloading {filename}: {downloaded_mb:.1f}MB"
+                                f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB"
                             )
 
                         logger.info(progress_msg)
EOF
@@ -987,6 +987,8 @@
last_logged_mb = 0
log_interval_mb = 10

# Sanitize filename to avoid leaking sensitive data
sanitized_filename = filename.split("?")[0].split("/")[-1] if filename else ""
with open(temp_file_path, "wb") as f:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
@@ -1004,12 +1006,12 @@
total_mb = total_size_int / (1024 * 1024)
percent = (downloaded_bytes / total_size_int) * 100
progress_msg = (
f"Downloading {filename}: {downloaded_mb:.1f}MB / "
f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB / "
f"{total_mb:.1f}MB ({percent:.1f}%)"
)
else:
progress_msg = (
f"Downloading {filename}: {downloaded_mb:.1f}MB"
f"Downloading {sanitized_filename}: {downloaded_mb:.1f}MB"
)

logger.info(progress_msg)
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants