Skip to content

Potential Command Injection Vulnerability: Unvalidated Commands Generated by LLM #693

@glmgbj233

Description

@glmgbj233

Description:

In the file , within the Runner class—specifically in the run_code method (approximately lines 119–182)—we identified a potential vulnerability where commands generated by a Large Language Model (LLM) are executed without sufficient validation or sanitization.

Details:

  1. The response from self.llm.inference(prompt, project_name) is superficially validated via the method (defined at line 63). However, this method only checks for the existence of "action" and "response" fields in the response, and does not inspect or sanitize the contents of the command field.

  2. When the action is "command", the command string extracted from valid_response["command"] is split using command.split(" ") and passed directly to subprocess.run(). This behavior allows any command—malicious or otherwise—generated by the LLM to be executed by the system. For instance, commands like command1; command2 or rm -rf / could lead to serious command injection vulnerabilities.

Impact:

  • Unauthorized code execution.
  • Data leakage or corruption.
  • Compromise of system integrity.

Suggested Mitigation:

To prevent exploitation, strict validation and sanitization must be applied before passing any LLM-generated command to subprocess.run(). Recommended actions include:

  • Whitelist Validation: Only allow execution of commands explicitly defined as safe.
  • Argument Sanitization: Properly sanitize and escape command arguments to prevent injection of additional commands or malicious flags.
  • Sandbox Execution: Consider executing LLM-generated commands in a restricted sandboxed environment to contain potential damage.
  • Principle of Least Privilege: Ensure that the process executing commands runs with the minimum privileges necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions