Skip to content

Remote Code Execution by Pickle Deserialization via LogRecordStreamHandler in Auto-Sklearn #1766

@Chenpinji

Description

@Chenpinji

Description

In the core logging functionality of the Auto-Sklearn framework (autosklearn/util/logging_.py), the LogRecordStreamHandler class directly uses pickle.loads() to deserialize messages received from TCP socket connections without any sanitization, which results in a remote code execution vulnerability.

The vulnerable code is located in the unPickle() method of the LogRecordStreamHandler class (lines 274-275). This method is called in the handle() method (line 270) when processing incoming log records from TCP connections. The server accepts pickled objects over the network and deserializes them without any validation or restriction on what classes can be instantiated.

Proof of Concept

Step 1: The victim user starts an Auto-Sklearn log server that binds to the network interface:

python log_server.py --host 0.0.0.0 --port 9020

Step 2: The attacker can then send malicious pickle-serialized data to the TCP socket. The provided log_client.py demonstrates how an attacker can execute arbitrary commands like ls -l:

class RCE:
def __reduce__(self):
import os
return (os.system, ("ls -l",))
record = RCE()
send_log_record(args.host, args.port, record)

The RCE class implements the __reduce__ method, which is called during pickle deserialization. When the server deserializes this object, it will execute os.system("ls -l"), resulting in remote command execution.

To reproduce this issue:

  1. Start the log server:
python log_server.py --host 0.0.0.0 --port 9020
  1. Run the malicious client:
python log_client.py --host <target_host> --port 9020

We also give a demo video in the attachment, along with log_server.py and log client.py.

Impact

Remote code execution on the victim's machine over the network. Once the victim starts the Auto-Sklearn log server with network binding, an attacker on the network can gain arbitrary code execution by:

  1. Scanning and finding the victim's Auto-Sklearn logging service
  2. Sending malicious pickle payloads to the logging port
  3. Executing arbitrary system commands with the privileges of the Auto-Sklearn process

This vulnerability is particularly severe because:

  • The pickle deserialization is completely unrestricted, allowing execution of any Python code
  • No authentication or authorization is required to send log records
  • The vulnerability can be exploited by anyone who can reach the server's network interface

We understand that this log server implementation was not originally intended to be exposed as a public service, because our review of the code shows that the log server is started as a secondary process together with the main process and is, by default, meant for internal network use. However, we are concerned that when users deploy Auto-Sklearn, if their network protections are not strict (for example, if the interface is exposed to the public internet or the firewall is not properly configured), this could lead to extremely serious consequences — namely remote code execution (RCE). Since the security of the software should not rely solely on network isolation, we are reporting this vulnerability in the hope that it can be fixed.

Mitigation

1.Sanitize data before deserialization: Rewrite the unPickle() method to use a custom Unpickler with a restricted find_class() method that only allows safe classes, or replace pickle with more secure serialization methods such as JSON, msgpack, or protocol buffers.

2.Network access control: If the logging service is intended for internal use only, accept connections solely from the internal network (localhost or specific trusted IPs) and reject other connections.

log_client.py
log_server.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions