Skip to content

Commit f5f9f70

Browse files
committed
Add memory usage benchmarks
1 parent 552f622 commit f5f9f70

File tree

5 files changed

+982
-1
lines changed

5 files changed

+982
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,7 @@ This structure ensures that:
251251
- [x] Context tracking for RPC calls
252252
- [x] Async/await support
253253
- [x] Performance benchmarking suite
254+
- [x] Memory usage tracking and benchmarking
254255

255256
### 🚧 In Progress
256257
- [ ] Documentation site

benchmarks/README.md

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,92 @@ The benchmark suite includes robust error handling:
249249
- `benchmarks/simple_benchmark.py`: Quick benchmarks for rapid testing
250250
- `tests/test_benchmarks.py`: Benchmark runner class and test utilities
251251

252+
## Memory Benchmarking
253+
254+
### Overview
255+
256+
The memory benchmarking suite (`benchmarks/memory_benchmark.py`) measures RAM and VRAM usage across host and child processes with varying numbers of extensions and different tensor sharing configurations.
257+
258+
### Running Memory Benchmarks
259+
260+
```bash
261+
# Run full memory benchmark suite
262+
python benchmarks/memory_benchmark.py
263+
264+
# Test with custom extension counts
265+
python benchmarks/memory_benchmark.py --counts 1,5,10,20,50
266+
267+
# Test up to 100 extensions
268+
python benchmarks/memory_benchmark.py --max-extensions 100
269+
270+
# Only test large tensor sharing
271+
python benchmarks/memory_benchmark.py --large-only
272+
273+
# Only test small tensor scaling
274+
python benchmarks/memory_benchmark.py --small-only
275+
```
276+
277+
### Memory Benchmark Features
278+
279+
1. **Process Memory Tracking**: Uses `psutil` to track RAM usage across process trees
280+
2. **GPU Memory Tracking**: Uses `nvidia-ml-py3` to track VRAM usage per process
281+
3. **Extension Scaling**: Tests memory usage with 1-100 extensions
282+
4. **Tensor Sharing Analysis**: Compares memory usage with and without `share_torch`
283+
5. **Large Tensor Tests**: Tests with 2GB tensors to verify memory sharing efficiency
284+
285+
### Memory Benchmark Output
286+
287+
The memory benchmark provides detailed tables showing:
288+
- RAM usage per extension
289+
- Memory overhead for tensor transfers
290+
- VRAM usage for GPU tensors
291+
- Memory savings from `share_torch` optimization
292+
293+
Example output:
294+
```
295+
MEMORY BENCHMARK SUMMARY
296+
================================================================================
297+
298+
Baseline Memory Usage:
299+
RAM: 150.3 MB
300+
VRAM: 0.0 MB
301+
302+
CPU NO SHARE Results:
303+
+-------------+----------------+-------------------+-------------+---------+
304+
| Extensions | RAM/Ext (MB) | Tensor RAM (MB) | VRAM (MB) | Shared |
305+
+=============+================+===================+=============+=========+
306+
| 1 | 45.2 | 1.1 | 0.0 | No |
307+
+-------------+----------------+-------------------+-------------+---------+
308+
| 5 | 44.8 | 5.3 | 0.0 | No |
309+
+-------------+----------------+-------------------+-------------+---------+
310+
311+
2GB TENSOR SHARING TEST:
312+
+--------------------+--------------------+--------------------------+------------------------+
313+
| Config | Tensor Size (MB) | Distribution RAM (MB) | RAM/Extension (MB) |
314+
+====================+====================+==========================+========================+
315+
| share_torch=False | 2048.0 | 10240.0 | 2048.0 |
316+
+--------------------+--------------------+--------------------------+------------------------+
317+
| share_torch=True | 2048.0 | 512.0 | 102.4 |
318+
+--------------------+--------------------+--------------------------+------------------------+
319+
320+
Memory Sharing Analysis:
321+
Memory saved with share_torch: 9728.0 MB (95.0%)
322+
```
323+
324+
### Key Metrics
325+
326+
- **RAM/Extension**: Average memory overhead per extension process
327+
- **Tensor RAM**: Additional RAM used for tensor distribution
328+
- **VRAM**: GPU memory usage (if CUDA available)
329+
- **Memory Sharing**: Whether tensors are shared (same memory address) or copied
330+
252331
## Contributing
253332

254333
When adding new benchmarks:
255-
1. Follow the existing pattern in `benchmarks/benchmark.py`
334+
1. Follow the existing pattern in `benchmarks/benchmark.py` or `benchmarks/memory_benchmark.py`
256335
2. Include error handling for potential failures
257336
3. Add appropriate test data sizes
258337
4. Document what the benchmark measures
259338
5. Update this README with new benchmark descriptions
260339
6. Test with various `--torch-mode` options to ensure compatibility
340+
7. For memory benchmarks, ensure proper cleanup to avoid memory leaks

0 commit comments

Comments
 (0)