@@ -249,12 +249,92 @@ The benchmark suite includes robust error handling:
249249- ` benchmarks/simple_benchmark.py ` : Quick benchmarks for rapid testing
250250- ` tests/test_benchmarks.py ` : Benchmark runner class and test utilities
251251
252+ ## Memory Benchmarking
253+
254+ ### Overview
255+
256+ The memory benchmarking suite (` benchmarks/memory_benchmark.py ` ) measures RAM and VRAM usage across host and child processes with varying numbers of extensions and different tensor sharing configurations.
257+
258+ ### Running Memory Benchmarks
259+
260+ ``` bash
261+ # Run full memory benchmark suite
262+ python benchmarks/memory_benchmark.py
263+
264+ # Test with custom extension counts
265+ python benchmarks/memory_benchmark.py --counts 1,5,10,20,50
266+
267+ # Test up to 100 extensions
268+ python benchmarks/memory_benchmark.py --max-extensions 100
269+
270+ # Only test large tensor sharing
271+ python benchmarks/memory_benchmark.py --large-only
272+
273+ # Only test small tensor scaling
274+ python benchmarks/memory_benchmark.py --small-only
275+ ```
276+
277+ ### Memory Benchmark Features
278+
279+ 1 . ** Process Memory Tracking** : Uses ` psutil ` to track RAM usage across process trees
280+ 2 . ** GPU Memory Tracking** : Uses ` nvidia-ml-py3 ` to track VRAM usage per process
281+ 3 . ** Extension Scaling** : Tests memory usage with 1-100 extensions
282+ 4 . ** Tensor Sharing Analysis** : Compares memory usage with and without ` share_torch `
283+ 5 . ** Large Tensor Tests** : Tests with 2GB tensors to verify memory sharing efficiency
284+
285+ ### Memory Benchmark Output
286+
287+ The memory benchmark provides detailed tables showing:
288+ - RAM usage per extension
289+ - Memory overhead for tensor transfers
290+ - VRAM usage for GPU tensors
291+ - Memory savings from ` share_torch ` optimization
292+
293+ Example output:
294+ ```
295+ MEMORY BENCHMARK SUMMARY
296+ ================================================================================
297+
298+ Baseline Memory Usage:
299+ RAM: 150.3 MB
300+ VRAM: 0.0 MB
301+
302+ CPU NO SHARE Results:
303+ +-------------+----------------+-------------------+-------------+---------+
304+ | Extensions | RAM/Ext (MB) | Tensor RAM (MB) | VRAM (MB) | Shared |
305+ +=============+================+===================+=============+=========+
306+ | 1 | 45.2 | 1.1 | 0.0 | No |
307+ +-------------+----------------+-------------------+-------------+---------+
308+ | 5 | 44.8 | 5.3 | 0.0 | No |
309+ +-------------+----------------+-------------------+-------------+---------+
310+
311+ 2GB TENSOR SHARING TEST:
312+ +--------------------+--------------------+--------------------------+------------------------+
313+ | Config | Tensor Size (MB) | Distribution RAM (MB) | RAM/Extension (MB) |
314+ +====================+====================+==========================+========================+
315+ | share_torch=False | 2048.0 | 10240.0 | 2048.0 |
316+ +--------------------+--------------------+--------------------------+------------------------+
317+ | share_torch=True | 2048.0 | 512.0 | 102.4 |
318+ +--------------------+--------------------+--------------------------+------------------------+
319+
320+ Memory Sharing Analysis:
321+ Memory saved with share_torch: 9728.0 MB (95.0%)
322+ ```
323+
324+ ### Key Metrics
325+
326+ - ** RAM/Extension** : Average memory overhead per extension process
327+ - ** Tensor RAM** : Additional RAM used for tensor distribution
328+ - ** VRAM** : GPU memory usage (if CUDA available)
329+ - ** Memory Sharing** : Whether tensors are shared (same memory address) or copied
330+
252331## Contributing
253332
254333When adding new benchmarks:
255- 1 . Follow the existing pattern in ` benchmarks/benchmark.py `
334+ 1 . Follow the existing pattern in ` benchmarks/benchmark.py ` or ` benchmarks/memory_benchmark.py `
2563352 . Include error handling for potential failures
2573363 . Add appropriate test data sizes
2583374 . Document what the benchmark measures
2593385 . Update this README with new benchmark descriptions
2603396 . Test with various ` --torch-mode ` options to ensure compatibility
340+ 7 . For memory benchmarks, ensure proper cleanup to avoid memory leaks
0 commit comments