Optimized Large-Scale IPC in Python with Shared Memory Queues

Install using:

pip install py-sharedmemory

Python's standard multiprocessing.Queue relies on _winapi.CreateFile for inter-process communication (IPC), introducing significant I/O overhead. This can become a performance bottleneck in demanding applications like Reinforcement Learning, scientific computing, or distributed systems that transfer large amounts of data (e.g. NumPy arrays or tensors) between processes (actors, replay buffers, trainers, etc.).

py-sharedmemory provides an alternative that utilizes multiprocessing.shared_memory (and therefore _winapi.CreateFileMapping) for the main data and sends only lightweight metadata through the queues. This eliminates most inter-process I/O, reducing system load and latency. If you're hitting performance limits with standard queues, py-sharedmemory may help.

Usage example

import multiprocessing as mp
from memory import create_shared_memory_pair, SharedMemorySender, SharedMemoryReceiver

def producer_sm(sender:SharedMemorySender):
    your_data = "your data"

    sender.put(your_data) # blocks until space is available
    sender.put(your_data, timeout=3) # raises queue.Full exception after 3s
    sender.put(your_data, block=False) # raises queue.Full exception if no space available
    sender.put_nowait(your_data) # ^ equivalent to above
    # ...

    # wait for all data to be received before closing the sender
    # to properly close all shared memory objects
    sender.wait_for_all_ack()

def consumer_sm(receiver:SharedMemoryReceiver):
    data = receiver.get() # blocks
    data = receiver.get(timeout=3) # raises queue.Empty exception after 3s
    data = receiver.get(block=False) # raises queue.Empty exception if no data available
    data = receiver.get_nowait() # ^ equivalent to above
    # ...

if __name__ == '__main__':
    sender, receiver = create_shared_memory_pair(capacity=5)
    mp.Process(target=producer_sm, args=(sender,)).start()
    mp.Process(target=consumer_sm, args=(receiver,)).start()

Considerations

There is a certain overhead to allocating shared memory which is especially noticable for smaller objects. Use the following heuristic depending on the size of the data you are handling:

	10B	100B	1KB	10KB	100KB	1MB	10MB	100MB	1GB	10GB
`mp.Queue()`	✅	✅	✅	✅	✅	❌	❌	❌	❌	❌
`py-sharedmemory`	❌	❌	❌	❌	❌	✅	✅	✅	✅	✅

Performance Testing

I benchmarked data transfer performance using both the standard multiprocessing.Queue and my py-sharedmemory implementation:

Starting around 1MB per message, py-sharedmemory matches or slightly trails the standard queue in speed. However, the key advantage is that it avoids generating I/O, which becomes critical at larger data sizes. Notably, the standard implementation fails with 10GB messages, while py-sharedmemory handles them reliably.

Here’s the I/O load on my Windows system using the standard queue:

And here’s the I/O load using py-sharedmemory:

In practice, py-sharedmemory delivers smoother and more stable performance, with consistent put/get times and no slowdowns, especially under high data throughput.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
memory		memory
memory_test		memory_test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements_testing.txt		requirements_testing.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimized Large-Scale IPC in Python with Shared Memory Queues

Usage example

Considerations

Performance Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tobiaswuerth/python_shared_memory_queue

Folders and files

Latest commit

History

Repository files navigation

Optimized Large-Scale IPC in Python with Shared Memory Queues

Usage example

Considerations

Performance Testing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages