Python multiprocessing - multiple producers, single consumer

I'm working on some code that I initially wrote to write results to a mysql database. I chose mysql because it supports concurrent writes, and this program processess data in parallel having each process write it's results to the database when a given task is complete. This was a lazy way of getting the data stored relatively quickly and easily (let's forget about the overhead of any given process for a moment). That said, having mysql as a dependency of your code is a bummer, particularly for folks with one-off tasks who don't have the time or patience to install and configure mysqld. Additionally, portability of mysql data, while relatively easy using dumpfiles, is not as useful as several other options.

Because I'm working in Python, I can also use the exceptional sqlite3 module. It offers most of the database functions that i want/need, it's generally available on any platform, it is open source, and databases are portable between machines.

However, sqlite3 does not support concurrent writes - meaning that I need a general way to process my data in parallel while writing my results to a database using a single process. This basically translates to needing a multiprocessing model having multiple producers and a single consumer - something with few examples available on the interwebs.

So, after a bit of playing around, here's some test code that does just that:


comments powered by Disqus