Lego my regex

NOTE:  This is a repost of an entry that I wrote for the molecularecologist.com. Regular expressions are something that pretty much everyone working with more than a handful of data should take the time to learn (a handful being around 500 lines). They can easily improve your life, particularly…

Python multiprocessing - multiple producers, single consumer

I'm working on some code that I initially wrote to write results to a mysql database. I chose mysql because it supports concurrent writes, and this program processess data in parallel having each process write it's results to the database when a given task is complete. This was a lazy…

(Relatively) Easily get coverage for velvet assemblies

You can get kmer coverage from contigs assembled by velvet by parsing the kmer value from the output fasta header, but sometimes I want "actual" coverage for contigs or coverage across a specific subset of contigs. Here is a way to do this relatively painlessly (requires that you first download…

Beast in the cloud

We've been running beast and mrbayes on several data sets lately, generally using ec2 to help us run multiple analyses simultaneously. Along those lines, I was interested in getting beast (using the beagle-lib running on ec2, to take advantage of their GPU HPC options (what a load of acronyms!). Anyway…

Get your protein

NOTE: This is a repost of an entry that I wrote for the molecularecologist.com. This weekend, I was doing a little work on one of our projects where we are using various cpDNA genes. I really needed to get a number of protein sequences from Genbank for the products…

454 assembly on ec2

I recently came across a situation in which I needed to assemble some reads from a critter that we gathered from a couple of 454 sequencing runs. Unfortunately, the assembly of these runs using gsAssembler 2.5.3 required a bit more RAM than available on our local workstation running…