454 assembly on ec2

I recently came across a situation in which I needed to assemble some reads from a critter that we gathered from a couple of 454 sequencing runs. Unfortunately, the assembly of these runs using gsAssembler 2.5.3 required a bit more RAM than available on our local workstation running…

Getting taxonomy information from NCBI

Sometimes you need to get taxonomy information from NCBI, assuming that you know a particular species name. If you only are working with one species, then this is not very hard. When it comes to working with multiple species, however, attempting such a task using the web-frontend would be painful…

Sphinx + github with no submodules

Previously, I detailed how I was using git submodules and gh-pages at github to host the html versions of my documentation generated by sphinx. Basically, the problem is that using git submodules for this is a real pain (for me, at least). You've always got to remember to sync up…

Get you some homologs*

NOTE:  This is a repost of an entry that I wrote for the molecularecologist.com. Finding homologous genetic regions (let's ignore the homolog, ortholog, paralog distinction) across "genome-enabled" organisms is a handy thing to know how to do. Yet, sometimes this task appears harder than it really is, particularly given…

Casting a numpy array of strings to int

Sometimes you need to create an array from a string, and then you need to cast the array (which is of string type) into something more useful like int - for example when reading PHRED quality scores from a file. You can do this several ways, often using a list…

Chunking a fasta file, part 2

Well, it took me more time than I had planned to get around to wrapping this up... but, it is what it is. I have completed some code that will use single- or multiple-processes to split a fasta or fastq file into a requested number of subunits. I have yet…