Getting taxonomy information from NCBI

Sometimes you need to get taxonomy information from NCBI, assuming that you know a particular species name. If you only are working with one species, then this is not very hard. When it comes to working with multiple species, however, attempting such a task using the web-frontend would be painful…

Get you some homologs*

NOTE:  This is a repost of an entry that I wrote for the molecularecologist.com. Finding homologous genetic regions (let's ignore the homolog, ortholog, paralog distinction) across "genome-enabled" organisms is a handy thing to know how to do. Yet, sometimes this task appears harder than it really is, particularly given…

Casting a numpy array of strings to int

Sometimes you need to create an array from a string, and then you need to cast the array (which is of string type) into something more useful like int - for example when reading PHRED quality scores from a file. You can do this several ways, often using a list…

Chunking a fasta file, part 2

Well, it took me more time than I had planned to get around to wrapping this up... but, it is what it is. I have completed some code that will use single- or multiple-processes to split a fasta or fastq file into a requested number of subunits. I have yet…

A plethora of sequence tags

UPDATE:  You can find these tags and a description of the program used to generate them in Faircloth and Glenn (2012). Sequence tags can be attached to DNA reads of interest to let you track different pools of reads following a second generation sequencing run. The best way to generate…

Extending the nextera indexing set

UPDATE:  The tags originally described in this post are available in Faircloth and Glenn (2012) and also described in this post.  Because the code originally detailed in this post has been deprecated in favor of the code described in the manuscript and available in the edittag package, I have removed…