Disclaimer: Jason S. Iacovoni is no longer a student/researcher at TSRI. This site is for archive purposes only and is not actively maintained.

GeneHuggers - tools for biological discovery in sequenced genomes


GeneHuggers is a collection of UNIX applications which enable High-Throughput database mining of genome sequence databases.


     The GeneHuggers toolkit contains over 70 individual programs which can be used in the construction of customized bioinformatics applications. In order to accomodate a wide variety of scientific interests, each basic function with respect to data mining and sequence analysis has been modularized into a separate executable. Each GeneHuggers program utilizes a common datatype so that the output of one module can be used as input to another module. Complex queries are composed using this finite set of tools by combining them into an infinite number of serial assemblies.
     This data type, hereafter refered to as GeneHuggers hit I/O, ghio, is an ASCII text representation of numerical data compatible with all standard UNIX tools like grep, sort, sed, awk, vi, more, etc... The databases that can be mined with GeneHuggers include: UniGene, LocusLink, and any GenBank formatted sequence flatfile including the RefSeq genome contigs. In addition, GeneHuggers supports FASTA/FASTN file formats so that either non-public data or modified subsets of the public data can also be used with the toolkit. The ability to generate and use FASTA/N formatted sequence files with GeneHuggers also enables the use of BLAST and other sequence analysis programs in the construction of GeneHuggers applications. Common programs utilized in GeneHuggers applications currently include: BLAST, HMMR, MEME, MAST, and CLUSTALW. GeneHuggers has tools which can create input files for these programs as well as parsers which can transform the output reports of these programs back into ghio.
     There are many aspects of GeneHuggers that are new for most biologists and their presentation is an organisational nightmare for me, the sole person responsible for creating, developing, testing, and documenting this project. If you have specific issues that are not addressed by this web site, please check the ftp site for the manual. If you have sucessfully installed GeneHuggers at your own site and are happy/unhappy with it or have additional questions not covered in the documentation, please email me Jason S. Iacovoni, jiaco@scripps.edu.
     GeneHuggers is avaiable for FREE, under the terms and conditions of the GNU General Public License. It is my intention that programers interested in extending the functionality of GeneHuggers to include other databases and other applications do so in a manner consistent with the GNU General Public License.

GeneHuggers web documentation has been divided into the following sections
Toolkit Documentation for each specific GeneHuggers tool
Installation Guide to installing GeneHuggers on a new system
Indexes Instructions for generating commonly used indexes
Tutorials Examples of applications written with GeneHuggers
Developer Details about the source code and how to proceed to write additional tools

Links associated with the GeneHuggers project
http: GeneHuggers web documentation (archive only)
ftp: GeneHuggers ftp site (web archive copy)
email: jiaco@scripps.edu (inactive)
License: GNU General Public License

TSRI researchers interesting in using GeneHuggers for their work may do so using the SGI Origin cluster. For access, email Bill Young at Research Computing.
GeneHuggers classes and workshops are routinely posted to the sequence mailing list. Please subscribe to keep informed of classes, database updates, and bug fixes.
Be sure to define the envirornment variables GH_HOME and GH_DATA and add $GH_HOME/bin to your $path.
GH_HOME    /usr/people/applications/genehuggers
GH_DATA    /scratch/genbank
path    = ( $path $GH_HOME/bin )