GenPerl

GenPerl is a software development toolkit designed to facilitate the development of Perl scripts and applications in the area of genetic analysis. This includes a detailed genetic analysis object model implemented as a collection of OO Perl classes, and a series modules that implement an API for interacting with objects of these classes. The API includes functionality for managing the persistence of objects in a relational database.

GenPerl does not include ready to use programs with friendly graphical user interfaces (GUIs). Instead, GenPerl provides reusable Perl modules that facilitate the development of applications for the storage, manipulation and analysis of genetic data. GenPerl enables the development applications which can process and analyze large quantities of data in customized ways that are typically difficult or impossible with large GUI-based systems.

A CGI client application has been built using GenPerl, which demonstrates some of the functionality in GenPerl. Examples of code using GenPerl can be found in the GenPerl Tutorial document.

GenPerl should be considered alpha software.

  Classes/Objects   API   Database
The GenPerl object model is implemented as a series of Perl classes. GenPerl objects are instances of these classes.

Genetics::Object is the superclass for all GenPerl objects.

The objects with which one interacts directly are:

  • Cluster representing groupings of objects, by reference.
  • DNASample representing laboratory samples of DNA.
  • FrequencySource representing observed allele and/or haplotype frequencies.
  • Genotype representing the observed allele(s) an individual has at a polymorphic locus.
  • Haplotype representing a specific combination of alleles known or predicted to be segregating together.
  • HtMarkerCollection representing a set of Markers/SNPs from which Haplotypes can be constructed.
  • Kindred representing groups of Subjects related genetically or by mariage.
  • Map representing an ordered set of chromosomal loci.
  • Marker representing polymorphic genetic loci.
  • Phenotype representing the observable properties - genetic and/orenvironmental - of an individual.
  • SNP representing single nucleotide polymorphisms.
  • StudyVariable representing definitions of physical traits, affection status, environmental exposure, drug treatments, etc.
  • Subject representing the individuals being studied.
  • TissueSample representing laboratory tissue samples.
The GenPerl API functionality is separated into the following packages.

The Genetics::API package contains general API methods that don't fit anywhere else right now.

The Genetics::API::DB packages contain methods for managing the persistance of GenPerl objects in a relational database. These packages essentially constitute the middleware layer between client scripts and data in a GenPerl schema.

  • Insert This package contains the methods for saving new objects (i.e. objects that have not been previously saved to the database).
  • Read This package contains the methods for reading objects. This involves extracting the neccessary data from the database and instantiating Genperl objects.
  • Update This package contains the methods for updating objects that have previously been saved in the MySQL database.
  • Delete This package contains the methods for deleting objects.
  • Query This package contains methods for more sophisticated querying.

The Genetics::API::Analysis package contains methods related to the analysis of genetic data. This includes statistical tests, formatting of analysis files, etc.

  • The Genetics::API::Analysis::Linkage package contains methods relating to genetic linkage analyses. Generally speaking, this means reading and writing linkage format pedigree and locus files, and/or running programs such as genehunter, etc.

In order to manage the persistance of GenPerl objects, the GenPerl API requires a relational database instance in which to store the data. Right now, the only supported databse is MySQL. I chose MySQL mainly because it's free, fast, and relatively simple to administrate. All database interaction in the API is implemented using DBI, and thus the API could, in theory, easily support the use of any RDBMS for which a DBD module exists.

The DDL in this script can be used to create an appropriate schema in MySQL.


Steve Mathias <mathias@genomica.com>
Last modified: Tue Nov 27 17:11:43 2001