[Bio] / FigKernelPackages / AliasAnalysis.pm Repository:
ViewVC logotype

View of /FigKernelPackages/AliasAnalysis.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (download) (as text) (annotate)
Thu Dec 6 13:32:01 2007 UTC (12 years, 2 months ago) by parrello
Branch: MAIN
CVS Tags: rast_rel_2008_06_18, rast_rel_2008_06_16, rast_rel_2008_12_18, rast_rel_2008_07_21, rast_2008_0924, rast_rel_2008_04_23, rast_rel_2008_09_30, mgrast_rel_2008_0924, mgrast_rel_2008_1110_v2, rast_rel_2009_02_05, mgrast_rel_2008_0625, rast_rel_2008_10_09, rast_release_2008_09_29, mgrast_rel_2008_0806, mgrast_rel_2008_0923, mgrast_rel_2008_0919, mgrast_rel_2008_1110, rast_rel_2008_09_29, mgrast_rel_2008_0917, rast_rel_2008_10_29, rast_rel_2008_11_24, rast_rel_2008_08_07
Changes since 1.2: +5 -5 lines
Changed POD format for better compatability with Wiki.

#!/usr/bin/perl -w

package AliasAnalysis;

    use strict;
    use Tracer;
    use FIG;

=head1 Alias Analysis Module

=head2 Introduction

This module encapsulates data about aliases. For each alias, it tells us how to generate
the appropriate link, what the type is for the alias, its export format, and its display
format. To add new alias types, we simply update this package.

An alias has three forms. The I<internal> form is how the alias is stored in the database.
The I<export> form is the form into which it should be translated when being exported to
BRC databases. The I<natural> form is the form it takes in its own environment. For
example, C<gi|15675083> is the internal form of a GenBank ID. Its export form is
C<NCBI_gi:15675083>, and its natural form is simply C<15675083>.

=head2 The Alias Table

The alias table is a hash of hashes. Each sub-hash relates to a specific type of alias, and
the key names the alias type (e.g. C<uniprot>, C<KEGG>). The sub-hashes have three fields.

=over 4

=item pattern

This is a regular expression that will match aliases of the specified type in their internal
forms.

=item convert

This field is a hash of conversions. The key for each is the conversion type and the
data is a replacement expression. These replacement expressions rely on the pattern match
having just taken place and use the C<$1>, C<$2>, ... variables to get text from the
alias's internal form. An alias's natural form, export form, and URL are all implemented as
different types of conversions. New conversion types can be created at
will be updating the table without having to worry about changing any code. Note that for
the URL conversion, a value of C<undef> means no URL is available.

=item normalize

This is a prefix that can be used to convert an alias from its natural form to its
internal form.

=back

At some point the Alias Table may be converted from an inline hash to an external XML file.

=cut

my %AliasTable = (
        RefSeq => {
            pattern     =>  '([NXYZA]P_[0-9\.]+)',
            convert     =>  { natural   => '$1',
                              export    => 'RefSeq_Prot:$1',
                              url       => 'http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=protein;cmd=search;term=$1',
                            },
            normalize   =>  '',
            },
        GenBank => {
            pattern     =>  'gi\|(\d+)',
            convert     =>  { natural    => '$1',
                              export     => 'NCBI_gi:$1',
                              url        => 'http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve;db=Protein&list_uids=$1;dopt=GenPept',
                           },
            normalize   => 'gi|',
            },
        SwissProt => {
            pattern     =>  'sp\|([A-Z0-9]{6})',
            convert     =>  { natural   => '$1',
                              export    => 'Swiss-Prot:$1',
                              url       => 'http://us.expasy.org/cgi-bin/get-sprot-entry?$1',
                            },
            normalize   => 'sp|',
            },
        UniProt => {
            pattern     =>  'uni\|([A-Z0-9]{6})',
            convert     =>  { natural   => '$1',
                              export    => 'UniProtKB:',
                              url       => 'http://www.ebi.uniprot.org/uniprot-srv/uniProtView.do?proteinAc=$1',
                            },
            normalize   =>  'uni|',
            },
        KEGG => {
            pattern     =>  'kegg\|(([a-z]{2,4}):([a-zA-Z_0-9]+))',
            convert     =>  { natural   => '$1',
                              export    => 'KEGG:$2+$3',
                              url       => 'http://www.genome.ad.jp/dbget-bin/www_bget?$2+$3',
                            },
            normalize   =>  'kegg|',
            },
        LocusTag => {
            pattern     =>  'LocusTag:([A-Za-z0-9_]+)',
            convert     =>  { natural   => '$1',
                              export    => 'Locus_Tag:$1',
                              url       => undef,
                            },
            normalize   =>  'LocusTag:',
            },
        GeneID => {
            pattern     =>  'GeneID:(\d+)',
            convert     =>  { natural   => '$1',
                              export    => 'GeneID:$1',
                              url       => undef,
                            },
            normalize   =>  'GeneID:',
            },
        Trembl => {
            pattern     =>  'tr\|([a-zA-Z0-9]+)',
            convert     =>  { natural   => '$1',
                              export    => 'TrEMBL:$1',
                              url       => 'http://ca.expasy.org/uniprot/$1',
                            },
            normalize   =>  'tr|',
            },
    );

=head2 Public Methods

=head3 AliasTypes

    my @aliasTypes = AliasAnalysis::AliasTypes();

Return a list of the alias types. The list can be used to create a menu or dropdown
for selecting a preferred alias.

=cut

sub AliasTypes {
    return sort keys %AliasTable;
}

=head3 Find

    my $aliasFound = AliasAnalysis::Find($type, \@aliases);

Find the first alias of the specified type in the list.

=over 4

=item type

Type of alias desired. This must be one of the keys in C<%AliasTable>.

=item aliases

Reference of a list containing alias names. The first alias name that matches
the structure of the specified alias type will be returned. The incoming
aliases are presumed to be in internal form.

=item RETURN

Returns the natural form of the desired alias, or C<undef> if no alias of
the specified type could be found.

=back

=cut

sub Find {
    # Get the parameters.
    my ($type, $aliases) = @_;
    # Declare the return variable.
    my $retVal;
    # Insure we have a valid alias type.
    if (! exists $AliasTable{$type}) {
        Confess("Invalid aliase type \"$type\" specified.");
    } else {
        # Get the pattern for the specified alias type.
        my $pattern = $AliasTable{$type}->{pattern};
        Trace("Alias pattern is /$pattern/.") if T(3);
        # Search for matching aliases. We can't use GREP here because we want
        # to stop as soon as we find a match. That way, the $1,$2.. variables
        # will be set properly.
        my $found;
        for my $alias (@$aliases) { last if $found;
            Trace("Matching against \"$alias\".") if T(4);
            if ($alias =~ /$pattern/) {
                Trace("Match found.") if T(4);
                # Here we have a match. Return the matching alias's natural form.
                $retVal = eval($AliasTable{$type}->{convert}->{natural});
                $found = 1;
            }
        }
    }
    # Return the value found.
    return $retVal;
}

=head3 Type

    my $naturalName = AliasAnalysis::Type($type => $name);

Return the natural name of an alias if it is of the specified type, and C<undef> otherwise.
Note that the result of this method will be TRUE if the alias is an internal form of the named
type and FALSE otherwise.

=over 4

=item type

Relevant alias type.

=item name

Internal-form alias to be matched to the specified type.

=item RETURN

Returns the natural form of the alias if it is of the specified type, and C<undef> otherwise.

=back

=cut

sub Type {
    # Get the parameters.
    my ($type, $name) = @_;
    # Declare the return variable. If there is no match, it will stay undefined.
    my $retVal;
    # Check the alias type.
    my $pattern = $AliasTable{$type}->{pattern};
    if ($name =~ /$pattern/) {
        # We have a match, so we return the natural form of the alias.
        $retVal = eval($AliasTable{$type}->{convert}->{natural});
    }
    # Return the result.
    return $retVal;
}

=head3 FormatHtml

    my $htmlText = AliasAnalysis::FormatHtml(@aliases);

Create an html string that contains the specified aliases in a comma-separated list
with hyperlinks where available. The aliases are expected to be in internal form and
will stay that way.

=over 4

=item aliases

A list of aliases in internal form that are to be formatted into HTML.

=item RETURN

Returns a string containing the aliases in a comma-separated list, with hyperlinks
present on those for which hyperlinks are available.

=back

=cut

sub FormatHtml {
    # Get the parameters.
    my (@aliases) = @_;
    # Set up the output list. The hyperlinked aliases will be put in here, and then
    # srung together before returning to the caller.
    my @retVal = ();
    # Loop through the incoming aliases.
    for my $alias (@aliases) {
        # We'll compute the alias's URL in here.
        my $url;
        # Check this alias against all the known types.
        for my $type (keys %AliasTable) { last if defined $url;
            # Get the URL conversion expression for this alias type.
            my $urlExpression = $AliasTable{$type}->{convert}->{url};
            # Check to see if we found the right type.
            my $pattern = $AliasTable{$type}->{pattern};
            Trace("Matching \"$alias\" to /$pattern/.") if T(4);
            if ($alias =~ /$pattern/) {
                # Here we did. Set the URL variable if there's a url expression and
                # null it out otherwise.
                if ($urlExpression) {
                    Trace("Evaluating $urlExpression.") if T(4);
                    $url = eval("\"$urlExpression\"");
                } else {
                    # This will stop the loop, but will evaluate as false when
                    # we decide whether or not to hyperlink the alias.
                    $url = "";
                }
            }
        }
        # Check to see if we found a URL.
        if ($url) {
            $alias = "<a href=\"$url\">$alias</a>";
        }
        # Push this alias into the return list.
        push @retVal, $alias;
    }
    # Convert the aliases into a comma-separated string.
    return join(", ", @retVal);
}

1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3