[Bio] / Sprout / CustomAttributes.pm Repository:
ViewVC logotype

View of /Sprout/CustomAttributes.pm

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.6 - (download) (as text) (annotate)
Fri Nov 10 21:59:34 2006 UTC (13 years, 1 month ago) by parrello
Branch: MAIN
Changes since 1.5: +37 -2 lines
*** empty log message ***

#!/usr/bin/perl -w

package CustomAttributes;

    require Exporter;
    use ERDB;
    @ISA = qw(ERDB);
    use strict;
    use Tracer;
    use ERDBLoad;

=head1 Custom SEED Attribute Manager

=head2 Introduction

The Custom SEED Attributes Manager allows the user to upload and retrieve
custom data for SEED objects. It uses the B<ERDB> database system to
store the attributes, which are implemented as multi-valued fields
of ERDB entities.

The full suite of ERDB retrieval capabilities is provided. In addition,
custom methods are provided specific to this application. To get all
the values of the attribute C<essential> in a specified B<Feature>, you
would code

    my @values = $attrDB->GetAttributes([Feature => $fid], 'essential');

where I<$fid> contains the ID of the desired feature. Each attribute has
an alternate index to allow searching for attributes by value.

New attributes are introduced by updating the database definition at
run-time. Attribute values are stored by uploading data from files.
A web interface is provided for both these activities.

=head2 FIG_Config Parameters

The following configuration parameters are used to manage custom attributes.

=over 4

=item attrDbms

Type of database manager used: C<mysql> for MySQL or C<pg> for PostGres.

=item attrDbName

Name of the attribute database.

=item attrHost

Name of the host server for the database. If omitted, the current host
is used.

=item attrUser

User name for logging in to the database.

=item attrPass

Password for logging in to the database.

=item attrPort

TCP/IP port for accessing the database.

=item attrSock

Socket name used to access the database. If omitted, the default socket
will be used.

=item attrDBD

Fully-qualified file name for the database definition XML file. This file
functions as data to the attribute management process, so if the data is
moved, this file must go with it.

=back

The DBD file is critical, and must have reasonable contents before we can
begin using the system. In the old system, attributes were only provided
for Genomes and Features, so the initial XML file was the following.

    <Database>
      <Title>SEED Custom Attribute Database</Title>
      <Entities>
        <Entity name="Feature" keyType="id-string">
          <Notes>A [i]feature[/i] is a part of the genome
          that is of special interest. Features may be spread
          across multiple contigs of a genome, but never across
          more than one genome. Features can be assigned to roles
          via spreadsheet cells, and are the targets of
          annotation.</Notes>
        </Entity>
        <Entity name="Genome" keyType="name-string">
          <Notes>A [i]genome[/i] describes a particular individual
          organism's DNA.</Notes>
        </Entity>
      </Entities>
    </Database>

It is not necessary to put any tables into the database; however, you should
run

    AttrDBRefresh

periodically to insure it has the correct Genomes and Features in it. When
converting from the old system, use

    AttrDBRefresh -migrate

to initialize the database and migrate the legacy data. You should only need
to do that once.

=head2 Implementation Note

The L</Refresh> method reloads the entities in the database. If new
entity types are added, that method will need to be adjusted accordingly.

=head2 Public Methods

=head3 new

C<< my $attrDB = CustomAttributes->new($splitter); >>

Construct a new CustomAttributes object. This object cannot be used to add or
delete keys because that requires modifying the database design. To do that,
you need to use the static L</StoreAttributeKey> or L</DeleteAttributeKey>
methods.

=over 4

=item splitter

Value to be used to split attribute values into sections in the
L</Fig Replacement Methods>. The default is a double colon C<::>.
If you do not use the replacement methods, you do not need to
worry about this parameter.

=back

=cut

sub new {
    # Get the parameters.
    my ($class, $splitter) = @_;
    # Connect to the database.
    my $dbh = DBKernel->new($FIG_Config::attrDbms, $FIG_Config::attrDbName,
                            $FIG_Config::attrUser, $FIG_Config::attrPass,
                            $FIG_Config::attrPort, $FIG_Config::attrHost,
                            $FIG_Config::attrSock);
    # Create the ERDB object.
    my $xmlFileName = $FIG_Config::attrDBD;
    my $retVal = ERDB::new($class, $dbh, $xmlFileName);
    # Store the splitter value.
    $retVal->{splitter} = (defined($splitter) ? $splitter : '::');
    # Return the result.
    return $retVal;
}

=head3 StoreAttributeKey

C<< my $attrDB = CustomAttributes::StoreAttributeKey($entityName, $attributeName, $type, $notes); >>

Create or update an attribute for the database. This method will update the database definition
XML, but it will not create the table. It will connect to the database so that the caller
can upload the attribute values.

=over 4

=item entityName

Name of the entity containing the attribute. The entity must exist.

=item attributeName

Name of the attribute. It must be a valid ERDB field name, consisting entirely of
letters, digits, and hyphens, with a letter at the beginning. If it does not
exist already, it will be created.

=item type

Data type of the attribute. This must be a valid ERDB data type name.

=item notes

Descriptive notes about the attribute. It is presumed to be raw text, not HTML.

=item RETURN

Returns a Custom Attribute Database object if successful. If unsuccessful, an
error will be thrown.

=back

=cut

sub StoreAttributeKey {
    # Get the parameters.
    my ($entityName, $attributeName, $type, $notes) = @_;
    # Get the data type hash.
    my %types = ERDB::GetDataTypes();
    # Validate the initial input values.
    if (! ERDB::ValidateFieldName($attributeName)) {
        Confess("Invalid attribute name \"$attributeName\" specified.");
    } elsif (! $notes || length($notes) < 25) {
        Confess("Missing or incomplete description for $attributeName.");
    } elsif (! exists $types{$type}) {
        Confess("Invalid data type \"$type\" for $attributeName.");
    }
    # Our next step is to read in the XML for the database defintion. We
    # need to verify that the named entity exists.
    my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);
    my $entityHash = $metadata->{Entities};
    if (! exists $entityHash->{$entityName}) {
        Confess("Entity $entityName not found.");
    } else {
        # Okay, we're ready to begin. Get the entity hash and the field hash.
        my $entityData = $entityHash->{$entityName};
        my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);
        # Compute the attribute's relation name.
        my $relName = join("", $entityName, map { ucfirst $_ } split(/-|_/, $attributeName));
        # Store the attribute's field data. Note the use of the "content" hash for
        # the notes. This is how the XML writer knows Notes is a text tag instead of
        # an attribute.
        $fieldHash->{$attributeName} = { type => $type, relation => $relName,
                                         Notes => { content => $notes } };
        # Insure we have an index for this attribute.
        my $index = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);
        if (! defined($index)) {
            push @{$entityData->{Indexes}}, { IndexFields => [ { name => $attributeName, order => 'ascending' } ],
                                              Notes       => "Alternate index provided for access by $attributeName." };
        }
        # Write the XML back out.
        ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);
    }
    # Open a database with the new XML.
    my $retVal = CustomAttributes->new();
    return $retVal;
}

=head3 Refresh

C<< $attrDB->Refresh($fig); >>

Refresh the primary entity tables from the FIG data store. This method basically
drops and reloads the main tables of the custom attributes database.

=over 4

=item fig

FIG-like object that can be used to find genomes and features.

=back

=cut

sub Refresh {
    # Get the parameters.
    my ($self, $fig) = @_;
    # Create load objects for the genomes and the features.
    my $loadGenome = ERDBLoad->new($self, 'Genome', $FIG_Config::temp);
    my $loadFeature = ERDBLoad->new($self, 'Feature', $FIG_Config::temp);
    # Get the genome list.
    my @genomes = $fig->genomes();
    # Loop through the genomes.
    for my $genomeID (@genomes) {
        # Put this genome in the genome table.
        $loadGenome->Put($genomeID);
        Trace("Processing Genome $genomeID") if T(3);
        # Put its features into the feature table. Note we have to use a hash to
        # remove duplicates.
        my %featureList = map { $_ => 1 } $fig->all_features($genomeID);
        for my $fid (keys %featureList) {
            $loadFeature->Put($fid);
        }
    }
    # Get a variable for holding statistics objects.
    my $stats;
    # Finish the genome load.
    Trace("Loading Genome relation.") if T(2);
    $stats = $loadGenome->FinishAndLoad();
    Trace("Genome table load statistics:\n" . $stats->Show()) if T(3);
    # Finish the feature load.
    Trace("Loading Feature relation.") if T(2);
    $stats = $loadFeature->FinishAndLoad();
    Trace("Feature table load statistics:\n" . $stats->Show()) if T(3);
}

=head3 LoadAttributeKey

C<< my $stats = $attrDB->LoadAttributeKey($entityName, $fieldName, $fh, $keyCol, $dataCol); >>

Load the specified attribute from the specified file. The file should be a
tab-delimited file with internal tab and new-line characters escaped. This is
the typical TBL-style file used by most FIG applications. One of the columns
in the input file must contain the appropriate key value and the other the
corresponding attribute value.

=over 4

=item entityName

Name of the entity containing the attribute.

=item fieldName

Name of the actual attribute.

=item fh

Open file handle for the input file.

=item keyCol

Index (0-based) of the column containing the key field. The key field should
contain the ID of an instance of the named entity.

=item dataCol

Index (0-based) of the column containing the data value field.

=item RETURN

Returns a statistics object for the load process.

=back

=cut

sub LoadAttributeKey {
    # Get the parameters.
    my ($self, $entityName, $fieldName, $fh, $keyCol, $dataCol) = @_;
    # Create the return variable.
    my $retVal;
    # Insure the entity exists.
    my $found = grep { $_ eq $entityName } $self->GetEntityTypes();
    if (! $found) {
        Confess("Entity \"$entityName\" not found in database.");
    } else {
        # Get the field structure for the named entity.
        my $fieldHash = $self->GetFieldTable($entityName);
        # Verify that the attribute exists.
        if (! exists $fieldHash->{$fieldName}) {
            Confess("Attribute key \"$fieldName\" does not exist in entity $entityName.");
        } else {
            # Create a loader for the specified attribute. We need the
            # relation name first.
            my $relName = $fieldHash->{$fieldName}->{relation};
            my $loadAttribute = ERDBLoad->new($self, $relName, $FIG_Config::temp);
            # Loop through the input file.
            while (! eof $fh) {
                # Get the next line of the file.
                my @fields = Tracer::GetLine($fh);
                $loadAttribute->Add("lineIn");
                # Now we need to validate the line.
                if ($#fields < $dataCol) {
                    $loadAttribute->Add("shortLine");
                } elsif (! $self->Exists($entityName, $fields[$keyCol])) {
                    $loadAttribute->Add("badKey");
                } else {
                    # It's valid,so send it to the loader.
                    $loadAttribute->Put($fields[$keyCol], $fields[$dataCol]);
                    $loadAttribute->Add("lineUsed");
                }
            }
            # Finish the load.
            $retVal = $loadAttribute->FinishAndLoad();
        }
    }
    # Return the statistics.
    return $retVal;
}


=head3 DeleteAttributeKey

C<< CustomAttributes::DeleteAttributeKey($entityName, $attributeName); >>

Delete an attribute from the custom attributes database.

=over 4

=item entityName

Name of the entity possessing the attribute.

=item attributeName

Name of the attribute to delete.

=back

=cut

sub DeleteAttributeKey {
    # Get the parameters.
    my ($entityName, $attributeName) = @_;
    # Read in the XML for the database defintion. We need to verify that
    # the named entity exists and it has the named attribute.
    my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);
    my $entityHash = $metadata->{Entities};
    if (! exists $entityHash->{$entityName}) {
        Confess("Entity \"$entityName\" not found.");
    } else {
        # Get the field hash.
        my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);
        if (! exists $fieldHash->{$attributeName}) {
            Confess("Attribute key \"$attributeName\" not found in entity $entityName.");
        } else {
            # Get the attribute's relation name.
            my $relName = $fieldHash->{$attributeName}->{relation};
            # Check for an index.
            my $indexIdx = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);
            if (defined($indexIdx)) {
                Trace("Index for $attributeName found at position $indexIdx for $entityName.") if T(3);
                delete $entityHash->{$entityName}->{Indexes}->[$indexIdx];
            }
            # Delete the attribute from the field hash.
            Trace("Deleting attribute $attributeName from $entityName.") if T(3);
            delete $fieldHash->{$attributeName};
            # Write the XML back out.
            ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);
            # Insure the relation does not exist in the database. This requires connecting
            # since we may have to do a table drop.
            my $attrDB = CustomAttributes->new();
            Trace("Dropping table $relName.") if T(3);
            $attrDB->DropRelation($relName);
        }
    }
}

=head3 ControlForm

C<< my $formHtml = $attrDB->ControlForm($cgi, $name); >>

Return a form that can be used to control the creation and modification of
attributes.

=over 4

=item cgi

CGI query object used to create HTML.

=item name

Name to give to the form. This should be unique for the web page.

=item RETURN

Returns the HTML for a form that submits instructions to the C<Attributes.cgi> script
for loading, creating, or deleting an attribute.

=back

=cut

sub ControlForm {
    # Get the parameters.
    my ($self, $cgi, $name) = @_;
    # Declare the return list.
    my @retVal = ();
    # Start the form. We use multipart to support the upload control.
    push @retVal, $cgi->start_multipart_form(-name => $name);
    # We'll put the controls in a table. Nothing else ever seems to look nice.
    push @retVal, $cgi->start_table({ border => 2, cellpadding => 2 });
    # The first row is for selecting the field name.
    push @retVal, $cgi->Tr($cgi->th("Select a Field"),
                           $cgi->td($self->FieldMenu($cgi, 10, 'fieldName', 1,
                                                     "document.$name.notes.value",
                                                     "document.$name.dataType.value")));
    # Now we set up a dropdown for the data types. The values will be the
    # data type names, and the labels will be the descriptions.
    my %types = ERDB::GetDataTypes();
    my %labelMap = map { $_ => $types{$_}->{notes} } keys %types;
    my $typeMenu = $cgi->popup_menu(-name   => 'dataType',
                                    -values => [sort keys %types],
                                    -labels => \%labelMap);
    push @retVal, $cgi->Tr($cgi->th("Data type"),
                           $cgi->td($typeMenu));
    # The next row is for the notes.
    push @retVal, $cgi->Tr($cgi->th("Description"),
                           $cgi->td($cgi->textarea(-name => 'notes',
                                                   -rows => 6,
                                                   -columns => 80))
                          );
    # Allow the user to specify a new field name. This is required if the
    # user has selected one of the "(new)" markers.
    push @retVal, $cgi->Tr($cgi->th("New Field Name"),
                           $cgi->td($cgi->textfield(-name => 'newName',
                                                    -size => 30)),
                                    );
    # If the user wants to upload new values for the field, then we have
    # an upload file name and column indicators.
    push @retVal, $cgi->Tr($cgi->th("Upload Values"),
                           $cgi->td($cgi->filefield(-name => 'newValueFile',
                                                    -size => 20) .
                                    " Key&nbsp;" .
                                    $cgi->textfield(-name => 'keyCol',
                                                    -size => 3,
                                                    -default => 0) .
                                    " Value&nbsp;" .
                                    $cgi->textfield(-name => 'valueCol',
                                                    -size => 3,
                                                    -default => 1)
                                   ),
                          );
    # Now the two buttons: UPDATE and DELETE.
    push @retVal, $cgi->Tr($cgi->th("&nbsp;"),
                           $cgi->td({align => 'center'},
                                    $cgi->submit(-name => 'Delete', -value => 'DELETE') . " " .
                                    $cgi->submit(-name => 'Store',  -value => 'STORE')
                                   )
                          );
    # Close the table and the form.
    push @retVal, $cgi->end_table();
    push @retVal, $cgi->end_form();
    # Return the assembled HTML.
    return join("\n", @retVal, "");
}

=head3 FieldMenu

C<< my $menuHtml = $attrDB->FieldMenu($cgi, $height, $name, $newFlag, $noteControl, $typeControl); >>

Return the HTML for a menu to select an attribute field. The menu will
be a standard SELECT/OPTION thing which is called "popup menu" in the
CGI package, but actually looks like a list. The list will contain
one selectable row per field, grouped by entity.

=over 4

=item cgi

CGI query object used to generate HTML.

=item height

Number of lines to display in the list.

=item name

Name to give to the menu. This is the name under which the value will
appear when the form is submitted.

=item newFlag (optional)

If TRUE, then extra rows will be provided to allow the user to select
a new attribute. In other words, the user can select an existing
attribute, or can choose a C<(new)> marker to indicate a field to
be created in the parent entity.

=item noteControl (optional)

If specified, the name of a variable for displaying the notes attached
to the field. This must be in Javascript form ready for assignment.
So, for example, if you have a variable called C<notes> that
represents a paragraph element, you should code C<notes.innerHTML>.
If it actually represents a form field you should code C<notes.value>.
If an C<innerHTML> coding is used, the text will be HTML-escaped before
it is copied in. Specifying this parameter generates Javascript for
displaying the field description when a field is selected.

=item typeControl (optional)

If specified, the name of a variable for displaying the field's
data type. Data types are a much more controlled vocabulary than
notes, so there is no worry about HTML translation. Instead, the
raw value is put into the specified variable. Otherwise, the same
rules apply to this value that apply to I<$noteControl>.

=item RETURN

Returns the HTML to create a form field that can be used to select an
attribute from the custom attributes system.

=back

=cut

sub FieldMenu {
    # Get the parameters.
    my ($self, $cgi, $height, $name, $newFlag, $noteControl, $typeControl) = @_;
    # These next two hashes make everything happen. "entities"
    # maps each entity name to the list of values to be put into its
    # option group. "labels" maps each entity name to a map from values
    # to labels.
    my @entityNames = sort ($self->GetEntityTypes());
    my %entities = map { $_ => [] } @entityNames;
    my %labels = map { $_ => { }} @entityNames;
    # Loop through the entities, adding the existing attributes.
    for my $entity (@entityNames) {
        # Get this entity's field table.
        my $fieldHash = $self->GetFieldTable($entity);
        # Get its field list in our local hashes.
        my $fieldList = $entities{$entity};
        my $labelList = $labels{$entity};
        # Add the NEW fields if we want them.
        if ($newFlag) {
            push @{$fieldList}, $entity;
            $labelList->{$entity} = "(new)";
        }
        # Loop through the fields in the hash. We only keep the ones with a
        # secondary relation name. (In other words, the name of the relation
        # in which the field appears cannot be the same as the entity name.)
        for my $fieldName (sort keys %{$fieldHash}) {
            if ($fieldHash->{$fieldName}->{relation} ne $entity) {
                my $value = "$entity/$fieldName";
                push @{$fieldList}, $value;
                $labelList->{$value} = $fieldName;
            }
        }
    }
    # Now we have a hash and a list for each entity, and they correspond
    # exactly to what the $cgi->optgroup function expects.
    # The last step is to create the name for the onChange function. This function
    # may not do anything, but we need to know the name to generate the HTML
    # for the menu.
    my $changeName = "${name}_setNotes";
    my $retVal = $cgi->popup_menu({name => $name,
                                   size => $height,
                                   onChange => "$changeName(this.value)",
                                   values => [map { $cgi->optgroup(-name => $_,
                                                                   -values => $entities{$_},
                                                                   -labels => $labels{$_})
                                                  } @entityNames]}
                                 );
    # Create the change function.
    $retVal .= "\n<script language=\"javascript\">\n";
    $retVal .= "    function $changeName(fieldValue) {\n";
    # The function only has a body if we have a notes control to store the description.
    if ($noteControl || $typeControl) {
        # Check to see if we're storing HTML or text into the note control.
        my $htmlMode = ($noteControl && $noteControl =~ /innerHTML$/);
        # We use a CASE statement based on the newly-selected field value. The
        # field description will be stored in the JavaScript variable "myText"
        # and the data type in "myType". Note the default data type is a normal
        # string, but the default notes is an empty string.
        $retVal .= "        var myText = \"\";\n";
        $retVal .= "        var myType = \"string\";\n";
        $retVal .= "        switch (fieldValue) {\n";
        # Loop through the entities.
        for my $entity (@entityNames) {
            # Get the entity's field hash. This has the notes in it.
            my $fieldHash = $self->GetFieldTable($entity);
            # Loop through the values we might see for this entity's fields.
            my $fields = $entities{$entity};
            for my $value (@{$fields}) {
                # Only proceed if we have an existing field.
                if ($value =~ m!/(.+)$!) {
                    # Get the field's hash element.
                    my $element = $fieldHash->{$1};
                    # Generate this case.
                    $retVal .= "        case \"$value\" :\n";
                    # Here we either want to update the note display, the
                    # type display, or both.
                    if ($noteControl) {
                        # Here we want the notes updated.
                        my $notes = $element->{Notes}->{content};
                        # Insure it's in the proper form.
                        if ($htmlMode) {
                            $notes = ERDB::HTMLNote($notes);
                        }
                        # Escape it for use as a string literal.
                        $notes =~ s/\n/\\n/g;
                        $notes =~ s/"/\\"/g;
                        $retVal .= "           myText = \"$notes\";\n";
                    }
                    if ($typeControl) {
                        # Here we want the type updated.
                        my $type = $element->{type};
                        $retVal .= "           myType = \"$type\";\n";
                    }
                    # Close this case.
                    $retVal .= "           break;\n";
                }
            }
        }
        # Close the CASE statement and make the appropriate assignments.
        $retVal .= "        }\n";
        if ($noteControl) {
            $retVal .= "        $noteControl = myText;\n";
        }
        if ($typeControl) {
            $retVal .= "        $typeControl = myType;\n";
        }
    }
    # Terminate the change function.
    $retVal .= "    }\n";
    $retVal .= "</script>\n";
    # Return the result.
    return $retVal;
}

=head3 MatchSqlPattern

C<< my $matched = CustomAttributes::MatchSqlPattern($value, $pattern); >>

Determine whether or not a specified value matches an SQL pattern. An SQL
pattern has two wild card characters: C<%> that matches multiple characters,
and C<_> that matches a single character. These can be escaped using a
backslash (C<\>). We pull this off by converting the SQL pattern to a
PERL regular expression. As per SQL rules, the match is case-insensitive.

=over 4

=item value

Value to be matched against the pattern. Note that an undefined or empty
value will not match anything.

=item pattern

SQL pattern against which to match the value. An undefined or empty pattern will
match everything.

=item RETURN

Returns TRUE if the value and pattern match, else FALSE.

=back

=cut

sub MatchSqlPattern {
    # Get the parameters.
    my ($value, $pattern) = @_;
    # Declare the return variable.
    my $retVal;
    # Insure we have a pattern.
    if (! defined($pattern) || $pattern eq "") {
        $retVal = 1;
    } else {
        # Break the pattern into pieces around the wildcard characters. Because we
        # use parentheses in the split function's delimiter expression, we'll get
        # list elements for the delimiters as well as the rest of the string.
        my @pieces = split /([_%]|\\[_%])/, $pattern;
        # Check some fast special cases.
        if ($pattern eq '%') {
            # A null pattern matches everything.
            $retVal = 1;
        } elsif (@pieces == 1) {
            # No wildcards, so we have a literal comparison. Note we're case-insensitive.
            $retVal = (lc($value) eq lc($pattern));
        } elsif (@pieces == 2 && $pieces[1] eq '%') {
            # A wildcard at the end, so we have a substring match. This is also case-insensitive.
            $retVal = (lc(substr($value, 0, length($pieces[0]))) eq lc($pieces[0]));
        } else {
            # Okay, we have to do it the hard way. Convert each piece to a PERL pattern.
            my $realPattern = "";
            for my $piece (@pieces) {
                # Determine the type of piece.
                if ($piece eq "") {
                    # Empty pieces are ignored.
                } elsif ($piece eq "%") {
                    # Here we have a multi-character wildcard. Note that it can match
                    # zero or more characters.
                    $realPattern .= ".*"
                } elsif ($piece eq "_") {
                    # Here we have a single-character wildcard.
                    $realPattern .= ".";
                } elsif ($piece eq "\\%" || $piece eq "\\_") {
                    # This is an escape sequence (which is a rare thing, actually).
                    $realPattern .= substr($piece, 1, 1);
                } else {
                    # Here we have raw text.
                    $realPattern .= quotemeta($piece);
                }
            }
            # Do the match.
            $retVal = ($value =~ /^$realPattern$/i ? 1 : 0);
        }
    }
    # Return the result.
    return $retVal;
}

=head3 MigrateAttributes

C<< CustomAttributes::MigrateAttributes($fig); >>

Migrate all the attributes data from the specified FIG instance. This is a long, slow
method used to convert the old attribute data to the new system. Only attribute
keys that are not already in the database will be loaded, and only for entity instances
current in the database. To get an accurate capture of the attributes in the given
instance, you may want to clear the database and the DBD before starting and
run L</Refresh> to populate the entities.

=over 4

=item fig

A FIG object that can be used to retrieve attributes for migration purposes.

=back

=cut

sub MigrateAttributes {
    # Get the parameters.
    my ($fig) = @_;
    # Get a list of the objects to migrate. This requires connecting. Note we
    # will map each entity type to a file name. The file will contain a list
    # of the object's IDs so we can get to them when we're not connected to
    # the database.
    my $ca = CustomAttributes->new();
    my %objects = map { $_ => "$FIG_Config::temp/$_.keys.tbl" } $ca->GetEntityTypes();
    # Set up hash of the existing attribute keys for each entity type.
    my %oldKeys = ();
    # Finally, we have a hash that counts the IDs for each entity type.
    my %idCounts = map { $_ => 0 } keys %objects;
    # Loop through the list, creating key files to read back in.
    for my $entityType (keys %objects) {
        Trace("Retrieving keys for $entityType.") if T(2);
        # Create the key file.
        my $idFile = Open(undef, ">$objects{$entityType}");
        # Loop through the keys.
        my @ids = $ca->GetFlat([$entityType], "", [], "$entityType(id)");
        for my $id (@ids) {
            print $idFile "$id\n";
        }
        close $idFile;
        # In addition to the key file, we must get a list of attributes already
        # in the database. This avoids a circularity problem that might occur if the $fig
        # object is retrieving from the custom attributes database already.
        my %fields = $ca->GetSecondaryFields($entityType);
        $oldKeys{$entityType} = \%fields;
        # Finally, we have the ID count.
        $idCounts{$entityType} = scalar @ids;
    }
    # Release the custom attributes database so we can add attributes.
    undef $ca;
    # Loop through the objects.
    for my $entityType (keys %objects) {
        # Get a hash of all the attributes already in this database. These are
        # left untouched.
        my $myOldKeys = $oldKeys{$entityType};
        # Create a hash to control the load file names for each attribute key we find.
        my %keyHash = ();
        # Set up some counters so we can trace our progress.
        my ($totalIDs, $processedIDs, $keyCount, $valueCount) = ($idCounts{$entityType}, 0, 0, 0);
        # Open this object's ID file.
        Trace("Migrating data for $entityType. $totalIDs found.") if T(3);
        my $keysIn = Open(undef, "<$objects{$entityType}");
        while (my $id = <$keysIn>) {
            # Remove the EOL characters.
            chomp $id;
            # Get this object's attributes.
            my @allData = $fig->get_attributes($id);
            Trace(scalar(@allData) . " attribute values found for $entityType($id).") if T(4);
            # Loop through the attribute values one at a time.
            for my $dataTuple (@allData) {
                # Get the key, value, and URL. We ignore the first element because that's the
                # object ID, and we already know the object ID.
                my (undef, $key, $value, $url) = @{$dataTuple};
                # Only proceed if this is not an old key.
                if (! $myOldKeys->{$key}) {
                    # See if we've run into this key before.
                    if (! exists $keyHash{$key}) {
                        # Here we need to create the attribute key in the database.
                        StoreAttributeKey($entityType, $key, 'text',
                                          "Key migrated automatically from the FIG system. " .
                                          "Please replace these notes as soon as possible " .
                                          "with useful text."
                                         );
                        # Compute the attribute's load file name and open it for output.
                        my $fileName = "$FIG_Config::temp/$entityType.$key.load.tbl";
                        my $fh = Open(undef, ">$fileName");
                        # Store the file name and handle.
                        $keyHash{$key} = {h => $fh, name => $fileName};
                        # Count this key.
                        $keyCount++;
                    }
                    # Smash the value and the URL together.
                    if (defined($url) && length($url) > 0) {
                        $value .= "::$url";
                    }
                    # Write the attribute value to the load file.
                    Tracer::PutLine($keyHash{$key}->{h}, [$id, $value]);
                    $valueCount++;
                }
            }
            # Now we've finished all the attributes for this object. Count and trace it.
            $processedIDs++;
            if ($processedIDs % 500 == 0) {
                Trace("$processedIDs of $totalIDs ${entityType}s processed.") if T(3);
                Trace("$entityType has $keyCount keys and $valueCount values so far.") if T(3);
            }
        }
        # Now we've finished all the attributes for all objects of this type.
        Trace("$processedIDs ${entityType}s processed, with $keyCount keys and $valueCount values.") if T(2);
        # Loop through the files, loading the keys into the database.
        Trace("Connecting to database.") if T(2);
        my $objectCA = CustomAttributes->new();
        Trace("Loading key files.") if T(2);
        for my $key (sort keys %keyHash) {
            # Close the key's load file.
            close $keyHash{$key}->{h};
            # Reopen it for input.
            my $fileName = $keyHash{$key}->{name};
            my $fh = Open(undef, "<$fileName");
            Trace("Loading $key from $fileName.") if T(3);
            my $stats = $objectCA->LoadAttributeKey($entityType, $key, $fh, 0, 1);
            Trace("Statistics for $key of $entityType:\n" . $stats->Show()) if T(3);
        }
        # All the keys for this entity type are now loaded.
        Trace("Key files loaded for $entityType.") if T(2);
    }
    # All keys for all entity types are now loaded.
    Trace("Migration complete.") if T(2);
}

=head3 ComputeObjectTypeFromID

C<< my ($entityName, $id) = CustomAttributes::ComputeObjectTypeFromID($objectID); >>

This method will compute the entity type corresponding to a specified object ID.
If the object ID begins with C<fig|>, it is presumed to be a feature ID. If it
is all digits with a single period, it is presumed to by a genome ID. Otherwise,
it must be a list reference. In this last case the first list element will be
taken as the entity type and the second will be taken as the actual ID.

=over 4

=item objectID

Object ID to examine.

=item RETURN

Returns a 2-element list consisting of the entity type followed by the specified ID.

=back

=cut

sub ComputeObjectTypeFromID {
    # Get the parameters.
    my ($objectID) = @_;
    # Declare the return variables.
    my ($entityName, $id);
    # Only proceed if the object ID is defined. If it's not, we'll be returning a
    # pair of undefs.
    if ($objectID) {
        if (ref $objectID eq 'ARRAY') {
            # Here we have the new-style list reference. Pull out its pieces.
            ($entityName, $id) = @{$objectID};
        } else {
            # Here the ID is the outgoing ID, and we need to look at its structure
            # to determine the entity type.
            $id = $objectID;
            if ($objectID =~ /^\d+\.\d+/) {
                # Digits with a single period is a genome.
                $entityName = 'Genome';
            } elsif ($objectID =~ /^fig\|/) {
                # The "fig|" prefix indicates a feature.
                $entityName = 'Feature';
            } else {
                # Anything else is illegal!
                Confess("Invalid attribute ID specification \"$objectID\".");
            }
        }
    }
    # Return the result.
    return ($entityName, $id);
}

=head2 FIG Method Replacements

The following methods are used by B<FIG.pm> to replace the previous attribute functionality.
Some of the old functionality is no longer present. Controlled vocabulary is no longer
supported and there is no longer any searching by URL. Fortunately, neither of these
capabilities were used in the old system.

The methods here are the only ones supported by the B<RemoteCustomAttributes> object.
The idea is that these methods represent attribute manipulation allowed by all users, while
the others are only for privileged users with access to the attribute server.

In the previous implementation, an attribute had a value and a URL. In the new implementation,
there is only a value. In this implementation, each attribute has only a value. These
methods will treat the value as a list with the individual elements separated by the
value of the splitter parameter on the constructor (L</new>). The default is double
colons C<::>.

So, for example, an old-style keyword with a /value of C<essential> and a URL of
C<http://www.sciencemag.org/cgi/content/abstract/293/5538/2266> using the default
splitter value would be stored as

    essential::http://www.sciencemag.org/cgi/content/abstract/293/5538/2266

The best performance is achieved by searching for a particular key for a specified
feature or genome.

=head3 GetAttributes

C<< my @attributeList = $attrDB->GetAttributes($objectID, $key, @valuePatterns); >>

In the database, attribute values are sectioned into pieces using a splitter
value specified in the constructor (L</new>). This is not a requirement of
the attribute system as a whole, merely a convenience for the purpose of
these methods. If you are using the static method calls instead of the
object-based calls, the splitter will always be the default value of
double colons (C<::>). If a value has multiple sections, each section
is matched against the correspond criterion in the I<@valuePatterns> list.

This method returns a series of tuples that match the specified criteria. Each tuple
will contain an object ID, a key, and one or more values. The parameters to this
method therefore correspond structurally to the values expected in each tuple.

    my @attributeList = GetAttributes('fig|100226.1.peg.1004', 'structure%', 1, 2);

would return something like

    ['fig}100226.1.peg.1004', 'structure', 1, 2]
    ['fig}100226.1.peg.1004', 'structure1', 1, 2]
    ['fig}100226.1.peg.1004', 'structure2', 1, 2]
    ['fig}100226.1.peg.1004', 'structureA', 1, 2]

Use of C<undef> in any position acts as a wild card (all values). In addition,
the I<$key> and I<@valuePatterns> parameters can contain SQL pattern characters: C<%>, which
matches any sequence of characters, and C<_>, which matches any single character.
(You can use an escape sequence C<\%> or C<\_> to match an actual percent sign or
underscore.)

In addition to values in multiple sections, a single attribute key can have multiple
values, so even

    my @attributeList = GetAttributes($peg, 'virulent');

which has no wildcard in the key or the object ID, may return multiple tuples.

For reasons of backward compatability, we examine the structure of the object ID to
determine the entity type. In that case the only two types allowed are C<Genome> and
C<Feature>. An alternative method is to use a list reference, with the list consisting
of an entity type name and the actual ID. Thus, the above example could equivalently
be written as

    my @attributeList = GetAttributes([Feature => $peg], 'virulent');

The list-reference approach allows us to add attributes to other entity types in
the future. Doing so, however, will require modifying the L</Refresh> method and
updated the database design XML.

The list-reference approach also allows for a more fault-tolerant approach to
getting all objects with a particular attribute.

    my @attributeList = GetAttributes([Feature => undef], 'virulent');

will only return feature attributes, while

    my @attributeList = GetAttributes(undef, 'virulent');

could at some point in the future get you attributes for genomes or even subsystems
as well as features.

=over 4

=item objectID

ID of the genome or feature whose attributes are desired. In general, an ID that
starts with C<fig|> is treated as a feature ID, and an ID that is all digits with a
single period is treated as a genome ID. For other entity types, use a list reference; in
this case the first list element is the entity type and the second is the ID. A value of
C<undef> or an empty string here will match all objects.

=item key

Attribute key name. Since attributes are stored as fields in the database with a
field name equal to the key name, it is very fast to find a list of all the
matching keys. Each key's values require a separate query, however, which may
be a performance problem if the pattern matches a lot of keys. Wild cards are
acceptable here, and a value of C<undef> or an empty string will match all
attribute keys.

=item valuePatterns

List of the desired attribute values, section by section. If C<undef>
or an empty string is specified, all values in that section will match.

=item RETURN

Returns a list of tuples. The first element in the tuple is an object ID, the
second is an attribute key, and the remaining elements are the sections of
the attribute value. All of the tuples will match the criteria set forth in
the parameter list.

=back

=cut

sub GetAttributes {
    # Get the parameters.
    my ($self, $objectID, $key, @valuePatterns) = @_;
    # Declare the return variable.
    my @retVal = ();
    # Determine the entity types for our search.
    my @objects = ();
    my ($actualObjectID, $computedType);
    if (! $objectID) {
        push @objects, $self->GetEntityTypes();
    } else {
        ($computedType, $actualObjectID) = ComputeObjectTypeFromID($objectID);
        push @objects, $computedType;
    }
    # Loop through the entity types.
    for my $entityType (@objects) {
        # Now we need to find all the matching keys. The keys are actually stored in
        # our database object, so this process is fast. Note that our
        # MatchSqlPattern method
        my %secondaries = $self->GetSecondaryFields($entityType);
        my @fieldList = grep { MatchSqlPattern($_, $key) } keys %secondaries;
        # Now we figure out whether or not we need to filter by object.
        my $filter = "";
        my @params = ();
        if (defined($actualObjectID)) {
            # Here the caller wants to filter on object ID.
            $filter = "$entityType(id) = ?";
            push @params, $actualObjectID;
        }
        # It's time to begin making queries. We process one attribute key at a time, because
        # each attribute is actually a different field in the database. We know here that
        # all the keys we've collected are for the correct entity because we got them from
        # the DBD. That's a good thing, because an invalid key name will cause an SQL error.
        for my $key (@fieldList) {
            # Get all of the attribute values for this key.
            my @dataRows = $self->GetAll([$entityType], $filter, \@params,
                                         ["$entityType(id)", "$entityType($key)"]);
            # Process each value separately. We need to verify the values and reformat the
            # tuples. Note that GetAll will give us one row per matching object ID,
            # with the ID first followed by a list of the data values. This is very
            # different from the structure we'll be returning, which has one row
            # per value.
            for my $dataRow (@dataRows) {
                # Get the object ID and the list of values.
                my ($rowObjectID, @dataValues) = @{$dataRow};
                # Loop through the values. There will be one result row per attribute value.
                for my $dataValue (@dataValues) {
                    # Separate this value into sections.
                    my @sections = split("::", $dataValue);
                    # Loop through the value patterns, looking for a mismatch. Note that
                    # since we're working through parallel arrays, we are using an index
                    # loop. As soon as a match fails we stop checking. This means that
                    # if the value pattern list is longer than the number of sections,
                    # we will fail as soon as we run out of sections.
                    my $match = 1;
                    for (my $i = 0; $i <= $#valuePatterns && $match; $i++) {
                        $match = MatchSqlPattern($sections[$i], $valuePatterns[$i]);
                    }
                    # If we match, we save this value in the output list.
                    if ($match) {
                        push @retVal, [$rowObjectID, $key, @sections];
                    }
                }
                # Here we've processed all the attribute values for the current object ID.
            }
            # Here we've processed all the rows returned by GetAll. In general, there will
            # be one row per object ID.
        }
        # Here we've processed all the matching attribute keys.
    }
    # Here we've processed all the entity types. That means @retVal has all the matching
    # results.
    return @retVal;
}

=head3 AddAttribute

C<< $attrDB->AddAttribute($objectID, $key, @values); >>

Add an attribute key/value pair to an object. This method cannot add a new key, merely
add a value to an existing key. Use L</StoreAttributeKey> to create a new key.

=over 4

=item objectID

ID of the genome or feature to which the attribute is to be added. In general, an ID that
starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods
is treated as a genome ID. For IDs of other types, this parameter should be a reference
to a 2-tuple consisting of the entity type name followed by the object ID.

=item key

Attribute key name. This corresponds to the name of a field in the database.

=item values

One or more values to be associated with the key. The values are joined together with
the splitter value before being stored as field values. This enables L</GetAttributes>
to split them apart during retrieval. The splitter value defaults to double colons C<::>.

=back

=cut

sub AddAttribute {
    # Get the parameters.
    my ($self, $objectID, $key, @values) = @_;
    # Don't allow undefs.
    if (! defined($objectID)) {
        Confess("No object ID specified for AddAttribute call.");
    } elsif (! defined($key)) {
        Confess("No attribute key specified for AddAttribute call.");
    } elsif (! @values) {
        Confess("No values specified in AddAttribute call for key $key.");
    } else {
        # Okay, now we have some reason to believe we can do this. Start by
        # computing the object type and ID.
        my ($entityName, $id) = ComputeObjectTypeFromID($objectID);
        # Form the values into a scalar.
        my $valueString = join($self->{splitter}, @values);
        # Insert the value.
        $self->InsertValue($id, "$entityName($key)", $valueString);
    }
    # Return a one. We do this for backward compatability.
    return 1;
}

=head3 DeleteAttribute

C<< $attrDB->DeleteAttribute($objectID, $key, @values); >>

Delete the specified attribute key/value combination from the database.

The first form will connect to the database and release it. The second form
uses the database connection contained in the object.

=over 4

=item objectID

ID of the genome or feature to which the attribute is to be added. In general, an ID that
starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods
is treated as a genome ID. For IDs of other types, this parameter should be a reference
to a 2-tuple consisting of the entity type name followed by the object ID.

=item key

Attribute key name. This corresponds to the name of a field in the database.

=item values

One or more values to be associated with the key.

=back

=cut

sub DeleteAttribute {
    # Get the parameters.
    my ($self, $objectID, $key, @values) = @_;
    # Don't allow undefs.
    if (! defined($objectID)) {
        Confess("No object ID specified for DeleteAttribute call.");
    } elsif (! defined($key)) {
        Confess("No attribute key specified for DeleteAttribute call.");
    } elsif (! @values) {
        Confess("No values specified in DeleteAttribute call for key $key.");
    } else {
        # Now compute the object type and ID.
        my ($entityName, $id) = ComputeObjectTypeFromID($objectID);
        # Form the values into a scalar.
        my $valueString = join($self->{splitter}, @values);
        # Delete the value.
        $self->DeleteValue($entityName, $id, $key, $valueString);
    }
    # Return a one. This is for backward compatability.
    return 1;
}

=head3 ChangeAttribute

C<< $attrDB->ChangeAttribute($objectID, $key, \@oldValues, \@newValues); >>

Change the value of an attribute key/value pair for an object.

=over 4

=item objectID

ID of the genome or feature to which the attribute is to be changed. In general, an ID that
starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods
is treated as a genome ID. For IDs of other types, this parameter should be a reference
to a 2-tuple consisting of the entity type name followed by the object ID.

=item key

Attribute key name. This corresponds to the name of a field in the database.

=item oldValues

One or more values identifying the key/value pair to change.

=item newValues

One or more values to be put in place of the old values.

=back

=cut

sub ChangeAttribute {
    # Get the parameters.
    my ($self, $objectID, $key, $oldValues, $newValues) = @_;
    # Don't allow undefs.
    if (! defined($objectID)) {
        Confess("No object ID specified for ChangeAttribute call.");
    } elsif (! defined($key)) {
        Confess("No attribute key specified for ChangeAttribute call.");
    } elsif (! defined($oldValues) || ref $oldValues ne 'ARRAY') {
        Confess("No old values specified in ChangeAttribute call for key $key.");
    } elsif (! defined($newValues) || ref $newValues ne 'ARRAY') {
        Confess("No new values specified in ChangeAttribute call for key $key.");
    } else {
        # Okay, now we do the change as a delete/add.
        $self->DeleteAttribute($objectID, $key, @{$oldValues});
        $self->AddAttribute($objectID, $key, @{$newValues});
    }
    # Return a one. We do this for backward compatability.
    return 1;
}

1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3