[Bio] / Sprout / CustomAttributes.pm Repository:
ViewVC logotype

Diff of /Sprout/CustomAttributes.pm

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.9, Thu Nov 16 22:09:33 2006 UTC revision 1.41, Fri Jan 23 13:44:51 2009 UTC
# Line 2  Line 2 
2    
3  package CustomAttributes;  package CustomAttributes;
4    
     require Exporter;  
     use ERDB;  
     @ISA = qw(ERDB);  
5      use strict;      use strict;
6      use Tracer;      use Tracer;
7      use ERDBLoad;      use Stats;
8        use Time::HiRes qw(time);
9        use FIGRules;
10        use base qw(ERDB);
11    
12  =head1 Custom SEED Attribute Manager  =head1 Custom SEED Attribute Manager
13    
# Line 15  Line 15 
15    
16  The Custom SEED Attributes Manager allows the user to upload and retrieve  The Custom SEED Attributes Manager allows the user to upload and retrieve
17  custom data for SEED objects. It uses the B<ERDB> database system to  custom data for SEED objects. It uses the B<ERDB> database system to
18  store the attributes, which are implemented as multi-valued fields  store the attributes.
19  of ERDB entities.  
20    Attributes are organized by I<attribute key>. Attribute values are
21    assigned to I<objects>. In the real world, objects have types and IDs;
22    however, to the attribute database only the ID matters. This will create
23    a problem if we have a single ID that applies to two objects of different
24    types, but it is more consistent with the original attribute implementation
25    in the SEED (which this implementation replaces).
26    
27    The actual attribute values are stored as a relationship between the attribute
28    keys and the objects. There can be multiple values for a single key/object pair.
29    
30    =head3 Object IDs
31    
32    The object ID is normally represented as
33    
34        I<type>:I<id>
35    
36    where I<type> is the object type (C<Role>, C<Coupling>, etc.) and I<id> is
37    the actual object ID. Note that the object type must consist of only upper- and
38    lower-case letters! Thus, C<GenomeGroup> is a valid object type, but
39    C<genome_group> is not. Given that restriction, the object ID
40    
41        Family:aclame|cluster10
42    
43    would represent the FIG family C<aclame|cluster10>. For historical reasons,
44    there are three exceptions: subsystems, genomes, and features do not need
45    a type. So, for PEG 3361 of Streptomyces coelicolor A3(2), you simply code
46    
47        fig|100226.1.peg.3361
48    
49    The methods L</ParseID> and L</FormID> can be used to make this all seem
50    more consistent. Given any object ID string, L</ParseID> will convert it to an
51    object type and ID, and given any object type and ID, L</FormID> will
52    convert it to an object ID string. The attribute database is pretty
53    freewheeling about what it will allow for an ID; however, for best
54    results, the type should match an entity type from a Sprout genetics
55    database. If this rule is followed, then the database object
56    corresponding to an ID in the attribute database could be retrieved using
57    L</GetTargetObject> method.
58    
59        my $object = CustomAttributes::GetTargetObject($sprout, $idValue);
60    
61    =head3 Retrieval and Logging
62    
63  The full suite of ERDB retrieval capabilities is provided. In addition,  The full suite of ERDB retrieval capabilities is provided. In addition,
64  custom methods are provided specific to this application. To get all  custom methods are provided specific to this application. To get all
65  the values of the attribute C<essential> in a specified B<Feature>, you  the values of the attribute C<essential> in a specified B<Feature>, you
66  would code  would code
67    
68      my @values = $attrDB->GetAttributes([Feature => $fid], 'essential');      my @values = $attrDB->GetAttributes($fid, 'essential');
69    
70  where I<$fid> contains the ID of the desired feature. Each attribute has  where I<$fid> contains the ID of the desired feature.
 an alternate index to allow searching for attributes by value.  
71    
72  New attributes are introduced by updating the database definition at  Keys can be split into two pieces using the splitter value defined in the
73  run-time. Attribute values are stored by uploading data from files.  constructor (the default is C<::>). The first piece of the key is called
74  A web interface is provided for both these activities.  the I<real key>. This portion of the key must be defined using the
75    web interface (C<Attributes.cgi>). The second portion of the key is called
76    the I<sub key>, and can take any value.
77    
78    Major attribute activity is recorded in a log (C<attributes.log>) in the
79    C<$FIG_Config::var> directory. The log reports the user name, time, and
80    the details of the operation. The user name will almost always be unknown,
81    the exception being when it is specified in this object's constructor
82    (see L</new>).
83    
84  =head2 FIG_Config Parameters  =head2 FIG_Config Parameters
85    
# Line 74  Line 123 
123  functions as data to the attribute management process, so if the data is  functions as data to the attribute management process, so if the data is
124  moved, this file must go with it.  moved, this file must go with it.
125    
126  =back  =item attr_default_table
   
 The DBD file is critical, and must have reasonable contents before we can  
 begin using the system. In the old system, attributes were only provided  
 for Genomes and Features, so the initial XML file was the following.  
   
     <Database>  
       <Title>SEED Custom Attribute Database</Title>  
       <Entities>  
         <Entity name="Feature" keyType="id-string">  
           <Notes>A [i]feature[/i] is a part of the genome  
           that is of special interest. Features may be spread  
           across multiple contigs of a genome, but never across  
           more than one genome. Features can be assigned to roles  
           via spreadsheet cells, and are the targets of  
           annotation.</Notes>  
         </Entity>  
         <Entity name="Genome" keyType="name-string">  
           <Notes>A [i]genome[/i] describes a particular individual  
           organism's DNA.</Notes>  
         </Entity>  
       </Entities>  
     </Database>  
   
 It is not necessary to put any tables into the database; however, you should  
 run  
   
     AttrDBRefresh  
   
 periodically to insure it has the correct Genomes and Features in it. When  
 converting from the old system, use  
   
     AttrDBRefresh -migrate  
127    
128  to initialize the database and migrate the legacy data. You should only need  Name of the default relationship for attribute values. If not present,
129  to do that once.  C<HasValueFor> is used.
130    
131  =head2 Implementation Note  =back
   
 The L</Refresh> method reloads the entities in the database. If new  
 entity types are added, that method will need to be adjusted accordingly.  
132    
133  =head2 Public Methods  =head2 Public Methods
134    
135  =head3 new  =head3 new
136    
137  C<< my $attrDB = CustomAttributes->new($splitter); >>      my $attrDB = CustomAttributes->new(%options);
138    
139  Construct a new CustomAttributes object. This object cannot be used to add or  Construct a new CustomAttributes object. The following options are
140  delete keys because that requires modifying the database design. To do that,  supported.
 you need to use the static L</StoreAttributeKey> or L</DeleteAttributeKey>  
 methods.  
141    
142  =over 4  =over 4
143    
144  =item splitter  =item splitter
145    
146  Value to be used to split attribute values into sections in the  Value to be used to split attribute values into sections in the
147  L</Fig Replacement Methods>. The default is a double colon C<::>.  L</Fig Replacement Methods>. The default is a double colon C<::>,
148  If you do not use the replacement methods, you do not need to  and should only be overridden in extreme circumstances.
149  worry about this parameter.  
150    =item user
151    
152    Name of the current user. This will appear in the attribute log.
153    
154    =item dbd
155    
156    Filename for the DBD. If unspecified, the default DBD is used.
157    
158  =back  =back
159    
# Line 142  Line 161 
161    
162  sub new {  sub new {
163      # Get the parameters.      # Get the parameters.
164      my ($class, $splitter) = @_;      my ($class, %options) = @_;
165        # Get the name ofthe default table.
166      # Connect to the database.      # Connect to the database.
167      my $dbh = DBKernel->new($FIG_Config::attrDbms, $FIG_Config::attrDbName,      my $dbh = DBKernel->new($FIG_Config::attrDbms, $FIG_Config::attrDbName,
168                              $FIG_Config::attrUser, $FIG_Config::attrPass,                              $FIG_Config::attrUser, $FIG_Config::attrPass,
169                              $FIG_Config::attrPort, $FIG_Config::attrHost,                              $FIG_Config::attrPort, $FIG_Config::attrHost,
170                              $FIG_Config::attrSock);                              $FIG_Config::attrSock);
171      # Create the ERDB object.      # Create the ERDB object.
172      my $xmlFileName = $FIG_Config::attrDBD;      my $xmlFileName = ($options{dbd} ? $options{dbd} : $FIG_Config::attrDBD);
173      my $retVal = ERDB::new($class, $dbh, $xmlFileName);      my $retVal = ERDB::new($class, $dbh, $xmlFileName, %options);
174      # Store the splitter value.      # Store the splitter value.
175      $retVal->{splitter} = (defined($splitter) ? $splitter : '::');      $retVal->{splitter} = $options{splitter} || '::';
176        # Store the user name.
177        $retVal->{user} = $options{user} || '<unknown>';
178        Trace("User $retVal->{user} selected for attribute object.") if T(3);
179        # Compute the default value table name. If it's not overridden, the
180        # default is HasValueFor.
181        $retVal->{defaultRel} = $FIG_Config::attr_default_table || 'HasValueFor';
182      # Return the result.      # Return the result.
183      return $retVal;      return $retVal;
184  }  }
185    
186  =head3 StoreAttributeKey  =head3 StoreAttributeKey
187    
188  C<< my $attrDB = CustomAttributes::StoreAttributeKey($entityName, $attributeName, $type, $notes); >>      $attrDB->StoreAttributeKey($attributeName, $notes, \@groups, $table);
189    
190  Create or update an attribute for the database. This method will update the database definition  Create or update an attribute for the database.
 XML, but it will not create the table. It will connect to the database so that the caller  
 can upload the attribute values.  
191    
192  =over 4  =over 4
193    
 =item entityName  
   
 Name of the entity containing the attribute. The entity must exist.  
   
194  =item attributeName  =item attributeName
195    
196  Name of the attribute. It must be a valid ERDB field name, consisting entirely of  Name of the attribute (the real key). If it does not exist already, it will be created.
 letters, digits, and hyphens, with a letter at the beginning. If it does not  
 exist already, it will be created.  
   
 =item type  
   
 Data type of the attribute. This must be a valid ERDB data type name.  
197    
198  =item notes  =item notes
199    
200  Descriptive notes about the attribute. It is presumed to be raw text, not HTML.  Descriptive notes about the attribute. It is presumed to be raw text, not HTML.
201    
202  =item RETURN  =item groups
203    
204    Reference to a list of the groups to which the attribute should be associated.
205    This will replace any groups to which the attribute is currently attached.
206    
207    =item table
208    
209  Returns a Custom Attribute Database object if successful. If unsuccessful, an  The name of the relationship in which the attribute's values are to be stored.
210  error will be thrown.  If empty or undefined, the default relationship (usually C<HasValueFor>) will be
211    assumed.
212    
213  =back  =back
214    
# Line 196  Line 216 
216    
217  sub StoreAttributeKey {  sub StoreAttributeKey {
218      # Get the parameters.      # Get the parameters.
219      my ($entityName, $attributeName, $type, $notes) = @_;      my ($self, $attributeName, $notes, $groups, $table) = @_;
220      # Declare the return variable.      # Declare the return variable.
221      my $retVal;      my $retVal;
222      # Get the data type hash.      # Default the table name.
223      my %types = ERDB::GetDataTypes();      if (! $table) {
224            $table = $self->{defaultRel};
225        }
226      # Validate the initial input values.      # Validate the initial input values.
227      if (! ERDB::ValidateFieldName($attributeName)) {      if ($attributeName =~ /$self->{splitter}/) {
228          Confess("Invalid attribute name \"$attributeName\" specified.");          Confess("Invalid attribute name \"$attributeName\" specified.");
229      } elsif (! $notes || length($notes) < 25) {      } elsif (! $notes) {
230          Confess("Missing or incomplete description for $attributeName.");          Confess("Missing description for $attributeName.");
231      } elsif (! exists $types{$type}) {      } elsif (! grep { $_ eq $table } $self->GetConnectingRelationships('AttributeKey')) {
232          Confess("Invalid data type \"$type\" for $attributeName.");          Confess("Invalid relationship name \"$table\" specified as a custom attribute table.");
233      }      } else {
234      # Our next step is to read in the XML for the database defintion. We          # Create a variable to hold the action to be displayed for the log (Add or Update).
235      # need to verify that the named entity exists.          my $action;
236      my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);          # Okay, we're ready to begin. See if this key exists.
237      my $entityHash = $metadata->{Entities};          my $attribute = $self->GetEntity('AttributeKey', $attributeName);
238      if (! exists $entityHash->{$entityName}) {          if (defined($attribute)) {
239          Confess("Entity $entityName not found.");              # It does, so we do an update.
240      } else {              $action = "Update Key";
241          # Okay, we're ready to begin. Get the entity hash and the field hash.              $self->UpdateEntity('AttributeKey', $attributeName,
242          my $entityData = $entityHash->{$entityName};                                  { description => $notes,
243          my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);                                    'relationship-name' => $table});
244          # Compare the old attribute data to the new data.              # Detach the key from its current groups.
245          my $bigChange = 1;              $self->Disconnect('IsInGroup', 'AttributeKey', $attributeName);
246          if (exists $fieldHash->{$attributeName} && $fieldHash->{$attributeName}->{type} eq $type) {          } else {
247              $bigChange = 0;              # It doesn't, so we do an insert.
248          }              $action = "Insert Key";
249          # Compute the attribute's relation name.              $self->InsertObject('AttributeKey', { id => $attributeName,
250          my $relName = join("", $entityName, map { ucfirst $_ } split(/-|_/, $attributeName));                                  description => $notes,
251          # Store the attribute's field data. Note the use of the "content" hash for                                  'relationship-name' => $table});
252          # the notes. This is how the XML writer knows Notes is a text tag instead of          }
253          # an attribute.          # Attach the key to the specified groups. (We presume the groups already
254          $fieldHash->{$attributeName} = { type => $type, relation => $relName,          # exist.)
255                                           Notes => { content => $notes } };          for my $group (@{$groups}) {
256          # Insure we have an index for this attribute.              $self->InsertObject('IsInGroup', { 'from-link' => $attributeName,
257          my $index = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);                                                 'to-link'   => $group });
         if (! defined($index)) {  
             push @{$entityData->{Indexes}}, { IndexFields => [ { name => $attributeName, order => 'ascending' } ],  
                                               Notes       => "Alternate index provided for access by $attributeName." };  
         }  
         # Write the XML back out.  
         ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);  
         # Open a database with the new XML.  
         $retVal = CustomAttributes->new();  
         # Create the table if there has been a significant change.  
         if ($bigChange) {  
             $retVal->CreateTable($relName);  
258          }          }
259            # Log the operation.
260            $self->LogOperation($action, $attributeName, "Group list is " . join(" ", @{$groups}));
261      }      }
     return $retVal;  
262  }  }
263    
 =head3 Refresh  
264    
265  C<< $attrDB->Refresh($fig); >>  =head3 DeleteAttributeKey
266    
267        my $stats = $attrDB->DeleteAttributeKey($attributeName);
268    
269  Refresh the primary entity tables from the FIG data store. This method basically  Delete an attribute from the custom attributes database.
 drops and reloads the main tables of the custom attributes database.  
270    
271  =over 4  =over 4
272    
273  =item fig  =item attributeName
274    
275    Name of the attribute to delete.
276    
277    =item RETURN
278    
279  FIG-like object that can be used to find genomes and features.  Returns a statistics object describing the effects of the deletion.
280    
281  =back  =back
282    
283  =cut  =cut
284    
285  sub Refresh {  sub DeleteAttributeKey {
286      # Get the parameters.      # Get the parameters.
287      my ($self, $fig) = @_;      my ($self, $attributeName) = @_;
288      # Create load objects for the genomes and the features.      # Delete the attribute key.
289      my $loadGenome = ERDBLoad->new($self, 'Genome', $FIG_Config::temp);      my $retVal = $self->Delete('AttributeKey', $attributeName);
290      my $loadFeature = ERDBLoad->new($self, 'Feature', $FIG_Config::temp);      # Log this operation.
291      # Get the genome list.      $self->LogOperation("Delete Key", $attributeName, "Key will no longer be available for use by anyone.");
292      my @genomes = $fig->genomes();      # Return the result.
293      # Loop through the genomes.      return $retVal;
     for my $genomeID (@genomes) {  
         # Put this genome in the genome table.  
         $loadGenome->Put($genomeID);  
         Trace("Processing Genome $genomeID") if T(3);  
         # Put its features into the feature table. Note we have to use a hash to  
         # remove duplicates.  
         my %featureList = map { $_ => 1 } $fig->all_features($genomeID);  
         for my $fid (keys %featureList) {  
             $loadFeature->Put($fid);  
         }  
     }  
     # Get a variable for holding statistics objects.  
     my $stats;  
     # Finish the genome load.  
     Trace("Loading Genome relation.") if T(2);  
     $stats = $loadGenome->FinishAndLoad();  
     Trace("Genome table load statistics:\n" . $stats->Show()) if T(3);  
     # Finish the feature load.  
     Trace("Loading Feature relation.") if T(2);  
     $stats = $loadFeature->FinishAndLoad();  
     Trace("Feature table load statistics:\n" . $stats->Show()) if T(3);  
 }  
294    
295  =head3 LoadAttributeKey  }
296    
297  C<< my $stats = $attrDB->LoadAttributeKey($entityName, $fieldName, $fh, $keyCol, $dataCol); >>  =head3 NewName
298    
299  Load the specified attribute from the specified file. The file should be a      my $text = CustomAttributes::NewName();
 tab-delimited file with internal tab and new-line characters escaped. This is  
 the typical TBL-style file used by most FIG applications. One of the columns  
 in the input file must contain the appropriate key value and the other the  
 corresponding attribute value.  
300    
301  =over 4  Return the string used to indicate the user wants to add a new attribute.
302    
303  =item entityName  =cut
304    
305  Name of the entity containing the attribute.  sub NewName {
306        return "(new)";
307    }
308    
309  =item fieldName  =head3 LoadAttributesFrom
310    
311  Name of the actual attribute.  C<< my $stats = $attrDB->LoadAttributesFrom($fileName, %options); >>
312    
313  =item fh  Load attributes from the specified tab-delimited file. Each line of the file must
314    contain an object ID in the first column, an attribute key name in the second
315    column, and attribute values in the remaining columns. The attribute values must
316    be assembled into a single value using the splitter code. In addition, the key names may
317    contain a splitter. If this is the case, the portion of the key after the splitter is
318    treated as a subkey.
319    
320  Open file handle for the input file.  =over 4
321    
322  =item keyCol  =item fileName
323    
324  Index (0-based) of the column containing the key field. The key field should  Name of the file from which to load the attributes, or an open handle for the file.
325  contain the ID of an instance of the named entity.  (This last enables the method to be used in conjunction with the CGI form upload
326    control.)
327    
328  =item dataCol  =item options
329    
330  Index (0-based) of the column containing the data value field.  Hash of options for modifying the load process.
331    
332  =item RETURN  =item RETURN
333    
334  Returns a statistics object for the load process.  Returns a statistics object describing the load.
335    
336  =back  =back
337    
338  =cut  Permissible option values are as follows.
339    
340  sub LoadAttributeKey {  =over 4
     # Get the parameters.  
     my ($self, $entityName, $fieldName, $fh, $keyCol, $dataCol) = @_;  
     # Create the return variable.  
     my $retVal;  
     # Insure the entity exists.  
     my $found = grep { $_ eq $entityName } $self->GetEntityTypes();  
     if (! $found) {  
         Confess("Entity \"$entityName\" not found in database.");  
     } else {  
         # Get the field structure for the named entity.  
         my $fieldHash = $self->GetFieldTable($entityName);  
         # Verify that the attribute exists.  
         if (! exists $fieldHash->{$fieldName}) {  
             Confess("Attribute key \"$fieldName\" does not exist in entity $entityName.");  
         } else {  
             # Create a loader for the specified attribute. We need the  
             # relation name first.  
             my $relName = $fieldHash->{$fieldName}->{relation};  
             my $loadAttribute = ERDBLoad->new($self, $relName, $FIG_Config::temp);  
             # Loop through the input file.  
             while (! eof $fh) {  
                 # Get the next line of the file.  
                 my @fields = Tracer::GetLine($fh);  
                 $loadAttribute->Add("lineIn");  
                 # Now we need to validate the line.  
                 if ($#fields < $dataCol) {  
                     $loadAttribute->Add("shortLine");  
                 } elsif (! $self->Exists($entityName, $fields[$keyCol])) {  
                     $loadAttribute->Add("badKey");  
                 } else {  
                     # It's valid,so send it to the loader.  
                     $loadAttribute->Put($fields[$keyCol], $fields[$dataCol]);  
                     $loadAttribute->Add("lineUsed");  
                 }  
             }  
             # Finish the load.  
             $retVal = $loadAttribute->FinishAndLoad();  
         }  
     }  
     # Return the statistics.  
     return $retVal;  
 }  
341    
342    =item mode
343    
344  =head3 DeleteAttributeKey  Loading mode. Legal values are C<low_priority> (which reduces the task priority
345    of the load) and C<concurrent> (which reduces the locking cost of the load). The
346    default is a normal load.
347    
348  C<< CustomAttributes::DeleteAttributeKey($entityName, $attributeName); >>  =item append
349    
350  Delete an attribute from the custom attributes database.  If TRUE, then the attributes will be appended to existing data; otherwise, the
351    first time a key name is encountered, it will be erased.
352    
353  =over 4  =item archive
354    
355  =item entityName  If specified, the name of a file into which the incoming data should be saved.
356    If I<resume> is also specified, only the lines actually loaded will be put
357    into this file.
358    
359  Name of the entity possessing the attribute.  =item objectType
360    
361  =item attributeName  If specified, the specified object type will be prefixed to each object ID.
362    
363  Name of the attribute to delete.  =item resume
364    
365    If specified, key-value pairs already in the database will not be reinserted.
366    Specify a number to start checking after the specified number of lines and
367    then admit everything after the first line not yet loaded. Specify C<careful>
368    to check every single line. Specify C<none> to ignore this option. The default
369    is C<none>. So, if you believe that a previous load failed somewhere after 50000
370    lines, a resume value of C<50000> would skip 50000 lines in the file, then
371    check each line after that until it finds one not already in the database. The
372    first such line found and all lines after that will be loaded. On the other
373    hand, if you have a file of 100000 records, and some have been loaded and some
374    not, you would use the word C<careful>, so that every line would be checked before
375    it is inserted. A resume of C<0> will start checking the first line of the
376    input file and then begin loading once it finds a line not in the database.
377    
378    =item chunkSize
379    
380    Number of lines to load in each burst. The default is 10,000.
381    
382  =back  =back
383    
384  =cut  =cut
385    
386  sub DeleteAttributeKey {  sub LoadAttributesFrom {
387      # Get the parameters.      # Get the parameters.
388      my ($entityName, $attributeName) = @_;      my ($self, $fileName, %options) = @_;
389      # Read in the XML for the database defintion. We need to verify that      # Declare the return variable.
390      # the named entity exists and it has the named attribute.      my $retVal = Stats->new('keys', 'values', 'linesOut');
391      my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);      # Initialize the timers.
392      my $entityHash = $metadata->{Entities};      my ($eraseTime, $archiveTime, $checkTime) = (0, 0, 0);
393      if (! exists $entityHash->{$entityName}) {      # Check for append mode.
394          Confess("Entity \"$entityName\" not found.");      my $append = ($options{append} ? 1 : 0);
395      } else {      # Check for resume mode.
396          # Get the field hash.      my $resume = (defined($options{resume}) ? $options{resume} : 'none');
397          my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);      # Create a hash of key names found.
398          if (! exists $fieldHash->{$attributeName}) {      my %keyHash = ();
399              Confess("Attribute key \"$attributeName\" not found in entity $entityName.");      # Create a hash of table names to files. Most attributes go into the HasValueFor
400          } else {      # table, but some are put into other tables. Each table name will be mapped
401              # Get the attribute's relation name.      # to a sub-hash with keys "fileName" (output file for the table) and "count"
402              my $relName = $fieldHash->{$attributeName}->{relation};      # (number of lines in the file).
403              # Check for an index.      my %tableHash = ();
404              my $indexIdx = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);      # Compute the chunk size.
405              if (defined($indexIdx)) {      my $chunkSize = ($options{chunkSize} ? $options{chunkSize} : 10000);
406                  Trace("Index for $attributeName found at position $indexIdx for $entityName.") if T(3);      # Open the file for input. Note we must anticipate the possibility of an
407                  delete $entityHash->{$entityName}->{Indexes}->[$indexIdx];      # open filehandle being passed in. This occurs when the user is submitting
408              }      # the load file over the web.
409              # Delete the attribute from the field hash.      my $fh;
410              Trace("Deleting attribute $attributeName from $entityName.") if T(3);      if (ref $fileName) {
411              delete $fieldHash->{$attributeName};          Trace("Using file opened by caller.") if T(3);
412              # Write the XML back out.          $fh = $fileName;
413              ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);      } else {
414              # Insure the relation does not exist in the database. This requires connecting          Trace("Attributes will be loaded from $fileName.") if T(3);
415              # since we may have to do a table drop.          $fh = Open(undef, "<$fileName");
416              my $attrDB = CustomAttributes->new();      }
417              Trace("Dropping table $relName.") if T(3);      # Trace the mode.
418              $attrDB->DropRelation($relName);      if (T(3)) {
419            if ($options{mode}) {
420                Trace("Mode is $options{mode}.")
421            } else {
422                Trace("No mode specified.")
423            }
424        }
425        # Now check to see if we need to archive.
426        my $ah;
427        if (exists $options{archive}) {
428            my $ah = Open(undef, ">$options{archive}");
429            Trace("Load file will be archived to $options{archive}.") if T(3);
430        }
431        # Insure we recover from errors.
432        eval {
433            # If we have a resume number, process it here.
434            if ($resume =~ /\d+/) {
435                Trace("Skipping $resume lines.") if T(2);
436                my $startTime = time();
437                # Skip the specified number of lines.
438                for (my $skipped = 0; ! eof($fh) && $skipped < $resume; $skipped++) {
439                    my $line = <$fh>;
440                    $retVal->Add(skipped => 1);
441          }          }
442                $checkTime += time() - $startTime;
443      }      }
444            # Loop through the file.
445            Trace("Starting load.") if T(2);
446            while (! eof $fh) {
447                # Read the current line.
448                my ($id, $key, @values) = Tracer::GetLine($fh);
449                $retVal->Add(linesIn => 1);
450                # Do some validation.
451                if (! $id) {
452                    # We ignore blank lines.
453                    $retVal->Add(blankLines => 1);
454                } elsif (substr($id, 0, 1) eq '#') {
455                    # A line beginning with a pound sign is a comment.
456                    $retVal->Add(comments => 1);
457                } elsif (! defined($key)) {
458                    # An ID without a key is a serious error.
459                    my $lines = $retVal->Ask('linesIn');
460                    Confess("Line $lines in $fileName has no attribute key.");
461                } elsif (! @values) {
462                    # A line with no values is not allowed.
463                    my $lines = $retVal->Ask('linesIn');
464                    Trace("Line $lines for key $key has no attribute values.") if T(1);
465                    $retVal->Add(skipped => 1);
466                } else {
467                    # Check to see if we need to fix up the object ID.
468                    if ($options{objectType}) {
469                        $id = "$options{objectType}:$id";
470                    }
471                    # The key contains a real part and an optional sub-part. We need the real part.
472                    my ($realKey, $subKey) = $self->SplitKey($key);
473                    # Now we need to check for a new key.
474                    if (! exists $keyHash{$realKey}) {
475                        my $keyObject = $self->GetEntity(AttributeKey => $realKey);
476                        if (! defined($keyObject)) {
477                            # Here the specified key does not exist, which is an error.
478                            my $line = $retVal->Ask('linesIn');
479                            Confess("Attribute \"$realKey\" on line $line of $fileName not found in database.");
480                        } else {
481                            # Make sure we know this is no longer a new key. We do this by putting
482                            # its table name in the key hash.
483                            $keyHash{$realKey} = $keyObject->PrimaryValue('AttributeKey(relationship-name)');
484                            $retVal->Add(keys => 1);
485                            # If this is NOT append mode, erase the key. This does not delete the key
486                            # itself; it just clears out all the values.
487                            if (! $append) {
488                                my $startTime = time();
489                                $self->EraseAttribute($realKey);
490                                $eraseTime += time() - $startTime;
491                                Trace("Attribute $realKey erased.") if T(3);
492                            }
493                        }
494                        Trace("Key $realKey found.") if T(3);
495                    }
496                    # If we're in resume mode, check to see if this insert is redundant.
497                    my $ok = 1;
498                    if ($resume ne 'none') {
499                        my $startTime = time();
500                        my $count = $self->GetAttributes($id, $key, @values);
501                        if ($count) {
502                            # Here the record is found, so we skip it.
503                            $ok = 0;
504                            $retVal->Add(skipped => 1);
505                        } else {
506                            # Here the record is not found. If we're in non-careful mode, we
507                            # stop resume checking at this point.
508                            if ($resume ne 'careful') {
509                                $resume = 'none';
510                            }
511                        }
512                        $checkTime += time() - $startTime;
513                    }
514                    if ($ok) {
515                        # We're in business. First, archive this row.
516                        if (defined $ah) {
517                            my $startTime = time();
518                            Tracer::PutLine($ah, [$id, $key, @values]);
519                            $archiveTime += time() - $startTime;
520                        }
521                        # We need to format the attribute data so it will work
522                        # as if it were a load file. This means we join the
523                        # values.
524                        my $valueString = join('::', @values);
525                        # Now we need to get access to the key's load file. Check for it in the
526                        # table hash.
527                        my $keyTable = $keyHash{$realKey};
528                        if (! exists $tableHash{$keyTable}) {
529                            # This is a new table, so we need to set it up. First, we get
530                            # a temporary file for it.
531                            my $tempFileName = FIGRules::GetTempFileName(sessionID => $$ . $keyTable,
532                                                                         extension => 'dtx');
533                            my $oh = Open(undef, ">$tempFileName");
534                            # Now we create its descriptor in the table hash.
535                            $tableHash{$keyTable} = {fileName => $tempFileName, handle => $oh, count => 0};
536                        }
537                        # Everything is all set up, so we put the value in the temporary file and
538                        # count it.
539                        my $tableData = $tableHash{$keyTable};
540                        my $startTime = time();
541                        Tracer::PutLine($tableData->{handle}, [$realKey, $id, $subKey, $valueString]);
542                        $archiveTime += time() - $startTime;
543                        $retVal->Add(linesOut => 1);
544                        $tableData->{count}++;
545                        # See if it's time to load a chunk.
546                        if ($tableData->{count} >= $chunkSize) {
547                            # We've filled a chunk, so it's time.
548                            close $tableData->{handle};
549                            $self->_LoadAttributeTable($keyTable, $tableData->{fileName}, $retVal);
550                            # Reset for the next chunk.
551                            $tableData->{count} = 0;
552                            $tableData->{handle} = Open(undef, ">$tableData->{fileName}");
553                        }
554                    } else {
555                        # Here we skipped because of resume mode.
556                        $retVal->Add(resumeSkip => 1);
557                    }
558                    Trace($retVal->Ask('values') . " values processed.") if $retVal->Check(values => 1000) && T(3);
559                }
560            }
561            # Now we close the archive file. Note we undefine the handle so the error methods know
562            # not to worry.
563            if (defined $ah) {
564                close $ah;
565                undef $ah;
566            }
567            # Now we load the residual from the temporary files (if any). This time we'll do an
568            # analyze as well.
569            for my $tableName (keys %tableHash) {
570                # Get the data for this table.
571                my $tableData = $tableHash{$tableName};
572                # Close the handle. ERDB will re-open it for input later.
573                close $tableData->{handle};
574                # Check to see if there's anything left to load.
575                if ($tableData->{count} > 0) {
576                    # Yes, load the data.
577                    $self->_LoadAttributeTable($tableName, $tableData->{fileName}, $retVal);
578                }
579                # Regardless of whether additional loading was required, we need to
580                # analyze the table for performance.
581                my $startTime = time();
582                $self->Analyze($tableName);
583                $retVal->Add(analyzeTime => time() - $startTime);
584            }
585            Trace("Attribute load successful.") if T(2);
586        };
587        # Check for an error.
588        if ($@) {
589            # Here we have an error. Display the error message.
590            my $message = $@;
591            Trace("Error during attribute load: $message") if T(0);
592            $retVal->AddMessage($message);
593            # Close the archive file if it's open. The archive file can sometimes provide
594            # clues as to what happened.
595            if (defined $ah) {
596                close $ah;
597            }
598        }
599        # Store the timers.
600        $retVal->Add(eraseTime   => $eraseTime);
601        $retVal->Add(archiveTime => $archiveTime);
602        $retVal->Add(checkTime   => $checkTime);
603        # Return the result.
604        return $retVal;
605  }  }
606    
607  =head3 ControlForm  =head3 BackupKeys
608    
609  C<< my $formHtml = $attrDB->ControlForm($cgi, $name); >>      my $stats = $attrDB->BackupKeys($fileName, %options);
610    
611  Return a form that can be used to control the creation and modification of  Backup the attribute key information from the attribute database.
 attributes.  
612    
613  =over 4  =over 4
614    
615  =item cgi  =item fileName
616    
617  CGI query object used to create HTML.  Name of the output file.
618    
619  =item name  =item options
620    
621  Name to give to the form. This should be unique for the web page.  Options for modifying the backup process.
622    
623  =item RETURN  =item RETURN
624    
625  Returns the HTML for a form that submits instructions to the C<Attributes.cgi> script  Returns a statistics object for the backup.
 for loading, creating, or deleting an attribute.  
626    
627  =back  =back
628    
629    Currently there are no options. The backup is straight to a text file in
630    tab-delimited format. Each key is backup up to two lines. The first line
631    is all of the data from the B<AttributeKey> table. The second is a
632    tab-delimited list of all the groups.
633    
634  =cut  =cut
635    
636  sub ControlForm {  sub BackupKeys {
637      # Get the parameters.      # Get the parameters.
638      my ($self, $cgi, $name) = @_;      my ($self, $fileName, %options) = @_;
639      # Declare the return list.      # Declare the return variable.
640      my @retVal = ();      my $retVal = Stats->new();
641      # Start the form. We use multipart to support the upload control.      # Open the output file.
642      push @retVal, $cgi->start_multipart_form(-name => $name);      my $fh = Open(undef, ">$fileName");
643      # We'll put the controls in a table. Nothing else ever seems to look nice.      # Set up to read the keys.
644      push @retVal, $cgi->start_table({ border => 2, cellpadding => 2 });      my $keyQuery = $self->Get(['AttributeKey'], "", []);
645      # The first row is for selecting the field name.      # Loop through the keys.
646      push @retVal, $cgi->Tr($cgi->th("Select a Field"),      while (my $keyData = $keyQuery->Fetch()) {
647                             $cgi->td($self->FieldMenu($cgi, 10, 'fieldName', 1,          $retVal->Add(key => 1);
648                                                       "document.$name.notes.value",          # Get the fields.
649                                                       "document.$name.dataType.value")));          my ($id, $type, $tableName, $description) =
650      # Now we set up a dropdown for the data types. The values will be the              $keyData->Values(['AttributeKey(id)', 'AttributeKey(relationship-name)',
651      # data type names, and the labels will be the descriptions.                                'AttributeKey(description)']);
652      my %types = ERDB::GetDataTypes();          # Escape any tabs or new-lines in the description.
653      my %labelMap = map { $_ => $types{$_}->{notes} } keys %types;          my $escapedDescription = Tracer::Escape($description);
654      my $typeMenu = $cgi->popup_menu(-name   => 'dataType',          # Write the key data to the output.
655                                      -values => [sort keys %types],          Tracer::PutLine($fh, [$id, $type, $tableName, $escapedDescription]);
656                                      -labels => \%labelMap);          # Get the key's groups.
657      push @retVal, $cgi->Tr($cgi->th("Data type"),          my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(from-link) = ?", [$id],
658                             $cgi->td($typeMenu));                                      'IsInGroup(to-link)');
659      # The next row is for the notes.          $retVal->Add(memberships => scalar(@groups));
660      push @retVal, $cgi->Tr($cgi->th("Description"),          # Write them to the output. Note we put a marker at the beginning to insure the line
661                             $cgi->td($cgi->textarea(-name => 'notes',          # is nonempty.
662                                                     -rows => 6,          Tracer::PutLine($fh, ['#GROUPS', @groups]);
663                                                     -columns => 80))      }
664                            );      # Log the operation.
665      # Allow the user to specify a new field name. This is required if the      $self->LogOperation("Backup Keys", $fileName, $retVal->Display());
666      # user has selected one of the "(new)" markers.      # Return the result.
667      push @retVal, $cgi->Tr($cgi->th("New Field Name"),      return $retVal;
668                             $cgi->td($cgi->textfield(-name => 'newName',  }
                                                     -size => 30)),  
                                     );  
     # If the user wants to upload new values for the field, then we have  
     # an upload file name and column indicators.  
     push @retVal, $cgi->Tr($cgi->th("Upload Values"),  
                            $cgi->td($cgi->filefield(-name => 'newValueFile',  
                                                     -size => 20) .  
                                     " Key&nbsp;" .  
                                     $cgi->textfield(-name => 'keyCol',  
                                                     -size => 3,  
                                                     -default => 0) .  
                                     " Value&nbsp;" .  
                                     $cgi->textfield(-name => 'valueCol',  
                                                     -size => 3,  
                                                     -default => 1)  
                                    ),  
                           );  
     # Now the three buttons: UPDATE, SHOW, and DELETE.  
     push @retVal, $cgi->Tr($cgi->th("&nbsp;"),  
                            $cgi->td({align => 'center'},  
                                     $cgi->submit(-name => 'Delete', -value => 'DELETE') . " " .  
                                     $cgi->submit(-name => 'Store',  -value => 'STORE') . " " .  
                                     $cgi->submit(-name => 'Show',   -value => 'SHOW')  
                                    )  
                           );  
     # Close the table and the form.  
     push @retVal, $cgi->end_table();  
     push @retVal, $cgi->end_form();  
     # Return the assembled HTML.  
     return join("\n", @retVal, "");  
 }  
   
 =head3 FieldMenu  
   
 C<< my $menuHtml = $attrDB->FieldMenu($cgi, $height, $name, $newFlag, $noteControl, $typeControl); >>  
   
 Return the HTML for a menu to select an attribute field. The menu will  
 be a standard SELECT/OPTION thing which is called "popup menu" in the  
 CGI package, but actually looks like a list. The list will contain  
 one selectable row per field, grouped by entity.  
   
 =over 4  
   
 =item cgi  
   
 CGI query object used to generate HTML.  
   
 =item height  
669    
670  Number of lines to display in the list.  =head3 RestoreKeys
671    
672  =item name      my $stats = $attrDB->RestoreKeys($fileName, %options);
673    
674  Name to give to the menu. This is the name under which the value will  Restore the attribute keys and groups from a backup file.
 appear when the form is submitted.  
675    
676  =item newFlag (optional)  =over 4
677    
678  If TRUE, then extra rows will be provided to allow the user to select  =item fileName
 a new attribute. In other words, the user can select an existing  
 attribute, or can choose a C<(new)> marker to indicate a field to  
 be created in the parent entity.  
679    
680  =item noteControl (optional)  Name of the file containing the backed-up keys. Each key has a pair of lines,
681    one containing the key data and one listing its groups.
682    
683  If specified, the name of a variable for displaying the notes attached  =back
 to the field. This must be in Javascript form ready for assignment.  
 So, for example, if you have a variable called C<notes> that  
 represents a paragraph element, you should code C<notes.innerHTML>.  
 If it actually represents a form field you should code C<notes.value>.  
 If an C<innerHTML> coding is used, the text will be HTML-escaped before  
 it is copied in. Specifying this parameter generates Javascript for  
 displaying the field description when a field is selected.  
684    
685  =item typeControl (optional)  =cut
686    
687  If specified, the name of a variable for displaying the field's  sub RestoreKeys {
688  data type. Data types are a much more controlled vocabulary than      # Get the parameters.
689  notes, so there is no worry about HTML translation. Instead, the      my ($self, $fileName, %options) = @_;
690  raw value is put into the specified variable. Otherwise, the same      # Declare the return variable.
691  rules apply to this value that apply to I<$noteControl>.      my $retVal = Stats->new();
692        # Set up a hash to hold the group IDs.
693        my %groups = ();
694        # Open the file.
695        my $fh = Open(undef, "<$fileName");
696        # Loop until we're done.
697        while (! eof $fh) {
698            # Get a key record.
699            my ($id, $tableName, $description) = Tracer::GetLine($fh);
700            if ($id eq '#GROUPS') {
701                Confess("Group record found when key record expected.");
702            } elsif (! defined($description)) {
703                Confess("Invalid format found for key record.");
704            } else {
705                $retVal->Add("keyIn" => 1);
706                # Add this key to the database.
707                $self->InsertObject('AttributeKey', { id => $id,
708                                                      description => Tracer::UnEscape($description),
709                                                      'relationship-name' => $tableName});
710                Trace("Attribute $id stored.") if T(3);
711                # Get the group line.
712                my ($marker, @groups) = Tracer::GetLine($fh);
713                if (! defined($marker)) {
714                    Confess("End of file found where group record expected.");
715                } elsif ($marker ne '#GROUPS') {
716                    Confess("Group record not found after key record.");
717                } else {
718                    $retVal->Add(memberships => scalar(@groups));
719                    # Connect the groups.
720                    for my $group (@groups) {
721                        # Find out if this is a new group.
722                        if (! $groups{$group}) {
723                            $retVal->Add(newGroup => 1);
724                            # Add the group.
725                            $self->InsertObject('AttributeGroup', { id => $group });
726                            Trace("Group $group created.") if T(3);
727                            # Make sure we know it's not new.
728                            $groups{$group} = 1;
729                        }
730                        # Connect the group to our key.
731                        $self->InsertObject('IsInGroup', { 'from-link' => $id, 'to-link' => $group });
732                    }
733                    Trace("$id added to " . scalar(@groups) . " groups.") if T(3);
734                }
735            }
736        }
737        # Log the operation.
738        $self->LogOperation("Backup Keys", $fileName, $retVal->Display());
739        # Return the result.
740        return $retVal;
741    }
742    
743  =item RETURN  =head3 ArchiveFileName
744    
745  Returns the HTML to create a form field that can be used to select an      my $fileName = $ca->ArchiveFileName();
 attribute from the custom attributes system.  
746    
747  =back  Compute a file name for archiving attribute input data. The file will be in the attribute log directory
748    
749  =cut  =cut
750    
751  sub FieldMenu {  sub ArchiveFileName {
752      # Get the parameters.      # Get the parameters.
753      my ($self, $cgi, $height, $name, $newFlag, $noteControl, $typeControl) = @_;      my ($self) = @_;
754      # These next two hashes make everything happen. "entities"      # Declare the return variable.
755      # maps each entity name to the list of values to be put into its      my $retVal;
756      # option group. "labels" maps each entity name to a map from values      # We start by turning the timestamp into something usable as a file name.
757      # to labels.      my $now = Tracer::Now();
758      my @entityNames = sort ($self->GetEntityTypes());      $now =~ tr/ :\//___/;
759      my %entities = map { $_ => [] } @entityNames;      # Next we get the directory name.
760      my %labels = map { $_ => { }} @entityNames;      my $dir = "$FIG_Config::var/attributes";
761      # Loop through the entities, adding the existing attributes.      if (! -e $dir) {
762      for my $entity (@entityNames) {          Trace("Creating attribute file directory $dir.") if T(1);
763          # Get this entity's field table.          mkdir $dir;
764          my $fieldHash = $self->GetFieldTable($entity);      }
765          # Get its field list in our local hashes.      # Put it together with the field name and the time stamp.
766          my $fieldList = $entities{$entity};      $retVal = "$dir/upload.$now";
767          my $labelList = $labels{$entity};      # Modify the file name to insure it's unique.
768          # Add the NEW fields if we want them.      my $seq = 0;
769          if ($newFlag) {      while (-e "$retVal.$seq.tbl") { $seq++ }
770              push @{$fieldList}, $entity;      # Use the computed sequence number to get the correct file name.
771              $labelList->{$entity} = "(new)";      $retVal .= ".$seq.tbl";
         }  
         # Loop through the fields in the hash. We only keep the ones with a  
         # secondary relation name. (In other words, the name of the relation  
         # in which the field appears cannot be the same as the entity name.)  
         for my $fieldName (sort keys %{$fieldHash}) {  
             if ($fieldHash->{$fieldName}->{relation} ne $entity) {  
                 my $value = "$entity/$fieldName";  
                 push @{$fieldList}, $value;  
                 $labelList->{$value} = $fieldName;  
             }  
         }  
     }  
     # Now we have a hash and a list for each entity, and they correspond  
     # exactly to what the $cgi->optgroup function expects.  
     # The last step is to create the name for the onChange function. This function  
     # may not do anything, but we need to know the name to generate the HTML  
     # for the menu.  
     my $changeName = "${name}_setNotes";  
     my $retVal = $cgi->popup_menu({name => $name,  
                                    size => $height,  
                                    onChange => "$changeName(this.value)",  
                                    values => [map { $cgi->optgroup(-name => $_,  
                                                                    -values => $entities{$_},  
                                                                    -labels => $labels{$_})  
                                                   } @entityNames]}  
                                  );  
     # Create the change function.  
     $retVal .= "\n<script language=\"javascript\">\n";  
     $retVal .= "    function $changeName(fieldValue) {\n";  
     # The function only has a body if we have a notes control to store the description.  
     if ($noteControl || $typeControl) {  
         # Check to see if we're storing HTML or text into the note control.  
         my $htmlMode = ($noteControl && $noteControl =~ /innerHTML$/);  
         # We use a CASE statement based on the newly-selected field value. The  
         # field description will be stored in the JavaScript variable "myText"  
         # and the data type in "myType". Note the default data type is a normal  
         # string, but the default notes is an empty string.  
         $retVal .= "        var myText = \"\";\n";  
         $retVal .= "        var myType = \"string\";\n";  
         $retVal .= "        switch (fieldValue) {\n";  
         # Loop through the entities.  
         for my $entity (@entityNames) {  
             # Get the entity's field hash. This has the notes in it.  
             my $fieldHash = $self->GetFieldTable($entity);  
             # Loop through the values we might see for this entity's fields.  
             my $fields = $entities{$entity};  
             for my $value (@{$fields}) {  
                 # Only proceed if we have an existing field.  
                 if ($value =~ m!/(.+)$!) {  
                     # Get the field's hash element.  
                     my $element = $fieldHash->{$1};  
                     # Generate this case.  
                     $retVal .= "        case \"$value\" :\n";  
                     # Here we either want to update the note display, the  
                     # type display, or both.  
                     if ($noteControl) {  
                         # Here we want the notes updated.  
                         my $notes = $element->{Notes}->{content};  
                         # Insure it's in the proper form.  
                         if ($htmlMode) {  
                             $notes = ERDB::HTMLNote($notes);  
                         }  
                         # Escape it for use as a string literal.  
                         $notes =~ s/\n/\\n/g;  
                         $notes =~ s/"/\\"/g;  
                         $retVal .= "           myText = \"$notes\";\n";  
                     }  
                     if ($typeControl) {  
                         # Here we want the type updated.  
                         my $type = $element->{type};  
                         $retVal .= "           myType = \"$type\";\n";  
                     }  
                     # Close this case.  
                     $retVal .= "           break;\n";  
                 }  
             }  
         }  
         # Close the CASE statement and make the appropriate assignments.  
         $retVal .= "        }\n";  
         if ($noteControl) {  
             $retVal .= "        $noteControl = myText;\n";  
         }  
         if ($typeControl) {  
             $retVal .= "        $typeControl = myType;\n";  
         }  
     }  
     # Terminate the change function.  
     $retVal .= "    }\n";  
     $retVal .= "</script>\n";  
772      # Return the result.      # Return the result.
773      return $retVal;      return $retVal;
774  }  }
775    
776  =head3 MatchSqlPattern  =head3 BackupAllAttributes
777    
778  C<< my $matched = CustomAttributes::MatchSqlPattern($value, $pattern); >>      my $stats = $attrDB->BackupAllAttributes($fileName, %options);
779    
780  Determine whether or not a specified value matches an SQL pattern. An SQL  Backup all of the attributes to a file. The attributes will be stored in a
781  pattern has two wild card characters: C<%> that matches multiple characters,  tab-delimited file suitable for reloading via L</LoadAttributesFrom>.
 and C<_> that matches a single character. These can be escaped using a  
 backslash (C<\>). We pull this off by converting the SQL pattern to a  
 PERL regular expression. As per SQL rules, the match is case-insensitive.  
782    
783  =over 4  =over 4
784    
785  =item value  =item fileName
786    
787  Value to be matched against the pattern. Note that an undefined or empty  Name of the file to which the attribute data should be backed up.
 value will not match anything.  
788    
789  =item pattern  =item options
790    
791  SQL pattern against which to match the value. An undefined or empty pattern will  Hash of options for the backup.
 match everything.  
792    
793  =item RETURN  =item RETURN
794    
795  Returns TRUE if the value and pattern match, else FALSE.  Returns a statistics object describing the backup.
796    
797  =back  =back
798    
799    Currently there are no options defined.
800    
801  =cut  =cut
802    
803  sub MatchSqlPattern {  sub BackupAllAttributes {
804      # Get the parameters.      # Get the parameters.
805      my ($value, $pattern) = @_;      my ($self, $fileName, %options) = @_;
806      # Declare the return variable.      # Declare the return variable.
807      my $retVal;      my $retVal = Stats->new();
808      # Insure we have a pattern.      # Get a list of the keys.
809      if (! defined($pattern) || $pattern eq "") {      my %keys = map { $_->[0] => $_->[1] } $self->GetAll(['AttributeKey'],
810          $retVal = 1;                                                          "", [], ['AttributeKey(id)',
811      } else {                                                                    'AttributeKey(relationship-name)']);
812          # Break the pattern into pieces around the wildcard characters. Because we      Trace(scalar(keys %keys) . " keys found during backup.") if T(2);
813          # use parentheses in the split function's delimiter expression, we'll get      # Open the file for output.
814          # list elements for the delimiters as well as the rest of the string.      my $fh = Open(undef, ">$fileName");
815          my @pieces = split /([_%]|\\[_%])/, $pattern;      # Loop through the keys.
816          # Check some fast special cases.      for my $key (sort keys %keys) {
817          if ($pattern eq '%') {          Trace("Backing up attribute $key.") if T(3);
818              # A null pattern matches everything.          $retVal->Add(keys => 1);
819              $retVal = 1;          # Get the key's relevant relationship name.
820          } elsif (@pieces == 1) {          my $relName = $keys{$key};
821              # No wildcards, so we have a literal comparison. Note we're case-insensitive.          # Loop through this key's values.
822              $retVal = (lc($value) eq lc($pattern));          my $query = $self->Get([$relName], "$relName(from-link) = ?", [$key]);
823          } elsif (@pieces == 2 && $pieces[1] eq '%') {          my $valuesFound = 0;
824              # A wildcard at the end, so we have a substring match. This is also case-insensitive.          while (my $line = $query->Fetch()) {
825              $retVal = (lc(substr($value, 0, length($pieces[0]))) eq lc($pieces[0]));              $valuesFound++;
826          } else {              # Get this row's data.
827              # Okay, we have to do it the hard way. Convert each piece to a PERL pattern.              my ($id, $key, $subKey, $value) = $line->Values(["$relName(to-link)",
828              my $realPattern = "";                                                               "$relName(from-link)",
829              for my $piece (@pieces) {                                                               "$relName(subkey)",
830                  # Determine the type of piece.                                                               "$relName(value)"]);
831                  if ($piece eq "") {              # Check for a subkey.
832                      # Empty pieces are ignored.              if ($subKey ne '') {
833                  } elsif ($piece eq "%") {                  $key = "$key$self->{splitter}$subKey";
                     # Here we have a multi-character wildcard. Note that it can match  
                     # zero or more characters.  
                     $realPattern .= ".*"  
                 } elsif ($piece eq "_") {  
                     # Here we have a single-character wildcard.  
                     $realPattern .= ".";  
                 } elsif ($piece eq "\\%" || $piece eq "\\_") {  
                     # This is an escape sequence (which is a rare thing, actually).  
                     $realPattern .= substr($piece, 1, 1);  
                 } else {  
                     # Here we have raw text.  
                     $realPattern .= quotemeta($piece);  
                 }  
834              }              }
835              # Do the match.              # Write it to the file.
836              $retVal = ($value =~ /^$realPattern$/i ? 1 : 0);              Tracer::PutLine($fh, [$id, $key, Escape($value)]);
837          }          }
838            Trace("$valuesFound values backed up for key $key.") if T(3);
839            $retVal->Add(values => $valuesFound);
840      }      }
841        # Log the operation.
842        $self->LogOperation("Backup Data", $fileName, $retVal->Display());
843      # Return the result.      # Return the result.
844      return $retVal;      return $retVal;
845  }  }
846    
 =head3 MigrateAttributes  
847    
848  C<< CustomAttributes::MigrateAttributes($fig); >>  =head3 GetGroups
849    
850        my @groups = $attrDB->GetGroups();
851    
852    Return a list of the available groups.
853    
854    =cut
855    
856    sub GetGroups {
857        # Get the parameters.
858        my ($self) = @_;
859        # Get the groups.
860        my @retVal = $self->GetFlat(['AttributeGroup'], "", [], 'AttributeGroup(id)');
861        # Return them.
862        return @retVal;
863    }
864    
865    =head3 GetAttributeData
866    
867        my %keys = $attrDB->GetAttributeData($type, @list);
868    
869  Migrate all the attributes data from the specified FIG instance. This is a long, slow  Return attribute data for the selected attributes. The attribute
870  method used to convert the old attribute data to the new system. Only attribute  data is a hash mapping each attribute key name to a n-tuple containing the
871  keys that are not already in the database will be loaded, and only for entity instances  data type, the description, the table name, and the groups.
 current in the database. To get an accurate capture of the attributes in the given  
 instance, you may want to clear the database and the DBD before starting and  
 run L</Refresh> to populate the entities.  
872    
873  =over 4  =over 4
874    
875  =item fig  =item type
876    
877    Type of attribute criterion: C<name> for attributes whose names begin with the
878    specified string, or C<group> for attributes in the specified group.
879    
880    =item list
881    
882    List containing the names of the groups or keys for the desired attributes.
883    
884    =item RETURN
885    
886  A FIG object that can be used to retrieve attributes for migration purposes.  Returns a hash mapping each attribute key name to its description,
887    table name, and parent groups.
888    
889  =back  =back
890    
891  =cut  =cut
892    
893  sub MigrateAttributes {  sub GetAttributeData {
894      # Get the parameters.      # Get the parameters.
895      my ($fig) = @_;      my ($self, $type, @list) = @_;
896      # Get a list of the objects to migrate. This requires connecting. Note we      # Set up a hash to store the attribute data.
897      # will map each entity type to a file name. The file will contain a list      my %retVal = ();
898      # of the object's IDs so we can get to them when we're not connected to      # Loop through the list items.
899      # the database.      for my $item (@list) {
900      my $ca = CustomAttributes->new();          # Set up a query for the desired attributes.
901      my %objects = map { $_ => "$FIG_Config::temp/$_.keys.tbl" } $ca->GetEntityTypes();          my $query;
902      # Set up hash of the existing attribute keys for each entity type.          if ($type eq 'name') {
903      my %oldKeys = ();              # Here we're doing a generic name search. We need to escape it and then tack
904      # Finally, we have a hash that counts the IDs for each entity type.              # on a %.
905      my %idCounts = map { $_ => 0 } keys %objects;              my $parm = $item;
906      # Loop through the list, creating key files to read back in.              $parm =~ s/_/\\_/g;
907      for my $entityType (keys %objects) {              $parm =~ s/%/\\%/g;
908          Trace("Retrieving keys for $entityType.") if T(2);              $parm .= "%";
909          # Create the key file.              # Ask for matching attributes. (Note that if the user passed in a null string
910          my $idFile = Open(undef, ">$objects{$entityType}");              # he'll get everything.)
911          # Loop through the keys.              $query = $self->Get(['AttributeKey'], "AttributeKey(id) LIKE ?", [$parm]);
912          my @ids = $ca->GetFlat([$entityType], "", [], "$entityType(id)");          } elsif ($type eq 'group') {
913          for my $id (@ids) {              $query = $self->Get(['IsInGroup', 'AttributeKey'], "IsInGroup(to-link) = ?", [$item]);
914              print $idFile "$id\n";          } else {
915          }              Confess("Unknown attribute query type \"$type\".");
916          close $idFile;          }
917          # In addition to the key file, we must get a list of attributes already          while (my $row = $query->Fetch()) {
918          # in the database. This avoids a circularity problem that might occur if the $fig              # Get this attribute's data.
919          # object is retrieving from the custom attributes database already.              my ($key, $relName, $notes) = $row->Values(['AttributeKey(id)',
920          my %fields = $ca->GetSecondaryFields($entityType);                                                       'AttributeKey(relationship-name)',
921          $oldKeys{$entityType} = \%fields;                                                       'AttributeKey(description)']);
922          # Finally, we have the ID count.              # If it's new, get its groups and add it to the return hash.
923          $idCounts{$entityType} = scalar @ids;              if (! exists $retVal{$key}) {
924      }                  my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(from-link) = ?",
925      # Release the custom attributes database so we can add attributes.                                              [$key], 'IsInGroup(to-link)');
926      undef $ca;                  $retVal{$key} = [$relName, $notes, @groups];
927      # Loop through the objects.              }
     for my $entityType (keys %objects) {  
         # Get a hash of all the attributes already in this database. These are  
         # left untouched.  
         my $myOldKeys = $oldKeys{$entityType};  
         # Create a hash to control the load file names for each attribute key we find.  
         my %keyHash = ();  
         # Set up some counters so we can trace our progress.  
         my ($totalIDs, $processedIDs, $keyCount, $valueCount) = ($idCounts{$entityType}, 0, 0, 0);  
         # Open this object's ID file.  
         Trace("Migrating data for $entityType. $totalIDs found.") if T(3);  
         my $keysIn = Open(undef, "<$objects{$entityType}");  
         while (my $id = <$keysIn>) {  
             # Remove the EOL characters.  
             chomp $id;  
             # Get this object's attributes.  
             my @allData = $fig->get_attributes($id);  
             Trace(scalar(@allData) . " attribute values found for $entityType($id).") if T(4);  
             # Loop through the attribute values one at a time.  
             for my $dataTuple (@allData) {  
                 # Get the key, value, and URL. We ignore the first element because that's the  
                 # object ID, and we already know the object ID.  
                 my (undef, $key, $value, $url) = @{$dataTuple};  
                 # Remove the buggy "1" for $url.  
                 if ($url eq "1") {  
                     $url = undef;  
                 }  
                 # Only proceed if this is not an old key.  
                 if (! $myOldKeys->{$key}) {  
                     # See if we've run into this key before.  
                     if (! exists $keyHash{$key}) {  
                         # Here we need to create the attribute key in the database.  
                         StoreAttributeKey($entityType, $key, 'text',  
                                           "Key migrated automatically from the FIG system. " .  
                                           "Please replace these notes as soon as possible " .  
                                           "with useful text."  
                                          );  
                         # Compute the attribute's load file name and open it for output.  
                         my $fileName = "$FIG_Config::temp/$entityType.$key.load.tbl";  
                         my $fh = Open(undef, ">$fileName");  
                         # Store the file name and handle.  
                         $keyHash{$key} = {h => $fh, name => $fileName};  
                         # Count this key.  
                         $keyCount++;  
                     }  
                     # Smash the value and the URL together.  
                     if (defined($url) && length($url) > 0) {  
                         $value .= "::$url";  
                     }  
                     # Write the attribute value to the load file.  
                     Tracer::PutLine($keyHash{$key}->{h}, [$id, $value]);  
                     $valueCount++;  
                 }  
             }  
             # Now we've finished all the attributes for this object. Count and trace it.  
             $processedIDs++;  
             if ($processedIDs % 500 == 0) {  
                 Trace("$processedIDs of $totalIDs ${entityType}s processed.") if T(3);  
                 Trace("$entityType has $keyCount keys and $valueCount values so far.") if T(3);  
             }  
         }  
         # Now we've finished all the attributes for all objects of this type.  
         Trace("$processedIDs ${entityType}s processed, with $keyCount keys and $valueCount values.") if T(2);  
         # Loop through the files, loading the keys into the database.  
         Trace("Connecting to database.") if T(2);  
         my $objectCA = CustomAttributes->new();  
         Trace("Loading key files.") if T(2);  
         for my $key (sort keys %keyHash) {  
             # Close the key's load file.  
             close $keyHash{$key}->{h};  
             # Reopen it for input.  
             my $fileName = $keyHash{$key}->{name};  
             my $fh = Open(undef, "<$fileName");  
             Trace("Loading $key from $fileName.") if T(3);  
             my $stats = $objectCA->LoadAttributeKey($entityType, $key, $fh, 0, 1);  
             Trace("Statistics for $key of $entityType:\n" . $stats->Show()) if T(3);  
928          }          }
         # All the keys for this entity type are now loaded.  
         Trace("Key files loaded for $entityType.") if T(2);  
929      }      }
930      # All keys for all entity types are now loaded.      # Return the result.
931      Trace("Migration complete.") if T(2);      return %retVal;
932  }  }
933    
934  =head3 ComputeObjectTypeFromID  =head3 LogOperation
935    
936  C<< my ($entityName, $id) = CustomAttributes::ComputeObjectTypeFromID($objectID); >>      $ca->LogOperation($action, $target, $description);
937    
938  This method will compute the entity type corresponding to a specified object ID.  Write an operation description to the attribute activity log (C<$FIG_Config::var/attributes.log>).
 If the object ID begins with C<fig|>, it is presumed to be a feature ID. If it  
 is all digits with a single period, it is presumed to by a genome ID. Otherwise,  
 it must be a list reference. In this last case the first list element will be  
 taken as the entity type and the second will be taken as the actual ID.  
939    
940  =over 4  =over 4
941    
942  =item objectID  =item action
943    
944  Object ID to examine.  Action being logged (e.g. C<Delete Group> or C<Load Key>).
945    
946  =item RETURN  =item target
947    
948    ID of the key or group affected.
949    
950  Returns a 2-element list consisting of the entity type followed by the specified ID.  =item description
951    
952    Short description of the action.
953    
954  =back  =back
955    
956  =cut  =cut
957    
958  sub ComputeObjectTypeFromID {  sub LogOperation {
959      # Get the parameters.      # Get the parameters.
960      my ($objectID) = @_;      my ($self, $action, $target, $description) = @_;
961      # Declare the return variables.      # Get the user ID.
962      my ($entityName, $id);      my $user = $self->{user};
963      # Only proceed if the object ID is defined. If it's not, we'll be returning a      # Get a timestamp.
964      # pair of undefs.      my $timeString = Tracer::Now();
965      if ($objectID) {      # Open the log file for appending.
966          if (ref $objectID eq 'ARRAY') {      my $oh = Open(undef, ">>$FIG_Config::var/attributes.log");
967              # Here we have the new-style list reference. Pull out its pieces.      # Write the data to it.
968              ($entityName, $id) = @{$objectID};      Tracer::PutLine($oh, [$timeString, $user, $action, $target, $description]);
969          } else {      # Close the log file.
970              # Here the ID is the outgoing ID, and we need to look at its structure      close $oh;
             # to determine the entity type.  
             $id = $objectID;  
             if ($objectID =~ /^\d+\.\d+/) {  
                 # Digits with a single period is a genome.  
                 $entityName = 'Genome';  
             } elsif ($objectID =~ /^fig\|/) {  
                 # The "fig|" prefix indicates a feature.  
                 $entityName = 'Feature';  
             } else {  
                 # Anything else is illegal!  
                 Confess("Invalid attribute ID specification \"$objectID\".");  
             }  
         }  
     }  
     # Return the result.  
     return ($entityName, $id);  
971  }  }
972    
973  =head2 FIG Method Replacements  =head2 FIG Method Replacements
974    
975  The following methods are used by B<FIG.pm> to replace the previous attribute functionality.  The following methods are used by B<FIG.pm> to replace the previous attribute functionality.
976  Some of the old functionality is no longer present. Controlled vocabulary is no longer  Some of the old functionality is no longer present: controlled vocabulary is no longer
977  supported and there is no longer any searching by URL. Fortunately, neither of these  supported and there is no longer any searching by URL. Fortunately, neither of these
978  capabilities were used in the old system.  capabilities were used in the old system.
979    
# Line 993  Line 981 
981  The idea is that these methods represent attribute manipulation allowed by all users, while  The idea is that these methods represent attribute manipulation allowed by all users, while
982  the others are only for privileged users with access to the attribute server.  the others are only for privileged users with access to the attribute server.
983    
984  In the previous implementation, an attribute had a value and a URL. In the new implementation,  In the previous implementation, an attribute had a value and a URL. In this implementation,
985  there is only a value. In this implementation, each attribute has only a value. These  each attribute has only a value. These methods will treat the value as a list with the individual
986  methods will treat the value as a list with the individual elements separated by the  elements separated by the value of the splitter parameter on the constructor (L</new>). The default
987  value of the splitter parameter on the constructor (L</new>). The default is double  is double colons C<::>.
 colons C<::>.  
988    
989  So, for example, an old-style keyword with a /value of C<essential> and a URL of  So, for example, an old-style keyword with a value of C<essential> and a URL of
990  C<http://www.sciencemag.org/cgi/content/abstract/293/5538/2266> using the default  C<http://www.sciencemag.org/cgi/content/abstract/293/5538/2266> using the default
991  splitter value would be stored as  splitter value would be stored as
992    
# Line 1010  Line 997 
997    
998  =head3 GetAttributes  =head3 GetAttributes
999    
1000  C<< my @attributeList = $attrDB->GetAttributes($objectID, $key, @valuePatterns); >>      my @attributeList = $attrDB->GetAttributes($objectID, $key, @values);
1001    
1002  In the database, attribute values are sectioned into pieces using a splitter  In the database, attribute values are sectioned into pieces using a splitter
1003  value specified in the constructor (L</new>). This is not a requirement of  value specified in the constructor (L</new>). This is not a requirement of
1004  the attribute system as a whole, merely a convenience for the purpose of  the attribute system as a whole, merely a convenience for the purpose of
1005  these methods. If you are using the static method calls instead of the  these methods. If a value has multiple sections, each section
1006  object-based calls, the splitter will always be the default value of  is matched against the corresponding criterion in the I<@valuePatterns> list.
 double colons (C<::>). If a value has multiple sections, each section  
 is matched against the correspond criterion in the I<@valuePatterns> list.  
1007    
1008  This method returns a series of tuples that match the specified criteria. Each tuple  This method returns a series of tuples that match the specified criteria. Each tuple
1009  will contain an object ID, a key, and one or more values. The parameters to this  will contain an object ID, a key, and one or more values. The parameters to this
1010  method therefore correspond structurally to the values expected in each tuple.  method therefore correspond structurally to the values expected in each tuple. In
1011    addition, you can ask for a generic search by suffixing a percent sign (C<%>) to any
1012    of the parameters. So, for example,
1013    
1014      my @attributeList = GetAttributes('fig|100226.1.peg.1004', 'structure%', 1, 2);      my @attributeList = $attrDB->GetAttributes('fig|100226.1.peg.1004', 'structure%', 1, 2);
1015    
1016  would return something like  would return something like
1017    
# Line 1033  Line 1020 
1020      ['fig}100226.1.peg.1004', 'structure2', 1, 2]      ['fig}100226.1.peg.1004', 'structure2', 1, 2]
1021      ['fig}100226.1.peg.1004', 'structureA', 1, 2]      ['fig}100226.1.peg.1004', 'structureA', 1, 2]
1022    
1023  Use of C<undef> in any position acts as a wild card (all values). In addition,  Use of C<undef> in any position acts as a wild card (all values). You can also specify
1024  the I<$key> and I<@valuePatterns> parameters can contain SQL pattern characters: C<%>, which  a list reference in the ID column. Thus,
1025  matches any sequence of characters, and C<_>, which matches any single character.  
1026  (You can use an escape sequence C<\%> or C<\_> to match an actual percent sign or      my @attributeList = $attrDB->GetAttributes(['100226.1', 'fig|100226.1.%'], 'PUBMED');
1027  underscore.)  
1028    would get the PUBMED attribute data for Streptomyces coelicolor A3(2) and all its
1029    features.
1030    
1031  In addition to values in multiple sections, a single attribute key can have multiple  In addition to values in multiple sections, a single attribute key can have multiple
1032  values, so even  values, so even
1033    
1034      my @attributeList = GetAttributes($peg, 'virulent');      my @attributeList = $attrDB->GetAttributes($peg, 'virulent');
1035    
1036  which has no wildcard in the key or the object ID, may return multiple tuples.  which has no wildcard in the key or the object ID, may return multiple tuples.
1037    
1038  For reasons of backward compatability, we examine the structure of the object ID to  Value matching in this system works very poorly, because of the way multiple values are
1039  determine the entity type. In that case the only two types allowed are C<Genome> and  stored. For the object ID, key name, and first value, we create queries that filter for the
1040  C<Feature>. An alternative method is to use a list reference, with the list consisting  desired results. On any filtering by value, we must do a comparison after the attributes are
1041  of an entity type name and the actual ID. Thus, the above example could equivalently  retrieved from the database, since the database has no notion of the multiple values, which
1042  be written as  are stored in a single string. As a result, queries in which filter only on value end up
1043    reading a lot more than they need to.
     my @attributeList = GetAttributes([Feature => $peg], 'virulent');  
   
 The list-reference approach allows us to add attributes to other entity types in  
 the future. Doing so, however, will require modifying the L</Refresh> method and  
 updated the database design XML.  
   
 The list-reference approach also allows for a more fault-tolerant approach to  
 getting all objects with a particular attribute.  
   
     my @attributeList = GetAttributes([Feature => undef], 'virulent');  
   
 will only return feature attributes, while  
   
     my @attributeList = GetAttributes(undef, 'virulent');  
   
 could at some point in the future get you attributes for genomes or even subsystems  
 as well as features.  
1044    
1045  =over 4  =over 4
1046    
1047  =item objectID  =item objectID
1048    
1049  ID of the genome or feature whose attributes are desired. In general, an ID that  ID of object whose attributes are desired. If the attributes are desired for multiple
1050  starts with C<fig|> is treated as a feature ID, and an ID that is all digits with a  objects, this parameter can be specified as a list reference. If the attributes are
1051  single period is treated as a genome ID. For other entity types, use a list reference; in  desired for all objects, specify C<undef> or an empty string. Finally, you can specify
1052  this case the first list element is the entity type and the second is the ID. A value of  attributes for a range of object IDs by putting a percent sign (C<%>) at the end.
 C<undef> or an empty string here will match all objects.  
1053    
1054  =item key  =item key
1055    
1056  Attribute key name. Since attributes are stored as fields in the database with a  Attribute key name. A value of C<undef> or an empty string will match all
1057  field name equal to the key name, it is very fast to find a list of all the  attribute keys. If the values are desired for multiple keys, this parameter can be
1058  matching keys. Each key's values require a separate query, however, which may  specified as a list reference. Finally, you can specify attributes for a range of
1059  be a performance problem if the pattern matches a lot of keys. Wild cards are  keys by putting a percent sign (C<%>) at the end.
 acceptable here, and a value of C<undef> or an empty string will match all  
 attribute keys.  
1060    
1061  =item valuePatterns  =item values
1062    
1063  List of the desired attribute values, section by section. If C<undef>  List of the desired attribute values, section by section. If C<undef>
1064  or an empty string is specified, all values in that section will match.  or an empty string is specified, all values in that section will match. A
1065    generic match can be requested by placing a percent sign (C<%>) at the end.
1066    In that case, all values that match up to and not including the percent sign
1067    will match. You may also specify a regular expression enclosed
1068    in slashes. All values that match the regular expression will be returned. For
1069    performance reasons, only values have this extra capability.
1070    
1071  =item RETURN  =item RETURN
1072    
# Line 1107  Line 1081 
1081    
1082  sub GetAttributes {  sub GetAttributes {
1083      # Get the parameters.      # Get the parameters.
1084      my ($self, $objectID, $key, @valuePatterns) = @_;      my ($self, $objectID, $key, @values) = @_;
1085      # Declare the return variable.      # Declare the return variable.
1086      my @retVal = ();      my @retVal = ();
1087      # Determine the entity types for our search.      # Insure we have at least some sort of filtering going on.
1088      my @objects = ();      if (! grep { defined $_ } $objectID, $key, @values) {
1089      my ($actualObjectID, $computedType);          Confess("No filters specified in GetAttributes call.");
1090      if (! $objectID) {      } else {
1091          push @objects, $self->GetEntityTypes();          # This hash will map value-table fields to patterns. We use it to build the
1092      } else {          # SQL statement.
1093          ($computedType, $actualObjectID) = ComputeObjectTypeFromID($objectID);          my %data;
1094          push @objects, $computedType;          # Add the object ID to the key information.
1095      }          $data{'to-link'} = $objectID;
1096      # Loop through the entity types.          # The first value represents a problem, because we can search it using SQL, but not
1097      for my $entityType (@objects) {          # in the normal way. If the user specifies a generic search or exact match for
1098          # Now we need to find all the matching keys. The keys are actually stored in          # every alternative value (remember, the values may be specified as a list),
1099          # our database object, so this process is fast. Note that our          # then we can create SQL filtering for it. If any of the values are specified
1100          # MatchSqlPattern method          # as a regular expression, however, that's more complicated, because
1101          my %secondaries = $self->GetSecondaryFields($entityType);          # we need to read every value to verify a match.
1102          my @fieldList = grep { MatchSqlPattern($_, $key) } keys %secondaries;          if (@values > 0 && defined $values[0]) {
1103          # Now we figure out whether or not we need to filter by object. We will always              # Get the first value and put its alternatives in an array.
1104          # filter by key to a limited extent, so if we're filtering by object we need an              my $valueParm = $values[0];
1105          # AND to join the object ID filter with the key filter.              my @valueList;
1106          my $filter = "";              if (ref $valueParm eq 'ARRAY') {
1107          my @params = ();                  @valueList = @{$valueParm};
1108          if (defined($actualObjectID)) {              } else {
1109              # Here the caller wants to filter on object ID. Check for a pattern.                  @valueList = ($valueParm);
1110              my $comparator = ($actualObjectID =~ /%/ ? "LIKE" : "=");              }
1111              # Update the filter and the parameter list.              # Okay, now we have all the possible criteria for the first value in the list
1112              $filter = "$entityType(id) $comparator ? AND ";              # @valueList. We'll copy the values to a new array in which they have been
1113              push @params, $actualObjectID;              # converted to generic requests. If we find a regular-expression match
1114          }              # anywhere in the list, we toss the whole thing.
1115          # It's time to begin making queries. We process one attribute key at a time, because              my @valuePatterns = ();
1116          # each attribute is actually a different field in the database. We know here that              my $okValues = 1;
1117          # all the keys we've collected are for the correct entity because we got them from              for my $valuePattern (@valueList) {
1118          # the DBD. That's a good thing, because an invalid key name will cause an SQL error.                  # Check the pattern type.
1119          for my $key (@fieldList) {                  if (substr($valuePattern, 0, 1) eq '/') {
1120              # Get all of the attribute values for this key.                      # Regular expressions invalidate the entire process.
1121              my @dataRows = $self->GetAll([$entityType], "$filter$entityType($key) IS NOT NULL",                      $okValues = 0;
1122                                           \@params, ["$entityType(id)", "$entityType($key)"]);                  } elsif (substr($valuePattern, -1, 1) eq '%') {
1123              # Process each value separately. We need to verify the values and reformat the                      # A Generic pattern is passed in unmodified.
1124              # tuples. Note that GetAll will give us one row per matching object ID,                      push @valuePatterns, $valuePattern;
1125              # with the ID first followed by a list of the data values. This is very                  } else {
1126              # different from the structure we'll be returning, which has one row                      # An exact match is converted to generic.
1127              # per value.                      push @valuePatterns, "$valuePattern%";
1128              for my $dataRow (@dataRows) {                  }
1129                  # Get the object ID and the list of values.              }
1130                  my ($rowObjectID, @dataValues) = @{$dataRow};              # If everything works, add the value data to the filtering hash.
1131                  # Loop through the values. There will be one result row per attribute value.              if ($okValues) {
1132                  for my $dataValue (@dataValues) {                  $data{value} = \@valuePatterns;
1133                      # Separate this value into sections.              }
1134                      my @sections = split("::", $dataValue);          }
1135                      # Loop through the value patterns, looking for a mismatch. Note that          # Now comes the really tricky part, which is key handling. The key is
1136                      # since we're working through parallel arrays, we are using an index          # actually split in two parts: the real key and a sub-key. The real key
1137                      # loop. As soon as a match fails we stop checking. This means that          # determines which value table contains the relevant values. The information
1138                      # if the value pattern list is longer than the number of sections,          # we need is kept in here.
1139                      # we will fail as soon as we run out of sections.          my %tables = map { $_ => [] } $self->_GetAllTables();
1140                      my $match = 1;          # See if we have any key filtering to worry about.
1141                      for (my $i = 0; $i <= $#valuePatterns && $match; $i++) {          if ($key) {
1142                          $match = MatchSqlPattern($sections[$i], $valuePatterns[$i]);              # Here we have either a single key or a list. We convert both cases to a list.
1143                      }              my $keyList = (ref $key ne 'ARRAY' ? [$key] : $key);
1144                      # If we match, we save this value in the output list.              Trace("Reading key table.") if T(3);
1145                      if ($match) {              # Get easy access to the key/table hash.
1146                          push @retVal, [$rowObjectID, $key, @sections];              my $keyTableHash = $self->_KeyTable();
1147                      }              # Loop through the keys, discovering tables.
1148                  }              for my $keyChoice (@$keyList) {
1149                  # Here we've processed all the attribute values for the current object ID.                  # Now we have to start thinking about the real key and the subkeys.
1150                    my ($realKey, $subKey) = $self->_SplitKeyPattern($keyChoice);
1151                    Trace("Checking $realKey against key table.") if T(3);
1152                    # Find the matches for the real key in the key hash. For each of
1153                    # these, we memorize the table name in the hash below.
1154                    my %tableNames = ();
1155                    for my $keyInTable (keys %{$keyTableHash}) {
1156                        if (_CheckSQLPattern($realKey, $keyInTable)) {
1157                            $tableNames{$keyTableHash->{$key}} = 1;
1158                        }
1159                    }
1160                    # If the key is generic, or didn't match anything, add
1161                    # the default table to the mix.
1162                    if (keys %tableNames == 0 || $keyChoice =~ /%/) {
1163                        $tableNames{$self->{defaultRel}} = 1;
1164                    }
1165                    # Now we add this key combination to the key list for each relevant table.
1166                    for my $tableName (keys %tableNames) {
1167                        push @{$tables{$tableName}}, [$realKey, $subKey];
1168                    }
1169                }
1170            }
1171            # Now we loop through the tables of interest, performing queries.
1172            # Loop through the tables.
1173            for my $table (keys %tables) {
1174                # Get the key pairs for this table.
1175                my $pairs = $tables{$table};
1176                # Does this table have data? It does if there is no key specified or
1177                # it has at least one key pair.
1178                my $pairCount = scalar @{$pairs};
1179                Trace("Pair count for table $table is $pairCount.") if T(3);
1180                if ($pairCount || ! $key) {
1181                    # Create some lists to contain the filter fragments and parameter values.
1182                    my @filter = ();
1183                    my @parms = ();
1184                    # This next loop goes through the different fields that can be specified in the
1185                    # parameter list and generates filters for each. The %data hash that we built above
1186                    # contains most of the necessary information to do this. When we're done, we'll
1187                    # paste on stuff for the key pairs.
1188                    for my $field (keys %data) {
1189                        # Accumulate filter information for this field. We will OR together all the
1190                        # elements accumulated to create the final result.
1191                        my @fieldFilter = ();
1192                        # Get the specified filter for this field.
1193                        my $fieldPattern = $data{$field};
1194                        # Only proceed if the pattern is one that won't match everything.
1195                        if (defined($fieldPattern) && $fieldPattern ne "" && $fieldPattern ne "%") {
1196                            # Convert the pattern to an array.
1197                            my @patterns = ();
1198                            if (ref $fieldPattern eq 'ARRAY') {
1199                                push @patterns, @{$fieldPattern};
1200                            } else {
1201                                push @patterns, $fieldPattern;
1202                            }
1203                            # Only proceed if the array is nonempty. The loop will work fine if the
1204                            # array is empty, but when we build the filter string at the end we'll
1205                            # get "()" in the filter list, which will result in an SQL syntax error.
1206                            if (@patterns) {
1207                                # Loop through the individual patterns.
1208                                for my $pattern (@patterns) {
1209                                    my ($clause, $value) = _WherePart($table, $field, $pattern);
1210                                    push @fieldFilter, $clause;
1211                                    push @parms, $value;
1212                                }
1213                                # Form the filter for this field.
1214                                my $fieldFilterString = join(" OR ", @fieldFilter);
1215                                push @filter, "($fieldFilterString)";
1216                            }
1217                        }
1218              }              }
1219              # Here we've processed all the rows returned by GetAll. In general, there will                  # The final filter is for the key pairs. Only proceed if we have some.
1220              # be one row per object ID.                  if ($pairCount) {
1221                        # We'll accumulate pair filter clauses in here.
1222                        my @pairFilters = ();
1223                        # Loop through the key pairs.
1224                        for my $pair (@$pairs) {
1225                            my ($realKey, $subKey) = @{$pair};
1226                            my ($realClause, $realValue) = _WherePart($table, 'from-link', $realKey);
1227                            if (! $subKey) {
1228                                # Here the subkey is wild, so only the real key matters.
1229                                push @pairFilters, $realClause;
1230                                push @parms, $realValue;
1231                            } else {
1232                                # Here we have to select on both keys.
1233                                my ($subClause, $subValue) = _WherePart($table, 'subkey', $subKey);
1234                                push @pairFilters, "($realClause AND $subClause)";
1235                                push @parms, $realValue, $subValue;
1236                            }
1237                        }
1238                        # Join the pair filters together to make a giant key filter.
1239                        my $pairFilter = "(" . join(" OR ", @pairFilters) . ")";
1240                        push @filter, $pairFilter;
1241                    }
1242                    # At this point, @filter contains one or more filter strings and @parms
1243                    # contains the parameter values to bind to them.
1244                    my $actualFilter = join(" AND ", @filter);
1245                    # Now we're ready to make our query.
1246                    my $query = $self->Get([$table], $actualFilter, \@parms);
1247                    # Format the results.
1248                    push @retVal, $self->_QueryResults($query, $table, @values);
1249          }          }
         # Here we've processed all the matching attribute keys.  
1250      }      }
1251      # Here we've processed all the entity types. That means @retVal has all the matching      }
1252      # results.      # The above loop ran the query for each necessary value table and merged the
1253        # results into @retVal. Now we return the rows found.
1254      return @retVal;      return @retVal;
1255  }  }
1256    
1257  =head3 AddAttribute  =head3 AddAttribute
1258    
1259  C<< $attrDB->AddAttribute($objectID, $key, @values); >>      $attrDB->AddAttribute($objectID, $key, @values);
1260    
1261  Add an attribute key/value pair to an object. This method cannot add a new key, merely  Add an attribute key/value pair to an object. This method cannot add a new key, merely
1262  add a value to an existing key. Use L</StoreAttributeKey> to create a new key.  add a value to an existing key. Use L</StoreAttributeKey> to create a new key.
# Line 1195  Line 1265 
1265    
1266  =item objectID  =item objectID
1267    
1268  ID of the genome or feature to which the attribute is to be added. In general, an ID that  ID of the object to which the attribute is to be added.
 starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods  
 is treated as a genome ID. For IDs of other types, this parameter should be a reference  
 to a 2-tuple consisting of the entity type name followed by the object ID.  
1269    
1270  =item key  =item key
1271    
1272  Attribute key name. This corresponds to the name of a field in the database.  Attribute key name.
1273    
1274  =item values  =item values
1275    
# Line 1225  Line 1292 
1292      } elsif (! @values) {      } elsif (! @values) {
1293          Confess("No values specified in AddAttribute call for key $key.");          Confess("No values specified in AddAttribute call for key $key.");
1294      } else {      } else {
1295          # Okay, now we have some reason to believe we can do this. Start by          # Okay, now we have some reason to believe we can do this. Form the values
1296          # computing the object type and ID.          # into a scalar.
         my ($entityName, $id) = ComputeObjectTypeFromID($objectID);  
         # Form the values into a scalar.  
1297          my $valueString = join($self->{splitter}, @values);          my $valueString = join($self->{splitter}, @values);
1298          # Insert the value.          # Split up the key.
1299          $self->InsertValue($id, "$entityName($key)", $valueString);          my ($realKey, $subKey) = $self->SplitKey($key);
1300            # Find the table containing the key.
1301            my $table = $self->_KeyTable($realKey);
1302            # Connect the object to the key.
1303            $self->InsertObject($table, { 'from-link' => $realKey,
1304                                                 'to-link'   => $objectID,
1305                                                 'subkey'    => $subKey,
1306                                                 'value'     => $valueString,
1307                                           });
1308      }      }
1309      # Return a one. We do this for backward compatability.      # Return a one, indicating success. We do this for backward compatability.
1310      return 1;      return 1;
1311  }  }
1312    
1313  =head3 DeleteAttribute  =head3 DeleteAttribute
1314    
1315  C<< $attrDB->DeleteAttribute($objectID, $key, @values); >>      $attrDB->DeleteAttribute($objectID, $key, @values);
1316    
1317  Delete the specified attribute key/value combination from the database.  Delete the specified attribute key/value combination from the database.
1318    
 The first form will connect to the database and release it. The second form  
 uses the database connection contained in the object.  
   
1319  =over 4  =over 4
1320    
1321  =item objectID  =item objectID
1322    
1323  ID of the genome or feature to which the attribute is to be added. In general, an ID that  ID of the object whose attribute is to be deleted.
 starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods  
 is treated as a genome ID. For IDs of other types, this parameter should be a reference  
 to a 2-tuple consisting of the entity type name followed by the object ID.  
1324    
1325  =item key  =item key
1326    
1327  Attribute key name. This corresponds to the name of a field in the database.  Attribute key name.
1328    
1329  =item values  =item values
1330    
1331  One or more values to be associated with the key.  One or more values associated with the key. If no values are specified, then all values
1332    will be deleted. Otherwise, only a matching value will be deleted.
1333    
1334  =back  =back
1335    
# Line 1275  Line 1343 
1343          Confess("No object ID specified for DeleteAttribute call.");          Confess("No object ID specified for DeleteAttribute call.");
1344      } elsif (! defined($key)) {      } elsif (! defined($key)) {
1345          Confess("No attribute key specified for DeleteAttribute call.");          Confess("No attribute key specified for DeleteAttribute call.");
     } elsif (! @values) {  
         Confess("No values specified in DeleteAttribute call for key $key.");  
1346      } else {      } else {
1347          # Now compute the object type and ID.          # Split the key into the real key and the subkey.
1348          my ($entityName, $id) = ComputeObjectTypeFromID($objectID);          my ($realKey, $subKey) = $self->SplitKey($key);
1349          # Form the values into a scalar.          # Find the table containing the key's values.
1350            my $table = $self->_KeyTable($realKey);
1351            if ($subKey eq '' && scalar(@values) == 0) {
1352                # Here we erase the entire key for this object.
1353                $self->DeleteRow('HasValueFor', $key, $objectID);
1354            } else {
1355                # Here we erase the matching values.
1356          my $valueString = join($self->{splitter}, @values);          my $valueString = join($self->{splitter}, @values);
1357          # Delete the value.              $self->DeleteRow('HasValueFor', $realKey, $objectID,
1358          $self->DeleteValue($entityName, $id, $key, $valueString);                               { subkey => $subKey, value => $valueString });
1359            }
1360      }      }
1361      # Return a one. This is for backward compatability.      # Return a one. This is for backward compatability.
1362      return 1;      return 1;
1363  }  }
1364    
1365    =head3 DeleteMatchingAttributes
1366    
1367        my @deleted = $attrDB->DeleteMatchingAttributes($objectID, $key, @values);
1368    
1369    Delete all attributes that match the specified criteria. This is equivalent to
1370    calling L</GetAttributes> and then invoking L</DeleteAttribute> for each
1371    row found.
1372    
1373    =over 4
1374    
1375    =item objectID
1376    
1377    ID of object whose attributes are to be deleted. If the attributes for multiple
1378    objects are to be deleted, this parameter can be specified as a list reference. If
1379    attributes are to be deleted for all objects, specify C<undef> or an empty string.
1380    Finally, you can delete attributes for a range of object IDs by putting a percent
1381    sign (C<%>) at the end.
1382    
1383    =item key
1384    
1385    Attribute key name. A value of C<undef> or an empty string will match all
1386    attribute keys. If the values are to be deletedfor multiple keys, this parameter can be
1387    specified as a list reference. Finally, you can delete attributes for a range of
1388    keys by putting a percent sign (C<%>) at the end.
1389    
1390    =item values
1391    
1392    List of the desired attribute values, section by section. If C<undef>
1393    or an empty string is specified, all values in that section will match. A
1394    generic match can be requested by placing a percent sign (C<%>) at the end.
1395    In that case, all values that match up to and not including the percent sign
1396    will match. You may also specify a regular expression enclosed
1397    in slashes. All values that match the regular expression will be deleted. For
1398    performance reasons, only values have this extra capability.
1399    
1400    =item RETURN
1401    
1402    Returns a list of tuples for the attributes that were deleted, in the
1403    same form as L</GetAttributes>.
1404    
1405    =back
1406    
1407    =cut
1408    
1409    sub DeleteMatchingAttributes {
1410        # Get the parameters.
1411        my ($self, $objectID, $key, @values) = @_;
1412        # Get the matching attributes.
1413        my @retVal = $self->GetAttributes($objectID, $key, @values);
1414        # Loop through the attributes, deleting them.
1415        for my $tuple (@retVal) {
1416            $self->DeleteAttribute(@{$tuple});
1417        }
1418        # Log this operation.
1419        my $count = @retVal;
1420        $self->LogOperation("Mass Delete", $key, "$count matching attributes deleted.");
1421        # Return the deleted attributes.
1422        return @retVal;
1423    }
1424    
1425  =head3 ChangeAttribute  =head3 ChangeAttribute
1426    
1427  C<< $attrDB->ChangeAttribute($objectID, $key, \@oldValues, \@newValues); >>      $attrDB->ChangeAttribute($objectID, $key, \@oldValues, \@newValues);
1428    
1429  Change the value of an attribute key/value pair for an object.  Change the value of an attribute key/value pair for an object.
1430    
# Line 1333  Line 1466 
1466      } elsif (! defined($newValues) || ref $newValues ne 'ARRAY') {      } elsif (! defined($newValues) || ref $newValues ne 'ARRAY') {
1467          Confess("No new values specified in ChangeAttribute call for key $key.");          Confess("No new values specified in ChangeAttribute call for key $key.");
1468      } else {      } else {
1469          # Okay, now we do the change as a delete/add.          # We do the change as a delete/add.
1470          $self->DeleteAttribute($objectID, $key, @{$oldValues});          $self->DeleteAttribute($objectID, $key, @{$oldValues});
1471          $self->AddAttribute($objectID, $key, @{$newValues});          $self->AddAttribute($objectID, $key, @{$newValues});
1472      }      }
# Line 1343  Line 1476 
1476    
1477  =head3 EraseAttribute  =head3 EraseAttribute
1478    
1479  C<< $attrDB->EraseAttribute($entityName, $key); >>      $attrDB->EraseAttribute($key);
1480    
1481  Erase all values for the specified attribute key. This does not remove the  Erase all values for the specified attribute key. This does not remove the
1482  key from the database; it merely removes all the values.  key from the database; it merely removes all the values.
1483    
1484  =over 4  =over 4
1485    
 =item entityName  
   
 Name of the entity to which the key belongs. If undefined, all entities will be  
 examined for the desired key.  
   
1486  =item key  =item key
1487    
1488  Key to erase.  Key to erase. This must be a real key; that is, it cannot have a subkey
1489    component.
1490    
1491  =back  =back
1492    
# Line 1365  Line 1494 
1494    
1495  sub EraseAttribute {  sub EraseAttribute {
1496      # Get the parameters.      # Get the parameters.
1497      my ($self, $entityName, $key) = @_;      my ($self, $key) = @_;
1498      # Determine the relevant entity types.      # Find the table containing the key.
1499      my @objects = ();      my $table = $self->_KeyTable($key);
1500      if (! $entityName) {      # Is it the default table?
1501          push @objects, $self->GetEntityTypes();      if ($table eq $self->{defaultRel}) {
1502      } else {          # Yes, so the key is mixed in with other keys.
1503          push @objects, $entityName;          # Delete everything connected to it.
1504      }          $self->Disconnect('HasValueFor', 'AttributeKey', $key);
1505      # Loop through the entity types.      } else {
1506      for my $entityType (@objects) {          # No. Drop and re-create the table.
1507          # Now check for this key in this entity.          $self->TruncateTable($table);
         my %secondaries = $self->GetSecondaryFields($entityType);  
         if (exists $secondaries{$key}) {  
             # We found it, so delete all the values of the key.  
             $self->DeleteValue($entityType, undef, $key);  
         }  
1508      }      }
1509        # Log the operation.
1510        $self->LogOperation("Erase Data", $key);
1511      # Return a 1, for backward compatability.      # Return a 1, for backward compatability.
1512      return 1;      return 1;
1513  }  }
1514    
1515  =head3 GetAttributeKeys  =head3 GetAttributeKeys
1516    
1517  C<< my @keyList = $attrDB->GetAttributeKeys($entityName); >>      my @keyList = $attrDB->GetAttributeKeys($groupName);
1518    
1519  Return a list of the attribute keys for a particular entity type.  Return a list of the attribute keys for a particular group.
1520    
1521  =over 4  =over 4
1522    
1523  =item entityName  =item groupName
1524    
1525  Name of the entity whose keys are desired.  Name of the group whose keys are desired.
1526    
1527  =item RETURN  =item RETURN
1528    
1529  Returns a list of the attribute keys for the specified entity.  Returns a list of the attribute keys for the specified group.
1530    
1531  =back  =back
1532    
# Line 1408  Line 1534 
1534    
1535  sub GetAttributeKeys {  sub GetAttributeKeys {
1536      # Get the parameters.      # Get the parameters.
1537      my ($self, $entityName) = @_;      my ($self, $groupName) = @_;
1538      # Get the entity's secondary fields.      # Get the attributes for the specified group.
1539      my %keyList = $self->GetSecondaryFields($entityName);      my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(to-link) = ?", [$groupName],
1540                                    'IsInGroup(from-link)');
1541      # Return the keys.      # Return the keys.
1542      return sort keys %keyList;      return sort @groups;
1543    }
1544    
1545    =head3 QueryAttributes
1546    
1547        my @attributeData = $ca->QueryAttributes($filter, $filterParms);
1548    
1549    Return the attribute data based on an SQL filter clause. In the filter clause,
1550    the name C<$object> should be used for the object ID, C<$key> should be used for
1551    the key name, C<$subkey> for the subkey value, and C<$value> for the value field.
1552    
1553    =over 4
1554    
1555    =item filter
1556    
1557    Filter clause in the standard ERDB format, except that the field names are C<$object> for
1558    the object ID field, C<$key> for the key name field, C<$subkey> for the subkey field,
1559    and C<$value> for the value field. This abstraction enables us to hide the details of
1560    the database construction from the user.
1561    
1562    =item filterParms
1563    
1564    Parameters for the filter clause.
1565    
1566    =item RETURN
1567    
1568    Returns a list of tuples. Each tuple consists of an object ID, a key (with optional subkey), and
1569    one or more attribute values.
1570    
1571    =back
1572    
1573    =cut
1574    
1575    # This hash is used to drive the substitution process.
1576    my %AttributeParms = (object => 'to-link',
1577                          key    => 'from-link',
1578                          subkey => 'subkey',
1579                          value  => 'value');
1580    
1581    sub QueryAttributes {
1582        # Get the parameters.
1583        my ($self, $filter, $filterParms) = @_;
1584        # Declare the return variable.
1585        my @retVal = ();
1586        # Make sue we have filter parameters.
1587        my $realParms = (defined($filterParms) ? $filterParms : []);
1588        # Loop through all the value tables.
1589        for my $table ($self->_GetAllTables()) {
1590            # Create the query for this table by converting the filter.
1591            my $realFilter = $filter;
1592            for my $name (keys %AttributeParms) {
1593                $realFilter =~ s/\$$name/$table($AttributeParms{$name})/g;
1594            }
1595            my $query = $self->Get([$table], $realFilter, $realParms);
1596            # Loop through the results, forming the output attribute tuples.
1597            while (my $result = $query->Fetch()) {
1598                # Get the four values from this query result row.
1599                my ($objectID, $key, $subkey, $value) = $result->Values(["$table($AttributeParms{object})",
1600                                                                        "$table($AttributeParms{key})",
1601                                                                        "$table($AttributeParms{subkey})",
1602                                                                        "$table($AttributeParms{value})"]);
1603                # Combine the key and the subkey.
1604                my $realKey = ($subkey ? $key . $self->{splitter} . $subkey : $key);
1605                # Split the value.
1606                my @values = split $self->{splitter}, $value;
1607                # Output the result.
1608                push @retVal, [$objectID, $realKey, @values];
1609            }
1610        }
1611        # Return the result.
1612        return @retVal;
1613    }
1614    
1615    =head2 Key and ID Manipulation Methods
1616    
1617    =head3 ParseID
1618    
1619        my ($type, $id) = CustomAttributes::ParseID($idValue);
1620    
1621    Determine the type and object ID corresponding to an ID value from the attribute database.
1622    Most ID values consist of a type name and an ID, separated by a colon (e.g. C<Family:aclame|cluster10>);
1623    however, Genomes, Features, and Subsystems are not stored with a type name, so we need to
1624    deduce the type from the ID value structure.
1625    
1626    The theory here is that you can plug the ID and type directly into a Sprout database method, as
1627    follows
1628    
1629        my ($type, $id) = CustomAttributes::ParseID($attrList[$num]->[0]);
1630        my $target = $sprout->GetEntity($type, $id);
1631    
1632    =over 4
1633    
1634    =item idValue
1635    
1636    ID value taken from the attribute database.
1637    
1638    =item RETURN
1639    
1640    Returns a two-element list. The first element is the type of object indicated by the ID value,
1641    and the second element is the actual object ID.
1642    
1643    =back
1644    
1645    =cut
1646    
1647    sub ParseID {
1648        # Get the parameters.
1649        my ($idValue) = @_;
1650        # Declare the return variables.
1651        my ($type, $id);
1652        # Parse the incoming ID. We first check for the presence of an entity name. Entity names
1653        # can only contain letters, which helps to insure typed object IDs don't collide with
1654        # subsystem names (which are untyped).
1655        if ($idValue =~ /^([A-Za-z]+):(.+)/) {
1656            # Here we have a typed ID.
1657            ($type, $id) = ($1, $2);
1658            # Fix the case sensitivity on PDB IDs.
1659            if ($type eq 'PDB') { $id = lc $id; }
1660        } elsif ($idValue =~ /fig\|/) {
1661            # Here we have a feature ID.
1662            ($type, $id) = (Feature => $idValue);
1663        } elsif ($idValue =~ /\d+\.\d+/) {
1664            # Here we have a genome ID.
1665            ($type, $id) = (Genome => $idValue);
1666        } else {
1667            # The default is a subsystem ID.
1668            ($type, $id) = (Subsystem => $idValue);
1669        }
1670        # Return the results.
1671        return ($type, $id);
1672    }
1673    
1674    =head3 FormID
1675    
1676        my $idValue = CustomAttributes::FormID($type, $id);
1677    
1678    Convert an object type and ID pair into an object ID string for the attribute system. Subsystems,
1679    genomes, and features are stored in the database without type information, but all other object IDs
1680    must be prefixed with the object type.
1681    
1682    =over 4
1683    
1684    =item type
1685    
1686    Relevant object type.
1687    
1688    =item id
1689    
1690    ID of the object in question.
1691    
1692    =item RETURN
1693    
1694    Returns a string that will be recognized as an object ID in the attribute database.
1695    
1696    =back
1697    
1698    =cut
1699    
1700    sub FormID {
1701        # Get the parameters.
1702        my ($type, $id) = @_;
1703        # Declare the return variable.
1704        my $retVal;
1705        # Compute the ID string from the type.
1706        if (grep { $type eq $_ } qw(Feature Genome Subsystem)) {
1707            $retVal = $id;
1708        } else {
1709            $retVal = "$type:$id";
1710        }
1711        # Return the result.
1712        return $retVal;
1713    }
1714    
1715    =head3 GetTargetObject
1716    
1717        my $object = CustomAttributes::GetTargetObject($erdb, $idValue);
1718    
1719    Return the database object corresponding to the specified attribute object ID. The
1720    object type associated with the ID value must correspond to an entity name in the
1721    specified database.
1722    
1723    =over 4
1724    
1725    =item erdb
1726    
1727    B<ERDB> object for accessing the target database.
1728    
1729    =item idValue
1730    
1731    ID value retrieved from the attribute database.
1732    
1733    =item RETURN
1734    
1735    Returns a B<ERDBObject> for the attribute value's target object.
1736    
1737    =back
1738    
1739    =cut
1740    
1741    sub GetTargetObject {
1742        # Get the parameters.
1743        my ($erdb, $idValue) = @_;
1744        # Declare the return variable.
1745        my $retVal;
1746        # Get the type and ID for the target object.
1747        my ($type, $id) = ParseID($idValue);
1748        # Plug them into the GetEntity method.
1749        $retVal = $erdb->GetEntity($type, $id);
1750        # Return the resulting object.
1751        return $retVal;
1752    }
1753    
1754    =head3 SplitKey
1755    
1756        my ($realKey, $subKey) = $ca->SplitKey($key);
1757    
1758    Split an external key (that is, one passed in by a caller) into the real key and the sub key.
1759    The real and sub keys are separated by a splitter value (usually C<::>). If there is no splitter,
1760    then the sub key is presumed to be an empty string.
1761    
1762    =over 4
1763    
1764    =item key
1765    
1766    Incoming key to be split.
1767    
1768    =item RETURN
1769    
1770    Returns a two-element list, the first element of which is the real key and the second element of
1771    which is the sub key.
1772    
1773    =back
1774    
1775    =cut
1776    
1777    sub SplitKey {
1778        # Get the parameters.
1779        my ($self, $key) = @_;
1780        # Do the split.
1781        my ($realKey, $subKey) = split($self->{splitter}, $key, 2);
1782        # Insure the subkey has a value.
1783        if (! defined $subKey) {
1784            $subKey = '';
1785        }
1786        # Return the results.
1787        return ($realKey, $subKey);
1788    }
1789    
1790    
1791    =head3 JoinKey
1792    
1793        my $key = $ca->JoinKey($realKey, $subKey);
1794    
1795    Join a real key and a subkey together to make an external key. The external key is the attribute key
1796    used by the caller. The real key and the subkey are how the keys are represented in the database. The
1797    real key is the key to the B<AttributeKey> entity. The subkey is an attribute of the B<HasValueFor>
1798    relationship.
1799    
1800    =over 4
1801    
1802    =item realKey
1803    
1804    The real attribute key.
1805    
1806    =item subKey
1807    
1808    The subordinate portion of the attribute key.
1809    
1810    =item RETURN
1811    
1812    Returns a single string representing both keys.
1813    
1814    =back
1815    
1816    =cut
1817    
1818    sub JoinKey {
1819        # Get the parameters.
1820        my ($self, $realKey, $subKey) = @_;
1821        # Declare the return variable.
1822        my $retVal;
1823        # Check for a subkey.
1824        if ($subKey eq '') {
1825            # No subkey, so the real key is the key.
1826            $retVal = $realKey;
1827        } else {
1828            # Subkey found, so the two pieces must be joined by a splitter.
1829            $retVal = "$realKey$self->{splitter}$subKey";
1830        }
1831        # Return the result.
1832        return $retVal;
1833    }
1834    
1835    
1836    =head3 AttributeTable
1837    
1838        my $tableHtml = CustomAttributes::AttributeTable($cgi, @attrList);
1839    
1840    Format the attribute data into an HTML table.
1841    
1842    =over 4
1843    
1844    =item cgi
1845    
1846    CGI query object used to generate the HTML
1847    
1848    =item attrList
1849    
1850    List of attribute results, in the format returned by the L</GetAttributes> or
1851    L</QueryAttributes> methods.
1852    
1853    =item RETURN
1854    
1855    Returns an HTML table displaying the attribute keys and values.
1856    
1857    =back
1858    
1859    =cut
1860    
1861    sub AttributeTable {
1862        # Get the parameters.
1863        my ($cgi, @attrList) = @_;
1864        # Accumulate the table rows.
1865        my @html = ();
1866        for my $attrData (@attrList) {
1867            # Format the object ID and key.
1868            my @columns = map { CGI::escapeHTML($_) } @{$attrData}[0,1];
1869            # Now we format the values. These remain unchanged unless one of them is a URL.
1870            my $lastValue = scalar(@{$attrData}) - 1;
1871            push @columns, map { $_ =~ /^http:/ ? CGI::a({ href => $_ }, $_) : $_ } @{$attrData}[2 .. $lastValue];
1872            # Assemble the values into a table row.
1873            push @html, CGI::Tr(CGI::td(\@columns));
1874        }
1875        # Format the table in the return variable.
1876        my $retVal = CGI::table({ border => 2 }, CGI::Tr(CGI::th(['Object', 'Key', 'Values'])), @html);
1877        # Return it.
1878        return $retVal;
1879    }
1880    
1881    
1882    =head2 Internal Utility Methods
1883    
1884    =head3 _KeyTable
1885    
1886        my $tableName = $ca->_KeyTable($keyName);
1887    
1888    Return the name of the table that contains the attribute values for the
1889    specified key.
1890    
1891    Most attribute values are stored in the default table (usually C<HasValueFor>).
1892    Some, however, are placed in private tables by themselves for performance reasons.
1893    
1894    =over 4
1895    
1896    =item keyName (optional)
1897    
1898    Name of the attribute key whose table name is desired. If not specified, the
1899    entire key/table hash is returned.
1900    
1901    =item RETURN
1902    
1903    Returns the name of the table containing the specified attribute key's values,
1904    or a reference to a hash that maps key names to table names.
1905    
1906    =back
1907    
1908    =cut
1909    
1910    sub _KeyTable {
1911        # Get the parameters.
1912        my ($self, $keyName) = @_;
1913        # Declare the return variable.
1914        my $retVal;
1915        # Insure the key table hash is present.
1916        if (! exists $self->{keyTables}) {
1917            Trace("Creating key table.") if T(3);
1918            $self->{keyTables} = { map { $_->[0] => $_->[1] } $self->GetAll(['AttributeKey'],
1919                                                    "AttributeKey(relationship-name) <> ?",
1920                                                    [$self->{defaultRel}],
1921                                                    ['AttributeKey(id)', 'AttributeKey(relationship-name)']) };
1922        }
1923        # Get the key hash.
1924        my $keyHash = $self->{keyTables};
1925        # Does the user want a specific table or the whole thing?
1926        if ($keyName) {
1927            # Here we want a specific table. Is this key in the hash?
1928            if (exists $keyHash->{$keyName}) {
1929                # It's there, so return the specified table.
1930                $retVal = $keyHash->{$keyName};
1931            } else {
1932                # No, return the default table name.
1933                $retVal = $self->{defaultRel};
1934            }
1935        } else {
1936            # Here we want the whole hash.
1937            $retVal = $keyHash;
1938        }
1939        # Return the result.
1940        return $retVal;
1941    }
1942    
1943    
1944    =head3 _QueryResults
1945    
1946        my @attributeList = $attrDB->_QueryResults($query, $table, @values);
1947    
1948    Match the results of a query against value criteria and return
1949    the results. This is an internal method that splits the values coming back
1950    and matches the sections against the specified section patterns. It serves
1951    as the back end to L</GetAttributes> and L</FindAttributes>.
1952    
1953    =over 4
1954    
1955    =item query
1956    
1957    A query object that will return the desired records.
1958    
1959    =item table
1960    
1961    Name of the value table for the query.
1962    
1963    =item values
1964    
1965    List of the desired attribute values, section by section. If C<undef>
1966    or an empty string is specified, all values in that section will match. A
1967    generic match can be requested by placing a percent sign (C<%>) at the end.
1968    In that case, all values that match up to and not including the percent sign
1969    will match. You may also specify a regular expression enclosed
1970    in slashes. All values that match the regular expression will be returned. For
1971    performance reasons, only values have this extra capability.
1972    
1973    =item RETURN
1974    
1975    Returns a list of tuples. The first element in the tuple is an object ID, the
1976    second is an attribute key, and the remaining elements are the sections of
1977    the attribute value. All of the tuples will match the criteria set forth in
1978    the parameter list.
1979    
1980    =back
1981    
1982    =cut
1983    
1984    sub _QueryResults {
1985        # Get the parameters.
1986        my ($self, $query, $table, @values) = @_;
1987        # Declare the return value.
1988        my @retVal = ();
1989        # We use this hash to check for duplicates.
1990        my %dupHash = ();
1991        # Get the number of value sections we have to match.
1992        my $sectionCount = scalar(@values);
1993        # Loop through the assignments found.
1994        while (my $row = $query->Fetch()) {
1995            # Get the current row's data.
1996            my ($id, $realKey, $subKey, $valueString) = $row->Values(["$table(to-link)",
1997                                                                      "$table(from-link)",
1998                                                                      "$table(subkey)",
1999                                                                      "$table(value)"
2000                                                                    ]);
2001            # Form the key from the real key and the sub key.
2002            my $key = $self->JoinKey($realKey, $subKey);
2003            # Break the value into sections.
2004            my @sections = split($self->{splitter}, $valueString);
2005            # Match each section against the incoming values. We'll assume we're
2006            # okay unless we learn otherwise.
2007            my $matching = 1;
2008            for (my $i = 0; $i < $sectionCount && $matching; $i++) {
2009                # We need to check to see if this section is generic.
2010                my $value = $values[$i];
2011                Trace("Current value pattern is \"$value\".") if T(4);
2012                if ($value =~ m#^/(.+)/[a-z]*$#) {
2013                    Trace("Regular expression detected.") if T(4);
2014                    # Here we have a regular expression match.
2015                    my $section = $sections[$i];
2016                    $matching = eval("\$section =~ $value");
2017                } elsif (! defined $value) {
2018                    # Wild card. Skip it.
2019                } else {
2020                    # Here we have a normal match.
2021                    Trace("SQL match used.") if T(4);
2022                    $matching = _CheckSQLPattern($values[$i], $sections[$i]);
2023                }
2024            }
2025            # If we match, consider writing this row to the return list.
2026            if ($matching) {
2027                # Check for a duplicate.
2028                my $wholeThing = join($self->{splitter}, $id, $key, $valueString);
2029                if (! $dupHash{$wholeThing}) {
2030                    # It's okay, we're not a duplicate. Insure we don't duplicate this result.
2031                    $dupHash{$wholeThing} = 1;
2032                    push @retVal, [$id, $key, @sections];
2033                }
2034            }
2035        }
2036        # Return the rows found.
2037        return @retVal;
2038    }
2039    
2040    
2041    =head3 _LoadAttributeTable
2042    
2043        $attr->_LoadAttributeTable($tableName, $fileName, $stats, $mode);
2044    
2045    Load a file's data into an attribute table. This is an internal method
2046    provided for the convenience of L</LoadAttributesFrom>. It loads the
2047    specified file into the specified table and updates the statistics
2048    object.
2049    
2050    =over 4
2051    
2052    =item tableName
2053    
2054    Name of the table being loaded. This is usually C<HasValueFor>, but may
2055    be a different table for some specific attribute keys.
2056    
2057    =item fileName
2058    
2059    Name of the file containing a chunk of attribute data to load.
2060    
2061    =item stats
2062    
2063    Statistics object into which counts and times should be placed.
2064    
2065    =item mode
2066    
2067    Load mode for the file, usually C<low_priority>, C<concurrent>, or
2068    an empty string. The mode is used by some applications to control access
2069    to the table while it's being loaded. The default (empty string) is to lock the
2070    table until all the data's in place.
2071    
2072    =back
2073    
2074    =cut
2075    
2076    sub _LoadAttributeTable {
2077        # Get the parameters.
2078        my ($self, $tableName, $fileName, $stats, $mode) = @_;
2079        # Load the table from the file. Note that we don't do an analyze.
2080        # The analyze is done only after everything is complete.
2081        my $startTime = time();
2082        Trace("Loading attributes from $fileName: " . (-s $fileName) .
2083              " characters.") if T(3);
2084        my $loadStats = $self->LoadTable($fileName, $tableName,
2085                                         mode => $mode, partial => 1);
2086        # Record the load time.
2087        $stats->Add(insertTime => time() - $startTime);
2088        # Roll up the other statistics.
2089        $stats->Accumulate($loadStats);
2090    }
2091    
2092    
2093    =head3 _GetAllTables
2094    
2095        my @tables = $ca->_GetAllTables();
2096    
2097    Return a list of the names of all the tables used to store attribute
2098    values.
2099    
2100    =cut
2101    
2102    sub _GetAllTables {
2103        # Get the parameters.
2104        my ($self) = @_;
2105        # Start with the default table.
2106        my @retVal = $self->{defaultRel};
2107        # Add the tables named in the key hash. These tables are automatically
2108        # NOT the default, and each can only occur once, because alternate tables
2109        # are allocated on a per-key basis.
2110        my $keyHash = $self->_KeyTable();
2111        push @retVal, values %$keyHash;
2112        # Return the result.
2113        return @retVal;
2114    }
2115    
2116    
2117    =head3 _SplitKeyPattern
2118    
2119        my ($realKey, $subKey) = $ca->_SplitKeyPattern($keyChoice);
2120    
2121    Split a key pattern into the main part (the I<real key>) and a sub-part
2122    (the I<sub key>). This method differs from L</SplitKey> in that it treats
2123    the key as an SQL pattern instead of a raw string. Also, if there is no
2124    incoming sub-part, the sub-key will be undefined instead of an empty
2125    string.
2126    
2127    =over 4
2128    
2129    =item keyChoice
2130    
2131    SQL key pattern to be examined. This can either be a literal, an SQL pattern,
2132    a literal with an internal splitter code (usually C<::>) or an SQL pattern with
2133    an internal splitter. Note that the only SQL pattern we support is a percent
2134    sign (C<%>) at the end. This is the way we've declared things in the documentation,
2135    so users who try anything else will have problems.
2136    
2137    =item RETURN
2138    
2139    Returns a two-element list. The first element is the SQL pattern for the
2140    real key and the second is the SQL pattern for the sub-key. If the value
2141    for either one does not matter (e.g., the user wants a real key value of
2142    C<iedb> and doesn't care about the sub-key value), it will be undefined.
2143    
2144    =back
2145    
2146    =cut
2147    
2148    sub _SplitKeyPattern {
2149        # Get the parameters.
2150        my ($self, $keyChoice) = @_;
2151        # Declare the return variables.
2152        my ($realKey, $subKey);
2153        # Look for a splitter in the input.
2154        if ($keyChoice =~ /^(.*?)$self->{splitter}(.*)/) {
2155            # We found one. This means we can treat both sides of the
2156            # splitter as known patterns.
2157            ($realKey, $subKey) = ($1, $2);
2158        } elsif ($keyChoice =~ /%$/) {
2159            # Here we have a generic pattern for the whole key. The pattern
2160            # is treated as the correct pattern for the real key, but the
2161            # sub-key is considered to be wild.
2162            $realKey = $keyChoice;
2163        } else {
2164            # Here we have a literal pattern for the whole key. The pattern
2165            # is treated as the correct pattern for the real key, and the
2166            # sub-key is required to be blank.
2167            $realKey = $keyChoice;
2168            $subKey = '';
2169        }
2170        # Return the results.
2171        return ($realKey, $subKey);
2172    }
2173    
2174    
2175    =head3 _WherePart
2176    
2177        my ($sqlClause, $escapedValue) = _WherePart($tableName, $fieldName, $sqlPattern);
2178    
2179    Return the SQL clause and value for checking a field against the
2180    specified SQL pattern value. If the pattern is generic (ends in a C<%>),
2181    then a C<LIKE> expression is returned. Otherwise, an equality expression
2182    is returned. We take in information describing the field being checked,
2183    and the pattern we're checking against it. The output is a WHERE clause
2184    fragment for the comparison and a value to be used as a bound parameter
2185    value for the clause.
2186    
2187    =over 4
2188    
2189    =item tableName
2190    
2191    Name of the table containing the field we want checked by the clause.
2192    
2193    =item fieldName
2194    
2195    Name of the field to check in that table.
2196    
2197    =item sqlPattern
2198    
2199    Pattern to be compared against the field. If the last character is a percent sign
2200    (C<%>), it will be treated as a generic SQL pattern; otherwise, it will be treated
2201    as a literal.
2202    
2203    =item RETURN
2204    
2205    Returns a two-element list. The first element will be an SQL comparison expression
2206    and the second will be the value to be used as a bound parameter for the expression
2207    in order to
2208    
2209    =back
2210    
2211    =cut
2212    
2213    sub _WherePart {
2214        # Get the parameters.
2215        my ($tableName, $fieldName, $sqlPattern) = @_;
2216        # Declare the return variables.
2217        my ($sqlClause, $escapedValue);
2218        # Copy the pattern into the return area.
2219        $escapedValue = $sqlPattern;
2220        # Check the pattern. Is it generic or exact?
2221        if ($sqlPattern =~ /(.+)%$/) {
2222            # Yes, it is. We need a LIKE clause and we must escape the underscores
2223            # and percents in the pattern (except for the last one, of course).
2224            $escapedValue = $1;
2225            $escapedValue =~ s/(%|_)/\\$1/g;
2226            $escapedValue .= "%";
2227            $sqlClause = "$tableName($fieldName) LIKE ?";
2228        } else {
2229            # No, it isn't. We use an equality clause.
2230            $sqlClause = "$tableName($fieldName) = ?";
2231        }
2232        # Return the results.
2233        return ($sqlClause, $escapedValue);
2234    }
2235    
2236    
2237    =head3 _CheckSQLPattern
2238    
2239        my $flag = _CheckSQLPattern($pattern, $value);
2240    
2241    Return TRUE if the specified SQL pattern matches the specified value,
2242    else FALSE. The pattern is not a true full-blown SQL LIKE pattern: the
2243    only wild-carding allowed is a percent sign (C<%>) at the end.
2244    
2245    =over 4
2246    
2247    =item pattern
2248    
2249    SQL pattern to match against a value.
2250    
2251    =item value
2252    
2253    Value to match against an SQL pattern.
2254    
2255    =item RETURN
2256    
2257    Returns TRUE if the pattern matches the value, else FALSE.
2258    
2259    =back
2260    
2261    =cut
2262    
2263    sub _CheckSQLPattern {
2264        # Get the parameters.
2265        my ($pattern, $value) = @_;
2266        # Declare the return variable.
2267        my $retVal;
2268        # Check for a generic pattern.
2269        if ($pattern =~ /(.*)%$/) {
2270            # Here we have one. Do a substring match.
2271            $retVal = (substr($value, 0, length $1) eq $1);
2272        } else {
2273            # Here it's an exact match.
2274            $retVal = ($pattern eq $value);
2275        }
2276        Trace("SQL pattern check: \"$value\" vs \"$pattern\" = $retVal.") if T(3);
2277        # Return the result.
2278        return $retVal;
2279  }  }
2280    
2281  1;  1;

Legend:
Removed from v.1.9  
changed lines
  Added in v.1.41

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3