[Bio] / Sprout / CustomAttributes.pm Repository:
ViewVC logotype

Diff of /Sprout/CustomAttributes.pm

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.9, Thu Nov 16 22:09:33 2006 UTC revision 1.38, Sat Oct 18 09:52:21 2008 UTC
# Line 8  Line 8 
8      use strict;      use strict;
9      use Tracer;      use Tracer;
10      use ERDBLoad;      use ERDBLoad;
11        use Stats;
12        use Time::HiRes qw(time);
13        use FIGRules;
14    
15  =head1 Custom SEED Attribute Manager  =head1 Custom SEED Attribute Manager
16    
# Line 15  Line 18 
18    
19  The Custom SEED Attributes Manager allows the user to upload and retrieve  The Custom SEED Attributes Manager allows the user to upload and retrieve
20  custom data for SEED objects. It uses the B<ERDB> database system to  custom data for SEED objects. It uses the B<ERDB> database system to
21  store the attributes, which are implemented as multi-valued fields  store the attributes.
22  of ERDB entities.  
23    Attributes are organized by I<attribute key>. Attribute values are
24    assigned to I<objects>. In the real world, objects have types and IDs;
25    however, to the attribute database only the ID matters. This will create
26    a problem if we have a single ID that applies to two objects of different
27    types, but it is more consistent with the original attribute implementation
28    in the SEED (which this implementation replaces).
29    
30    The actual attribute values are stored as a relationship between the attribute
31    keys and the objects. There can be multiple values for a single key/object pair.
32    
33    =head3 Object IDs
34    
35    The object ID is normally represented as
36    
37        I<type>:I<id>
38    
39    where I<type> is the object type (C<Role>, C<Coupling>, etc.) and I<id> is
40    the actual object ID. Note that the object type must consist of only upper- and
41    lower-case letters! Thus, C<GenomeGroup> is a valid object type, but
42    C<genome_group> is not. Given that restriction, the object ID
43    
44        Family:aclame|cluster10
45    
46    would represent the FIG family C<aclame|cluster10>. For historical reasons,
47    there are three exceptions: subsystems, genomes, and features do not need
48    a type. So, for PEG 3361 of Streptomyces coelicolor A3(2), you simply code
49    
50        fig|100226.1.peg.3361
51    
52    The methods L</ParseID> and L</FormID> can be used to make this all seem
53    more consistent. Given any object ID string, L</ParseID> will convert it to an
54    object type and ID, and given any object type and ID, L</FormID> will
55    convert it to an object ID string. The attribute database is pretty
56    freewheeling about what it will allow for an ID; however, for best
57    results, the type should match an entity type from a Sprout genetics
58    database. If this rule is followed, then the database object
59    corresponding to an ID in the attribute database could be retrieved using
60    L</GetTargetObject> method.
61    
62        my $object = CustomAttributes::GetTargetObject($sprout, $idValue);
63    
64    =head3 Retrieval and Logging
65    
66  The full suite of ERDB retrieval capabilities is provided. In addition,  The full suite of ERDB retrieval capabilities is provided. In addition,
67  custom methods are provided specific to this application. To get all  custom methods are provided specific to this application. To get all
68  the values of the attribute C<essential> in a specified B<Feature>, you  the values of the attribute C<essential> in a specified B<Feature>, you
69  would code  would code
70    
71      my @values = $attrDB->GetAttributes([Feature => $fid], 'essential');      my @values = $attrDB->GetAttributes($fid, 'essential');
72    
73  where I<$fid> contains the ID of the desired feature. Each attribute has  where I<$fid> contains the ID of the desired feature.
 an alternate index to allow searching for attributes by value.  
74    
75  New attributes are introduced by updating the database definition at  Keys can be split into two pieces using the splitter value defined in the
76  run-time. Attribute values are stored by uploading data from files.  constructor (the default is C<::>). The first piece of the key is called
77  A web interface is provided for both these activities.  the I<real key>. This portion of the key must be defined using the
78    web interface (C<Attributes.cgi>). The second portion of the key is called
79    the I<sub key>, and can take any value.
80    
81    Major attribute activity is recorded in a log (C<attributes.log>) in the
82    C<$FIG_Config::var> directory. The log reports the user name, time, and
83    the details of the operation. The user name will almost always be unknown,
84    the exception being when it is specified in this object's constructor
85    (see L</new>).
86    
87  =head2 FIG_Config Parameters  =head2 FIG_Config Parameters
88    
# Line 74  Line 126 
126  functions as data to the attribute management process, so if the data is  functions as data to the attribute management process, so if the data is
127  moved, this file must go with it.  moved, this file must go with it.
128    
129  =back  =item attr_default_table
   
 The DBD file is critical, and must have reasonable contents before we can  
 begin using the system. In the old system, attributes were only provided  
 for Genomes and Features, so the initial XML file was the following.  
   
     <Database>  
       <Title>SEED Custom Attribute Database</Title>  
       <Entities>  
         <Entity name="Feature" keyType="id-string">  
           <Notes>A [i]feature[/i] is a part of the genome  
           that is of special interest. Features may be spread  
           across multiple contigs of a genome, but never across  
           more than one genome. Features can be assigned to roles  
           via spreadsheet cells, and are the targets of  
           annotation.</Notes>  
         </Entity>  
         <Entity name="Genome" keyType="name-string">  
           <Notes>A [i]genome[/i] describes a particular individual  
           organism's DNA.</Notes>  
         </Entity>  
       </Entities>  
     </Database>  
   
 It is not necessary to put any tables into the database; however, you should  
 run  
   
     AttrDBRefresh  
   
 periodically to insure it has the correct Genomes and Features in it. When  
 converting from the old system, use  
   
     AttrDBRefresh -migrate  
130    
131  to initialize the database and migrate the legacy data. You should only need  Name of the default relationship for attribute values. If not present,
132  to do that once.  C<HasValueFor> is used.
133    
134  =head2 Implementation Note  =back
   
 The L</Refresh> method reloads the entities in the database. If new  
 entity types are added, that method will need to be adjusted accordingly.  
135    
136  =head2 Public Methods  =head2 Public Methods
137    
138  =head3 new  =head3 new
139    
140  C<< my $attrDB = CustomAttributes->new($splitter); >>      my $attrDB = CustomAttributes->new(%options);
141    
142  Construct a new CustomAttributes object. This object cannot be used to add or  Construct a new CustomAttributes object. The following options are
143  delete keys because that requires modifying the database design. To do that,  supported.
 you need to use the static L</StoreAttributeKey> or L</DeleteAttributeKey>  
 methods.  
144    
145  =over 4  =over 4
146    
147  =item splitter  =item splitter
148    
149  Value to be used to split attribute values into sections in the  Value to be used to split attribute values into sections in the
150  L</Fig Replacement Methods>. The default is a double colon C<::>.  L</Fig Replacement Methods>. The default is a double colon C<::>,
151  If you do not use the replacement methods, you do not need to  and should only be overridden in extreme circumstances.
152  worry about this parameter.  
153    =item user
154    
155    Name of the current user. This will appear in the attribute log.
156    
157  =back  =back
158    
# Line 142  Line 160 
160    
161  sub new {  sub new {
162      # Get the parameters.      # Get the parameters.
163      my ($class, $splitter) = @_;      my ($class, %options) = @_;
164        # Get the name ofthe default table.
165      # Connect to the database.      # Connect to the database.
166      my $dbh = DBKernel->new($FIG_Config::attrDbms, $FIG_Config::attrDbName,      my $dbh = DBKernel->new($FIG_Config::attrDbms, $FIG_Config::attrDbName,
167                              $FIG_Config::attrUser, $FIG_Config::attrPass,                              $FIG_Config::attrUser, $FIG_Config::attrPass,
# Line 152  Line 171 
171      my $xmlFileName = $FIG_Config::attrDBD;      my $xmlFileName = $FIG_Config::attrDBD;
172      my $retVal = ERDB::new($class, $dbh, $xmlFileName);      my $retVal = ERDB::new($class, $dbh, $xmlFileName);
173      # Store the splitter value.      # Store the splitter value.
174      $retVal->{splitter} = (defined($splitter) ? $splitter : '::');      $retVal->{splitter} = $options{splitter} || '::';
175        # Store the user name.
176        $retVal->{user} = $options{user} || '<unknown>';
177        Trace("User $retVal->{user} selected for attribute object.") if T(3);
178        # Compute the default value table name. If it's not overridden, the
179        # default is HasValueFor.
180        $retVal->{defaultRel} = $FIG_Config::attr_default_table || 'HasValueFor';
181      # Return the result.      # Return the result.
182      return $retVal;      return $retVal;
183  }  }
184    
185  =head3 StoreAttributeKey  =head3 StoreAttributeKey
186    
187  C<< my $attrDB = CustomAttributes::StoreAttributeKey($entityName, $attributeName, $type, $notes); >>      $attrDB->StoreAttributeKey($attributeName, $notes, \@groups, $table);
188    
189  Create or update an attribute for the database. This method will update the database definition  Create or update an attribute for the database.
 XML, but it will not create the table. It will connect to the database so that the caller  
 can upload the attribute values.  
190    
191  =over 4  =over 4
192    
 =item entityName  
   
 Name of the entity containing the attribute. The entity must exist.  
   
193  =item attributeName  =item attributeName
194    
195  Name of the attribute. It must be a valid ERDB field name, consisting entirely of  Name of the attribute (the real key). If it does not exist already, it will be created.
 letters, digits, and hyphens, with a letter at the beginning. If it does not  
 exist already, it will be created.  
   
 =item type  
   
 Data type of the attribute. This must be a valid ERDB data type name.  
196    
197  =item notes  =item notes
198    
199  Descriptive notes about the attribute. It is presumed to be raw text, not HTML.  Descriptive notes about the attribute. It is presumed to be raw text, not HTML.
200    
201  =item RETURN  =item groups
202    
203  Returns a Custom Attribute Database object if successful. If unsuccessful, an  Reference to a list of the groups to which the attribute should be associated.
204  error will be thrown.  This will replace any groups to which the attribute is currently attached.
205    
206    =item table
207    
208    The name of the relationship in which the attribute's values are to be stored.
209    If empty or undefined, the default relationship (usually C<HasValueFor>) will be
210    assumed.
211    
212  =back  =back
213    
# Line 196  Line 215 
215    
216  sub StoreAttributeKey {  sub StoreAttributeKey {
217      # Get the parameters.      # Get the parameters.
218      my ($entityName, $attributeName, $type, $notes) = @_;      my ($self, $attributeName, $notes, $groups, $table) = @_;
219      # Declare the return variable.      # Declare the return variable.
220      my $retVal;      my $retVal;
221        # Default the table name.
222        if (! $table) {
223            $table = $self->{defaultRel};
224        }
225      # Get the data type hash.      # Get the data type hash.
226      my %types = ERDB::GetDataTypes();      my %types = ERDB::GetDataTypes();
227      # Validate the initial input values.      # Validate the initial input values.
228      if (! ERDB::ValidateFieldName($attributeName)) {      if ($attributeName =~ /$self->{splitter}/) {
229          Confess("Invalid attribute name \"$attributeName\" specified.");          Confess("Invalid attribute name \"$attributeName\" specified.");
230      } elsif (! $notes || length($notes) < 25) {      } elsif (! $notes) {
231          Confess("Missing or incomplete description for $attributeName.");          Confess("Missing description for $attributeName.");
232      } elsif (! exists $types{$type}) {      } elsif (! grep { $_ eq $table } $self->GetConnectingRelationships('AttributeKey')) {
233          Confess("Invalid data type \"$type\" for $attributeName.");          Confess("Invalid relationship name \"$table\" specified as a custom attribute table.");
234      }      } else {
235      # Our next step is to read in the XML for the database defintion. We          # Create a variable to hold the action to be displayed for the log (Add or Update).
236      # need to verify that the named entity exists.          my $action;
237      my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);          # Okay, we're ready to begin. See if this key exists.
238      my $entityHash = $metadata->{Entities};          my $attribute = $self->GetEntity('AttributeKey', $attributeName);
239      if (! exists $entityHash->{$entityName}) {          if (defined($attribute)) {
240          Confess("Entity $entityName not found.");              # It does, so we do an update.
241      } else {              $action = "Update Key";
242          # Okay, we're ready to begin. Get the entity hash and the field hash.              $self->UpdateEntity('AttributeKey', $attributeName,
243          my $entityData = $entityHash->{$entityName};                                  { description => $notes,
244          my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);                                    'relationship-name' => $table});
245          # Compare the old attribute data to the new data.              # Detach the key from its current groups.
246          my $bigChange = 1;              $self->Disconnect('IsInGroup', 'AttributeKey', $attributeName);
247          if (exists $fieldHash->{$attributeName} && $fieldHash->{$attributeName}->{type} eq $type) {          } else {
248              $bigChange = 0;              # It doesn't, so we do an insert.
249          }              $action = "Insert Key";
250          # Compute the attribute's relation name.              $self->InsertObject('AttributeKey', { id => $attributeName,
251          my $relName = join("", $entityName, map { ucfirst $_ } split(/-|_/, $attributeName));                                  description => $notes,
252          # Store the attribute's field data. Note the use of the "content" hash for                                  'relationship-name' => $table});
253          # the notes. This is how the XML writer knows Notes is a text tag instead of          }
254          # an attribute.          # Attach the key to the specified groups. (We presume the groups already
255          $fieldHash->{$attributeName} = { type => $type, relation => $relName,          # exist.)
256                                           Notes => { content => $notes } };          for my $group (@{$groups}) {
257          # Insure we have an index for this attribute.              $self->InsertObject('IsInGroup', { 'from-link' => $attributeName,
258          my $index = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);                                                 'to-link'   => $group });
         if (! defined($index)) {  
             push @{$entityData->{Indexes}}, { IndexFields => [ { name => $attributeName, order => 'ascending' } ],  
                                               Notes       => "Alternate index provided for access by $attributeName." };  
         }  
         # Write the XML back out.  
         ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);  
         # Open a database with the new XML.  
         $retVal = CustomAttributes->new();  
         # Create the table if there has been a significant change.  
         if ($bigChange) {  
             $retVal->CreateTable($relName);  
259          }          }
260            # Log the operation.
261            $self->LogOperation($action, $attributeName, "Group list is " . join(" ", @{$groups}));
262      }      }
     return $retVal;  
263  }  }
264    
 =head3 Refresh  
265    
266  C<< $attrDB->Refresh($fig); >>  =head3 DeleteAttributeKey
267    
268        my $stats = $attrDB->DeleteAttributeKey($attributeName);
269    
270  Refresh the primary entity tables from the FIG data store. This method basically  Delete an attribute from the custom attributes database.
 drops and reloads the main tables of the custom attributes database.  
271    
272  =over 4  =over 4
273    
274  =item fig  =item attributeName
275    
276    Name of the attribute to delete.
277    
278    =item RETURN
279    
280  FIG-like object that can be used to find genomes and features.  Returns a statistics object describing the effects of the deletion.
281    
282  =back  =back
283    
284  =cut  =cut
285    
286  sub Refresh {  sub DeleteAttributeKey {
287      # Get the parameters.      # Get the parameters.
288      my ($self, $fig) = @_;      my ($self, $attributeName) = @_;
289      # Create load objects for the genomes and the features.      # Delete the attribute key.
290      my $loadGenome = ERDBLoad->new($self, 'Genome', $FIG_Config::temp);      my $retVal = $self->Delete('AttributeKey', $attributeName);
291      my $loadFeature = ERDBLoad->new($self, 'Feature', $FIG_Config::temp);      # Log this operation.
292      # Get the genome list.      $self->LogOperation("Delete Key", $attributeName, "Key will no longer be available for use by anyone.");
293      my @genomes = $fig->genomes();      # Return the result.
294      # Loop through the genomes.      return $retVal;
     for my $genomeID (@genomes) {  
         # Put this genome in the genome table.  
         $loadGenome->Put($genomeID);  
         Trace("Processing Genome $genomeID") if T(3);  
         # Put its features into the feature table. Note we have to use a hash to  
         # remove duplicates.  
         my %featureList = map { $_ => 1 } $fig->all_features($genomeID);  
         for my $fid (keys %featureList) {  
             $loadFeature->Put($fid);  
         }  
     }  
     # Get a variable for holding statistics objects.  
     my $stats;  
     # Finish the genome load.  
     Trace("Loading Genome relation.") if T(2);  
     $stats = $loadGenome->FinishAndLoad();  
     Trace("Genome table load statistics:\n" . $stats->Show()) if T(3);  
     # Finish the feature load.  
     Trace("Loading Feature relation.") if T(2);  
     $stats = $loadFeature->FinishAndLoad();  
     Trace("Feature table load statistics:\n" . $stats->Show()) if T(3);  
 }  
295    
296  =head3 LoadAttributeKey  }
297    
298  C<< my $stats = $attrDB->LoadAttributeKey($entityName, $fieldName, $fh, $keyCol, $dataCol); >>  =head3 NewName
299    
300  Load the specified attribute from the specified file. The file should be a      my $text = CustomAttributes::NewName();
 tab-delimited file with internal tab and new-line characters escaped. This is  
 the typical TBL-style file used by most FIG applications. One of the columns  
 in the input file must contain the appropriate key value and the other the  
 corresponding attribute value.  
301    
302  =over 4  Return the string used to indicate the user wants to add a new attribute.
303    
304  =item entityName  =cut
305    
306  Name of the entity containing the attribute.  sub NewName {
307        return "(new)";
308    }
309    
310  =item fieldName  =head3 LoadAttributesFrom
311    
312  Name of the actual attribute.  C<< my $stats = $attrDB->LoadAttributesFrom($fileName, %options); >>
313    
314  =item fh  Load attributes from the specified tab-delimited file. Each line of the file must
315    contain an object ID in the first column, an attribute key name in the second
316    column, and attribute values in the remaining columns. The attribute values must
317    be assembled into a single value using the splitter code. In addition, the key names may
318    contain a splitter. If this is the case, the portion of the key after the splitter is
319    treated as a subkey.
320    
321  Open file handle for the input file.  =over 4
322    
323  =item keyCol  =item fileName
324    
325  Index (0-based) of the column containing the key field. The key field should  Name of the file from which to load the attributes, or an open handle for the file.
326  contain the ID of an instance of the named entity.  (This last enables the method to be used in conjunction with the CGI form upload
327    control.)
328    
329  =item dataCol  =item options
330    
331  Index (0-based) of the column containing the data value field.  Hash of options for modifying the load process.
332    
333  =item RETURN  =item RETURN
334    
335  Returns a statistics object for the load process.  Returns a statistics object describing the load.
336    
337  =back  =back
338    
339  =cut  Permissible option values are as follows.
340    
341  sub LoadAttributeKey {  =over 4
     # Get the parameters.  
     my ($self, $entityName, $fieldName, $fh, $keyCol, $dataCol) = @_;  
     # Create the return variable.  
     my $retVal;  
     # Insure the entity exists.  
     my $found = grep { $_ eq $entityName } $self->GetEntityTypes();  
     if (! $found) {  
         Confess("Entity \"$entityName\" not found in database.");  
     } else {  
         # Get the field structure for the named entity.  
         my $fieldHash = $self->GetFieldTable($entityName);  
         # Verify that the attribute exists.  
         if (! exists $fieldHash->{$fieldName}) {  
             Confess("Attribute key \"$fieldName\" does not exist in entity $entityName.");  
         } else {  
             # Create a loader for the specified attribute. We need the  
             # relation name first.  
             my $relName = $fieldHash->{$fieldName}->{relation};  
             my $loadAttribute = ERDBLoad->new($self, $relName, $FIG_Config::temp);  
             # Loop through the input file.  
             while (! eof $fh) {  
                 # Get the next line of the file.  
                 my @fields = Tracer::GetLine($fh);  
                 $loadAttribute->Add("lineIn");  
                 # Now we need to validate the line.  
                 if ($#fields < $dataCol) {  
                     $loadAttribute->Add("shortLine");  
                 } elsif (! $self->Exists($entityName, $fields[$keyCol])) {  
                     $loadAttribute->Add("badKey");  
                 } else {  
                     # It's valid,so send it to the loader.  
                     $loadAttribute->Put($fields[$keyCol], $fields[$dataCol]);  
                     $loadAttribute->Add("lineUsed");  
                 }  
             }  
             # Finish the load.  
             $retVal = $loadAttribute->FinishAndLoad();  
         }  
     }  
     # Return the statistics.  
     return $retVal;  
 }  
342    
343    =item mode
344    
345  =head3 DeleteAttributeKey  Loading mode. Legal values are C<low_priority> (which reduces the task priority
346    of the load) and C<concurrent> (which reduces the locking cost of the load). The
347    default is a normal load.
348    
349  C<< CustomAttributes::DeleteAttributeKey($entityName, $attributeName); >>  =item append
350    
351  Delete an attribute from the custom attributes database.  If TRUE, then the attributes will be appended to existing data; otherwise, the
352    first time a key name is encountered, it will be erased.
353    
354  =over 4  =item archive
355    
356  =item entityName  If specified, the name of a file into which the incoming data should be saved.
357    If I<resume> is also specified, only the lines actually loaded will be put
358    into this file.
359    
360  Name of the entity possessing the attribute.  =item objectType
361    
362  =item attributeName  If specified, the specified object type will be prefixed to each object ID.
363    
364  Name of the attribute to delete.  =item resume
365    
366    If specified, key-value pairs already in the database will not be reinserted.
367    Specify a number to start checking after the specified number of lines and
368    then admit everything after the first line not yet loaded. Specify C<careful>
369    to check every single line. Specify C<none> to ignore this option. The default
370    is C<none>. So, if you believe that a previous load failed somewhere after 50000
371    lines, a resume value of C<50000> would skip 50000 lines in the file, then
372    check each line after that until it finds one not already in the database. The
373    first such line found and all lines after that will be loaded. On the other
374    hand, if you have a file of 100000 records, and some have been loaded and some
375    not, you would use the word C<careful>, so that every line would be checked before
376    it is inserted. A resume of C<0> will start checking the first line of the
377    input file and then begin loading once it finds a line not in the database.
378    
379    =item chunkSize
380    
381    Number of lines to load in each burst. The default is 10,000.
382    
383  =back  =back
384    
385  =cut  =cut
386    
387  sub DeleteAttributeKey {  sub LoadAttributesFrom {
388      # Get the parameters.      # Get the parameters.
389      my ($entityName, $attributeName) = @_;      my ($self, $fileName, %options) = @_;
390      # Read in the XML for the database defintion. We need to verify that      # Declare the return variable.
391      # the named entity exists and it has the named attribute.      my $retVal = Stats->new('keys', 'values', 'linesOut');
392      my $metadata = ERDB::ReadMetaXML($FIG_Config::attrDBD);      # Initialize the timers.
393      my $entityHash = $metadata->{Entities};      my ($eraseTime, $archiveTime, $checkTime) = (0, 0, 0);
394      if (! exists $entityHash->{$entityName}) {      # Check for append mode.
395          Confess("Entity \"$entityName\" not found.");      my $append = ($options{append} ? 1 : 0);
396      } else {      # Check for resume mode.
397          # Get the field hash.      my $resume = (defined($options{resume}) ? $options{resume} : 'none');
398          my $fieldHash = ERDB::GetEntityFieldHash($metadata, $entityName);      # Create a hash of key names found.
399          if (! exists $fieldHash->{$attributeName}) {      my %keyHash = ();
400              Confess("Attribute key \"$attributeName\" not found in entity $entityName.");      # Create a hash of table names to files. Most attributes go into the HasValueFor
401          } else {      # table, but some are put into other tables. Each table name will be mapped
402              # Get the attribute's relation name.      # to a sub-hash with keys "fileName" (output file for the table) and "count"
403              my $relName = $fieldHash->{$attributeName}->{relation};      # (number of lines in the file).
404              # Check for an index.      my %tableHash = ();
405              my $indexIdx = ERDB::FindIndexForEntity($metadata, $entityName, $attributeName);      # Compute the chunk size.
406              if (defined($indexIdx)) {      my $chunkSize = ($options{chunkSize} ? $options{chunkSize} : 10000);
407                  Trace("Index for $attributeName found at position $indexIdx for $entityName.") if T(3);      # Open the file for input. Note we must anticipate the possibility of an
408                  delete $entityHash->{$entityName}->{Indexes}->[$indexIdx];      # open filehandle being passed in. This occurs when the user is submitting
409              }      # the load file over the web.
410              # Delete the attribute from the field hash.      my $fh;
411              Trace("Deleting attribute $attributeName from $entityName.") if T(3);      if (ref $fileName) {
412              delete $fieldHash->{$attributeName};          Trace("Using file opened by caller.") if T(3);
413              # Write the XML back out.          $fh = $fileName;
414              ERDB::WriteMetaXML($metadata, $FIG_Config::attrDBD);      } else {
415              # Insure the relation does not exist in the database. This requires connecting          Trace("Attributes will be loaded from $fileName.") if T(3);
416              # since we may have to do a table drop.          $fh = Open(undef, "<$fileName");
417              my $attrDB = CustomAttributes->new();      }
418              Trace("Dropping table $relName.") if T(3);      # Trace the mode.
419              $attrDB->DropRelation($relName);      if (T(3)) {
420            if ($options{mode}) {
421                Trace("Mode is $options{mode}.")
422            } else {
423                Trace("No mode specified.")
424            }
425        }
426        # Now check to see if we need to archive.
427        my $ah;
428        if (exists $options{archive}) {
429            my $ah = Open(undef, ">$options{archive}");
430            Trace("Load file will be archived to $options{archive}.") if T(3);
431        }
432        # Insure we recover from errors.
433        eval {
434            # If we have a resume number, process it here.
435            if ($resume =~ /\d+/) {
436                Trace("Skipping $resume lines.") if T(2);
437                my $startTime = time();
438                # Skip the specified number of lines.
439                for (my $skipped = 0; ! eof($fh) && $skipped < $resume; $skipped++) {
440                    my $line = <$fh>;
441                    $retVal->Add(skipped => 1);
442                }
443                $checkTime += time() - $startTime;
444            }
445            # Loop through the file.
446            Trace("Starting load.") if T(2);
447            while (! eof $fh) {
448                # Read the current line.
449                my ($id, $key, @values) = Tracer::GetLine($fh);
450                $retVal->Add(linesIn => 1);
451                # Do some validation.
452                if (! $id) {
453                    # We ignore blank lines.
454                    $retVal->Add(blankLines => 1);
455                } elsif (substr($id, 0, 1) eq '#') {
456                    # A line beginning with a pound sign is a comment.
457                    $retVal->Add(comments => 1);
458                } elsif (! defined($key)) {
459                    # An ID without a key is a serious error.
460                    my $lines = $retVal->Ask('linesIn');
461                    Confess("Line $lines in $fileName has no attribute key.");
462                } elsif (! @values) {
463                    # A line with no values is not allowed.
464                    my $lines = $retVal->Ask('linesIn');
465                    Trace("Line $lines for key $key has no attribute values.") if T(1);
466                    $retVal->Add(skipped => 1);
467                } else {
468                    # Check to see if we need to fix up the object ID.
469                    if ($options{objectType}) {
470                        $id = "$options{objectType}:$id";
471                    }
472                    # The key contains a real part and an optional sub-part. We need the real part.
473                    my ($realKey, $subKey) = $self->SplitKey($key);
474                    # Now we need to check for a new key.
475                    if (! exists $keyHash{$realKey}) {
476                        my $keyObject = $self->GetEntity(AttributeKey => $realKey);
477                        if (! defined($keyObject)) {
478                            # Here the specified key does not exist, which is an error.
479                            my $line = $retVal->Ask('linesIn');
480                            Confess("Attribute \"$realKey\" on line $line of $fileName not found in database.");
481                        } else {
482                            # Make sure we know this is no longer a new key. We do this by putting
483                            # its table name in the key hash.
484                            $keyHash{$realKey} = $keyObject->PrimaryValue('AttributeKey(relationship-name)');
485                            $retVal->Add(keys => 1);
486                            # If this is NOT append mode, erase the key. This does not delete the key
487                            # itself; it just clears out all the values.
488                            if (! $append) {
489                                my $startTime = time();
490                                $self->EraseAttribute($realKey);
491                                $eraseTime += time() - $startTime;
492                                Trace("Attribute $realKey erased.") if T(3);
493                            }
494                        }
495                        Trace("Key $realKey found.") if T(3);
496                    }
497                    # If we're in resume mode, check to see if this insert is redundant.
498                    my $ok = 1;
499                    if ($resume ne 'none') {
500                        my $startTime = time();
501                        my $count = $self->GetAttributes($id, $key, @values);
502                        if ($count) {
503                            # Here the record is found, so we skip it.
504                            $ok = 0;
505                            $retVal->Add(skipped => 1);
506                        } else {
507                            # Here the record is not found. If we're in non-careful mode, we
508                            # stop resume checking at this point.
509                            if ($resume ne 'careful') {
510                                $resume = 'none';
511                            }
512                        }
513                        $checkTime += time() - $startTime;
514                    }
515                    if ($ok) {
516                        # We're in business. First, archive this row.
517                        if (defined $ah) {
518                            my $startTime = time();
519                            Tracer::PutLine($ah, [$id, $key, @values]);
520                            $archiveTime += time() - $startTime;
521                        }
522                        # We need to format the attribute data so it will work
523                        # as if it were a load file. This means we join the
524                        # values.
525                        my $valueString = join('::', @values);
526                        # Now we need to get access to the key's load file. Check for it in the
527                        # table hash.
528                        my $keyTable = $keyHash{$realKey};
529                        if (! exists $tableHash{$keyTable}) {
530                            # This is a new table, so we need to set it up. First, we get
531                            # a temporary file for it.
532                            my $tempFileName = FIGRules::GetTempFileName(sessionID => $$ . $keyTable,
533                                                                         extension => 'dtx');
534                            my $oh = Open(undef, ">$tempFileName");
535                            # Now we create its descriptor in the table hash.
536                            $tableHash{$keyTable} = {fileName => $tempFileName, handle => $oh, count => 0};
537                        }
538                        # Everything is all set up, so we put the value in the temporary file and
539                        # count it.
540                        my $tableData = $tableHash{$keyTable};
541                        my $startTime = time();
542                        Tracer::PutLine($tableData->{handle}, [$realKey, $id, $subKey, $valueString]);
543                        $archiveTime += time() - $startTime;
544                        $retVal->Add(linesOut => 1);
545                        $tableData->{count}++;
546                        # See if it's time to load a chunk.
547                        if ($tableData->{count} >= $chunkSize) {
548                            # We've filled a chunk, so it's time.
549                            close $tableData->{handle};
550                            $self->_LoadAttributeTable($keyTable, $tableData->{fileName}, $retVal);
551                            # Reset for the next chunk.
552                            $tableData->{count} = 0;
553                            $tableData->{handle} = Open(undef, ">$tableData->{fileName}");
554                        }
555                    } else {
556                        # Here we skipped because of resume mode.
557                        $retVal->Add(resumeSkip => 1);
558          }          }
559                    Trace($retVal->Ask('values') . " values processed.") if $retVal->Check(values => 1000) && T(3);
560      }      }
561  }  }
562            # Now we close the archive file. Note we undefine the handle so the error methods know
563            # not to worry.
564            if (defined $ah) {
565                close $ah;
566                undef $ah;
567            }
568            # Now we load the residual from the temporary files (if any). This time we'll do an
569            # analyze as well.
570            for my $tableName (keys %tableHash) {
571                # Get the data for this table.
572                my $tableData = $tableHash{$tableName};
573                # Close the handle. ERDB will re-open it for input later.
574                close $tableData->{handle};
575                # Check to see if there's anything left to load.
576                if ($tableData->{count} > 0) {
577                    # Yes, load the data.
578                    $self->_LoadAttributeTable($tableName, $tableData->{fileName}, $retVal);
579                }
580                # Regardless of whether additional loading was required, we need to
581                # analyze the table for performance.
582                my $startTime = time();
583                $self->Analyze($tableName);
584                $retVal->Add(analyzeTime => time() - $startTime);
585            }
586            Trace("Attribute load successful.") if T(2);
587        };
588        # Check for an error.
589        if ($@) {
590            # Here we have an error. Display the error message.
591            my $message = $@;
592            Trace("Error during attribute load: $message") if T(0);
593            $retVal->AddMessage($message);
594            # Close the archive file if it's open. The archive file can sometimes provide
595            # clues as to what happened.
596            if (defined $ah) {
597                close $ah;
598            }
599        }
600        # Store the timers.
601        $retVal->Add(eraseTime   => $eraseTime);
602        $retVal->Add(archiveTime => $archiveTime);
603        $retVal->Add(checkTime   => $checkTime);
604        # Return the result.
605        return $retVal;
606    }
607    
608  =head3 ControlForm  =head3 BackupKeys
609    
610  C<< my $formHtml = $attrDB->ControlForm($cgi, $name); >>      my $stats = $attrDB->BackupKeys($fileName, %options);
611    
612  Return a form that can be used to control the creation and modification of  Backup the attribute key information from the attribute database.
 attributes.  
613    
614  =over 4  =over 4
615    
616  =item cgi  =item fileName
617    
618  CGI query object used to create HTML.  Name of the output file.
619    
620  =item name  =item options
621    
622  Name to give to the form. This should be unique for the web page.  Options for modifying the backup process.
623    
624  =item RETURN  =item RETURN
625    
626  Returns the HTML for a form that submits instructions to the C<Attributes.cgi> script  Returns a statistics object for the backup.
 for loading, creating, or deleting an attribute.  
627    
628  =back  =back
629    
630    Currently there are no options. The backup is straight to a text file in
631    tab-delimited format. Each key is backup up to two lines. The first line
632    is all of the data from the B<AttributeKey> table. The second is a
633    tab-delimited list of all the groups.
634    
635  =cut  =cut
636    
637  sub ControlForm {  sub BackupKeys {
638      # Get the parameters.      # Get the parameters.
639      my ($self, $cgi, $name) = @_;      my ($self, $fileName, %options) = @_;
640      # Declare the return list.      # Declare the return variable.
641      my @retVal = ();      my $retVal = Stats->new();
642      # Start the form. We use multipart to support the upload control.      # Open the output file.
643      push @retVal, $cgi->start_multipart_form(-name => $name);      my $fh = Open(undef, ">$fileName");
644      # We'll put the controls in a table. Nothing else ever seems to look nice.      # Set up to read the keys.
645      push @retVal, $cgi->start_table({ border => 2, cellpadding => 2 });      my $keyQuery = $self->Get(['AttributeKey'], "", []);
646      # The first row is for selecting the field name.      # Loop through the keys.
647      push @retVal, $cgi->Tr($cgi->th("Select a Field"),      while (my $keyData = $keyQuery->Fetch()) {
648                             $cgi->td($self->FieldMenu($cgi, 10, 'fieldName', 1,          $retVal->Add(key => 1);
649                                                       "document.$name.notes.value",          # Get the fields.
650                                                       "document.$name.dataType.value")));          my ($id, $type, $tableName, $description) =
651      # Now we set up a dropdown for the data types. The values will be the              $keyData->Values(['AttributeKey(id)', 'AttributeKey(relationship-name)',
652      # data type names, and the labels will be the descriptions.                                'AttributeKey(description)']);
653      my %types = ERDB::GetDataTypes();          # Escape any tabs or new-lines in the description.
654      my %labelMap = map { $_ => $types{$_}->{notes} } keys %types;          my $escapedDescription = Tracer::Escape($description);
655      my $typeMenu = $cgi->popup_menu(-name   => 'dataType',          # Write the key data to the output.
656                                      -values => [sort keys %types],          Tracer::PutLine($fh, [$id, $type, $tableName, $escapedDescription]);
657                                      -labels => \%labelMap);          # Get the key's groups.
658      push @retVal, $cgi->Tr($cgi->th("Data type"),          my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(from-link) = ?", [$id],
659                             $cgi->td($typeMenu));                                      'IsInGroup(to-link)');
660      # The next row is for the notes.          $retVal->Add(memberships => scalar(@groups));
661      push @retVal, $cgi->Tr($cgi->th("Description"),          # Write them to the output. Note we put a marker at the beginning to insure the line
662                             $cgi->td($cgi->textarea(-name => 'notes',          # is nonempty.
663                                                     -rows => 6,          Tracer::PutLine($fh, ['#GROUPS', @groups]);
664                                                     -columns => 80))      }
665                            );      # Log the operation.
666      # Allow the user to specify a new field name. This is required if the      $self->LogOperation("Backup Keys", $fileName, $retVal->Display());
667      # user has selected one of the "(new)" markers.      # Return the result.
668      push @retVal, $cgi->Tr($cgi->th("New Field Name"),      return $retVal;
669                             $cgi->td($cgi->textfield(-name => 'newName',  }
                                                     -size => 30)),  
                                     );  
     # If the user wants to upload new values for the field, then we have  
     # an upload file name and column indicators.  
     push @retVal, $cgi->Tr($cgi->th("Upload Values"),  
                            $cgi->td($cgi->filefield(-name => 'newValueFile',  
                                                     -size => 20) .  
                                     " Key&nbsp;" .  
                                     $cgi->textfield(-name => 'keyCol',  
                                                     -size => 3,  
                                                     -default => 0) .  
                                     " Value&nbsp;" .  
                                     $cgi->textfield(-name => 'valueCol',  
                                                     -size => 3,  
                                                     -default => 1)  
                                    ),  
                           );  
     # Now the three buttons: UPDATE, SHOW, and DELETE.  
     push @retVal, $cgi->Tr($cgi->th("&nbsp;"),  
                            $cgi->td({align => 'center'},  
                                     $cgi->submit(-name => 'Delete', -value => 'DELETE') . " " .  
                                     $cgi->submit(-name => 'Store',  -value => 'STORE') . " " .  
                                     $cgi->submit(-name => 'Show',   -value => 'SHOW')  
                                    )  
                           );  
     # Close the table and the form.  
     push @retVal, $cgi->end_table();  
     push @retVal, $cgi->end_form();  
     # Return the assembled HTML.  
     return join("\n", @retVal, "");  
 }  
   
 =head3 FieldMenu  
   
 C<< my $menuHtml = $attrDB->FieldMenu($cgi, $height, $name, $newFlag, $noteControl, $typeControl); >>  
   
 Return the HTML for a menu to select an attribute field. The menu will  
 be a standard SELECT/OPTION thing which is called "popup menu" in the  
 CGI package, but actually looks like a list. The list will contain  
 one selectable row per field, grouped by entity.  
   
 =over 4  
   
 =item cgi  
   
 CGI query object used to generate HTML.  
   
 =item height  
670    
671  Number of lines to display in the list.  =head3 RestoreKeys
672    
673  =item name      my $stats = $attrDB->RestoreKeys($fileName, %options);
674    
675  Name to give to the menu. This is the name under which the value will  Restore the attribute keys and groups from a backup file.
 appear when the form is submitted.  
676    
677  =item newFlag (optional)  =over 4
678    
679  If TRUE, then extra rows will be provided to allow the user to select  =item fileName
 a new attribute. In other words, the user can select an existing  
 attribute, or can choose a C<(new)> marker to indicate a field to  
 be created in the parent entity.  
680    
681  =item noteControl (optional)  Name of the file containing the backed-up keys. Each key has a pair of lines,
682    one containing the key data and one listing its groups.
683    
684  If specified, the name of a variable for displaying the notes attached  =back
 to the field. This must be in Javascript form ready for assignment.  
 So, for example, if you have a variable called C<notes> that  
 represents a paragraph element, you should code C<notes.innerHTML>.  
 If it actually represents a form field you should code C<notes.value>.  
 If an C<innerHTML> coding is used, the text will be HTML-escaped before  
 it is copied in. Specifying this parameter generates Javascript for  
 displaying the field description when a field is selected.  
685    
686  =item typeControl (optional)  =cut
687    
688  If specified, the name of a variable for displaying the field's  sub RestoreKeys {
689  data type. Data types are a much more controlled vocabulary than      # Get the parameters.
690  notes, so there is no worry about HTML translation. Instead, the      my ($self, $fileName, %options) = @_;
691  raw value is put into the specified variable. Otherwise, the same      # Declare the return variable.
692  rules apply to this value that apply to I<$noteControl>.      my $retVal = Stats->new();
693        # Set up a hash to hold the group IDs.
694        my %groups = ();
695        # Open the file.
696        my $fh = Open(undef, "<$fileName");
697        # Loop until we're done.
698        while (! eof $fh) {
699            # Get a key record.
700            my ($id, $tableName, $description) = Tracer::GetLine($fh);
701            if ($id eq '#GROUPS') {
702                Confess("Group record found when key record expected.");
703            } elsif (! defined($description)) {
704                Confess("Invalid format found for key record.");
705            } else {
706                $retVal->Add("keyIn" => 1);
707                # Add this key to the database.
708                $self->InsertObject('AttributeKey', { id => $id,
709                                                      description => Tracer::UnEscape($description),
710                                                      'relationship-name' => $tableName});
711                Trace("Attribute $id stored.") if T(3);
712                # Get the group line.
713                my ($marker, @groups) = Tracer::GetLine($fh);
714                if (! defined($marker)) {
715                    Confess("End of file found where group record expected.");
716                } elsif ($marker ne '#GROUPS') {
717                    Confess("Group record not found after key record.");
718                } else {
719                    $retVal->Add(memberships => scalar(@groups));
720                    # Connect the groups.
721                    for my $group (@groups) {
722                        # Find out if this is a new group.
723                        if (! $groups{$group}) {
724                            $retVal->Add(newGroup => 1);
725                            # Add the group.
726                            $self->InsertObject('AttributeGroup', { id => $group });
727                            Trace("Group $group created.") if T(3);
728                            # Make sure we know it's not new.
729                            $groups{$group} = 1;
730                        }
731                        # Connect the group to our key.
732                        $self->InsertObject('IsInGroup', { 'from-link' => $id, 'to-link' => $group });
733                    }
734                    Trace("$id added to " . scalar(@groups) . " groups.") if T(3);
735                }
736            }
737        }
738        # Log the operation.
739        $self->LogOperation("Backup Keys", $fileName, $retVal->Display());
740        # Return the result.
741        return $retVal;
742    }
743    
744  =item RETURN  =head3 ArchiveFileName
745    
746  Returns the HTML to create a form field that can be used to select an      my $fileName = $ca->ArchiveFileName();
 attribute from the custom attributes system.  
747    
748  =back  Compute a file name for archiving attribute input data. The file will be in the attribute log directory
749    
750  =cut  =cut
751    
752  sub FieldMenu {  sub ArchiveFileName {
753      # Get the parameters.      # Get the parameters.
754      my ($self, $cgi, $height, $name, $newFlag, $noteControl, $typeControl) = @_;      my ($self) = @_;
755      # These next two hashes make everything happen. "entities"      # Declare the return variable.
756      # maps each entity name to the list of values to be put into its      my $retVal;
757      # option group. "labels" maps each entity name to a map from values      # We start by turning the timestamp into something usable as a file name.
758      # to labels.      my $now = Tracer::Now();
759      my @entityNames = sort ($self->GetEntityTypes());      $now =~ tr/ :\//___/;
760      my %entities = map { $_ => [] } @entityNames;      # Next we get the directory name.
761      my %labels = map { $_ => { }} @entityNames;      my $dir = "$FIG_Config::var/attributes";
762      # Loop through the entities, adding the existing attributes.      if (! -e $dir) {
763      for my $entity (@entityNames) {          Trace("Creating attribute file directory $dir.") if T(1);
764          # Get this entity's field table.          mkdir $dir;
765          my $fieldHash = $self->GetFieldTable($entity);      }
766          # Get its field list in our local hashes.      # Put it together with the field name and the time stamp.
767          my $fieldList = $entities{$entity};      $retVal = "$dir/upload.$now";
768          my $labelList = $labels{$entity};      # Modify the file name to insure it's unique.
769          # Add the NEW fields if we want them.      my $seq = 0;
770          if ($newFlag) {      while (-e "$retVal.$seq.tbl") { $seq++ }
771              push @{$fieldList}, $entity;      # Use the computed sequence number to get the correct file name.
772              $labelList->{$entity} = "(new)";      $retVal .= ".$seq.tbl";
         }  
         # Loop through the fields in the hash. We only keep the ones with a  
         # secondary relation name. (In other words, the name of the relation  
         # in which the field appears cannot be the same as the entity name.)  
         for my $fieldName (sort keys %{$fieldHash}) {  
             if ($fieldHash->{$fieldName}->{relation} ne $entity) {  
                 my $value = "$entity/$fieldName";  
                 push @{$fieldList}, $value;  
                 $labelList->{$value} = $fieldName;  
             }  
         }  
     }  
     # Now we have a hash and a list for each entity, and they correspond  
     # exactly to what the $cgi->optgroup function expects.  
     # The last step is to create the name for the onChange function. This function  
     # may not do anything, but we need to know the name to generate the HTML  
     # for the menu.  
     my $changeName = "${name}_setNotes";  
     my $retVal = $cgi->popup_menu({name => $name,  
                                    size => $height,  
                                    onChange => "$changeName(this.value)",  
                                    values => [map { $cgi->optgroup(-name => $_,  
                                                                    -values => $entities{$_},  
                                                                    -labels => $labels{$_})  
                                                   } @entityNames]}  
                                  );  
     # Create the change function.  
     $retVal .= "\n<script language=\"javascript\">\n";  
     $retVal .= "    function $changeName(fieldValue) {\n";  
     # The function only has a body if we have a notes control to store the description.  
     if ($noteControl || $typeControl) {  
         # Check to see if we're storing HTML or text into the note control.  
         my $htmlMode = ($noteControl && $noteControl =~ /innerHTML$/);  
         # We use a CASE statement based on the newly-selected field value. The  
         # field description will be stored in the JavaScript variable "myText"  
         # and the data type in "myType". Note the default data type is a normal  
         # string, but the default notes is an empty string.  
         $retVal .= "        var myText = \"\";\n";  
         $retVal .= "        var myType = \"string\";\n";  
         $retVal .= "        switch (fieldValue) {\n";  
         # Loop through the entities.  
         for my $entity (@entityNames) {  
             # Get the entity's field hash. This has the notes in it.  
             my $fieldHash = $self->GetFieldTable($entity);  
             # Loop through the values we might see for this entity's fields.  
             my $fields = $entities{$entity};  
             for my $value (@{$fields}) {  
                 # Only proceed if we have an existing field.  
                 if ($value =~ m!/(.+)$!) {  
                     # Get the field's hash element.  
                     my $element = $fieldHash->{$1};  
                     # Generate this case.  
                     $retVal .= "        case \"$value\" :\n";  
                     # Here we either want to update the note display, the  
                     # type display, or both.  
                     if ($noteControl) {  
                         # Here we want the notes updated.  
                         my $notes = $element->{Notes}->{content};  
                         # Insure it's in the proper form.  
                         if ($htmlMode) {  
                             $notes = ERDB::HTMLNote($notes);  
                         }  
                         # Escape it for use as a string literal.  
                         $notes =~ s/\n/\\n/g;  
                         $notes =~ s/"/\\"/g;  
                         $retVal .= "           myText = \"$notes\";\n";  
                     }  
                     if ($typeControl) {  
                         # Here we want the type updated.  
                         my $type = $element->{type};  
                         $retVal .= "           myType = \"$type\";\n";  
                     }  
                     # Close this case.  
                     $retVal .= "           break;\n";  
                 }  
             }  
         }  
         # Close the CASE statement and make the appropriate assignments.  
         $retVal .= "        }\n";  
         if ($noteControl) {  
             $retVal .= "        $noteControl = myText;\n";  
         }  
         if ($typeControl) {  
             $retVal .= "        $typeControl = myType;\n";  
         }  
     }  
     # Terminate the change function.  
     $retVal .= "    }\n";  
     $retVal .= "</script>\n";  
773      # Return the result.      # Return the result.
774      return $retVal;      return $retVal;
775  }  }
776    
777  =head3 MatchSqlPattern  =head3 BackupAllAttributes
778    
779  C<< my $matched = CustomAttributes::MatchSqlPattern($value, $pattern); >>      my $stats = $attrDB->BackupAllAttributes($fileName, %options);
780    
781  Determine whether or not a specified value matches an SQL pattern. An SQL  Backup all of the attributes to a file. The attributes will be stored in a
782  pattern has two wild card characters: C<%> that matches multiple characters,  tab-delimited file suitable for reloading via L</LoadAttributesFrom>.
 and C<_> that matches a single character. These can be escaped using a  
 backslash (C<\>). We pull this off by converting the SQL pattern to a  
 PERL regular expression. As per SQL rules, the match is case-insensitive.  
783    
784  =over 4  =over 4
785    
786  =item value  =item fileName
787    
788  Value to be matched against the pattern. Note that an undefined or empty  Name of the file to which the attribute data should be backed up.
 value will not match anything.  
789    
790  =item pattern  =item options
791    
792  SQL pattern against which to match the value. An undefined or empty pattern will  Hash of options for the backup.
 match everything.  
793    
794  =item RETURN  =item RETURN
795    
796  Returns TRUE if the value and pattern match, else FALSE.  Returns a statistics object describing the backup.
797    
798  =back  =back
799    
800    Currently there are no options defined.
801    
802  =cut  =cut
803    
804  sub MatchSqlPattern {  sub BackupAllAttributes {
805      # Get the parameters.      # Get the parameters.
806      my ($value, $pattern) = @_;      my ($self, $fileName, %options) = @_;
807      # Declare the return variable.      # Declare the return variable.
808      my $retVal;      my $retVal = Stats->new();
809      # Insure we have a pattern.      # Get a list of the keys.
810      if (! defined($pattern) || $pattern eq "") {      my %keys = map { $_->[0] => $_->[1] } $self->GetAll(['AttributeKey'],
811          $retVal = 1;                                                          "", [], ['AttributeKey(id)',
812      } else {                                                                    'AttributeKey(relationship-name)']);
813          # Break the pattern into pieces around the wildcard characters. Because we      Trace(scalar(keys %keys) . " keys found during backup.") if T(2);
814          # use parentheses in the split function's delimiter expression, we'll get      # Open the file for output.
815          # list elements for the delimiters as well as the rest of the string.      my $fh = Open(undef, ">$fileName");
816          my @pieces = split /([_%]|\\[_%])/, $pattern;      # Loop through the keys.
817          # Check some fast special cases.      for my $key (sort keys %keys) {
818          if ($pattern eq '%') {          Trace("Backing up attribute $key.") if T(3);
819              # A null pattern matches everything.          $retVal->Add(keys => 1);
820              $retVal = 1;          # Get the key's relevant relationship name.
821          } elsif (@pieces == 1) {          my $relName = $keys{$key};
822              # No wildcards, so we have a literal comparison. Note we're case-insensitive.          # Loop through this key's values.
823              $retVal = (lc($value) eq lc($pattern));          my $query = $self->Get([$relName], "$relName(from-link) = ?", [$key]);
824          } elsif (@pieces == 2 && $pieces[1] eq '%') {          my $valuesFound = 0;
825              # A wildcard at the end, so we have a substring match. This is also case-insensitive.          while (my $line = $query->Fetch()) {
826              $retVal = (lc(substr($value, 0, length($pieces[0]))) eq lc($pieces[0]));              $valuesFound++;
827          } else {              # Get this row's data.
828              # Okay, we have to do it the hard way. Convert each piece to a PERL pattern.              my ($id, $key, $subKey, $value) = $line->Values(["$relName(to-link)",
829              my $realPattern = "";                                                               "$relName(from-link)",
830              for my $piece (@pieces) {                                                               "$relName(subkey)",
831                  # Determine the type of piece.                                                               "$relName(value)"]);
832                  if ($piece eq "") {              # Check for a subkey.
833                      # Empty pieces are ignored.              if ($subKey ne '') {
834                  } elsif ($piece eq "%") {                  $key = "$key$self->{splitter}$subKey";
                     # Here we have a multi-character wildcard. Note that it can match  
                     # zero or more characters.  
                     $realPattern .= ".*"  
                 } elsif ($piece eq "_") {  
                     # Here we have a single-character wildcard.  
                     $realPattern .= ".";  
                 } elsif ($piece eq "\\%" || $piece eq "\\_") {  
                     # This is an escape sequence (which is a rare thing, actually).  
                     $realPattern .= substr($piece, 1, 1);  
                 } else {  
                     # Here we have raw text.  
                     $realPattern .= quotemeta($piece);  
                 }  
835              }              }
836              # Do the match.              # Write it to the file.
837              $retVal = ($value =~ /^$realPattern$/i ? 1 : 0);              Tracer::PutLine($fh, [$id, $key, Escape($value)]);
838          }          }
839            Trace("$valuesFound values backed up for key $key.") if T(3);
840            $retVal->Add(values => $valuesFound);
841      }      }
842        # Log the operation.
843        $self->LogOperation("Backup Data", $fileName, $retVal->Display());
844      # Return the result.      # Return the result.
845      return $retVal;      return $retVal;
846  }  }
847    
 =head3 MigrateAttributes  
848    
849  C<< CustomAttributes::MigrateAttributes($fig); >>  =head3 GetGroups
850    
851        my @groups = $attrDB->GetGroups();
852    
853    Return a list of the available groups.
854    
855    =cut
856    
857    sub GetGroups {
858        # Get the parameters.
859        my ($self) = @_;
860        # Get the groups.
861        my @retVal = $self->GetFlat(['AttributeGroup'], "", [], 'AttributeGroup(id)');
862        # Return them.
863        return @retVal;
864    }
865    
866    =head3 GetAttributeData
867    
868        my %keys = $attrDB->GetAttributeData($type, @list);
869    
870  Migrate all the attributes data from the specified FIG instance. This is a long, slow  Return attribute data for the selected attributes. The attribute
871  method used to convert the old attribute data to the new system. Only attribute  data is a hash mapping each attribute key name to a n-tuple containing the
872  keys that are not already in the database will be loaded, and only for entity instances  data type, the description, the table name, and the groups.
 current in the database. To get an accurate capture of the attributes in the given  
 instance, you may want to clear the database and the DBD before starting and  
 run L</Refresh> to populate the entities.  
873    
874  =over 4  =over 4
875    
876  =item fig  =item type
877    
878    Type of attribute criterion: C<name> for attributes whose names begin with the
879    specified string, or C<group> for attributes in the specified group.
880    
881    =item list
882    
883    List containing the names of the groups or keys for the desired attributes.
884    
885    =item RETURN
886    
887  A FIG object that can be used to retrieve attributes for migration purposes.  Returns a hash mapping each attribute key name to its description,
888    table name, and parent groups.
889    
890  =back  =back
891    
892  =cut  =cut
893    
894  sub MigrateAttributes {  sub GetAttributeData {
895      # Get the parameters.      # Get the parameters.
896      my ($fig) = @_;      my ($self, $type, @list) = @_;
897      # Get a list of the objects to migrate. This requires connecting. Note we      # Set up a hash to store the attribute data.
898      # will map each entity type to a file name. The file will contain a list      my %retVal = ();
899      # of the object's IDs so we can get to them when we're not connected to      # Loop through the list items.
900      # the database.      for my $item (@list) {
901      my $ca = CustomAttributes->new();          # Set up a query for the desired attributes.
902      my %objects = map { $_ => "$FIG_Config::temp/$_.keys.tbl" } $ca->GetEntityTypes();          my $query;
903      # Set up hash of the existing attribute keys for each entity type.          if ($type eq 'name') {
904      my %oldKeys = ();              # Here we're doing a generic name search. We need to escape it and then tack
905      # Finally, we have a hash that counts the IDs for each entity type.              # on a %.
906      my %idCounts = map { $_ => 0 } keys %objects;              my $parm = $item;
907      # Loop through the list, creating key files to read back in.              $parm =~ s/_/\\_/g;
908      for my $entityType (keys %objects) {              $parm =~ s/%/\\%/g;
909          Trace("Retrieving keys for $entityType.") if T(2);              $parm .= "%";
910          # Create the key file.              # Ask for matching attributes. (Note that if the user passed in a null string
911          my $idFile = Open(undef, ">$objects{$entityType}");              # he'll get everything.)
912          # Loop through the keys.              $query = $self->Get(['AttributeKey'], "AttributeKey(id) LIKE ?", [$parm]);
913          my @ids = $ca->GetFlat([$entityType], "", [], "$entityType(id)");          } elsif ($type eq 'group') {
914          for my $id (@ids) {              $query = $self->Get(['IsInGroup', 'AttributeKey'], "IsInGroup(to-link) = ?", [$item]);
915              print $idFile "$id\n";          } else {
916          }              Confess("Unknown attribute query type \"$type\".");
917          close $idFile;          }
918          # In addition to the key file, we must get a list of attributes already          while (my $row = $query->Fetch()) {
919          # in the database. This avoids a circularity problem that might occur if the $fig              # Get this attribute's data.
920          # object is retrieving from the custom attributes database already.              my ($key, $relName, $notes) = $row->Values(['AttributeKey(id)',
921          my %fields = $ca->GetSecondaryFields($entityType);                                                       'AttributeKey(relationship-name)',
922          $oldKeys{$entityType} = \%fields;                                                       'AttributeKey(description)']);
923          # Finally, we have the ID count.              # If it's new, get its groups and add it to the return hash.
924          $idCounts{$entityType} = scalar @ids;              if (! exists $retVal{$key}) {
925      }                  my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(from-link) = ?",
926      # Release the custom attributes database so we can add attributes.                                              [$key], 'IsInGroup(to-link)');
927      undef $ca;                  $retVal{$key} = [$relName, $notes, @groups];
     # Loop through the objects.  
     for my $entityType (keys %objects) {  
         # Get a hash of all the attributes already in this database. These are  
         # left untouched.  
         my $myOldKeys = $oldKeys{$entityType};  
         # Create a hash to control the load file names for each attribute key we find.  
         my %keyHash = ();  
         # Set up some counters so we can trace our progress.  
         my ($totalIDs, $processedIDs, $keyCount, $valueCount) = ($idCounts{$entityType}, 0, 0, 0);  
         # Open this object's ID file.  
         Trace("Migrating data for $entityType. $totalIDs found.") if T(3);  
         my $keysIn = Open(undef, "<$objects{$entityType}");  
         while (my $id = <$keysIn>) {  
             # Remove the EOL characters.  
             chomp $id;  
             # Get this object's attributes.  
             my @allData = $fig->get_attributes($id);  
             Trace(scalar(@allData) . " attribute values found for $entityType($id).") if T(4);  
             # Loop through the attribute values one at a time.  
             for my $dataTuple (@allData) {  
                 # Get the key, value, and URL. We ignore the first element because that's the  
                 # object ID, and we already know the object ID.  
                 my (undef, $key, $value, $url) = @{$dataTuple};  
                 # Remove the buggy "1" for $url.  
                 if ($url eq "1") {  
                     $url = undef;  
                 }  
                 # Only proceed if this is not an old key.  
                 if (! $myOldKeys->{$key}) {  
                     # See if we've run into this key before.  
                     if (! exists $keyHash{$key}) {  
                         # Here we need to create the attribute key in the database.  
                         StoreAttributeKey($entityType, $key, 'text',  
                                           "Key migrated automatically from the FIG system. " .  
                                           "Please replace these notes as soon as possible " .  
                                           "with useful text."  
                                          );  
                         # Compute the attribute's load file name and open it for output.  
                         my $fileName = "$FIG_Config::temp/$entityType.$key.load.tbl";  
                         my $fh = Open(undef, ">$fileName");  
                         # Store the file name and handle.  
                         $keyHash{$key} = {h => $fh, name => $fileName};  
                         # Count this key.  
                         $keyCount++;  
                     }  
                     # Smash the value and the URL together.  
                     if (defined($url) && length($url) > 0) {  
                         $value .= "::$url";  
                     }  
                     # Write the attribute value to the load file.  
                     Tracer::PutLine($keyHash{$key}->{h}, [$id, $value]);  
                     $valueCount++;  
                 }  
             }  
             # Now we've finished all the attributes for this object. Count and trace it.  
             $processedIDs++;  
             if ($processedIDs % 500 == 0) {  
                 Trace("$processedIDs of $totalIDs ${entityType}s processed.") if T(3);  
                 Trace("$entityType has $keyCount keys and $valueCount values so far.") if T(3);  
             }  
         }  
         # Now we've finished all the attributes for all objects of this type.  
         Trace("$processedIDs ${entityType}s processed, with $keyCount keys and $valueCount values.") if T(2);  
         # Loop through the files, loading the keys into the database.  
         Trace("Connecting to database.") if T(2);  
         my $objectCA = CustomAttributes->new();  
         Trace("Loading key files.") if T(2);  
         for my $key (sort keys %keyHash) {  
             # Close the key's load file.  
             close $keyHash{$key}->{h};  
             # Reopen it for input.  
             my $fileName = $keyHash{$key}->{name};  
             my $fh = Open(undef, "<$fileName");  
             Trace("Loading $key from $fileName.") if T(3);  
             my $stats = $objectCA->LoadAttributeKey($entityType, $key, $fh, 0, 1);  
             Trace("Statistics for $key of $entityType:\n" . $stats->Show()) if T(3);  
928          }          }
         # All the keys for this entity type are now loaded.  
         Trace("Key files loaded for $entityType.") if T(2);  
929      }      }
930      # All keys for all entity types are now loaded.      }
931      Trace("Migration complete.") if T(2);      # Return the result.
932        return %retVal;
933  }  }
934    
935  =head3 ComputeObjectTypeFromID  =head3 LogOperation
936    
937  C<< my ($entityName, $id) = CustomAttributes::ComputeObjectTypeFromID($objectID); >>      $ca->LogOperation($action, $target, $description);
938    
939  This method will compute the entity type corresponding to a specified object ID.  Write an operation description to the attribute activity log (C<$FIG_Config::var/attributes.log>).
 If the object ID begins with C<fig|>, it is presumed to be a feature ID. If it  
 is all digits with a single period, it is presumed to by a genome ID. Otherwise,  
 it must be a list reference. In this last case the first list element will be  
 taken as the entity type and the second will be taken as the actual ID.  
940    
941  =over 4  =over 4
942    
943  =item objectID  =item action
944    
945  Object ID to examine.  Action being logged (e.g. C<Delete Group> or C<Load Key>).
946    
947  =item RETURN  =item target
948    
949    ID of the key or group affected.
950    
951    =item description
952    
953  Returns a 2-element list consisting of the entity type followed by the specified ID.  Short description of the action.
954    
955  =back  =back
956    
957  =cut  =cut
958    
959  sub ComputeObjectTypeFromID {  sub LogOperation {
960      # Get the parameters.      # Get the parameters.
961      my ($objectID) = @_;      my ($self, $action, $target, $description) = @_;
962      # Declare the return variables.      # Get the user ID.
963      my ($entityName, $id);      my $user = $self->{user};
964      # Only proceed if the object ID is defined. If it's not, we'll be returning a      # Get a timestamp.
965      # pair of undefs.      my $timeString = Tracer::Now();
966      if ($objectID) {      # Open the log file for appending.
967          if (ref $objectID eq 'ARRAY') {      my $oh = Open(undef, ">>$FIG_Config::var/attributes.log");
968              # Here we have the new-style list reference. Pull out its pieces.      # Write the data to it.
969              ($entityName, $id) = @{$objectID};      Tracer::PutLine($oh, [$timeString, $user, $action, $target, $description]);
970          } else {      # Close the log file.
971              # Here the ID is the outgoing ID, and we need to look at its structure      close $oh;
             # to determine the entity type.  
             $id = $objectID;  
             if ($objectID =~ /^\d+\.\d+/) {  
                 # Digits with a single period is a genome.  
                 $entityName = 'Genome';  
             } elsif ($objectID =~ /^fig\|/) {  
                 # The "fig|" prefix indicates a feature.  
                 $entityName = 'Feature';  
             } else {  
                 # Anything else is illegal!  
                 Confess("Invalid attribute ID specification \"$objectID\".");  
             }  
         }  
     }  
     # Return the result.  
     return ($entityName, $id);  
972  }  }
973    
974  =head2 FIG Method Replacements  =head2 FIG Method Replacements
975    
976  The following methods are used by B<FIG.pm> to replace the previous attribute functionality.  The following methods are used by B<FIG.pm> to replace the previous attribute functionality.
977  Some of the old functionality is no longer present. Controlled vocabulary is no longer  Some of the old functionality is no longer present: controlled vocabulary is no longer
978  supported and there is no longer any searching by URL. Fortunately, neither of these  supported and there is no longer any searching by URL. Fortunately, neither of these
979  capabilities were used in the old system.  capabilities were used in the old system.
980    
# Line 993  Line 982 
982  The idea is that these methods represent attribute manipulation allowed by all users, while  The idea is that these methods represent attribute manipulation allowed by all users, while
983  the others are only for privileged users with access to the attribute server.  the others are only for privileged users with access to the attribute server.
984    
985  In the previous implementation, an attribute had a value and a URL. In the new implementation,  In the previous implementation, an attribute had a value and a URL. In this implementation,
986  there is only a value. In this implementation, each attribute has only a value. These  each attribute has only a value. These methods will treat the value as a list with the individual
987  methods will treat the value as a list with the individual elements separated by the  elements separated by the value of the splitter parameter on the constructor (L</new>). The default
988  value of the splitter parameter on the constructor (L</new>). The default is double  is double colons C<::>.
 colons C<::>.  
989    
990  So, for example, an old-style keyword with a /value of C<essential> and a URL of  So, for example, an old-style keyword with a value of C<essential> and a URL of
991  C<http://www.sciencemag.org/cgi/content/abstract/293/5538/2266> using the default  C<http://www.sciencemag.org/cgi/content/abstract/293/5538/2266> using the default
992  splitter value would be stored as  splitter value would be stored as
993    
# Line 1010  Line 998 
998    
999  =head3 GetAttributes  =head3 GetAttributes
1000    
1001  C<< my @attributeList = $attrDB->GetAttributes($objectID, $key, @valuePatterns); >>      my @attributeList = $attrDB->GetAttributes($objectID, $key, @values);
1002    
1003  In the database, attribute values are sectioned into pieces using a splitter  In the database, attribute values are sectioned into pieces using a splitter
1004  value specified in the constructor (L</new>). This is not a requirement of  value specified in the constructor (L</new>). This is not a requirement of
1005  the attribute system as a whole, merely a convenience for the purpose of  the attribute system as a whole, merely a convenience for the purpose of
1006  these methods. If you are using the static method calls instead of the  these methods. If a value has multiple sections, each section
1007  object-based calls, the splitter will always be the default value of  is matched against the corresponding criterion in the I<@valuePatterns> list.
 double colons (C<::>). If a value has multiple sections, each section  
 is matched against the correspond criterion in the I<@valuePatterns> list.  
1008    
1009  This method returns a series of tuples that match the specified criteria. Each tuple  This method returns a series of tuples that match the specified criteria. Each tuple
1010  will contain an object ID, a key, and one or more values. The parameters to this  will contain an object ID, a key, and one or more values. The parameters to this
1011  method therefore correspond structurally to the values expected in each tuple.  method therefore correspond structurally to the values expected in each tuple. In
1012    addition, you can ask for a generic search by suffixing a percent sign (C<%>) to any
1013    of the parameters. So, for example,
1014    
1015      my @attributeList = GetAttributes('fig|100226.1.peg.1004', 'structure%', 1, 2);      my @attributeList = $attrDB->GetAttributes('fig|100226.1.peg.1004', 'structure%', 1, 2);
1016    
1017  would return something like  would return something like
1018    
# Line 1033  Line 1021 
1021      ['fig}100226.1.peg.1004', 'structure2', 1, 2]      ['fig}100226.1.peg.1004', 'structure2', 1, 2]
1022      ['fig}100226.1.peg.1004', 'structureA', 1, 2]      ['fig}100226.1.peg.1004', 'structureA', 1, 2]
1023    
1024  Use of C<undef> in any position acts as a wild card (all values). In addition,  Use of C<undef> in any position acts as a wild card (all values). You can also specify
1025  the I<$key> and I<@valuePatterns> parameters can contain SQL pattern characters: C<%>, which  a list reference in the ID column. Thus,
1026  matches any sequence of characters, and C<_>, which matches any single character.  
1027  (You can use an escape sequence C<\%> or C<\_> to match an actual percent sign or      my @attributeList = $attrDB->GetAttributes(['100226.1', 'fig|100226.1.%'], 'PUBMED');
1028  underscore.)  
1029    would get the PUBMED attribute data for Streptomyces coelicolor A3(2) and all its
1030    features.
1031    
1032  In addition to values in multiple sections, a single attribute key can have multiple  In addition to values in multiple sections, a single attribute key can have multiple
1033  values, so even  values, so even
1034    
1035      my @attributeList = GetAttributes($peg, 'virulent');      my @attributeList = $attrDB->GetAttributes($peg, 'virulent');
1036    
1037  which has no wildcard in the key or the object ID, may return multiple tuples.  which has no wildcard in the key or the object ID, may return multiple tuples.
1038    
1039  For reasons of backward compatability, we examine the structure of the object ID to  Value matching in this system works very poorly, because of the way multiple values are
1040  determine the entity type. In that case the only two types allowed are C<Genome> and  stored. For the object ID, key name, and first value, we create queries that filter for the
1041  C<Feature>. An alternative method is to use a list reference, with the list consisting  desired results. On any filtering by value, we must do a comparison after the attributes are
1042  of an entity type name and the actual ID. Thus, the above example could equivalently  retrieved from the database, since the database has no notion of the multiple values, which
1043  be written as  are stored in a single string. As a result, queries in which filter only on value end up
1044    reading a lot more than they need to.
     my @attributeList = GetAttributes([Feature => $peg], 'virulent');  
   
 The list-reference approach allows us to add attributes to other entity types in  
 the future. Doing so, however, will require modifying the L</Refresh> method and  
 updated the database design XML.  
   
 The list-reference approach also allows for a more fault-tolerant approach to  
 getting all objects with a particular attribute.  
   
     my @attributeList = GetAttributes([Feature => undef], 'virulent');  
   
 will only return feature attributes, while  
   
     my @attributeList = GetAttributes(undef, 'virulent');  
   
 could at some point in the future get you attributes for genomes or even subsystems  
 as well as features.  
1045    
1046  =over 4  =over 4
1047    
1048  =item objectID  =item objectID
1049    
1050  ID of the genome or feature whose attributes are desired. In general, an ID that  ID of object whose attributes are desired. If the attributes are desired for multiple
1051  starts with C<fig|> is treated as a feature ID, and an ID that is all digits with a  objects, this parameter can be specified as a list reference. If the attributes are
1052  single period is treated as a genome ID. For other entity types, use a list reference; in  desired for all objects, specify C<undef> or an empty string. Finally, you can specify
1053  this case the first list element is the entity type and the second is the ID. A value of  attributes for a range of object IDs by putting a percent sign (C<%>) at the end.
 C<undef> or an empty string here will match all objects.  
1054    
1055  =item key  =item key
1056    
1057  Attribute key name. Since attributes are stored as fields in the database with a  Attribute key name. A value of C<undef> or an empty string will match all
1058  field name equal to the key name, it is very fast to find a list of all the  attribute keys. If the values are desired for multiple keys, this parameter can be
1059  matching keys. Each key's values require a separate query, however, which may  specified as a list reference. Finally, you can specify attributes for a range of
1060  be a performance problem if the pattern matches a lot of keys. Wild cards are  keys by putting a percent sign (C<%>) at the end.
 acceptable here, and a value of C<undef> or an empty string will match all  
 attribute keys.  
1061    
1062  =item valuePatterns  =item values
1063    
1064  List of the desired attribute values, section by section. If C<undef>  List of the desired attribute values, section by section. If C<undef>
1065  or an empty string is specified, all values in that section will match.  or an empty string is specified, all values in that section will match. A
1066    generic match can be requested by placing a percent sign (C<%>) at the end.
1067    In that case, all values that match up to and not including the percent sign
1068    will match. You may also specify a regular expression enclosed
1069    in slashes. All values that match the regular expression will be returned. For
1070    performance reasons, only values have this extra capability.
1071    
1072  =item RETURN  =item RETURN
1073    
# Line 1107  Line 1082 
1082    
1083  sub GetAttributes {  sub GetAttributes {
1084      # Get the parameters.      # Get the parameters.
1085      my ($self, $objectID, $key, @valuePatterns) = @_;      my ($self, $objectID, $key, @values) = @_;
1086      # Declare the return variable.      # Declare the return variable.
1087      my @retVal = ();      my @retVal = ();
1088      # Determine the entity types for our search.      # Insure we have at least some sort of filtering going on.
1089      my @objects = ();      if (! grep { defined $_ } $objectID, $key, @values) {
1090      my ($actualObjectID, $computedType);          Confess("No filters specified in GetAttributes call.");
1091      if (! $objectID) {      } else {
1092          push @objects, $self->GetEntityTypes();          # This hash will map value-table fields to patterns. We use it to build the
1093      } else {          # SQL statement.
1094          ($computedType, $actualObjectID) = ComputeObjectTypeFromID($objectID);          my %data;
1095          push @objects, $computedType;          # Add the object ID to the key information.
1096      }          $data{'to-link'} = $objectID;
1097      # Loop through the entity types.          # The first value represents a problem, because we can search it using SQL, but not
1098      for my $entityType (@objects) {          # in the normal way. If the user specifies a generic search or exact match for
1099          # Now we need to find all the matching keys. The keys are actually stored in          # every alternative value (remember, the values may be specified as a list),
1100          # our database object, so this process is fast. Note that our          # then we can create SQL filtering for it. If any of the values are specified
1101          # MatchSqlPattern method          # as a regular expression, however, that's more complicated, because
1102          my %secondaries = $self->GetSecondaryFields($entityType);          # we need to read every value to verify a match.
1103          my @fieldList = grep { MatchSqlPattern($_, $key) } keys %secondaries;          if (@values > 0) {
1104          # Now we figure out whether or not we need to filter by object. We will always              # Get the first value and put its alternatives in an array.
1105          # filter by key to a limited extent, so if we're filtering by object we need an              my $valueParm = $values[0];
1106          # AND to join the object ID filter with the key filter.              my @valueList;
1107          my $filter = "";              if (ref $valueParm eq 'ARRAY') {
1108          my @params = ();                  @valueList = @{$valueParm};
1109          if (defined($actualObjectID)) {              } else {
1110              # Here the caller wants to filter on object ID. Check for a pattern.                  @valueList = ($valueParm);
1111              my $comparator = ($actualObjectID =~ /%/ ? "LIKE" : "=");              }
1112              # Update the filter and the parameter list.              # Okay, now we have all the possible criteria for the first value in the list
1113              $filter = "$entityType(id) $comparator ? AND ";              # @valueList. We'll copy the values to a new array in which they have been
1114              push @params, $actualObjectID;              # converted to generic requests. If we find a regular-expression match
1115          }              # anywhere in the list, we toss the whole thing.
1116          # It's time to begin making queries. We process one attribute key at a time, because              my @valuePatterns = ();
1117          # each attribute is actually a different field in the database. We know here that              my $okValues = 1;
1118          # all the keys we've collected are for the correct entity because we got them from              for my $valuePattern (@valueList) {
1119          # the DBD. That's a good thing, because an invalid key name will cause an SQL error.                  # Check the pattern type.
1120          for my $key (@fieldList) {                  if (substr($valuePattern, 0, 1) eq '/') {
1121              # Get all of the attribute values for this key.                      # Regular expressions invalidate the entire process.
1122              my @dataRows = $self->GetAll([$entityType], "$filter$entityType($key) IS NOT NULL",                      $okValues = 0;
1123                                           \@params, ["$entityType(id)", "$entityType($key)"]);                  } elsif (substr($valuePattern, -1, 1) eq '%') {
1124              # Process each value separately. We need to verify the values and reformat the                      # A Generic pattern is passed in unmodified.
1125              # tuples. Note that GetAll will give us one row per matching object ID,                      push @valuePatterns, $valuePattern;
1126              # with the ID first followed by a list of the data values. This is very                  } else {
1127              # different from the structure we'll be returning, which has one row                      # An exact match is converted to generic.
1128              # per value.                      push @valuePatterns, "$valuePattern%";
1129              for my $dataRow (@dataRows) {                  }
1130                  # Get the object ID and the list of values.              }
1131                  my ($rowObjectID, @dataValues) = @{$dataRow};              # If everything works, add the value data to the filtering hash.
1132                  # Loop through the values. There will be one result row per attribute value.              if ($okValues) {
1133                  for my $dataValue (@dataValues) {                  $data{value} = \@valuePatterns;
1134                      # Separate this value into sections.              }
1135                      my @sections = split("::", $dataValue);          }
1136                      # Loop through the value patterns, looking for a mismatch. Note that          # Now comes the really tricky part, which is key handling. The key is
1137                      # since we're working through parallel arrays, we are using an index          # actually split in two parts: the real key and a sub-key. The real key
1138                      # loop. As soon as a match fails we stop checking. This means that          # determines which value table contains the relevant values. The information
1139                      # if the value pattern list is longer than the number of sections,          # we need is kept in here.
1140                      # we will fail as soon as we run out of sections.          my %tables = map { $_ => [] } $self->_GetAllTables();
1141                      my $match = 1;          # See if we have any key filtering to worry about.
1142                      for (my $i = 0; $i <= $#valuePatterns && $match; $i++) {          if ($key) {
1143                          $match = MatchSqlPattern($sections[$i], $valuePatterns[$i]);              # Here we have either a single key or a list. We convert both cases to a list.
1144                      }              my $keyList = (ref $key ne 'ARRAY' ? [$key] : $key);
1145                      # If we match, we save this value in the output list.              # Get easy access to the key/table hash.
1146                      if ($match) {              my $keyTableHash = $self->_KeyTable();
1147                          push @retVal, [$rowObjectID, $key, @sections];              # Loop through the keys, discovering tables.
1148                      }              for my $keyChoice (@$keyList) {
1149                  }                  # Now we have to start thinking about the real key and the subkeys.
1150                  # Here we've processed all the attribute values for the current object ID.                  my ($realKey, $subKey) = $self->_SplitKeyPattern($keyChoice);
1151                    # Find the matches for the real key in the key hash. For each of
1152                    # these, we memorize the table name in the hash below.
1153                    my %tableNames = ();
1154                    for my $keyInTable (keys %{$keyTableHash}) {
1155                        if ($self->_CheckSQLPattern($realKey, $keyInTable)) {
1156                            $tableNames{$keyTableHash->{$key}} = 1;
1157                        }
1158                    }
1159                    # If the key is generic, or didn't match anything, add
1160                    # the default table to the mix.
1161                    if (keys %tableNames == 0 || $keyChoice =~ /%/) {
1162                        $tableNames{$self->{defaultRel}} = 1;
1163                    }
1164                    # Now we add this key combination to the key list for each relevant table.
1165                    for my $tableName (keys %tableNames) {
1166                        push @{$tables{$tableName}}, [$realKey, $subKey];
1167                    }
1168                }
1169            }
1170            # Now we loop through the tables of interest, performing queries.
1171            # Loop through the tables.
1172            for my $table (keys %tables) {
1173                # Get the key pairs for this table.
1174                my $pairs = $tables{$table};
1175                # Does this table have data? It does if there is no key specified or
1176                # it has at least one key pair.
1177                my $pairCount = scalar @{$pairs};
1178                Trace("Pair count for table $table is $pairCount.") if T(3);
1179                if ($pairCount || ! $key) {
1180                    # Create some lists to contain the filter fragments and parameter values.
1181                    my @filter = ();
1182                    my @parms = ();
1183                    # This next loop goes through the different fields that can be specified in the
1184                    # parameter list and generates filters for each. The %data hash that we built above
1185                    # contains most of the necessary information to do this. When we're done, we'll
1186                    # paste on stuff for the key pairs.
1187                    for my $field (keys %data) {
1188                        # Accumulate filter information for this field. We will OR together all the
1189                        # elements accumulated to create the final result.
1190                        my @fieldFilter = ();
1191                        # Get the specified filter for this field.
1192                        my $fieldPattern = $data{$field};
1193                        # Only proceed if the pattern is one that won't match everything.
1194                        if (defined($fieldPattern) && $fieldPattern ne "" && $fieldPattern ne "%") {
1195                            # Convert the pattern to an array.
1196                            my @patterns = ();
1197                            if (ref $fieldPattern eq 'ARRAY') {
1198                                push @patterns, @{$fieldPattern};
1199                            } else {
1200                                push @patterns, $fieldPattern;
1201                            }
1202                            # Only proceed if the array is nonempty. The loop will work fine if the
1203                            # array is empty, but when we build the filter string at the end we'll
1204                            # get "()" in the filter list, which will result in an SQL syntax error.
1205                            if (@patterns) {
1206                                # Loop through the individual patterns.
1207                                for my $pattern (@patterns) {
1208                                    my ($clause, $value) = _WherePart($table, $field, $pattern);
1209                                    push @fieldFilter, $clause;
1210                                    push @parms, $value;
1211                                }
1212                                # Form the filter for this field.
1213                                my $fieldFilterString = join(" OR ", @fieldFilter);
1214                                push @filter, "($fieldFilterString)";
1215                            }
1216                        }
1217                    }
1218                    # The final filter is for the key pairs. Only proceed if we have some.
1219                    if ($pairCount) {
1220                        # We'll accumulate pair filter clauses in here.
1221                        my @pairFilters = ();
1222                        # Loop through the key pairs.
1223                        for my $pair (@$pairs) {
1224                            my ($realKey, $subKey) = @{$pair};
1225                            my ($realClause, $realValue) = _WherePart($table, 'from-link', $realKey);
1226                            if (! $subKey) {
1227                                # Here the subkey is wild, so only the real key matters.
1228                                push @pairFilters, $realClause;
1229                                push @parms, $realValue;
1230                            } else {
1231                                # Here we have to select on both keys.
1232                                my ($subClause, $subValue) = _WherePart($table, 'subkey', $subKey);
1233                                push @pairFilters, "($realClause AND $subClause)";
1234                                push @parms, $realValue, $subValue;
1235                            }
1236                        }
1237                        # Join the pair filters together to make a giant key filter.
1238                        my $pairFilter = "(" . join(" OR ", @pairFilters) . ")";
1239                        push @filter, $pairFilter;
1240                    }
1241                    # At this point, @filter contains one or more filter strings and @parms
1242                    # contains the parameter values to bind to them.
1243                    my $actualFilter = join(" AND ", @filter);
1244                    # Now we're ready to make our query.
1245                    my $query = $self->Get([$table], $actualFilter, \@parms);
1246                    # Format the results.
1247                    push @retVal, $self->_QueryResults($query, $table, @values);
1248              }              }
             # Here we've processed all the rows returned by GetAll. In general, there will  
             # be one row per object ID.  
1249          }          }
         # Here we've processed all the matching attribute keys.  
1250      }      }
1251      # Here we've processed all the entity types. That means @retVal has all the matching      # The above loop ran the query for each necessary value table and merged the
1252      # results.      # results into @retVal. Now we return the rows found.
1253      return @retVal;      return @retVal;
1254  }  }
1255    
1256  =head3 AddAttribute  =head3 AddAttribute
1257    
1258  C<< $attrDB->AddAttribute($objectID, $key, @values); >>      $attrDB->AddAttribute($objectID, $key, @values);
1259    
1260  Add an attribute key/value pair to an object. This method cannot add a new key, merely  Add an attribute key/value pair to an object. This method cannot add a new key, merely
1261  add a value to an existing key. Use L</StoreAttributeKey> to create a new key.  add a value to an existing key. Use L</StoreAttributeKey> to create a new key.
# Line 1195  Line 1264 
1264    
1265  =item objectID  =item objectID
1266    
1267  ID of the genome or feature to which the attribute is to be added. In general, an ID that  ID of the object to which the attribute is to be added.
 starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods  
 is treated as a genome ID. For IDs of other types, this parameter should be a reference  
 to a 2-tuple consisting of the entity type name followed by the object ID.  
1268    
1269  =item key  =item key
1270    
1271  Attribute key name. This corresponds to the name of a field in the database.  Attribute key name.
1272    
1273  =item values  =item values
1274    
# Line 1225  Line 1291 
1291      } elsif (! @values) {      } elsif (! @values) {
1292          Confess("No values specified in AddAttribute call for key $key.");          Confess("No values specified in AddAttribute call for key $key.");
1293      } else {      } else {
1294          # Okay, now we have some reason to believe we can do this. Start by          # Okay, now we have some reason to believe we can do this. Form the values
1295          # computing the object type and ID.          # into a scalar.
         my ($entityName, $id) = ComputeObjectTypeFromID($objectID);  
         # Form the values into a scalar.  
1296          my $valueString = join($self->{splitter}, @values);          my $valueString = join($self->{splitter}, @values);
1297          # Insert the value.          # Split up the key.
1298          $self->InsertValue($id, "$entityName($key)", $valueString);          my ($realKey, $subKey) = $self->SplitKey($key);
1299            # Find the table containing the key.
1300            my $table = $self->_KeyTable($realKey);
1301            # Connect the object to the key.
1302            $self->InsertObject($table, { 'from-link' => $realKey,
1303                                                 'to-link'   => $objectID,
1304                                                 'subkey'    => $subKey,
1305                                                 'value'     => $valueString,
1306                                           });
1307      }      }
1308      # Return a one. We do this for backward compatability.      # Return a one, indicating success. We do this for backward compatability.
1309      return 1;      return 1;
1310  }  }
1311    
1312  =head3 DeleteAttribute  =head3 DeleteAttribute
1313    
1314  C<< $attrDB->DeleteAttribute($objectID, $key, @values); >>      $attrDB->DeleteAttribute($objectID, $key, @values);
1315    
1316  Delete the specified attribute key/value combination from the database.  Delete the specified attribute key/value combination from the database.
1317    
 The first form will connect to the database and release it. The second form  
 uses the database connection contained in the object.  
   
1318  =over 4  =over 4
1319    
1320  =item objectID  =item objectID
1321    
1322  ID of the genome or feature to which the attribute is to be added. In general, an ID that  ID of the object whose attribute is to be deleted.
 starts with C<fig|> is treated as a feature ID, and an ID that is all digits and periods  
 is treated as a genome ID. For IDs of other types, this parameter should be a reference  
 to a 2-tuple consisting of the entity type name followed by the object ID.  
1323    
1324  =item key  =item key
1325    
1326  Attribute key name. This corresponds to the name of a field in the database.  Attribute key name.
1327    
1328  =item values  =item values
1329    
1330  One or more values to be associated with the key.  One or more values associated with the key. If no values are specified, then all values
1331    will be deleted. Otherwise, only a matching value will be deleted.
1332    
1333  =back  =back
1334    
# Line 1275  Line 1342 
1342          Confess("No object ID specified for DeleteAttribute call.");          Confess("No object ID specified for DeleteAttribute call.");
1343      } elsif (! defined($key)) {      } elsif (! defined($key)) {
1344          Confess("No attribute key specified for DeleteAttribute call.");          Confess("No attribute key specified for DeleteAttribute call.");
     } elsif (! @values) {  
         Confess("No values specified in DeleteAttribute call for key $key.");  
1345      } else {      } else {
1346          # Now compute the object type and ID.          # Split the key into the real key and the subkey.
1347          my ($entityName, $id) = ComputeObjectTypeFromID($objectID);          my ($realKey, $subKey) = $self->SplitKey($key);
1348          # Form the values into a scalar.          # Find the table containing the key's values.
1349            my $table = $self->_KeyTable($realKey);
1350            if ($subKey eq '' && scalar(@values) == 0) {
1351                # Here we erase the entire key for this object.
1352                $self->DeleteRow('HasValueFor', $key, $objectID);
1353            } else {
1354                # Here we erase the matching values.
1355          my $valueString = join($self->{splitter}, @values);          my $valueString = join($self->{splitter}, @values);
1356          # Delete the value.              $self->DeleteRow('HasValueFor', $realKey, $objectID,
1357          $self->DeleteValue($entityName, $id, $key, $valueString);                               { subkey => $subKey, value => $valueString });
1358            }
1359      }      }
1360      # Return a one. This is for backward compatability.      # Return a one. This is for backward compatability.
1361      return 1;      return 1;
1362  }  }
1363    
1364    =head3 DeleteMatchingAttributes
1365    
1366        my @deleted = $attrDB->DeleteMatchingAttributes($objectID, $key, @values);
1367    
1368    Delete all attributes that match the specified criteria. This is equivalent to
1369    calling L</GetAttributes> and then invoking L</DeleteAttribute> for each
1370    row found.
1371    
1372    =over 4
1373    
1374    =item objectID
1375    
1376    ID of object whose attributes are to be deleted. If the attributes for multiple
1377    objects are to be deleted, this parameter can be specified as a list reference. If
1378    attributes are to be deleted for all objects, specify C<undef> or an empty string.
1379    Finally, you can delete attributes for a range of object IDs by putting a percent
1380    sign (C<%>) at the end.
1381    
1382    =item key
1383    
1384    Attribute key name. A value of C<undef> or an empty string will match all
1385    attribute keys. If the values are to be deletedfor multiple keys, this parameter can be
1386    specified as a list reference. Finally, you can delete attributes for a range of
1387    keys by putting a percent sign (C<%>) at the end.
1388    
1389    =item values
1390    
1391    List of the desired attribute values, section by section. If C<undef>
1392    or an empty string is specified, all values in that section will match. A
1393    generic match can be requested by placing a percent sign (C<%>) at the end.
1394    In that case, all values that match up to and not including the percent sign
1395    will match. You may also specify a regular expression enclosed
1396    in slashes. All values that match the regular expression will be deleted. For
1397    performance reasons, only values have this extra capability.
1398    
1399    =item RETURN
1400    
1401    Returns a list of tuples for the attributes that were deleted, in the
1402    same form as L</GetAttributes>.
1403    
1404    =back
1405    
1406    =cut
1407    
1408    sub DeleteMatchingAttributes {
1409        # Get the parameters.
1410        my ($self, $objectID, $key, @values) = @_;
1411        # Get the matching attributes.
1412        my @retVal = $self->GetAttributes($objectID, $key, @values);
1413        # Loop through the attributes, deleting them.
1414        for my $tuple (@retVal) {
1415            $self->DeleteAttribute(@{$tuple});
1416        }
1417        # Log this operation.
1418        my $count = @retVal;
1419        $self->LogOperation("Mass Delete", $key, "$count matching attributes deleted.");
1420        # Return the deleted attributes.
1421        return @retVal;
1422    }
1423    
1424  =head3 ChangeAttribute  =head3 ChangeAttribute
1425    
1426  C<< $attrDB->ChangeAttribute($objectID, $key, \@oldValues, \@newValues); >>      $attrDB->ChangeAttribute($objectID, $key, \@oldValues, \@newValues);
1427    
1428  Change the value of an attribute key/value pair for an object.  Change the value of an attribute key/value pair for an object.
1429    
# Line 1333  Line 1465 
1465      } elsif (! defined($newValues) || ref $newValues ne 'ARRAY') {      } elsif (! defined($newValues) || ref $newValues ne 'ARRAY') {
1466          Confess("No new values specified in ChangeAttribute call for key $key.");          Confess("No new values specified in ChangeAttribute call for key $key.");
1467      } else {      } else {
1468          # Okay, now we do the change as a delete/add.          # We do the change as a delete/add.
1469          $self->DeleteAttribute($objectID, $key, @{$oldValues});          $self->DeleteAttribute($objectID, $key, @{$oldValues});
1470          $self->AddAttribute($objectID, $key, @{$newValues});          $self->AddAttribute($objectID, $key, @{$newValues});
1471      }      }
# Line 1343  Line 1475 
1475    
1476  =head3 EraseAttribute  =head3 EraseAttribute
1477    
1478  C<< $attrDB->EraseAttribute($entityName, $key); >>      $attrDB->EraseAttribute($key);
1479    
1480  Erase all values for the specified attribute key. This does not remove the  Erase all values for the specified attribute key. This does not remove the
1481  key from the database; it merely removes all the values.  key from the database; it merely removes all the values.
1482    
1483  =over 4  =over 4
1484    
 =item entityName  
   
 Name of the entity to which the key belongs. If undefined, all entities will be  
 examined for the desired key.  
   
1485  =item key  =item key
1486    
1487  Key to erase.  Key to erase. This must be a real key; that is, it cannot have a subkey
1488    component.
1489    
1490  =back  =back
1491    
# Line 1365  Line 1493 
1493    
1494  sub EraseAttribute {  sub EraseAttribute {
1495      # Get the parameters.      # Get the parameters.
1496      my ($self, $entityName, $key) = @_;      my ($self, $key) = @_;
1497      # Determine the relevant entity types.      # Find the table containing the key.
1498      my @objects = ();      my $table = $self->_KeyTable($key);
1499      if (! $entityName) {      # Is it the default table?
1500          push @objects, $self->GetEntityTypes();      if ($table eq $self->{defaultRel}) {
1501      } else {          # Yes, so the key is mixed in with other keys.
1502          push @objects, $entityName;          # Delete everything connected to it.
1503      }          $self->Disconnect('HasValueFor', 'AttributeKey', $key);
1504      # Loop through the entity types.      } else {
1505      for my $entityType (@objects) {          # No. Drop and re-create the table.
1506          # Now check for this key in this entity.          $self->TruncateTable($table);
         my %secondaries = $self->GetSecondaryFields($entityType);  
         if (exists $secondaries{$key}) {  
             # We found it, so delete all the values of the key.  
             $self->DeleteValue($entityType, undef, $key);  
         }  
1507      }      }
1508        # Log the operation.
1509        $self->LogOperation("Erase Data", $key);
1510      # Return a 1, for backward compatability.      # Return a 1, for backward compatability.
1511      return 1;      return 1;
1512  }  }
1513    
1514  =head3 GetAttributeKeys  =head3 GetAttributeKeys
1515    
1516  C<< my @keyList = $attrDB->GetAttributeKeys($entityName); >>      my @keyList = $attrDB->GetAttributeKeys($groupName);
1517    
1518  Return a list of the attribute keys for a particular entity type.  Return a list of the attribute keys for a particular group.
1519    
1520  =over 4  =over 4
1521    
1522  =item entityName  =item groupName
1523    
1524  Name of the entity whose keys are desired.  Name of the group whose keys are desired.
1525    
1526  =item RETURN  =item RETURN
1527    
1528  Returns a list of the attribute keys for the specified entity.  Returns a list of the attribute keys for the specified group.
1529    
1530  =back  =back
1531    
# Line 1408  Line 1533 
1533    
1534  sub GetAttributeKeys {  sub GetAttributeKeys {
1535      # Get the parameters.      # Get the parameters.
1536      my ($self, $entityName) = @_;      my ($self, $groupName) = @_;
1537      # Get the entity's secondary fields.      # Get the attributes for the specified group.
1538      my %keyList = $self->GetSecondaryFields($entityName);      my @groups = $self->GetFlat(['IsInGroup'], "IsInGroup(to-link) = ?", [$groupName],
1539                                    'IsInGroup(from-link)');
1540      # Return the keys.      # Return the keys.
1541      return sort keys %keyList;      return sort @groups;
1542    }
1543    
1544    =head3 QueryAttributes
1545    
1546        my @attributeData = $ca->QueryAttributes($filter, $filterParms);
1547    
1548    Return the attribute data based on an SQL filter clause. In the filter clause,
1549    the name C<$object> should be used for the object ID, C<$key> should be used for
1550    the key name, C<$subkey> for the subkey value, and C<$value> for the value field.
1551    
1552    =over 4
1553    
1554    =item filter
1555    
1556    Filter clause in the standard ERDB format, except that the field names are C<$object> for
1557    the object ID field, C<$key> for the key name field, C<$subkey> for the subkey field,
1558    and C<$value> for the value field. This abstraction enables us to hide the details of
1559    the database construction from the user.
1560    
1561    =item filterParms
1562    
1563    Parameters for the filter clause.
1564    
1565    =item RETURN
1566    
1567    Returns a list of tuples. Each tuple consists of an object ID, a key (with optional subkey), and
1568    one or more attribute values.
1569    
1570    =back
1571    
1572    =cut
1573    
1574    # This hash is used to drive the substitution process.
1575    my %AttributeParms = (object => 'to-link',
1576                          key    => 'from-link',
1577                          subkey => 'subkey',
1578                          value  => 'value');
1579    
1580    sub QueryAttributes {
1581        # Get the parameters.
1582        my ($self, $filter, $filterParms) = @_;
1583        # Declare the return variable.
1584        my @retVal = ();
1585        # Make sue we have filter parameters.
1586        my $realParms = (defined($filterParms) ? $filterParms : []);
1587        # Loop through all the value tables.
1588        for my $table ($self->_GetAllTables()) {
1589            # Create the query for this table by converting the filter.
1590            my $realFilter = $filter;
1591            for my $name (keys %AttributeParms) {
1592                $realFilter =~ s/\$$name/$table($AttributeParms{$name})/g;
1593            }
1594            my $query = $self->Get([$table], $realFilter, $realParms);
1595            # Loop through the results, forming the output attribute tuples.
1596            while (my $result = $query->Fetch()) {
1597                # Get the four values from this query result row.
1598                my ($objectID, $key, $subkey, $value) = $result->Values(["$table($AttributeParms{object})",
1599                                                                        "$table($AttributeParms{key})",
1600                                                                        "$table($AttributeParms{subkey})",
1601                                                                        "$table($AttributeParms{value})"]);
1602                # Combine the key and the subkey.
1603                my $realKey = ($subkey ? $key . $self->{splitter} . $subkey : $key);
1604                # Split the value.
1605                my @values = split $self->{splitter}, $value;
1606                # Output the result.
1607                push @retVal, [$objectID, $realKey, @values];
1608            }
1609        }
1610        # Return the result.
1611        return @retVal;
1612    }
1613    
1614    =head2 Key and ID Manipulation Methods
1615    
1616    =head3 ParseID
1617    
1618        my ($type, $id) = CustomAttributes::ParseID($idValue);
1619    
1620    Determine the type and object ID corresponding to an ID value from the attribute database.
1621    Most ID values consist of a type name and an ID, separated by a colon (e.g. C<Family:aclame|cluster10>);
1622    however, Genomes, Features, and Subsystems are not stored with a type name, so we need to
1623    deduce the type from the ID value structure.
1624    
1625    The theory here is that you can plug the ID and type directly into a Sprout database method, as
1626    follows
1627    
1628        my ($type, $id) = CustomAttributes::ParseID($attrList[$num]->[0]);
1629        my $target = $sprout->GetEntity($type, $id);
1630    
1631    =over 4
1632    
1633    =item idValue
1634    
1635    ID value taken from the attribute database.
1636    
1637    =item RETURN
1638    
1639    Returns a two-element list. The first element is the type of object indicated by the ID value,
1640    and the second element is the actual object ID.
1641    
1642    =back
1643    
1644    =cut
1645    
1646    sub ParseID {
1647        # Get the parameters.
1648        my ($idValue) = @_;
1649        # Declare the return variables.
1650        my ($type, $id);
1651        # Parse the incoming ID. We first check for the presence of an entity name. Entity names
1652        # can only contain letters, which helps to insure typed object IDs don't collide with
1653        # subsystem names (which are untyped).
1654        if ($idValue =~ /^([A-Za-z]+):(.+)/) {
1655            # Here we have a typed ID.
1656            ($type, $id) = ($1, $2);
1657            # Fix the case sensitivity on PDB IDs.
1658            if ($type eq 'PDB') { $id = lc $id; }
1659        } elsif ($idValue =~ /fig\|/) {
1660            # Here we have a feature ID.
1661            ($type, $id) = (Feature => $idValue);
1662        } elsif ($idValue =~ /\d+\.\d+/) {
1663            # Here we have a genome ID.
1664            ($type, $id) = (Genome => $idValue);
1665        } else {
1666            # The default is a subsystem ID.
1667            ($type, $id) = (Subsystem => $idValue);
1668        }
1669        # Return the results.
1670        return ($type, $id);
1671    }
1672    
1673    =head3 FormID
1674    
1675        my $idValue = CustomAttributes::FormID($type, $id);
1676    
1677    Convert an object type and ID pair into an object ID string for the attribute system. Subsystems,
1678    genomes, and features are stored in the database without type information, but all other object IDs
1679    must be prefixed with the object type.
1680    
1681    =over 4
1682    
1683    =item type
1684    
1685    Relevant object type.
1686    
1687    =item id
1688    
1689    ID of the object in question.
1690    
1691    =item RETURN
1692    
1693    Returns a string that will be recognized as an object ID in the attribute database.
1694    
1695    =back
1696    
1697    =cut
1698    
1699    sub FormID {
1700        # Get the parameters.
1701        my ($type, $id) = @_;
1702        # Declare the return variable.
1703        my $retVal;
1704        # Compute the ID string from the type.
1705        if (grep { $type eq $_ } qw(Feature Genome Subsystem)) {
1706            $retVal = $id;
1707        } else {
1708            $retVal = "$type:$id";
1709        }
1710        # Return the result.
1711        return $retVal;
1712    }
1713    
1714    =head3 GetTargetObject
1715    
1716        my $object = CustomAttributes::GetTargetObject($erdb, $idValue);
1717    
1718    Return the database object corresponding to the specified attribute object ID. The
1719    object type associated with the ID value must correspond to an entity name in the
1720    specified database.
1721    
1722    =over 4
1723    
1724    =item erdb
1725    
1726    B<ERDB> object for accessing the target database.
1727    
1728    =item idValue
1729    
1730    ID value retrieved from the attribute database.
1731    
1732    =item RETURN
1733    
1734    Returns a B<ERDBObject> for the attribute value's target object.
1735    
1736    =back
1737    
1738    =cut
1739    
1740    sub GetTargetObject {
1741        # Get the parameters.
1742        my ($erdb, $idValue) = @_;
1743        # Declare the return variable.
1744        my $retVal;
1745        # Get the type and ID for the target object.
1746        my ($type, $id) = ParseID($idValue);
1747        # Plug them into the GetEntity method.
1748        $retVal = $erdb->GetEntity($type, $id);
1749        # Return the resulting object.
1750        return $retVal;
1751    }
1752    
1753    =head3 SplitKey
1754    
1755        my ($realKey, $subKey) = $ca->SplitKey($key);
1756    
1757    Split an external key (that is, one passed in by a caller) into the real key and the sub key.
1758    The real and sub keys are separated by a splitter value (usually C<::>). If there is no splitter,
1759    then the sub key is presumed to be an empty string.
1760    
1761    =over 4
1762    
1763    =item key
1764    
1765    Incoming key to be split.
1766    
1767    =item RETURN
1768    
1769    Returns a two-element list, the first element of which is the real key and the second element of
1770    which is the sub key.
1771    
1772    =back
1773    
1774    =cut
1775    
1776    sub SplitKey {
1777        # Get the parameters.
1778        my ($self, $key) = @_;
1779        # Do the split.
1780        my ($realKey, $subKey) = split($self->{splitter}, $key, 2);
1781        # Insure the subkey has a value.
1782        if (! defined $subKey) {
1783            $subKey = '';
1784        }
1785        # Return the results.
1786        return ($realKey, $subKey);
1787    }
1788    
1789    
1790    =head3 JoinKey
1791    
1792        my $key = $ca->JoinKey($realKey, $subKey);
1793    
1794    Join a real key and a subkey together to make an external key. The external key is the attribute key
1795    used by the caller. The real key and the subkey are how the keys are represented in the database. The
1796    real key is the key to the B<AttributeKey> entity. The subkey is an attribute of the B<HasValueFor>
1797    relationship.
1798    
1799    =over 4
1800    
1801    =item realKey
1802    
1803    The real attribute key.
1804    
1805    =item subKey
1806    
1807    The subordinate portion of the attribute key.
1808    
1809    =item RETURN
1810    
1811    Returns a single string representing both keys.
1812    
1813    =back
1814    
1815    =cut
1816    
1817    sub JoinKey {
1818        # Get the parameters.
1819        my ($self, $realKey, $subKey) = @_;
1820        # Declare the return variable.
1821        my $retVal;
1822        # Check for a subkey.
1823        if ($subKey eq '') {
1824            # No subkey, so the real key is the key.
1825            $retVal = $realKey;
1826        } else {
1827            # Subkey found, so the two pieces must be joined by a splitter.
1828            $retVal = "$realKey$self->{splitter}$subKey";
1829        }
1830        # Return the result.
1831        return $retVal;
1832    }
1833    
1834    
1835    =head3 AttributeTable
1836    
1837        my $tableHtml = CustomAttributes::AttributeTable($cgi, @attrList);
1838    
1839    Format the attribute data into an HTML table.
1840    
1841    =over 4
1842    
1843    =item cgi
1844    
1845    CGI query object used to generate the HTML
1846    
1847    =item attrList
1848    
1849    List of attribute results, in the format returned by the L</GetAttributes> or
1850    L</QueryAttributes> methods.
1851    
1852    =item RETURN
1853    
1854    Returns an HTML table displaying the attribute keys and values.
1855    
1856    =back
1857    
1858    =cut
1859    
1860    sub AttributeTable {
1861        # Get the parameters.
1862        my ($cgi, @attrList) = @_;
1863        # Accumulate the table rows.
1864        my @html = ();
1865        for my $attrData (@attrList) {
1866            # Format the object ID and key.
1867            my @columns = map { CGI::escapeHTML($_) } @{$attrData}[0,1];
1868            # Now we format the values. These remain unchanged unless one of them is a URL.
1869            my $lastValue = scalar(@{$attrData}) - 1;
1870            push @columns, map { $_ =~ /^http:/ ? $cgi->a({ href => $_ }, $_) : $_ } @{$attrData}[2 .. $lastValue];
1871            # Assemble the values into a table row.
1872            push @html, $cgi->Tr($cgi->td(\@columns));
1873        }
1874        # Format the table in the return variable.
1875        my $retVal = $cgi->table({ border => 2 }, $cgi->Tr($cgi->th(['Object', 'Key', 'Values'])), @html);
1876        # Return it.
1877        return $retVal;
1878    }
1879    
1880    
1881    =head2 Internal Utility Methods
1882    
1883    =head3 _KeyTable
1884    
1885        my $tableName = $ca->_KeyTable($keyName);
1886    
1887    Return the name of the table that contains the attribute values for the
1888    specified key.
1889    
1890    Most attribute values are stored in the default table (usually C<HasValueFor>).
1891    Some, however, are placed in private tables by themselves for performance reasons.
1892    
1893    =over 4
1894    
1895    =item keyName (optional)
1896    
1897    Name of the attribute key whose table name is desired. If not specified, the
1898    entire key/table hash is returned.
1899    
1900    =item RETURN
1901    
1902    Returns the name of the table containing the specified attribute key's values,
1903    or a reference to a hash that maps key names to table names.
1904    
1905    =back
1906    
1907    =cut
1908    
1909    sub _KeyTable {
1910        # Get the parameters.
1911        my ($self, $keyName) = @_;
1912        # Declare the return variable.
1913        my $retVal;
1914        # Insure the key table hash is present.
1915        if (! exists $self->{keyTables}) {
1916            $self->{keyTables} = { map { $_->[0] => $_->[1] } $self->GetAll(['AttributeKey'],
1917                                                    "AttributeKey(relationship-name) <> ?",
1918                                                    [$self->{defaultRel}],
1919                                                    ['AttributeKey(id)', 'AttributeKey(relationship-name)']) };
1920        }
1921        # Get the key hash.
1922        my $keyHash = $self->{keyTables};
1923        # Does the user want a specific table or the whole thing?
1924        if ($keyName) {
1925            # Here we want a specific table. Is this key in the hash?
1926            if (exists $keyHash->{$keyName}) {
1927                # It's there, so return the specified table.
1928                $retVal = $keyHash->{$keyName};
1929            } else {
1930                # No, return the default table name.
1931                $retVal = $self->{defaultRel};
1932            }
1933        } else {
1934            # Here we want the whole hash.
1935            $retVal = $keyHash;
1936        }
1937        # Return the result.
1938        return $retVal;
1939    }
1940    
1941    
1942    =head3 _QueryResults
1943    
1944        my @attributeList = $attrDB->_QueryResults($query, $table, @values);
1945    
1946    Match the results of a query against value criteria and return
1947    the results. This is an internal method that splits the values coming back
1948    and matches the sections against the specified section patterns. It serves
1949    as the back end to L</GetAttributes> and L</FindAttributes>.
1950    
1951    =over 4
1952    
1953    =item query
1954    
1955    A query object that will return the desired records.
1956    
1957    =item table
1958    
1959    Name of the value table for the query.
1960    
1961    =item values
1962    
1963    List of the desired attribute values, section by section. If C<undef>
1964    or an empty string is specified, all values in that section will match. A
1965    generic match can be requested by placing a percent sign (C<%>) at the end.
1966    In that case, all values that match up to and not including the percent sign
1967    will match. You may also specify a regular expression enclosed
1968    in slashes. All values that match the regular expression will be returned. For
1969    performance reasons, only values have this extra capability.
1970    
1971    =item RETURN
1972    
1973    Returns a list of tuples. The first element in the tuple is an object ID, the
1974    second is an attribute key, and the remaining elements are the sections of
1975    the attribute value. All of the tuples will match the criteria set forth in
1976    the parameter list.
1977    
1978    =back
1979    
1980    =cut
1981    
1982    sub _QueryResults {
1983        # Get the parameters.
1984        my ($self, $query, $table, @values) = @_;
1985        # Declare the return value.
1986        my @retVal = ();
1987        # We use this hash to check for duplicates.
1988        my %dupHash = ();
1989        # Get the number of value sections we have to match.
1990        my $sectionCount = scalar(@values);
1991        # Loop through the assignments found.
1992        while (my $row = $query->Fetch()) {
1993            # Get the current row's data.
1994            my ($id, $realKey, $subKey, $valueString) = $row->Values(["$table(to-link)",
1995                                                                      "$table(from-link)",
1996                                                                      "$table(subkey)",
1997                                                                      "$table(value)"
1998                                                                    ]);
1999            # Form the key from the real key and the sub key.
2000            my $key = $self->JoinKey($realKey, $subKey);
2001            # Break the value into sections.
2002            my @sections = split($self->{splitter}, $valueString);
2003            # Match each section against the incoming values. We'll assume we're
2004            # okay unless we learn otherwise.
2005            my $matching = 1;
2006            for (my $i = 0; $i < $sectionCount && $matching; $i++) {
2007                # We need to check to see if this section is generic.
2008                my $value = $values[$i];
2009                Trace("Current value pattern is \"$value\".") if T(4);
2010                if ($value =~ m#^/(.+)/[a-z]*$#) {
2011                    Trace("Regular expression detected.") if T(4);
2012                    # Here we have a regular expression match.
2013                    my $section = $sections[$i];
2014                    $matching = eval("\$section =~ $value");
2015                } else {
2016                    # Here we have a normal match.
2017                    Trace("SQL match used.") if T(4);
2018                    $matching = _CheckSQLPattern($values[$i], $sections[$i]);
2019                }
2020            }
2021            # If we match, consider writing this row to the return list.
2022            if ($matching) {
2023                # Check for a duplicate.
2024                my $wholeThing = join($self->{splitter}, $id, $key, $valueString);
2025                if (! $dupHash{$wholeThing}) {
2026                    # It's okay, we're not a duplicate. Insure we don't duplicate this result.
2027                    $dupHash{$wholeThing} = 1;
2028                    push @retVal, [$id, $key, @sections];
2029                }
2030            }
2031        }
2032        # Return the rows found.
2033        return @retVal;
2034    }
2035    
2036    
2037    =head3 _LoadAttributeTable
2038    
2039        $attr->_LoadAttributeTable($tableName, $fileName, $stats, $mode);
2040    
2041    Load a file's data into an attribute table. This is an internal method
2042    provided for the convenience of L</LoadAttributesFrom>. It loads the
2043    specified file into the specified table and updates the statistics
2044    object.
2045    
2046    =over 4
2047    
2048    =item tableName
2049    
2050    Name of the table being loaded. This is usually C<HasValueFor>, but may
2051    be a different table for some specific attribute keys.
2052    
2053    =item fileName
2054    
2055    Name of the file containing a chunk of attribute data to load.
2056    
2057    =item stats
2058    
2059    Statistics object into which counts and times should be placed.
2060    
2061    =item mode
2062    
2063    Load mode for the file, usually C<low_priority>, C<concurrent>, or
2064    an empty string. The mode is used by some applications to control access
2065    to the table while it's being loaded. The default (empty string) is to lock the
2066    table until all the data's in place.
2067    
2068    =back
2069    
2070    =cut
2071    
2072    sub _LoadAttributeTable {
2073        # Get the parameters.
2074        my ($self, $tableName, $fileName, $stats, $mode) = @_;
2075        # Load the table from the file. Note that we don't do an analyze.
2076        # The analyze is done only after everything is complete.
2077        my $startTime = time();
2078        Trace("Loading attributes from $fileName: " . (-s $fileName) .
2079              " characters.") if T(3);
2080        my $loadStats = $self->LoadTable($fileName, $tableName,
2081                                         mode => $mode, partial => 1);
2082        # Record the load time.
2083        $stats->Add(insertTime => time() - $startTime);
2084        # Roll up the other statistics.
2085        $stats->Accumulate($loadStats);
2086    }
2087    
2088    
2089    =head3 _GetAllTables
2090    
2091        my @tables = $ca->_GetAllTables();
2092    
2093    Return a list of the names of all the tables used to store attribute
2094    values.
2095    
2096    =cut
2097    
2098    sub _GetAllTables {
2099        # Get the parameters.
2100        my ($self) = @_;
2101        # Start with the default table.
2102        my @retVal = $self->{defaultRel};
2103        # Add the tables named in the key hash. These tables are automatically
2104        # NOT the default, and each can only occur once, because alternate tables
2105        # are allocated on a per-key basis.
2106        my $keyHash = $self->_KeyTable();
2107        push @retVal, values %$keyHash;
2108        # Return the result.
2109        return @retVal;
2110    }
2111    
2112    
2113    =head3 _SplitKeyPattern
2114    
2115        my ($realKey, $subKey) = $ca->_SplitKeyPattern($keyChoice);
2116    
2117    Split a key pattern into the main part (the I<real key>) and a sub-part
2118    (the I<sub key>). This method differs from L</SplitKey> in that it treats
2119    the key as an SQL pattern instead of a raw string. Also, if there is no
2120    incoming sub-part, the sub-key will be undefined instead of an empty
2121    string.
2122    
2123    =over 4
2124    
2125    =item keyChoice
2126    
2127    SQL key pattern to be examined. This can either be a literal, an SQL pattern,
2128    a literal with an internal splitter code (usually C<::>) or an SQL pattern with
2129    an internal splitter. Note that the only SQL pattern we support is a percent
2130    sign (C<%>) at the end. This is the way we've declared things in the documentation,
2131    so users who try anything else will have problems.
2132    
2133    =item RETURN
2134    
2135    Returns a two-element list. The first element is the SQL pattern for the
2136    real key and the second is the SQL pattern for the sub-key. If the value
2137    for either one does not matter (e.g., the user wants a real key value of
2138    C<iedb> and doesn't care about the sub-key value), it will be undefined.
2139    
2140    =back
2141    
2142    =cut
2143    
2144    sub _SplitKeyPattern {
2145        # Get the parameters.
2146        my ($self, $keyChoice) = @_;
2147        # Declare the return variables.
2148        my ($realKey, $subKey);
2149        # Look for a splitter in the input.
2150        if ($keyChoice =~ /^(.*?)$self->{splitter}(.*)/) {
2151            # We found one. This means we can treat both sides of the
2152            # splitter as known patterns.
2153            ($realKey, $subKey) = ($1, $2);
2154        } elsif ($keyChoice =~ /%$/) {
2155            # Here we have a generic pattern for the whole key. The pattern
2156            # is treated as the correct pattern for the real key, but the
2157            # sub-key is considered to be wild.
2158            $realKey = $keyChoice;
2159        } else {
2160            # Here we have a literal pattern for the whole key. The pattern
2161            # is treated as the correct pattern for the real key, and the
2162            # sub-key is required to be blank.
2163            $realKey = $keyChoice;
2164            $subKey = '';
2165        }
2166        # Return the results.
2167        return ($realKey, $subKey);
2168    }
2169    
2170    
2171    =head3 _WherePart
2172    
2173        my ($sqlClause, $escapedValue) = _WherePart($tableName, $fieldName, $sqlPattern);
2174    
2175    Return the SQL clause and value for checking a field against the
2176    specified SQL pattern value. If the pattern is generic (ends in a C<%>),
2177    then a C<LIKE> expression is returned. Otherwise, an equality expression
2178    is returned. We take in information describing the field being checked,
2179    and the pattern we're checking against it. The output is a WHERE clause
2180    fragment for the comparison and a value to be used as a bound parameter
2181    value for the clause.
2182    
2183    =over 4
2184    
2185    =item tableName
2186    
2187    Name of the table containing the field we want checked by the clause.
2188    
2189    =item fieldName
2190    
2191    Name of the field to check in that table.
2192    
2193    =item sqlPattern
2194    
2195    Pattern to be compared against the field. If the last character is a percent sign
2196    (C<%>), it will be treated as a generic SQL pattern; otherwise, it will be treated
2197    as a literal.
2198    
2199    =item RETURN
2200    
2201    Returns a two-element list. The first element will be an SQL comparison expression
2202    and the second will be the value to be used as a bound parameter for the expression
2203    in order to
2204    
2205    =back
2206    
2207    =cut
2208    
2209    sub _WherePart {
2210        # Get the parameters.
2211        my ($tableName, $fieldName, $sqlPattern) = @_;
2212        # Declare the return variables.
2213        my ($sqlClause, $escapedValue);
2214        # Copy the pattern into the return area.
2215        $escapedValue = $sqlPattern;
2216        # Check the pattern. Is it generic or exact?
2217        if ($sqlPattern =~ /(.+)%$/) {
2218            # Yes, it is. We need a LIKE clause and we must escape the underscores
2219            # and percents in the pattern (except for the last one, of course).
2220            $escapedValue = $1;
2221            $escapedValue =~ s/(%|_)/\\$1/g;
2222            $escapedValue .= "%";
2223            $sqlClause = "$tableName($fieldName) LIKE ?";
2224        } else {
2225            # No, it isn't. We use an equality clause.
2226            $sqlClause = "$tableName($fieldName) = ?";
2227        }
2228        # Return the results.
2229        return ($sqlClause, $escapedValue);
2230    }
2231    
2232    
2233    =head3 _CheckSQLPattern
2234    
2235        my $flag = _CheckSQLPattern($pattern, $value);
2236    
2237    Return TRUE if the specified SQL pattern matches the specified value,
2238    else FALSE. The pattern is not a true full-blown SQL LIKE pattern: the
2239    only wild-carding allowed is a percent sign (C<%>) at the end.
2240    
2241    =over 4
2242    
2243    =item pattern
2244    
2245    SQL pattern to match against a value.
2246    
2247    =item value
2248    
2249    Value to match against an SQL pattern.
2250    
2251    =item RETURN
2252    
2253    Returns TRUE if the pattern matches the value, else FALSE.
2254    
2255    =back
2256    
2257    =cut
2258    
2259    sub _CheckSQLPattern {
2260        # Get the parameters.
2261        my ($pattern, $value) = @_;
2262        # Declare the return variable.
2263        my $retVal;
2264        # Check for a generic pattern.
2265        if ($pattern =~ /(.*)%$/) {
2266            # Here we have one. Do a substring match.
2267            $retVal = (substr($value, 0, length $1) eq $1);
2268        } else {
2269            # Here it's an exact match.
2270            $retVal = ($pattern eq $value);
2271        }
2272        # Return the result.
2273        return $retVal;
2274  }  }
2275    
2276  1;  1;

Legend:
Removed from v.1.9  
changed lines
  Added in v.1.38

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3