Parent Directory
|
Revision Log
Revision 1.84 - (view) (download) (as text)
1 : | parrello | 1.1 | package ERDB; |
2 : | |||
3 : | parrello | 1.10 | use strict; |
4 : | use Tracer; | ||
5 : | olson | 1.14 | use DBrtns; |
6 : | parrello | 1.10 | use Data::Dumper; |
7 : | use XML::Simple; | ||
8 : | use DBQuery; | ||
9 : | use DBObject; | ||
10 : | use Stats; | ||
11 : | use Time::HiRes qw(gettimeofday); | ||
12 : | parrello | 1.42 | use Digest::MD5 qw(md5_base64); |
13 : | parrello | 1.19 | use FIG; |
14 : | parrello | 1.73 | use CGI; |
15 : | parrello | 1.1 | |
16 : | =head1 Entity-Relationship Database Package | ||
17 : | |||
18 : | =head2 Introduction | ||
19 : | |||
20 : | The Entity-Relationship Database Package allows the client to create an easily-configurable | ||
21 : | database of Entities connected by Relationships. Each entity is represented by one or more | ||
22 : | relations in an underlying SQL database. Each relationship is represented by a single | ||
23 : | relation that connects two entities. | ||
24 : | |||
25 : | Although this package is designed for general use, all examples are derived from the | ||
26 : | Sprout database, which is the first database implemented using this package. | ||
27 : | |||
28 : | Each entity has at least one relation, the I<primary relation>, that has the same name as | ||
29 : | the entity. The primary relation contains a field named C<id> that contains the unique | ||
30 : | identifier of each entity instance. An entity may have additional relations that contain | ||
31 : | fields which are optional or can occur more than once. For example, the B<FEATURE> entity | ||
32 : | has a B<feature-type> attribute that occurs exactly once for each feature. This attribute | ||
33 : | is implemented by a C<feature_type> column in the primary relation C<Feature>. In addition, | ||
34 : | however, a feature may have zero or more aliases. These are implemented using a C<FeatureAlias> | ||
35 : | relation that contains two fields-- the feature ID (C<id>) and the alias name (C<alias>). | ||
36 : | The B<FEATURE> entity also contains an optional virulence number. This is implemented | ||
37 : | as a separate relation C<FeatureVirulence> which contains an ID (C<id>) and a virulence number | ||
38 : | parrello | 1.8 | (C<virulence>). If the virulence of a feature I<ABC> is known to be 6, there will be one row in |
39 : | the C<FeatureVirulence> relation possessing the value I<ABC> as its ID and 6 as its virulence | ||
40 : | number. If the virulence of I<ABC> is not known, there will not be any rows for it in | ||
41 : | C<FeatureVirulence>. | ||
42 : | parrello | 1.1 | |
43 : | Entities are connected by binary relationships implemented using single relations possessing the | ||
44 : | same name as the relationship itself and that has an I<arity> of 1-to-1 (C<11>), 1-to-many (C<1M>), | ||
45 : | or many-to-many (C<MM>). Each relationship's relation contains a C<from-link> field that contains the | ||
46 : | ID of the source entity and a C<to-link> field that contains the ID of the target entity. The name | ||
47 : | of the relationship is generally a verb phrase with the source entity as the subject and the | ||
48 : | target entity as the object. So, for example, the B<ComesFrom> relationship connects the B<GENOME> | ||
49 : | and B<SOURCE> entities, and indicates that a particular source organization participated in the | ||
50 : | mapping of the genome. A source organization frequently participates in the mapping | ||
51 : | of many genomes, and many source organizations can cooperate in the mapping of a single genome, so | ||
52 : | this relationship has an arity of many-to-many (C<MM>). The relation that implements the B<ComesFrom> | ||
53 : | relationship is called C<ComesFrom> and contains two fields-- C<from-link>, which contains a genome ID, | ||
54 : | and C<to-link>, which contains a source ID. | ||
55 : | |||
56 : | A relationship may itself have attributes. These attributes, known as I<intersection data attributes>, | ||
57 : | are implemented as additional fields in the relationship's relation. So, for example, the | ||
58 : | B<IsMadeUpOf> relationship connects the B<Contig> entity to the B<Sequence> entity, and is used | ||
59 : | to determine which sequences make up a contig. The relationship has as an attribute the | ||
60 : | B<start-position>, which indicates where in the contig that the sequence begins. This attribute | ||
61 : | is implemented as the C<start_position> field in the C<IsMadeUpOf> relation. | ||
62 : | |||
63 : | parrello | 1.73 | The database itself is described by an XML file. In addition to all the data required to define |
64 : | the entities, relationships, and attributes, the schema provides space for notes describing | ||
65 : | the data and what it means. These notes are used by L</ShowMetaData> to generate documentation | ||
66 : | for the database. | ||
67 : | |||
68 : | Special support is provided for text searching. An entity field can be marked as <em>searchable</em>, | ||
69 : | in which case it will be used to generate a text search index in which the user searches for words | ||
70 : | in the field instead of a particular field value. | ||
71 : | parrello | 1.1 | |
72 : | Finally, every entity and relationship object has a flag indicating if it is new or old. The object | ||
73 : | is considered I<old> if it was loaded by the L</LoadTables> method. It is considered I<new> if it | ||
74 : | was inserted by the L</InsertObject> method. | ||
75 : | |||
76 : | parrello | 1.8 | =head2 XML Database Description |
77 : | |||
78 : | =head3 Data Types | ||
79 : | |||
80 : | The ERDB system supports the following data types. Note that there are numerous string | ||
81 : | types depending on the maximum length. Some database packages limit the total number of | ||
82 : | characters you have in an index key; to insure the database works in all environments, | ||
83 : | the type of string should be the shortest one possible that supports all the known values. | ||
84 : | |||
85 : | =over 4 | ||
86 : | |||
87 : | =item char | ||
88 : | |||
89 : | single ASCII character | ||
90 : | |||
91 : | =item int | ||
92 : | |||
93 : | 32-bit signed integer | ||
94 : | |||
95 : | parrello | 1.60 | =item counter |
96 : | |||
97 : | 32-bit unsigned integer | ||
98 : | |||
99 : | parrello | 1.8 | =item date |
100 : | |||
101 : | 64-bit unsigned integer, representing a PERL date/time value | ||
102 : | |||
103 : | =item text | ||
104 : | |||
105 : | long string; Text fields cannot be used in indexes or sorting and do not support the | ||
106 : | normal syntax of filter clauses, but can be up to a billion character in length | ||
107 : | |||
108 : | =item float | ||
109 : | |||
110 : | double-precision floating-point number | ||
111 : | |||
112 : | =item boolean | ||
113 : | |||
114 : | single-bit numeric value; The value is stored as a 16-bit signed integer (for | ||
115 : | compatability with certain database packages), but the only values supported are | ||
116 : | 0 and 1. | ||
117 : | |||
118 : | parrello | 1.44 | =item id-string |
119 : | |||
120 : | variable-length string, maximum 25 characters | ||
121 : | |||
122 : | parrello | 1.8 | =item key-string |
123 : | |||
124 : | variable-length string, maximum 40 characters | ||
125 : | |||
126 : | =item name-string | ||
127 : | |||
128 : | variable-length string, maximum 80 characters | ||
129 : | |||
130 : | =item medium-string | ||
131 : | |||
132 : | variable-length string, maximum 160 characters | ||
133 : | |||
134 : | =item string | ||
135 : | |||
136 : | variable-length string, maximum 255 characters | ||
137 : | |||
138 : | parrello | 1.42 | =item hash-string |
139 : | |||
140 : | variable-length string, maximum 22 characters | ||
141 : | |||
142 : | parrello | 1.8 | =back |
143 : | |||
144 : | parrello | 1.42 | The hash-string data type has a special meaning. The actual key passed into the loader will |
145 : | be a string, but it will be digested into a 22-character MD5 code to save space. Although the | ||
146 : | MD5 algorithm is not perfect, it is extremely unlikely two strings will have the same | ||
147 : | digest. Therefore, it is presumed the keys will be unique. When the database is actually | ||
148 : | in use, the hashed keys will be presented rather than the original values. For this reason, | ||
149 : | they should not be used for entities where the key is meaningful. | ||
150 : | |||
151 : | parrello | 1.8 | =head3 Global Tags |
152 : | |||
153 : | The entire database definition must be inside a B<Database> tag. The display name of | ||
154 : | the database is given by the text associated with the B<Title> tag. The display name | ||
155 : | is only used in the automated documentation. It has no other effect. The entities and | ||
156 : | relationships are listed inside the B<Entities> and B<Relationships> tags, | ||
157 : | respectively. None of these tags have attributes. | ||
158 : | |||
159 : | parrello | 1.10 | <Database> |
160 : | <Title>... display title here...</Title> | ||
161 : | <Entities> | ||
162 : | ... entity definitions here ... | ||
163 : | </Entities> | ||
164 : | <Relationships> | ||
165 : | ... relationship definitions here... | ||
166 : | </Relationships> | ||
167 : | </Database> | ||
168 : | parrello | 1.8 | |
169 : | Entities, relationships, indexes, and fields all allow a text tag called B<Notes>. | ||
170 : | The text inside the B<Notes> tag contains comments that will appear when the database | ||
171 : | documentation is generated. Within a B<Notes> tag, you may use C<[i]> and C<[/i]> for | ||
172 : | italics, C<[b]> and C<[/b]> for bold, and C<[p]> for a new paragraph. | ||
173 : | |||
174 : | =head3 Fields | ||
175 : | |||
176 : | Both entities and relationships have fields described by B<Field> tags. A B<Field> | ||
177 : | tag can have B<Notes> associated with it. The complete set of B<Field> tags for an | ||
178 : | object mus be inside B<Fields> tags. | ||
179 : | |||
180 : | parrello | 1.10 | <Entity ... > |
181 : | <Fields> | ||
182 : | ... Field tags ... | ||
183 : | </Fields> | ||
184 : | </Entity> | ||
185 : | parrello | 1.8 | |
186 : | The attributes for the B<Field> tag are as follows. | ||
187 : | |||
188 : | =over 4 | ||
189 : | |||
190 : | =item name | ||
191 : | |||
192 : | Name of the field. The field name should contain only letters, digits, and hyphens (C<->), | ||
193 : | and the first character should be a letter. Most underlying databases are case-insensitive | ||
194 : | parrello | 1.70 | with the respect to field names, so a best practice is to use lower-case letters only. Finally, |
195 : | the name C<search-relevance> has special meaning for full-text searches and should not be | ||
196 : | used as a field name. | ||
197 : | parrello | 1.8 | |
198 : | =item type | ||
199 : | |||
200 : | Data type of the field. The legal data types are given above. | ||
201 : | |||
202 : | =item relation | ||
203 : | |||
204 : | Name of the relation containing the field. This should only be specified for entity | ||
205 : | fields. The ERDB system does not support optional fields or multi-occurring fields | ||
206 : | in the primary relation of an entity. Instead, they are put into secondary relations. | ||
207 : | So, for example, in the C<Genome> entity, the C<group-name> field indicates a special | ||
208 : | grouping used to select a subset of the genomes. A given genome may not be in any | ||
209 : | groups or may be in multiple groups. Therefore, C<group-name> specifies a relation | ||
210 : | value. The relation name specified must be a valid table name. By convention, it is | ||
211 : | usually the entity name followed by a qualifying word (e.g. C<GenomeGroup>). In an | ||
212 : | entity, the fields without a relation attribute are said to belong to the | ||
213 : | I<primary relation>. This relation has the same name as the entity itself. | ||
214 : | |||
215 : | parrello | 1.70 | =item searchable |
216 : | |||
217 : | If specified, then the field is a candidate for full-text searching. A single full-text | ||
218 : | index will be created for each relation with at least one searchable field in it. | ||
219 : | For best results, this option should only be used for string or text fields. | ||
220 : | |||
221 : | parrello | 1.73 | =item special |
222 : | |||
223 : | This attribute allows the subclass to assign special meaning for certain fields. | ||
224 : | The interpretation is up to the subclass itself. Currently, only entity fields | ||
225 : | can have this attribute. | ||
226 : | |||
227 : | parrello | 1.8 | =back |
228 : | |||
229 : | =head3 Indexes | ||
230 : | |||
231 : | An entity can have multiple alternate indexes associated with it. The fields must | ||
232 : | parrello | 1.74 | all be from the same relation. The alternate indexes assist in ordering results |
233 : | parrello | 1.8 | from a query. A relationship can have up to two indexes-- a I<to-index> and a |
234 : | I<from-index>. These order the results when crossing the relationship. For | ||
235 : | example, in the relationship C<HasContig> from C<Genome> to C<Contig>, the | ||
236 : | from-index would order the contigs of a ganome, and the to-index would order | ||
237 : | the genomes of a contig. A relationship's index must specify only fields in | ||
238 : | the relationship. | ||
239 : | |||
240 : | The indexes for an entity must be listed inside the B<Indexes> tag. The from-index | ||
241 : | of a relationship is specified using the B<FromIndex> tag; the to-index is specified | ||
242 : | using the B<ToIndex> tag. | ||
243 : | |||
244 : | Each index can contain a B<Notes> tag. In addition, it will have an B<IndexFields> | ||
245 : | tag containing the B<IndexField> tags. These specify, in order, the fields used in | ||
246 : | the index. The attributes of an B<IndexField> tag are as follows. | ||
247 : | |||
248 : | =over 4 | ||
249 : | |||
250 : | =item name | ||
251 : | |||
252 : | Name of the field. | ||
253 : | |||
254 : | =item order | ||
255 : | |||
256 : | Sort order of the field-- C<ascending> or C<descending>. | ||
257 : | |||
258 : | =back | ||
259 : | |||
260 : | parrello | 1.78 | The B<FromIndex>, and B<ToIndex> tags have no attributes. The B<Index> tag can |
261 : | have a B<Unique> attribute. If specified, the index will be generated as a unique | ||
262 : | index. | ||
263 : | parrello | 1.8 | |
264 : | =head3 Object and Field Names | ||
265 : | |||
266 : | By convention entity and relationship names use capital casing (e.g. C<Genome> or | ||
267 : | C<HasRegionsIn>. Most underlying databases, however, are aggressively case-insensitive | ||
268 : | with respect to relation names, converting them internally to all-upper case or | ||
269 : | all-lower case. | ||
270 : | |||
271 : | If syntax or parsing errors occur when you try to load or use an ERDB database, the | ||
272 : | most likely reason is that one of your objects has an SQL reserved word as its name. | ||
273 : | The list of SQL reserved words keeps increasing; however, most are unlikely to show | ||
274 : | up as a noun or declarative verb phrase. The exceptions are C<Group>, C<User>, | ||
275 : | C<Table>, C<Index>, C<Object>, C<Date>, C<Number>, C<Update>, C<Time>, C<Percent>, | ||
276 : | C<Memo>, C<Order>, and C<Sum>. This problem can crop up in field names as well. | ||
277 : | |||
278 : | Every entity has a field called C<id> that acts as its primary key. Every relationship | ||
279 : | has fields called C<from-link> and C<to-link> that contain copies of the relevant | ||
280 : | entity IDs. These are essentially ERDB's reserved words, and should not be used | ||
281 : | for user-defined field names. | ||
282 : | |||
283 : | =head3 Entities | ||
284 : | |||
285 : | An entity is described by the B<Entity> tag. The entity can contain B<Notes>, an | ||
286 : | B<Indexes> tag containing one or more secondary indexes, and a B<Fields> tag | ||
287 : | containing one or more fields. The attributes of the B<Entity> tag are as follows. | ||
288 : | |||
289 : | =over 4 | ||
290 : | |||
291 : | =item name | ||
292 : | |||
293 : | Name of the entity. The entity name, by convention, uses capital casing (e.g. C<Genome> | ||
294 : | or C<GroupBlock>) and should be a noun or noun phrase. | ||
295 : | |||
296 : | =item keyType | ||
297 : | |||
298 : | Data type of the primary key. The primary key is always named C<id>. | ||
299 : | |||
300 : | =back | ||
301 : | |||
302 : | =head3 Relationships | ||
303 : | |||
304 : | A relationship is described by the C<Relationship> tag. Within a relationship, | ||
305 : | there can be a C<Notes> tag, a C<Fields> tag containing the intersection data | ||
306 : | fields, a C<FromIndex> tag containing the from-index, and a C<ToIndex> tag containing | ||
307 : | the to-index. | ||
308 : | |||
309 : | The C<Relationship> tag has the following attributes. | ||
310 : | |||
311 : | =over 4 | ||
312 : | |||
313 : | =item name | ||
314 : | |||
315 : | Name of the relationship. The relationship name, by convention, uses capital casing | ||
316 : | (e.g. C<ContainsRegionIn> or C<HasContig>), and should be a declarative verb | ||
317 : | phrase, designed to fit between the from-entity and the to-entity (e.g. | ||
318 : | Block C<ContainsRegionIn> Genome). | ||
319 : | |||
320 : | =item from | ||
321 : | |||
322 : | Name of the entity from which the relationship starts. | ||
323 : | |||
324 : | =item to | ||
325 : | |||
326 : | Name of the entity to which the relationship proceeds. | ||
327 : | |||
328 : | =item arity | ||
329 : | |||
330 : | Relationship type: C<1M> for one-to-many and C<MM> for many-to-many. | ||
331 : | |||
332 : | =back | ||
333 : | |||
334 : | parrello | 1.1 | =cut |
335 : | |||
336 : | # GLOBALS | ||
337 : | |||
338 : | # Table of information about our datatypes. "sqlType" is the corresponding SQL datatype string. | ||
339 : | # "maxLen" is the maximum permissible length of the incoming string data used to populate a field | ||
340 : | parrello | 1.73 | # of the specified type. "avgLen" is the average byte length for estimating |
341 : | parrello | 1.75 | # record sizes. "sort" is the key modifier for the sort command, "notes" is a type description, |
342 : | # and "indexMod", if non-zero, is the number of characters to use when the field is specified in an | ||
343 : | # index | ||
344 : | parrello | 1.73 | my %TypeTable = ( char => { sqlType => 'CHAR(1)', maxLen => 1, avgLen => 1, sort => "", |
345 : | parrello | 1.75 | indexMod => 0, notes => "single ASCII character"}, |
346 : | parrello | 1.73 | int => { sqlType => 'INTEGER', maxLen => 20, avgLen => 4, sort => "n", |
347 : | parrello | 1.75 | indexMod => 0, notes => "signed 32-bit integer"}, |
348 : | parrello | 1.73 | counter => { sqlType => 'INTEGER UNSIGNED', maxLen => 20, avgLen => 4, sort => "n", |
349 : | parrello | 1.75 | indexMod => 0, notes => "unsigned 32-bit integer"}, |
350 : | parrello | 1.73 | string => { sqlType => 'VARCHAR(255)', maxLen => 255, avgLen => 100, sort => "", |
351 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 255 characters"}, |
352 : | parrello | 1.73 | text => { sqlType => 'TEXT', maxLen => 1000000000, avgLen => 500, sort => "", |
353 : | parrello | 1.75 | indexMod => 255, notes => "character string, nearly unlimited length, only first 255 characters are indexed"}, |
354 : | parrello | 1.73 | date => { sqlType => 'BIGINT', maxLen => 80, avgLen => 8, sort => "n", |
355 : | parrello | 1.75 | indexMod => 0, notes => "signed, 64-bit integer"}, |
356 : | parrello | 1.73 | float => { sqlType => 'DOUBLE PRECISION', maxLen => 40, avgLen => 8, sort => "g", |
357 : | parrello | 1.75 | indexMod => 0, notes => "64-bit double precision floating-point number"}, |
358 : | parrello | 1.73 | boolean => { sqlType => 'SMALLINT', maxLen => 1, avgLen => 1, sort => "n", |
359 : | parrello | 1.75 | indexMod => 0, notes => "boolean value: 0 if false, 1 if true"}, |
360 : | parrello | 1.42 | 'hash-string' => |
361 : | parrello | 1.73 | { sqlType => 'VARCHAR(22)', maxLen => 22, avgLen => 22, sort => "", |
362 : | parrello | 1.75 | indexMod => 0, notes => "string stored in digested form, used for certain types of key fields"}, |
363 : | parrello | 1.44 | 'id-string' => |
364 : | parrello | 1.73 | { sqlType => 'VARCHAR(25)', maxLen => 25, avgLen => 25, sort => "", |
365 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 25 characters"}, |
366 : | parrello | 1.10 | 'key-string' => |
367 : | parrello | 1.73 | { sqlType => 'VARCHAR(40)', maxLen => 40, avgLen => 10, sort => "", |
368 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 40 characters"}, |
369 : | parrello | 1.10 | 'name-string' => |
370 : | parrello | 1.73 | { sqlType => 'VARCHAR(80)', maxLen => 80, avgLen => 40, sort => "", |
371 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 80 characters"}, |
372 : | parrello | 1.10 | 'medium-string' => |
373 : | parrello | 1.73 | { sqlType => 'VARCHAR(160)', maxLen => 160, avgLen => 40, sort => "", |
374 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 160 characters"}, |
375 : | parrello | 1.10 | ); |
376 : | parrello | 1.1 | |
377 : | # Table translating arities into natural language. | ||
378 : | my %ArityTable = ( '11' => 'one-to-one', | ||
379 : | parrello | 1.10 | '1M' => 'one-to-many', |
380 : | 'MM' => 'many-to-many' | ||
381 : | ); | ||
382 : | parrello | 1.1 | |
383 : | parrello | 1.73 | # Options for XML input and output. |
384 : | |||
385 : | my %XmlOptions = (GroupTags => { Relationships => 'Relationship', | ||
386 : | Entities => 'Entity', | ||
387 : | Fields => 'Field', | ||
388 : | Indexes => 'Index', | ||
389 : | IndexFields => 'IndexField' | ||
390 : | }, | ||
391 : | KeyAttr => { Relationship => 'name', | ||
392 : | Entity => 'name', | ||
393 : | Field => 'name' | ||
394 : | }, | ||
395 : | SuppressEmpty => 1, | ||
396 : | ); | ||
397 : | |||
398 : | my %XmlInOpts = ( | ||
399 : | parrello | 1.78 | ForceArray => ['Field', 'Index', 'IndexField', 'Relationship', 'Entity'], |
400 : | parrello | 1.73 | ForceContent => 1, |
401 : | NormalizeSpace => 2, | ||
402 : | ); | ||
403 : | my %XmlOutOpts = ( | ||
404 : | RootName => 'Database', | ||
405 : | XMLDecl => 1, | ||
406 : | ); | ||
407 : | parrello | 1.1 | |
408 : | |||
409 : | =head2 Public Methods | ||
410 : | |||
411 : | =head3 new | ||
412 : | |||
413 : | parrello | 1.5 | C<< my $database = ERDB->new($dbh, $metaFileName); >> |
414 : | parrello | 1.1 | |
415 : | Create a new ERDB object. | ||
416 : | |||
417 : | =over 4 | ||
418 : | |||
419 : | =item dbh | ||
420 : | |||
421 : | DBKernel database object for the target database. | ||
422 : | |||
423 : | =item metaFileName | ||
424 : | |||
425 : | Name of the XML file containing the metadata. | ||
426 : | |||
427 : | =back | ||
428 : | |||
429 : | =cut | ||
430 : | |||
431 : | sub new { | ||
432 : | parrello | 1.10 | # Get the parameters. |
433 : | my ($class, $dbh, $metaFileName, $options) = @_; | ||
434 : | # Load the meta-data. | ||
435 : | my $metaData = _LoadMetaData($metaFileName); | ||
436 : | # Create the object. | ||
437 : | my $self = { _dbh => $dbh, | ||
438 : | _metaData => $metaData | ||
439 : | }; | ||
440 : | # Bless and return it. | ||
441 : | bless $self, $class; | ||
442 : | return $self; | ||
443 : | parrello | 1.1 | } |
444 : | |||
445 : | =head3 ShowMetaData | ||
446 : | |||
447 : | parrello | 1.18 | C<< $erdb->ShowMetaData($fileName); >> |
448 : | parrello | 1.1 | |
449 : | This method outputs a description of the database. This description can be used to help users create | ||
450 : | the data to be loaded into the relations. | ||
451 : | |||
452 : | =over 4 | ||
453 : | |||
454 : | =item filename | ||
455 : | |||
456 : | The name of the output file. | ||
457 : | |||
458 : | =back | ||
459 : | |||
460 : | =cut | ||
461 : | |||
462 : | sub ShowMetaData { | ||
463 : | parrello | 1.10 | # Get the parameters. |
464 : | my ($self, $filename) = @_; | ||
465 : | # Get the metadata and the title string. | ||
466 : | my $metadata = $self->{_metaData}; | ||
467 : | # Get the title string. | ||
468 : | my $title = $metadata->{Title}; | ||
469 : | # Get the entity and relationship lists. | ||
470 : | my $entityList = $metadata->{Entities}; | ||
471 : | my $relationshipList = $metadata->{Relationships}; | ||
472 : | # Open the output file. | ||
473 : | open(HTMLOUT, ">$filename") || Confess("Could not open MetaData display file $filename: $!"); | ||
474 : | Trace("Building MetaData table of contents.") if T(4); | ||
475 : | # Write the HTML heading stuff. | ||
476 : | print HTMLOUT "<html>\n<head>\n<title>$title</title>\n"; | ||
477 : | print HTMLOUT "</head>\n<body>\n"; | ||
478 : | parrello | 1.45 | # Write the documentation. |
479 : | print HTMLOUT $self->DisplayMetaData(); | ||
480 : | # Close the document. | ||
481 : | print HTMLOUT "</body>\n</html>\n"; | ||
482 : | # Close the file. | ||
483 : | close HTMLOUT; | ||
484 : | } | ||
485 : | |||
486 : | =head3 DisplayMetaData | ||
487 : | |||
488 : | C<< my $html = $erdb->DisplayMetaData(); >> | ||
489 : | |||
490 : | Return an HTML description of the database. This description can be used to help users create | ||
491 : | the data to be loaded into the relations and form queries. The output is raw includable HTML | ||
492 : | without any HEAD or BODY tags. | ||
493 : | |||
494 : | =over 4 | ||
495 : | |||
496 : | =item filename | ||
497 : | |||
498 : | The name of the output file. | ||
499 : | |||
500 : | =back | ||
501 : | |||
502 : | =cut | ||
503 : | |||
504 : | sub DisplayMetaData { | ||
505 : | # Get the parameters. | ||
506 : | my ($self) = @_; | ||
507 : | # Get the metadata and the title string. | ||
508 : | my $metadata = $self->{_metaData}; | ||
509 : | # Get the title string. | ||
510 : | my $title = $metadata->{Title}; | ||
511 : | # Get the entity and relationship lists. | ||
512 : | my $entityList = $metadata->{Entities}; | ||
513 : | my $relationshipList = $metadata->{Relationships}; | ||
514 : | # Declare the return variable. | ||
515 : | my $retVal = ""; | ||
516 : | # Open the output file. | ||
517 : | Trace("Building MetaData table of contents.") if T(4); | ||
518 : | parrello | 1.10 | # Here we do the table of contents. It starts as an unordered list of section names. Each |
519 : | # section contains an ordered list of entity or relationship subsections. | ||
520 : | parrello | 1.45 | $retVal .= "<ul>\n<li><a href=\"#EntitiesSection\">Entities</a>\n<ol>\n"; |
521 : | parrello | 1.10 | # Loop through the Entities, displaying a list item for each. |
522 : | foreach my $key (sort keys %{$entityList}) { | ||
523 : | # Display this item. | ||
524 : | parrello | 1.45 | $retVal .= "<li><a href=\"#$key\">$key</a></li>\n"; |
525 : | parrello | 1.10 | } |
526 : | # Close off the entity section and start the relationship section. | ||
527 : | parrello | 1.45 | $retVal .= "</ol></li>\n<li><a href=\"#RelationshipsSection\">Relationships</a>\n<ol>\n"; |
528 : | parrello | 1.10 | # Loop through the Relationships. |
529 : | foreach my $key (sort keys %{$relationshipList}) { | ||
530 : | # Display this item. | ||
531 : | my $relationshipTitle = _ComputeRelationshipSentence($key, $relationshipList->{$key}); | ||
532 : | parrello | 1.45 | $retVal .= "<li><a href=\"#$key\">$relationshipTitle</a></li>\n"; |
533 : | parrello | 1.10 | } |
534 : | # Close off the relationship section and list the join table section. | ||
535 : | parrello | 1.45 | $retVal .= "</ol></li>\n<li><a href=\"#JoinTable\">Join Table</a></li>\n"; |
536 : | parrello | 1.10 | # Close off the table of contents itself. |
537 : | parrello | 1.45 | $retVal .= "</ul>\n"; |
538 : | parrello | 1.10 | # Now we start with the actual data. Denote we're starting the entity section. |
539 : | parrello | 1.45 | $retVal .= "<a name=\"EntitiesSection\"></a><h2>Entities</h2>\n"; |
540 : | parrello | 1.10 | # Loop through the entities. |
541 : | for my $key (sort keys %{$entityList}) { | ||
542 : | Trace("Building MetaData entry for $key entity.") if T(4); | ||
543 : | # Create the entity header. It contains a bookmark and the entity name. | ||
544 : | parrello | 1.45 | $retVal .= "<a name=\"$key\"></a><h3>$key</h3>\n"; |
545 : | parrello | 1.10 | # Get the entity data. |
546 : | my $entityData = $entityList->{$key}; | ||
547 : | # If there's descriptive text, display it. | ||
548 : | if (my $notes = $entityData->{Notes}) { | ||
549 : | parrello | 1.73 | $retVal .= "<p>" . HTMLNote($notes->{content}) . "</p>\n"; |
550 : | parrello | 1.10 | } |
551 : | parrello | 1.83 | # See if we need a list of the entity's relationships. |
552 : | my $relCount = keys %{$relationshipList}; | ||
553 : | if ($relCount > 0) { | ||
554 : | # First, we set up the relationship subsection. | ||
555 : | $retVal .= "<h4>Relationships for <b>$key</b></h4>\n<ul>\n"; | ||
556 : | # Loop through the relationships. | ||
557 : | for my $relationship (sort keys %{$relationshipList}) { | ||
558 : | # Get the relationship data. | ||
559 : | my $relationshipStructure = $relationshipList->{$relationship}; | ||
560 : | # Only use the relationship if if has this entity in its FROM or TO fields. | ||
561 : | if ($relationshipStructure->{from} eq $key || $relationshipStructure->{to} eq $key) { | ||
562 : | # Get the relationship sentence and append the arity. | ||
563 : | my $relationshipDescription = _ComputeRelationshipSentence($relationship, $relationshipStructure); | ||
564 : | # Display the relationship data. | ||
565 : | $retVal .= "<li><a href=\"#$relationship\">$relationshipDescription</a></li>\n"; | ||
566 : | } | ||
567 : | parrello | 1.10 | } |
568 : | parrello | 1.83 | # Close off the relationship list. |
569 : | $retVal .= "</ul>\n"; | ||
570 : | parrello | 1.10 | } |
571 : | # Get the entity's relations. | ||
572 : | my $relationList = $entityData->{Relations}; | ||
573 : | # Create a header for the relation subsection. | ||
574 : | parrello | 1.45 | $retVal .= "<h4>Relations for <b>$key</b></h4>\n"; |
575 : | parrello | 1.10 | # Loop through the relations, displaying them. |
576 : | for my $relation (sort keys %{$relationList}) { | ||
577 : | my $htmlString = _ShowRelationTable($relation, $relationList->{$relation}); | ||
578 : | parrello | 1.45 | $retVal .= $htmlString; |
579 : | parrello | 1.10 | } |
580 : | } | ||
581 : | # Denote we're starting the relationship section. | ||
582 : | parrello | 1.45 | $retVal .= "<a name=\"RelationshipsSection\"></a><h2>Relationships</h2>\n"; |
583 : | parrello | 1.10 | # Loop through the relationships. |
584 : | for my $key (sort keys %{$relationshipList}) { | ||
585 : | Trace("Building MetaData entry for $key relationship.") if T(4); | ||
586 : | # Get the relationship's structure. | ||
587 : | my $relationshipStructure = $relationshipList->{$key}; | ||
588 : | # Create the relationship header. | ||
589 : | my $headerText = _ComputeRelationshipHeading($key, $relationshipStructure); | ||
590 : | parrello | 1.45 | $retVal .= "<h3><a name=\"$key\"></a>$headerText</h3>\n"; |
591 : | parrello | 1.10 | # Get the entity names. |
592 : | my $fromEntity = $relationshipStructure->{from}; | ||
593 : | my $toEntity = $relationshipStructure->{to}; | ||
594 : | # Describe the relationship arity. Note there's a bit of trickiness involving recursive | ||
595 : | # many-to-many relationships. In a normal many-to-many we use two sentences to describe | ||
596 : | # the arity (one for each direction). This is a bad idea for a recursive relationship, | ||
597 : | # since both sentences will say the same thing. | ||
598 : | my $arity = $relationshipStructure->{arity}; | ||
599 : | if ($arity eq "11") { | ||
600 : | parrello | 1.45 | $retVal .= "<p>Each <b>$fromEntity</b> relates to at most one <b>$toEntity</b>.\n"; |
601 : | parrello | 1.10 | } else { |
602 : | parrello | 1.45 | $retVal .= "<p>Each <b>$fromEntity</b> relates to multiple <b>$toEntity</b>s.\n"; |
603 : | parrello | 1.10 | if ($arity eq "MM" && $fromEntity ne $toEntity) { |
604 : | parrello | 1.45 | $retVal .= "Each <b>$toEntity</b> relates to multiple <b>$fromEntity</b>s.\n"; |
605 : | parrello | 1.10 | } |
606 : | } | ||
607 : | parrello | 1.45 | $retVal .= "</p>\n"; |
608 : | parrello | 1.10 | # If there are notes on this relationship, display them. |
609 : | if (my $notes = $relationshipStructure->{Notes}) { | ||
610 : | parrello | 1.73 | $retVal .= "<p>" . HTMLNote($notes->{content}) . "</p>\n"; |
611 : | parrello | 1.10 | } |
612 : | # Generate the relationship's relation table. | ||
613 : | my $htmlString = _ShowRelationTable($key, $relationshipStructure->{Relations}->{$key}); | ||
614 : | parrello | 1.45 | $retVal .= $htmlString; |
615 : | parrello | 1.10 | } |
616 : | Trace("Building MetaData join table.") if T(4); | ||
617 : | # Denote we're starting the join table. | ||
618 : | parrello | 1.45 | $retVal .= "<a name=\"JoinTable\"></a><h3>Join Table</h3>\n"; |
619 : | parrello | 1.10 | # Create a table header. |
620 : | parrello | 1.45 | $retVal .= _OpenTable("Join Table", "Source", "Target", "Join Condition"); |
621 : | parrello | 1.10 | # Loop through the joins. |
622 : | my $joinTable = $metadata->{Joins}; | ||
623 : | my @joinKeys = keys %{$joinTable}; | ||
624 : | for my $joinKey (sort @joinKeys) { | ||
625 : | # Separate out the source, the target, and the join clause. | ||
626 : | $joinKey =~ m!^([^/]+)/(.+)$!; | ||
627 : | my ($sourceRelation, $targetRelation) = ($1, $2); | ||
628 : | parrello | 1.30 | Trace("Join with key $joinKey is from $sourceRelation to $targetRelation.") if T(Joins => 4); |
629 : | parrello | 1.10 | my $source = $self->ComputeObjectSentence($sourceRelation); |
630 : | my $target = $self->ComputeObjectSentence($targetRelation); | ||
631 : | my $clause = $joinTable->{$joinKey}; | ||
632 : | # Display them in a table row. | ||
633 : | parrello | 1.45 | $retVal .= "<tr><td>$source</td><td>$target</td><td>$clause</td></tr>\n"; |
634 : | parrello | 1.10 | } |
635 : | # Close the table. | ||
636 : | parrello | 1.45 | $retVal .= _CloseTable(); |
637 : | Trace("Built MetaData HTML.") if T(3); | ||
638 : | # Return the HTML. | ||
639 : | return $retVal; | ||
640 : | parrello | 1.1 | } |
641 : | |||
642 : | =head3 DumpMetaData | ||
643 : | |||
644 : | parrello | 1.18 | C<< $erdb->DumpMetaData(); >> |
645 : | parrello | 1.1 | |
646 : | Return a dump of the metadata structure. | ||
647 : | |||
648 : | =cut | ||
649 : | |||
650 : | sub DumpMetaData { | ||
651 : | parrello | 1.10 | # Get the parameters. |
652 : | my ($self) = @_; | ||
653 : | # Dump the meta-data. | ||
654 : | return Data::Dumper::Dumper($self->{_metaData}); | ||
655 : | parrello | 1.1 | } |
656 : | |||
657 : | parrello | 1.74 | =head3 FindIndexForEntity |
658 : | |||
659 : | C<< my $indexFound = ERDB::FindIndexForEntity($xml, $entityName, $attributeName); >> | ||
660 : | |||
661 : | This method locates the entry in an entity's index list that begins with the | ||
662 : | specified attribute name. If the entity has no index list, one will be | ||
663 : | created. This method works on raw XML, not a live ERDB object. | ||
664 : | |||
665 : | =over 4 | ||
666 : | |||
667 : | =item xml | ||
668 : | |||
669 : | The raw XML structure defining the database. | ||
670 : | |||
671 : | =item entityName | ||
672 : | |||
673 : | The name of the relevant entity. | ||
674 : | |||
675 : | =item attributeName | ||
676 : | |||
677 : | The name of the attribute relevant to the search. | ||
678 : | |||
679 : | =item RETURN | ||
680 : | |||
681 : | The numerical index in the index list of the index entry for the specified entity and | ||
682 : | attribute, or C<undef> if no such index exists. | ||
683 : | |||
684 : | =back | ||
685 : | |||
686 : | =cut | ||
687 : | |||
688 : | sub FindIndexForEntity { | ||
689 : | # Get the parameters. | ||
690 : | my ($xml, $entityName, $attributeName) = @_; | ||
691 : | # Declare the return variable. | ||
692 : | my $retVal; | ||
693 : | # Get the named entity. | ||
694 : | my $entityData = $xml->{Entities}->{$entityName}; | ||
695 : | if (! $entityData) { | ||
696 : | Confess("Entity $entityName not found in DBD structure."); | ||
697 : | } else { | ||
698 : | # Insure it has an index list. | ||
699 : | if (! exists $entityData->{Indexes}) { | ||
700 : | $entityData->{Indexes} = []; | ||
701 : | } else { | ||
702 : | # Search for the desired index. | ||
703 : | my $indexList = $entityData->{Indexes}; | ||
704 : | my $n = scalar @{$indexList}; | ||
705 : | Trace("Searching $n indexes in index list for $entityName.") if T(2); | ||
706 : | # We use an indexed FOR here because we're returning an | ||
707 : | # index number instead of an object. We do THAT so we can | ||
708 : | # delete the index from the list if needed. | ||
709 : | for (my $i = 0; $i < $n && !defined($retVal); $i++) { | ||
710 : | my $index = $indexList->[$i]; | ||
711 : | my $fields = $index->{IndexFields}; | ||
712 : | # Technically this IF should be safe (that is, we are guaranteed | ||
713 : | # the existence of a "$fields->[0]"), because when we load the XML | ||
714 : | # we have SuppressEmpty specified. | ||
715 : | if ($fields->[0]->{name} eq $attributeName) { | ||
716 : | $retVal = $i; | ||
717 : | } | ||
718 : | } | ||
719 : | } | ||
720 : | } | ||
721 : | Trace("Index for $attributeName of $entityName found at position $retVal.") if defined($retVal) && T(3); | ||
722 : | Trace("Index for $attributeName not found in $entityName.") if !defined($retVal) && T(3); | ||
723 : | # Return the result. | ||
724 : | return $retVal; | ||
725 : | } | ||
726 : | |||
727 : | parrello | 1.1 | =head3 CreateTables |
728 : | |||
729 : | parrello | 1.18 | C<< $erdb->CreateTables(); >> |
730 : | parrello | 1.1 | |
731 : | This method creates the tables for the database from the metadata structure loaded by the | ||
732 : | constructor. It is expected this function will only be used on rare occasions, when the | ||
733 : | parrello | 1.2 | user needs to start with an empty database. Otherwise, the L</LoadTables> method can be |
734 : | parrello | 1.1 | used by itself with the truncate flag turned on. |
735 : | |||
736 : | =cut | ||
737 : | |||
738 : | sub CreateTables { | ||
739 : | parrello | 1.10 | # Get the parameters. |
740 : | my ($self) = @_; | ||
741 : | parrello | 1.23 | # Get the relation names. |
742 : | my @relNames = $self->GetTableNames(); | ||
743 : | # Loop through the relations. | ||
744 : | for my $relationName (@relNames) { | ||
745 : | # Create a table for this relation. | ||
746 : | $self->CreateTable($relationName); | ||
747 : | Trace("Relation $relationName created.") if T(2); | ||
748 : | parrello | 1.10 | } |
749 : | parrello | 1.1 | } |
750 : | |||
751 : | =head3 CreateTable | ||
752 : | |||
753 : | parrello | 1.18 | C<< $erdb->CreateTable($tableName, $indexFlag, $estimatedRows); >> |
754 : | parrello | 1.1 | |
755 : | Create the table for a relation and optionally create its indexes. | ||
756 : | |||
757 : | =over 4 | ||
758 : | |||
759 : | =item relationName | ||
760 : | |||
761 : | Name of the relation (which will also be the table name). | ||
762 : | |||
763 : | parrello | 1.18 | =item indexFlag |
764 : | parrello | 1.1 | |
765 : | TRUE if the indexes for the relation should be created, else FALSE. If FALSE, | ||
766 : | L</CreateIndexes> must be called later to bring the indexes into existence. | ||
767 : | |||
768 : | parrello | 1.18 | =item estimatedRows (optional) |
769 : | |||
770 : | If specified, the estimated maximum number of rows for the relation. This | ||
771 : | information allows the creation of tables using storage engines that are | ||
772 : | faster but require size estimates, such as MyISAM. | ||
773 : | |||
774 : | parrello | 1.1 | =back |
775 : | |||
776 : | =cut | ||
777 : | |||
778 : | sub CreateTable { | ||
779 : | parrello | 1.10 | # Get the parameters. |
780 : | parrello | 1.18 | my ($self, $relationName, $indexFlag, $estimatedRows) = @_; |
781 : | parrello | 1.10 | # Get the database handle. |
782 : | my $dbh = $self->{_dbh}; | ||
783 : | # Get the relation data and determine whether or not the relation is primary. | ||
784 : | my $relationData = $self->_FindRelation($relationName); | ||
785 : | my $rootFlag = $self->_IsPrimary($relationName); | ||
786 : | # Create a list of the field data. | ||
787 : | my @fieldList; | ||
788 : | for my $fieldData (@{$relationData->{Fields}}) { | ||
789 : | # Assemble the field name and type. | ||
790 : | my $fieldName = _FixName($fieldData->{name}); | ||
791 : | my $fieldString = "$fieldName $TypeTable{$fieldData->{type}}->{sqlType} NOT NULL "; | ||
792 : | # Push the result into the field list. | ||
793 : | push @fieldList, $fieldString; | ||
794 : | } | ||
795 : | # If this is a root table, add the "new_record" flag. It defaults to 0, so | ||
796 : | if ($rootFlag) { | ||
797 : | push @fieldList, "new_record $TypeTable{boolean}->{sqlType} NOT NULL DEFAULT 0"; | ||
798 : | } | ||
799 : | # Convert the field list into a comma-delimited string. | ||
800 : | my $fieldThing = join(', ', @fieldList); | ||
801 : | # Insure the table is not already there. | ||
802 : | $dbh->drop_table(tbl => $relationName); | ||
803 : | Trace("Table $relationName dropped.") if T(2); | ||
804 : | parrello | 1.18 | # If there are estimated rows, create an estimate so we can take advantage of |
805 : | # faster DB technologies. | ||
806 : | my $estimation = undef; | ||
807 : | if ($estimatedRows) { | ||
808 : | $estimation = [$self->EstimateRowSize($relationName), $estimatedRows]; | ||
809 : | } | ||
810 : | parrello | 1.10 | # Create the table. |
811 : | Trace("Creating table $relationName: $fieldThing") if T(2); | ||
812 : | parrello | 1.18 | $dbh->create_table(tbl => $relationName, flds => $fieldThing, estimates => $estimation); |
813 : | parrello | 1.10 | Trace("Relation $relationName created in database.") if T(2); |
814 : | parrello | 1.70 | # If we want to build the indexes, we do it here. Note that the full-text search |
815 : | # index will not be built until the table has been loaded. | ||
816 : | parrello | 1.10 | if ($indexFlag) { |
817 : | $self->CreateIndex($relationName); | ||
818 : | } | ||
819 : | parrello | 1.1 | } |
820 : | |||
821 : | parrello | 1.31 | =head3 VerifyFields |
822 : | |||
823 : | C<< my $count = $erdb->VerifyFields($relName, \@fieldList); >> | ||
824 : | |||
825 : | Run through the list of proposed field values, insuring that all the character fields are | ||
826 : | below the maximum length. If any fields are too long, they will be truncated in place. | ||
827 : | |||
828 : | =over 4 | ||
829 : | |||
830 : | =item relName | ||
831 : | |||
832 : | Name of the relation for which the specified fields are destined. | ||
833 : | |||
834 : | =item fieldList | ||
835 : | |||
836 : | Reference to a list, in order, of the fields to be put into the relation. | ||
837 : | |||
838 : | =item RETURN | ||
839 : | |||
840 : | Returns the number of fields truncated. | ||
841 : | |||
842 : | =back | ||
843 : | |||
844 : | =cut | ||
845 : | |||
846 : | sub VerifyFields { | ||
847 : | # Get the parameters. | ||
848 : | my ($self, $relName, $fieldList) = @_; | ||
849 : | # Initialize the return value. | ||
850 : | my $retVal = 0; | ||
851 : | # Get the relation definition. | ||
852 : | my $relData = $self->_FindRelation($relName); | ||
853 : | # Get the list of field descriptors. | ||
854 : | my $fieldTypes = $relData->{Fields}; | ||
855 : | my $fieldCount = scalar @{$fieldTypes}; | ||
856 : | # Loop through the two lists. | ||
857 : | for (my $i = 0; $i < $fieldCount; $i++) { | ||
858 : | # Get the type of the current field. | ||
859 : | my $fieldType = $fieldTypes->[$i]->{type}; | ||
860 : | # If it's a character field, verify the length. | ||
861 : | if ($fieldType =~ /string/) { | ||
862 : | my $maxLen = $TypeTable{$fieldType}->{maxLen}; | ||
863 : | my $oldString = $fieldList->[$i]; | ||
864 : | if (length($oldString) > $maxLen) { | ||
865 : | # Here it's too big, so we truncate it. | ||
866 : | Trace("Truncating field $i in relation $relName to $maxLen characters from \"$oldString\".") if T(1); | ||
867 : | $fieldList->[$i] = substr $oldString, 0, $maxLen; | ||
868 : | $retVal++; | ||
869 : | } | ||
870 : | } | ||
871 : | } | ||
872 : | # Return the truncation count. | ||
873 : | return $retVal; | ||
874 : | } | ||
875 : | |||
876 : | parrello | 1.42 | =head3 DigestFields |
877 : | |||
878 : | C<< $erdb->DigestFields($relName, $fieldList); >> | ||
879 : | |||
880 : | Digest the strings in the field list that correspond to data type C<hash-string> in the | ||
881 : | specified relation. | ||
882 : | |||
883 : | =over 4 | ||
884 : | |||
885 : | =item relName | ||
886 : | |||
887 : | Name of the relation to which the fields belong. | ||
888 : | |||
889 : | =item fieldList | ||
890 : | |||
891 : | List of field contents to be loaded into the relation. | ||
892 : | |||
893 : | =back | ||
894 : | |||
895 : | =cut | ||
896 : | #: Return Type ; | ||
897 : | sub DigestFields { | ||
898 : | # Get the parameters. | ||
899 : | my ($self, $relName, $fieldList) = @_; | ||
900 : | # Get the relation definition. | ||
901 : | my $relData = $self->_FindRelation($relName); | ||
902 : | # Get the list of field descriptors. | ||
903 : | my $fieldTypes = $relData->{Fields}; | ||
904 : | my $fieldCount = scalar @{$fieldTypes}; | ||
905 : | # Loop through the two lists. | ||
906 : | for (my $i = 0; $i < $fieldCount; $i++) { | ||
907 : | # Get the type of the current field. | ||
908 : | my $fieldType = $fieldTypes->[$i]->{type}; | ||
909 : | # If it's a hash string, digest it in place. | ||
910 : | if ($fieldType eq 'hash-string') { | ||
911 : | parrello | 1.46 | $fieldList->[$i] = $self->DigestKey($fieldList->[$i]); |
912 : | parrello | 1.42 | } |
913 : | } | ||
914 : | } | ||
915 : | |||
916 : | parrello | 1.46 | =head3 DigestKey |
917 : | |||
918 : | C<< my $digested = $erdb->DigestKey($keyValue); >> | ||
919 : | |||
920 : | Return the digested value of a symbolic key. The digested value can then be plugged into a | ||
921 : | key-based search into a table with key-type hash-string. | ||
922 : | |||
923 : | Currently the digesting process is independent of the database structure, but that may not | ||
924 : | always be the case, so this is an instance method instead of a static method. | ||
925 : | |||
926 : | =over 4 | ||
927 : | |||
928 : | =item keyValue | ||
929 : | |||
930 : | Key value to digest. | ||
931 : | |||
932 : | =item RETURN | ||
933 : | |||
934 : | parrello | 1.56 | Digested value of the key. |
935 : | parrello | 1.46 | |
936 : | =back | ||
937 : | |||
938 : | =cut | ||
939 : | |||
940 : | sub DigestKey { | ||
941 : | # Get the parameters. | ||
942 : | my ($self, $keyValue) = @_; | ||
943 : | # Compute the digest. | ||
944 : | my $retVal = md5_base64($keyValue); | ||
945 : | # Return the result. | ||
946 : | return $retVal; | ||
947 : | } | ||
948 : | |||
949 : | parrello | 1.1 | =head3 CreateIndex |
950 : | |||
951 : | parrello | 1.18 | C<< $erdb->CreateIndex($relationName); >> |
952 : | parrello | 1.1 | |
953 : | Create the indexes for a relation. If a table is being loaded from a large source file (as | ||
954 : | parrello | 1.12 | is the case in L</LoadTable>), it is sometimes best to create the indexes after the load. |
955 : | If that is the case, then L</CreateTable> should be called with the index flag set to | ||
956 : | FALSE, and this method used after the load to create the indexes for the table. | ||
957 : | parrello | 1.1 | |
958 : | =cut | ||
959 : | |||
960 : | sub CreateIndex { | ||
961 : | parrello | 1.10 | # Get the parameters. |
962 : | my ($self, $relationName) = @_; | ||
963 : | # Get the relation's descriptor. | ||
964 : | my $relationData = $self->_FindRelation($relationName); | ||
965 : | # Get the database handle. | ||
966 : | my $dbh = $self->{_dbh}; | ||
967 : | # Now we need to create this relation's indexes. We do this by looping through its index table. | ||
968 : | my $indexHash = $relationData->{Indexes}; | ||
969 : | for my $indexName (keys %{$indexHash}) { | ||
970 : | my $indexData = $indexHash->{$indexName}; | ||
971 : | # Get the index's field list. | ||
972 : | parrello | 1.75 | my @rawFields = @{$indexData->{IndexFields}}; |
973 : | # Get a hash of the relation's field types. | ||
974 : | my %types = map { $_->{name} => $_->{type} } @{$relationData->{Fields}}; | ||
975 : | parrello | 1.78 | # We need to check for text fields so we can append a length limitation for them. To do |
976 : | parrello | 1.75 | # that, we need the relation's field list. |
977 : | my $relFields = $relationData->{Fields}; | ||
978 : | for (my $i = 0; $i <= $#rawFields; $i++) { | ||
979 : | # Get the field type. | ||
980 : | my $field = $rawFields[$i]; | ||
981 : | my $type = $types{$field}; | ||
982 : | # Ask if it requires using prefix notation for the index. | ||
983 : | my $mod = $TypeTable{$type}->{indexMod}; | ||
984 : | Trace("Field $field ($i) in $relationName has type $type and indexMod $mod.") if T(3); | ||
985 : | if ($mod) { | ||
986 : | # Append the prefix length to the field name, | ||
987 : | $rawFields[$i] .= "($mod)"; | ||
988 : | } | ||
989 : | } | ||
990 : | my @fieldList = _FixNames(@rawFields); | ||
991 : | parrello | 1.10 | my $flds = join(', ', @fieldList); |
992 : | # Get the index's uniqueness flag. | ||
993 : | parrello | 1.70 | my $unique = (exists $indexData->{Unique} ? 'unique' : undef); |
994 : | parrello | 1.10 | # Create the index. |
995 : | parrello | 1.24 | my $rv = $dbh->create_index(idx => $indexName, tbl => $relationName, |
996 : | parrello | 1.70 | flds => $flds, kind => $unique); |
997 : | parrello | 1.24 | if ($rv) { |
998 : | Trace("Index created: $indexName for $relationName ($flds)") if T(1); | ||
999 : | } else { | ||
1000 : | Confess("Error creating index $indexName for $relationName using ($flds): " . $dbh->error_message()); | ||
1001 : | } | ||
1002 : | parrello | 1.10 | } |
1003 : | parrello | 1.1 | } |
1004 : | |||
1005 : | parrello | 1.75 | =head3 GetSecondaryFields |
1006 : | |||
1007 : | C<< my %fieldTuples = $erdb->GetSecondaryFields($entityName); >> | ||
1008 : | |||
1009 : | This method will return a list of the name and type of each of the secondary | ||
1010 : | fields for a specified entity. Secondary fields are stored in two-column tables | ||
1011 : | in addition to the primary entity table. This enables the field to have no value | ||
1012 : | or to have multiple values. | ||
1013 : | |||
1014 : | =over 4 | ||
1015 : | |||
1016 : | =item entityName | ||
1017 : | |||
1018 : | Name of the entity whose secondary fields are desired. | ||
1019 : | |||
1020 : | =item RETURN | ||
1021 : | |||
1022 : | Returns a hash mapping the field names to their field types. | ||
1023 : | |||
1024 : | =back | ||
1025 : | |||
1026 : | =cut | ||
1027 : | |||
1028 : | sub GetSecondaryFields { | ||
1029 : | # Get the parameters. | ||
1030 : | my ($self, $entityName) = @_; | ||
1031 : | # Declare the return variable. | ||
1032 : | my %retVal = (); | ||
1033 : | # Look for the entity. | ||
1034 : | my $table = $self->GetFieldTable($entityName); | ||
1035 : | # Loop through the fields, pulling out the secondaries. | ||
1036 : | for my $field (sort keys %{$table}) { | ||
1037 : | if ($table->{$field}->{relation} ne $entityName) { | ||
1038 : | # Here we have a secondary field. | ||
1039 : | $retVal{$field} = $table->{$field}->{type}; | ||
1040 : | } | ||
1041 : | } | ||
1042 : | # Return the result. | ||
1043 : | return %retVal; | ||
1044 : | } | ||
1045 : | |||
1046 : | =head3 GetFieldRelationName | ||
1047 : | |||
1048 : | C<< my $name = $erdb->GetFieldRelationName($objectName, $fieldName); >> | ||
1049 : | |||
1050 : | Return the name of the relation containing a specified field. | ||
1051 : | |||
1052 : | =over 4 | ||
1053 : | |||
1054 : | =item objectName | ||
1055 : | |||
1056 : | Name of the entity or relationship containing the field. | ||
1057 : | |||
1058 : | =item fieldName | ||
1059 : | |||
1060 : | Name of the relevant field in that entity or relationship. | ||
1061 : | |||
1062 : | =item RETURN | ||
1063 : | |||
1064 : | Returns the name of the database relation containing the field, or C<undef> if | ||
1065 : | the field does not exist. | ||
1066 : | |||
1067 : | =back | ||
1068 : | |||
1069 : | =cut | ||
1070 : | |||
1071 : | sub GetFieldRelationName { | ||
1072 : | # Get the parameters. | ||
1073 : | my ($self, $objectName, $fieldName) = @_; | ||
1074 : | # Declare the return variable. | ||
1075 : | my $retVal; | ||
1076 : | # Get the object field table. | ||
1077 : | my $table = $self->GetFieldTable($objectName); | ||
1078 : | # Only proceed if the field exists. | ||
1079 : | if (exists $table->{$fieldName}) { | ||
1080 : | # Determine the name of the relation that contains this field. | ||
1081 : | $retVal = $table->{$fieldName}->{relation}; | ||
1082 : | } | ||
1083 : | # Return the result. | ||
1084 : | return $retVal; | ||
1085 : | } | ||
1086 : | |||
1087 : | =head3 DeleteValue | ||
1088 : | |||
1089 : | C<< my $numDeleted = $erdb->DeleteValue($entityName, $id, $fieldName, $fieldValue); >> | ||
1090 : | |||
1091 : | Delete secondary field values from the database. This method can be used to delete all | ||
1092 : | values of a specified field for a particular entity instance, or only a single value. | ||
1093 : | |||
1094 : | Secondary fields are stored in two-column relations separate from an entity's primary | ||
1095 : | table, and as a result a secondary field can legitimately have no value or multiple | ||
1096 : | values. Therefore, it makes sense to talk about deleting secondary fields where it | ||
1097 : | would not make sense for primary fields. | ||
1098 : | |||
1099 : | =over 4 | ||
1100 : | |||
1101 : | =item entityName | ||
1102 : | |||
1103 : | Name of the entity from which the fields are to be deleted. | ||
1104 : | |||
1105 : | =item id | ||
1106 : | |||
1107 : | ID of the entity instance to be processed. If the instance is not found, this | ||
1108 : | parrello | 1.76 | method will have no effect. If C<undef> is specified, all values for all of |
1109 : | the entity instances will be deleted. | ||
1110 : | parrello | 1.75 | |
1111 : | =item fieldName | ||
1112 : | |||
1113 : | Name of the field whose values are to be deleted. | ||
1114 : | |||
1115 : | =item fieldValue (optional) | ||
1116 : | |||
1117 : | Value to be deleted. If not specified, then all values of the specified field | ||
1118 : | will be deleted for the entity instance. If specified, then only the values which | ||
1119 : | match this parameter will be deleted. | ||
1120 : | |||
1121 : | =item RETURN | ||
1122 : | |||
1123 : | Returns the number of rows deleted. | ||
1124 : | |||
1125 : | =back | ||
1126 : | |||
1127 : | =cut | ||
1128 : | |||
1129 : | sub DeleteValue { | ||
1130 : | # Get the parameters. | ||
1131 : | my ($self, $entityName, $id, $fieldName, $fieldValue) = @_; | ||
1132 : | # Declare the return value. | ||
1133 : | my $retVal = 0; | ||
1134 : | # We need to set up an SQL command to do the deletion. First, we | ||
1135 : | # find the name of the field's relation. | ||
1136 : | my $table = $self->GetFieldTable($entityName); | ||
1137 : | my $field = $table->{$fieldName}; | ||
1138 : | my $relation = $field->{relation}; | ||
1139 : | # Make sure this is a secondary field. | ||
1140 : | if ($relation eq $entityName) { | ||
1141 : | Confess("Cannot delete values of $fieldName for $entityName."); | ||
1142 : | } else { | ||
1143 : | # Set up the SQL command to delete all values. | ||
1144 : | parrello | 1.76 | my $sql = "DELETE FROM $relation"; |
1145 : | # Build the filter. | ||
1146 : | my @filters = (); | ||
1147 : | my @parms = (); | ||
1148 : | # Check for a filter by ID. | ||
1149 : | if (defined $id) { | ||
1150 : | push @filters, "id = ?"; | ||
1151 : | push @parms, $id; | ||
1152 : | } | ||
1153 : | # Check for a filter by value. | ||
1154 : | parrello | 1.75 | if (defined $fieldValue) { |
1155 : | parrello | 1.76 | push @filters, "$fieldName = ?"; |
1156 : | parrello | 1.75 | push @parms, $fieldValue; |
1157 : | } | ||
1158 : | parrello | 1.76 | # Append the filters to the command. |
1159 : | if (@filters) { | ||
1160 : | $sql .= " WHERE " . join(" AND ", @filters); | ||
1161 : | } | ||
1162 : | parrello | 1.75 | # Execute the command. |
1163 : | my $dbh = $self->{_dbh}; | ||
1164 : | $retVal = $dbh->SQL($sql, 0, @parms); | ||
1165 : | } | ||
1166 : | # Return the result. | ||
1167 : | return $retVal; | ||
1168 : | } | ||
1169 : | |||
1170 : | parrello | 1.1 | =head3 LoadTables |
1171 : | |||
1172 : | parrello | 1.18 | C<< my $stats = $erdb->LoadTables($directoryName, $rebuild); >> |
1173 : | parrello | 1.1 | |
1174 : | This method will load the database tables from a directory. The tables must already have been created | ||
1175 : | in the database. (This can be done by calling L</CreateTables>.) The caller passes in a directory name; | ||
1176 : | all of the relations to be loaded must have a file in the directory with the same name as the relation | ||
1177 : | (optionally with a suffix of C<.dtx>). Each file must be a tab-delimited table of field values. Each | ||
1178 : | line of the file will be loaded as a row of the target relation table. The field values should be in | ||
1179 : | the same order as the fields in the relation tables generated by L</ShowMetaData>. The old data is | ||
1180 : | erased before the new data is loaded in. | ||
1181 : | |||
1182 : | A certain amount of translation automatically takes place. Ctrl-M characters are deleted, and | ||
1183 : | tab and new-line characters inside a field are escaped as C<\t> and C<\n>, respectively. Dates must | ||
1184 : | be entered as a Unix timestamp, that is, as an integer number of seconds since the base epoch. | ||
1185 : | |||
1186 : | =over 4 | ||
1187 : | |||
1188 : | =item directoryName | ||
1189 : | |||
1190 : | Name of the directory containing the relation files to be loaded. | ||
1191 : | |||
1192 : | =item rebuild | ||
1193 : | |||
1194 : | TRUE if the tables should be dropped and rebuilt, else FALSE. This is, unfortunately, the | ||
1195 : | only way to erase existing data in the tables, since the TRUNCATE command is not supported | ||
1196 : | by all of the DB engines we use. | ||
1197 : | |||
1198 : | =item RETURN | ||
1199 : | |||
1200 : | Returns a statistical object describing the number of records read and a list of the error messages. | ||
1201 : | |||
1202 : | =back | ||
1203 : | |||
1204 : | =cut | ||
1205 : | |||
1206 : | sub LoadTables { | ||
1207 : | parrello | 1.10 | # Get the parameters. |
1208 : | my ($self, $directoryName, $rebuild) = @_; | ||
1209 : | # Start the timer. | ||
1210 : | my $startTime = gettimeofday; | ||
1211 : | # Clean any trailing slash from the directory name. | ||
1212 : | $directoryName =~ s!/\\$!!; | ||
1213 : | # Declare the return variable. | ||
1214 : | my $retVal = Stats->new(); | ||
1215 : | parrello | 1.23 | # Get the relation names. |
1216 : | my @relNames = $self->GetTableNames(); | ||
1217 : | for my $relationName (@relNames) { | ||
1218 : | # Try to load this relation. | ||
1219 : | my $result = $self->_LoadRelation($directoryName, $relationName, $rebuild); | ||
1220 : | parrello | 1.10 | # Accumulate the statistics. |
1221 : | $retVal->Accumulate($result); | ||
1222 : | } | ||
1223 : | # Add the duration of the load to the statistical object. | ||
1224 : | $retVal->Add('duration', gettimeofday - $startTime); | ||
1225 : | # Return the accumulated statistics. | ||
1226 : | return $retVal; | ||
1227 : | parrello | 1.1 | } |
1228 : | |||
1229 : | parrello | 1.23 | |
1230 : | parrello | 1.1 | =head3 GetTableNames |
1231 : | |||
1232 : | parrello | 1.18 | C<< my @names = $erdb->GetTableNames; >> |
1233 : | parrello | 1.1 | |
1234 : | Return a list of the relations required to implement this database. | ||
1235 : | |||
1236 : | =cut | ||
1237 : | |||
1238 : | sub GetTableNames { | ||
1239 : | parrello | 1.10 | # Get the parameters. |
1240 : | my ($self) = @_; | ||
1241 : | # Get the relation list from the metadata. | ||
1242 : | my $relationTable = $self->{_metaData}->{RelationTable}; | ||
1243 : | # Return the relation names. | ||
1244 : | return keys %{$relationTable}; | ||
1245 : | parrello | 1.1 | } |
1246 : | |||
1247 : | =head3 GetEntityTypes | ||
1248 : | |||
1249 : | parrello | 1.18 | C<< my @names = $erdb->GetEntityTypes; >> |
1250 : | parrello | 1.1 | |
1251 : | Return a list of the entity type names. | ||
1252 : | |||
1253 : | =cut | ||
1254 : | |||
1255 : | sub GetEntityTypes { | ||
1256 : | parrello | 1.10 | # Get the database object. |
1257 : | my ($self) = @_; | ||
1258 : | # Get the entity list from the metadata object. | ||
1259 : | my $entityList = $self->{_metaData}->{Entities}; | ||
1260 : | # Return the list of entity names in alphabetical order. | ||
1261 : | return sort keys %{$entityList}; | ||
1262 : | parrello | 1.1 | } |
1263 : | |||
1264 : | parrello | 1.73 | =head3 GetDataTypes |
1265 : | |||
1266 : | C<< my %types = ERDB::GetDataTypes(); >> | ||
1267 : | |||
1268 : | Return a table of ERDB data types. The table returned is a hash of hashes. | ||
1269 : | The keys of the big hash are the datatypes. Each smaller hash has several | ||
1270 : | values used to manage the data. The most interesting is the SQL type (key | ||
1271 : | C<sqlType>) and the descriptive node (key C<notes>). | ||
1272 : | |||
1273 : | Note that changing the values in the smaller hashes will seriously break | ||
1274 : | things, so this data should be treated as read-only. | ||
1275 : | |||
1276 : | =cut | ||
1277 : | |||
1278 : | sub GetDataTypes { | ||
1279 : | return %TypeTable; | ||
1280 : | } | ||
1281 : | |||
1282 : | |||
1283 : | parrello | 1.20 | =head3 IsEntity |
1284 : | |||
1285 : | C<< my $flag = $erdb->IsEntity($entityName); >> | ||
1286 : | |||
1287 : | Return TRUE if the parameter is an entity name, else FALSE. | ||
1288 : | |||
1289 : | =over 4 | ||
1290 : | |||
1291 : | =item entityName | ||
1292 : | |||
1293 : | Object name to be tested. | ||
1294 : | |||
1295 : | =item RETURN | ||
1296 : | |||
1297 : | Returns TRUE if the specified string is an entity name, else FALSE. | ||
1298 : | |||
1299 : | =back | ||
1300 : | |||
1301 : | =cut | ||
1302 : | |||
1303 : | sub IsEntity { | ||
1304 : | # Get the parameters. | ||
1305 : | my ($self, $entityName) = @_; | ||
1306 : | # Test to see if it's an entity. | ||
1307 : | return exists $self->{_metaData}->{Entities}->{$entityName}; | ||
1308 : | } | ||
1309 : | |||
1310 : | parrello | 1.1 | =head3 Get |
1311 : | |||
1312 : | parrello | 1.45 | C<< my $query = $erdb->Get(\@objectNames, $filterClause, \@params); >> |
1313 : | parrello | 1.1 | |
1314 : | This method returns a query object for entities of a specified type using a specified filter. | ||
1315 : | The filter is a standard WHERE/ORDER BY clause with question marks as parameter markers and each | ||
1316 : | field name represented in the form B<I<objectName>(I<fieldName>)>. For example, the | ||
1317 : | following call requests all B<Genome> objects for the genus specified in the variable | ||
1318 : | $genus. | ||
1319 : | |||
1320 : | parrello | 1.45 | C<< $query = $erdb->Get(['Genome'], "Genome(genus) = ?", [$genus]); >> |
1321 : | parrello | 1.1 | |
1322 : | The WHERE clause contains a single question mark, so there is a single additional | ||
1323 : | parameter representing the parameter value. It would also be possible to code | ||
1324 : | |||
1325 : | parrello | 1.18 | C<< $query = $erdb->Get(['Genome'], "Genome(genus) = \'$genus\'"); >> |
1326 : | parrello | 1.1 | |
1327 : | however, this version of the call would generate a syntax error if there were any quote | ||
1328 : | characters inside the variable C<$genus>. | ||
1329 : | |||
1330 : | The use of the strange parenthesized notation for field names enables us to distinguish | ||
1331 : | hyphens contained within field names from minus signs that participate in the computation | ||
1332 : | of the WHERE clause. All of the methods that manipulate fields will use this same notation. | ||
1333 : | |||
1334 : | It is possible to specify multiple entity and relationship names in order to retrieve more than | ||
1335 : | one object's data at the same time, which allows highly complex joined queries. For example, | ||
1336 : | |||
1337 : | parrello | 1.45 | C<< $query = $erdb->Get(['Genome', 'ComesFrom', 'Source'], "Genome(genus) = ?", [$genus]); >> |
1338 : | parrello | 1.1 | |
1339 : | If multiple names are specified, then the query processor will automatically determine a | ||
1340 : | join path between the entities and relationships. The algorithm used is very simplistic. | ||
1341 : | parrello | 1.39 | In particular, if a relationship is recursive, the path is determined by the order in which |
1342 : | the entity and the relationship appear. For example, consider a recursive relationship | ||
1343 : | B<IsParentOf> which relates B<People> objects to other B<People> objects. If the join path is | ||
1344 : | parrello | 1.1 | coded as C<['People', 'IsParentOf']>, then the people returned will be parents. If, however, |
1345 : | the join path is C<['IsParentOf', 'People']>, then the people returned will be children. | ||
1346 : | |||
1347 : | parrello | 1.39 | If an entity or relationship is mentioned twice, the name for the second occurrence will |
1348 : | be suffixed with C<2>, the third occurrence will be suffixed with C<3>, and so forth. So, | ||
1349 : | for example, if we have C<['Feature', 'HasContig', 'Contig', 'HasContig']>, then the | ||
1350 : | B<to-link> field of the first B<HasContig> is specified as C<HasContig(to-link)>, while | ||
1351 : | the B<to-link> field of the second B<HasContig> is specified as C<HasContig2(to-link)>. | ||
1352 : | |||
1353 : | parrello | 1.1 | =over 4 |
1354 : | |||
1355 : | =item objectNames | ||
1356 : | |||
1357 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1358 : | |||
1359 : | =item filterClause | ||
1360 : | |||
1361 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1362 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
1363 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
1364 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
1365 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
1366 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
1367 : | be included in the list of object names. | ||
1368 : | |||
1369 : | The filter clause can also specify a sort order. To do this, simply follow the filter string | ||
1370 : | with an ORDER BY clause. For example, the following filter string gets all genomes for a | ||
1371 : | particular genus and sorts them by species name. | ||
1372 : | |||
1373 : | C<< "Genome(genus) = ? ORDER BY Genome(species)" >> | ||
1374 : | |||
1375 : | parrello | 1.30 | Note that the case is important. Only an uppercase "ORDER BY" with a single space will |
1376 : | be processed. The idea is to make it less likely to find the verb by accident. | ||
1377 : | |||
1378 : | parrello | 1.1 | The rules for field references in a sort order are the same as those for field references in the |
1379 : | filter clause in general; however, odd things may happen if a sort field is from a secondary | ||
1380 : | relation. | ||
1381 : | |||
1382 : | parrello | 1.39 | Finally, you can limit the number of rows returned by adding a LIMIT clause. The LIMIT must |
1383 : | be the last thing in the filter clause, and it contains only the word "LIMIT" followed by | ||
1384 : | a positive number. So, for example | ||
1385 : | |||
1386 : | C<< "Genome(genus) = ? ORDER BY Genome(species) LIMIT 10" >> | ||
1387 : | |||
1388 : | will only return the first ten genomes for the specified genus. The ORDER BY clause is not | ||
1389 : | required. For example, to just get the first 10 genomes in the B<Genome> table, you could | ||
1390 : | use | ||
1391 : | |||
1392 : | C<< "LIMIT 10" >> | ||
1393 : | |||
1394 : | parrello | 1.45 | =item params |
1395 : | parrello | 1.1 | |
1396 : | parrello | 1.45 | Reference to a list of parameter values to be substituted into the filter clause. |
1397 : | parrello | 1.1 | |
1398 : | =item RETURN | ||
1399 : | |||
1400 : | Returns a B<DBQuery> that can be used to iterate through all of the results. | ||
1401 : | |||
1402 : | =back | ||
1403 : | |||
1404 : | =cut | ||
1405 : | |||
1406 : | sub Get { | ||
1407 : | parrello | 1.10 | # Get the parameters. |
1408 : | parrello | 1.45 | my ($self, $objectNames, $filterClause, $params) = @_; |
1409 : | # Process the SQL stuff. | ||
1410 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = | ||
1411 : | $self->_SetupSQL($objectNames, $filterClause); | ||
1412 : | # Create the query. | ||
1413 : | my $command = "SELECT DISTINCT " . join(".*, ", @{$mappedNameListRef}) . | ||
1414 : | ".* $suffix"; | ||
1415 : | my $sth = $self->_GetStatementHandle($command, $params); | ||
1416 : | parrello | 1.39 | # Now we create the relation map, which enables DBQuery to determine the order, name |
1417 : | # and mapped name for each object in the query. | ||
1418 : | my @relationMap = (); | ||
1419 : | parrello | 1.45 | for my $mappedName (@{$mappedNameListRef}) { |
1420 : | push @relationMap, [$mappedName, $mappedNameHashRef->{$mappedName}]; | ||
1421 : | parrello | 1.39 | } |
1422 : | parrello | 1.10 | # Return the statement object. |
1423 : | parrello | 1.39 | my $retVal = DBQuery::_new($self, $sth, \@relationMap); |
1424 : | parrello | 1.10 | return $retVal; |
1425 : | parrello | 1.1 | } |
1426 : | |||
1427 : | parrello | 1.78 | |
1428 : | |||
1429 : | parrello | 1.70 | =head3 Search |
1430 : | |||
1431 : | C<< my $query = $erdb->Search($searchExpression, $idx, \@objectNames, $filterClause, \@params); >> | ||
1432 : | |||
1433 : | Perform a full text search with filtering. The search will be against a specified object | ||
1434 : | in the object name list. That object will get an extra field containing the search | ||
1435 : | relevance. Note that except for the search expression, the parameters of this method are | ||
1436 : | the same as those for L</Get> and follow the same rules. | ||
1437 : | |||
1438 : | =over 4 | ||
1439 : | |||
1440 : | =item searchExpression | ||
1441 : | |||
1442 : | parrello | 1.75 | Boolean search expression for the text fields of the target object. The default mode for |
1443 : | a Boolean search expression is OR, but we want the default to be AND, so we will | ||
1444 : | add a C<+> operator to each word with no other operator before it. | ||
1445 : | parrello | 1.70 | |
1446 : | =item idx | ||
1447 : | |||
1448 : | Index in the I<$objectNames> list of the table to be searched in full-text mode. | ||
1449 : | |||
1450 : | =item objectNames | ||
1451 : | |||
1452 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1453 : | |||
1454 : | =item filterClause | ||
1455 : | |||
1456 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1457 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
1458 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
1459 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
1460 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
1461 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
1462 : | be included in the list of object names. | ||
1463 : | |||
1464 : | =item params | ||
1465 : | |||
1466 : | Reference to a list of parameter values to be substituted into the filter clause. | ||
1467 : | |||
1468 : | =item RETURN | ||
1469 : | |||
1470 : | Returns a query object for the specified search. | ||
1471 : | |||
1472 : | =back | ||
1473 : | |||
1474 : | =cut | ||
1475 : | |||
1476 : | sub Search { | ||
1477 : | # Get the parameters. | ||
1478 : | my ($self, $searchExpression, $idx, $objectNames, $filterClause, $params) = @_; | ||
1479 : | # Declare the return variable. | ||
1480 : | my $retVal; | ||
1481 : | parrello | 1.72 | # Create a safety copy of the parameter list. Note we have to be careful to insure |
1482 : | # a parameter list exists before we copy it. | ||
1483 : | my @myParams = (); | ||
1484 : | if (defined $params) { | ||
1485 : | @myParams = @{$params}; | ||
1486 : | } | ||
1487 : | parrello | 1.70 | # Get the first object's structure so we have access to the searchable fields. |
1488 : | my $object1Name = $objectNames->[$idx]; | ||
1489 : | my $object1Structure = $self->_GetStructure($object1Name); | ||
1490 : | # Get the field list. | ||
1491 : | if (! exists $object1Structure->{searchFields}) { | ||
1492 : | Confess("No searchable index for $object1Name."); | ||
1493 : | } else { | ||
1494 : | # Get the field list. | ||
1495 : | my @fields = @{$object1Structure->{searchFields}}; | ||
1496 : | parrello | 1.71 | # Clean the search expression. |
1497 : | my $actualKeywords = $self->CleanKeywords($searchExpression); | ||
1498 : | parrello | 1.75 | # Prefix a "+" to each uncontrolled word. This converts the default |
1499 : | # search mode from OR to AND. | ||
1500 : | parrello | 1.80 | $actualKeywords =~ s/(^|\s)(\w|")/$1\+$2/g; |
1501 : | parrello | 1.72 | Trace("Actual keywords for search are\n$actualKeywords") if T(3); |
1502 : | parrello | 1.70 | # We need two match expressions, one for the filter clause and one in the |
1503 : | # query itself. Both will use a parameter mark, so we need to push the | ||
1504 : | # search expression onto the front of the parameter list twice. | ||
1505 : | parrello | 1.71 | unshift @myParams, $actualKeywords, $actualKeywords; |
1506 : | parrello | 1.70 | # Build the match expression. |
1507 : | my @matchFilterFields = map { "$object1Name." . _FixName($_) } @fields; | ||
1508 : | my $matchClause = "MATCH (" . join(", ", @matchFilterFields) . ") AGAINST (? IN BOOLEAN MODE)"; | ||
1509 : | # Process the SQL stuff. | ||
1510 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = | ||
1511 : | $self->_SetupSQL($objectNames, $filterClause, $matchClause); | ||
1512 : | # Create the query. Note that the match clause is inserted at the front of | ||
1513 : | # the select fields. | ||
1514 : | my $command = "SELECT DISTINCT $matchClause, " . join(".*, ", @{$mappedNameListRef}) . | ||
1515 : | ".* $suffix"; | ||
1516 : | my $sth = $self->_GetStatementHandle($command, \@myParams); | ||
1517 : | # Now we create the relation map, which enables DBQuery to determine the order, name | ||
1518 : | # and mapped name for each object in the query. | ||
1519 : | my @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef); | ||
1520 : | # Return the statement object. | ||
1521 : | $retVal = DBQuery::_new($self, $sth, \@relationMap, $object1Name); | ||
1522 : | } | ||
1523 : | return $retVal; | ||
1524 : | } | ||
1525 : | |||
1526 : | parrello | 1.45 | =head3 GetFlat |
1527 : | |||
1528 : | C<< my @list = $erdb->GetFlat(\@objectNames, $filterClause, \@parameterList, $field); >> | ||
1529 : | |||
1530 : | This is a variation of L</GetAll> that asks for only a single field per record and | ||
1531 : | returns a single flattened list. | ||
1532 : | |||
1533 : | =over 4 | ||
1534 : | |||
1535 : | =item objectNames | ||
1536 : | |||
1537 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1538 : | |||
1539 : | =item filterClause | ||
1540 : | |||
1541 : | WHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1542 : | be parameterized with parameter markers (C<?>). Each field used must be specified in the standard form | ||
1543 : | B<I<objectName>(I<fieldName>)>. Any parameters specified in the filter clause should be added to the | ||
1544 : | parameter list as additional parameters. The fields in a filter clause can come from primary | ||
1545 : | entity relations, relationship relations, or secondary entity relations; however, all of the | ||
1546 : | entities and relationships involved must be included in the list of object names. | ||
1547 : | |||
1548 : | =item parameterList | ||
1549 : | |||
1550 : | List of the parameters to be substituted in for the parameters marks in the filter clause. | ||
1551 : | |||
1552 : | =item field | ||
1553 : | |||
1554 : | Name of the field to be used to get the elements of the list returned. | ||
1555 : | |||
1556 : | =item RETURN | ||
1557 : | |||
1558 : | Returns a list of values. | ||
1559 : | |||
1560 : | =back | ||
1561 : | |||
1562 : | =cut | ||
1563 : | #: Return Type @; | ||
1564 : | sub GetFlat { | ||
1565 : | # Get the parameters. | ||
1566 : | my ($self, $objectNames, $filterClause, $parameterList, $field) = @_; | ||
1567 : | # Construct the query. | ||
1568 : | my $query = $self->Get($objectNames, $filterClause, $parameterList); | ||
1569 : | # Create the result list. | ||
1570 : | my @retVal = (); | ||
1571 : | # Loop through the records, adding the field values found to the result list. | ||
1572 : | while (my $row = $query->Fetch()) { | ||
1573 : | push @retVal, $row->Value($field); | ||
1574 : | } | ||
1575 : | # Return the list created. | ||
1576 : | return @retVal; | ||
1577 : | } | ||
1578 : | |||
1579 : | parrello | 1.73 | =head3 SpecialFields |
1580 : | |||
1581 : | C<< my %specials = $erdb->SpecialFields($entityName); >> | ||
1582 : | |||
1583 : | Return a hash mapping special fields in the specified entity to the value of their | ||
1584 : | C<special> attribute. This enables the subclass to get access to the special field | ||
1585 : | attributes without needed to plumb the internal ERDB data structures. | ||
1586 : | |||
1587 : | =over 4 | ||
1588 : | |||
1589 : | =item entityName | ||
1590 : | |||
1591 : | Name of the entity whose special fields are desired. | ||
1592 : | |||
1593 : | =item RETURN | ||
1594 : | |||
1595 : | Returns a hash. The keys of the hash are the special field names, and the values | ||
1596 : | are the values from each special field's C<special> attribute. | ||
1597 : | |||
1598 : | =back | ||
1599 : | |||
1600 : | =cut | ||
1601 : | |||
1602 : | sub SpecialFields { | ||
1603 : | # Get the parameters. | ||
1604 : | my ($self, $entityName) = @_; | ||
1605 : | # Declare the return variable. | ||
1606 : | my %retVal = (); | ||
1607 : | # Find the entity's data structure. | ||
1608 : | parrello | 1.75 | my $entityData = $self->{_metaData}->{Entities}->{$entityName}; |
1609 : | parrello | 1.73 | # Loop through its fields, adding each special field to the return hash. |
1610 : | my $fieldHash = $entityData->{Fields}; | ||
1611 : | for my $fieldName (keys %{$fieldHash}) { | ||
1612 : | my $fieldData = $fieldHash->{$fieldName}; | ||
1613 : | if (exists $fieldData->{special}) { | ||
1614 : | $retVal{$fieldName} = $fieldData->{special}; | ||
1615 : | } | ||
1616 : | } | ||
1617 : | # Return the result. | ||
1618 : | return %retVal; | ||
1619 : | } | ||
1620 : | |||
1621 : | parrello | 1.32 | =head3 Delete |
1622 : | |||
1623 : | parrello | 1.78 | C<< my $stats = $erdb->Delete($entityName, $objectID, %options); >> |
1624 : | parrello | 1.32 | |
1625 : | Delete an entity instance from the database. The instance is deleted along with all entity and | ||
1626 : | parrello | 1.77 | relationship instances dependent on it. The definition of I<dependence> is recursive. |
1627 : | |||
1628 : | An object is always dependent on itself. An object is dependent if it is a 1-to-many or many-to-many | ||
1629 : | relationship connected to a dependent entity or if it is the "to" entity connected to a 1-to-many | ||
1630 : | parrello | 1.32 | dependent relationship. |
1631 : | |||
1632 : | =over 4 | ||
1633 : | |||
1634 : | =item entityName | ||
1635 : | |||
1636 : | Name of the entity type for the instance being deleted. | ||
1637 : | |||
1638 : | =item objectID | ||
1639 : | |||
1640 : | ID of the entity instance to be deleted. If the ID contains a wild card character (C<%>), | ||
1641 : | then it is presumed to by a LIKE pattern. | ||
1642 : | |||
1643 : | parrello | 1.78 | =item options |
1644 : | parrello | 1.32 | |
1645 : | parrello | 1.78 | A hash detailing the options for this delete operation. |
1646 : | parrello | 1.32 | |
1647 : | =item RETURN | ||
1648 : | |||
1649 : | Returns a statistics object indicating how many records of each particular table were | ||
1650 : | deleted. | ||
1651 : | |||
1652 : | =back | ||
1653 : | |||
1654 : | parrello | 1.78 | The permissible options for this method are as follows. |
1655 : | |||
1656 : | =over 4 | ||
1657 : | |||
1658 : | =item testMode | ||
1659 : | |||
1660 : | If TRUE, then the delete statements will be traced, but no changes will be made to the database. | ||
1661 : | |||
1662 : | =item keepRoot | ||
1663 : | |||
1664 : | If TRUE, then the entity instances will not be deleted, only the dependent records. | ||
1665 : | |||
1666 : | =back | ||
1667 : | |||
1668 : | parrello | 1.32 | =cut |
1669 : | #: Return Type $%; | ||
1670 : | sub Delete { | ||
1671 : | # Get the parameters. | ||
1672 : | parrello | 1.78 | my ($self, $entityName, $objectID, %options) = @_; |
1673 : | parrello | 1.32 | # Declare the return variable. |
1674 : | my $retVal = Stats->new(); | ||
1675 : | # Get the DBKernel object. | ||
1676 : | my $db = $self->{_dbh}; | ||
1677 : | # We're going to generate all the paths branching out from the starting entity. One of | ||
1678 : | # the things we have to be careful about is preventing loops. We'll use a hash to | ||
1679 : | # determine if we've hit a loop. | ||
1680 : | my %alreadyFound = (); | ||
1681 : | parrello | 1.33 | # These next lists will serve as our result stack. We start by pushing object lists onto |
1682 : | parrello | 1.32 | # the stack, and then popping them off to do the deletes. This means the deletes will |
1683 : | # start with the longer paths before getting to the shorter ones. That, in turn, makes | ||
1684 : | # sure we don't delete records that might be needed to forge relationships back to the | ||
1685 : | parrello | 1.33 | # original item. We have two lists-- one for TO-relationships, and one for |
1686 : | # FROM-relationships and entities. | ||
1687 : | my @fromPathList = (); | ||
1688 : | my @toPathList = (); | ||
1689 : | parrello | 1.78 | # This final list is used to remember what work still needs to be done. We push paths |
1690 : | parrello | 1.32 | # onto the list, then pop them off to extend the paths. We prime it with the starting |
1691 : | # point. Note that we will work hard to insure that the last item on a path in the | ||
1692 : | parrello | 1.75 | # to-do list is always an entity. |
1693 : | parrello | 1.32 | my @todoList = ([$entityName]); |
1694 : | while (@todoList) { | ||
1695 : | # Get the current path. | ||
1696 : | my $current = pop @todoList; | ||
1697 : | # Copy it into a list. | ||
1698 : | my @stackedPath = @{$current}; | ||
1699 : | # Pull off the last item on the path. It will always be an entity. | ||
1700 : | parrello | 1.78 | my $myEntityName = pop @stackedPath; |
1701 : | parrello | 1.32 | # Add it to the alreadyFound list. |
1702 : | parrello | 1.78 | $alreadyFound{$myEntityName} = 1; |
1703 : | # Figure out if we need to delete this entity. | ||
1704 : | if ($myEntityName ne $entityName || ! $options{keepRoot}) { | ||
1705 : | # Get the entity data. | ||
1706 : | my $entityData = $self->_GetStructure($myEntityName); | ||
1707 : | # Loop through the entity's relations. A DELETE command will be needed for each of them. | ||
1708 : | my $relations = $entityData->{Relations}; | ||
1709 : | for my $relation (keys %{$relations}) { | ||
1710 : | my @augmentedList = (@stackedPath, $relation); | ||
1711 : | push @fromPathList, \@augmentedList; | ||
1712 : | } | ||
1713 : | parrello | 1.32 | } |
1714 : | # Now we need to look for relationships connected to this entity. | ||
1715 : | my $relationshipList = $self->{_metaData}->{Relationships}; | ||
1716 : | for my $relationshipName (keys %{$relationshipList}) { | ||
1717 : | my $relationship = $relationshipList->{$relationshipName}; | ||
1718 : | # Check the FROM field. We're only interested if it's us. | ||
1719 : | parrello | 1.78 | if ($relationship->{from} eq $myEntityName) { |
1720 : | parrello | 1.32 | # Add the path to this relationship. |
1721 : | parrello | 1.78 | my @augmentedList = (@stackedPath, $myEntityName, $relationshipName); |
1722 : | parrello | 1.33 | push @fromPathList, \@augmentedList; |
1723 : | parrello | 1.32 | # Check the arity. If it's MM we're done. If it's 1M |
1724 : | # and the target hasn't been seen yet, we want to | ||
1725 : | # stack the entity for future processing. | ||
1726 : | if ($relationship->{arity} eq '1M') { | ||
1727 : | my $toEntity = $relationship->{to}; | ||
1728 : | if (! exists $alreadyFound{$toEntity}) { | ||
1729 : | # Here we have a new entity that's dependent on | ||
1730 : | # the current entity, so we need to stack it. | ||
1731 : | my @stackList = (@augmentedList, $toEntity); | ||
1732 : | parrello | 1.33 | push @fromPathList, \@stackList; |
1733 : | parrello | 1.34 | } else { |
1734 : | Trace("$toEntity ignored because it occurred previously.") if T(4); | ||
1735 : | parrello | 1.32 | } |
1736 : | } | ||
1737 : | } | ||
1738 : | # Now check the TO field. In this case only the relationship needs | ||
1739 : | parrello | 1.33 | # deletion. |
1740 : | parrello | 1.78 | if ($relationship->{to} eq $myEntityName) { |
1741 : | my @augmentedList = (@stackedPath, $myEntityName, $relationshipName); | ||
1742 : | parrello | 1.33 | push @toPathList, \@augmentedList; |
1743 : | parrello | 1.32 | } |
1744 : | } | ||
1745 : | } | ||
1746 : | # Create the first qualifier for the WHERE clause. This selects the | ||
1747 : | # keys of the primary entity records to be deleted. When we're deleting | ||
1748 : | parrello | 1.78 | # from a dependent table, we construct a join path from the first qualifier |
1749 : | parrello | 1.32 | # to the table containing the dependent records to delete. |
1750 : | my $qualifier = ($objectID =~ /%/ ? "LIKE ?" : "= ?"); | ||
1751 : | parrello | 1.33 | # We need to make two passes. The first is through the to-list, and |
1752 : | # the second through the from-list. The from-list is second because | ||
1753 : | # the to-list may need to pass through some of the entities the | ||
1754 : | # from-list would delete. | ||
1755 : | my %stackList = ( from_link => \@fromPathList, to_link => \@toPathList ); | ||
1756 : | # Now it's time to do the deletes. We do it in two passes. | ||
1757 : | for my $keyName ('to_link', 'from_link') { | ||
1758 : | # Get the list for this key. | ||
1759 : | my @pathList = @{$stackList{$keyName}}; | ||
1760 : | parrello | 1.34 | Trace(scalar(@pathList) . " entries in path list for $keyName.") if T(3); |
1761 : | parrello | 1.33 | # Loop through this list. |
1762 : | while (my $path = pop @pathList) { | ||
1763 : | # Get the table whose rows are to be deleted. | ||
1764 : | my @pathTables = @{$path}; | ||
1765 : | parrello | 1.37 | # Start the DELETE statement. We need to call DBKernel because the |
1766 : | # syntax of a DELETE-USING varies among DBMSs. | ||
1767 : | parrello | 1.33 | my $target = $pathTables[$#pathTables]; |
1768 : | parrello | 1.37 | my $stmt = $db->SetUsing(@pathTables); |
1769 : | parrello | 1.33 | # Now start the WHERE. The first thing is the ID field from the starting table. That |
1770 : | # starting table will either be the entity relation or one of the entity's | ||
1771 : | # sub-relations. | ||
1772 : | $stmt .= " WHERE $pathTables[0].id $qualifier"; | ||
1773 : | # Now we run through the remaining entities in the path, connecting them up. | ||
1774 : | for (my $i = 1; $i <= $#pathTables; $i += 2) { | ||
1775 : | # Connect the current relationship to the preceding entity. | ||
1776 : | my ($entity, $rel) = @pathTables[$i-1,$i]; | ||
1777 : | # The style of connection depends on the direction of the relationship. | ||
1778 : | parrello | 1.35 | $stmt .= " AND $entity.id = $rel.$keyName"; |
1779 : | parrello | 1.32 | if ($i + 1 <= $#pathTables) { |
1780 : | # Here there's a next entity, so connect that to the relationship's | ||
1781 : | # to-link. | ||
1782 : | my $entity2 = $pathTables[$i+1]; | ||
1783 : | parrello | 1.35 | $stmt .= " AND $rel.to_link = $entity2.id"; |
1784 : | parrello | 1.32 | } |
1785 : | } | ||
1786 : | parrello | 1.33 | # Now we have our desired DELETE statement. |
1787 : | parrello | 1.78 | if ($options{testMode}) { |
1788 : | parrello | 1.33 | # Here the user wants to trace without executing. |
1789 : | Trace($stmt) if T(0); | ||
1790 : | } else { | ||
1791 : | parrello | 1.77 | # Here we can delete. Note that the SQL method dies with a confession |
1792 : | parrello | 1.33 | # if an error occurs, so we just go ahead and do it. |
1793 : | parrello | 1.36 | Trace("Executing delete from $target using '$objectID'.") if T(3); |
1794 : | my $rv = $db->SQL($stmt, 0, $objectID); | ||
1795 : | parrello | 1.33 | # Accumulate the statistics for this delete. The only rows deleted |
1796 : | # are from the target table, so we use its name to record the | ||
1797 : | # statistic. | ||
1798 : | $retVal->Add($target, $rv); | ||
1799 : | } | ||
1800 : | parrello | 1.32 | } |
1801 : | } | ||
1802 : | # Return the result. | ||
1803 : | return $retVal; | ||
1804 : | } | ||
1805 : | |||
1806 : | parrello | 1.78 | =head3 Disconnect |
1807 : | |||
1808 : | C<< $erdb->Disconnect($relationshipName, $originEntityName, $originEntityID); >> | ||
1809 : | |||
1810 : | Disconnect an entity instance from all the objects to which it is related. This | ||
1811 : | will delete each relationship instance that connects to the specified entity. | ||
1812 : | |||
1813 : | =over 4 | ||
1814 : | |||
1815 : | =item relationshipName | ||
1816 : | |||
1817 : | Name of the relationship whose instances are to be deleted. | ||
1818 : | |||
1819 : | =item originEntityName | ||
1820 : | |||
1821 : | Name of the entity that is to be disconnected. | ||
1822 : | |||
1823 : | =item originEntityID | ||
1824 : | |||
1825 : | ID of the entity that is to be disconnected. | ||
1826 : | |||
1827 : | =back | ||
1828 : | |||
1829 : | =cut | ||
1830 : | |||
1831 : | sub Disconnect { | ||
1832 : | # Get the parameters. | ||
1833 : | my ($self, $relationshipName, $originEntityName, $originEntityID) = @_; | ||
1834 : | # Get the relationship descriptor. | ||
1835 : | my $structure = $self->_GetStructure($relationshipName); | ||
1836 : | # Insure we have a relationship. | ||
1837 : | if (! exists $structure->{from}) { | ||
1838 : | Confess("$relationshipName is not a relationship in the database."); | ||
1839 : | } else { | ||
1840 : | # Get the database handle. | ||
1841 : | my $dbh = $self->{_dbh}; | ||
1842 : | # We'll set this value to 1 if we find our entity. | ||
1843 : | my $found = 0; | ||
1844 : | # Loop through the ends of the relationship. | ||
1845 : | for my $dir ('from', 'to') { | ||
1846 : | if ($structure->{$dir} eq $originEntityName) { | ||
1847 : | # Delete all relationship instances on this side of the entity instance. | ||
1848 : | parrello | 1.84 | Trace("Disconnecting in $dir direction with ID \"$originEntityID\"."); |
1849 : | parrello | 1.78 | $dbh->SQL("DELETE FROM $relationshipName WHERE ${dir}_link = ?", 0, $originEntityID); |
1850 : | $found = 1; | ||
1851 : | } | ||
1852 : | } | ||
1853 : | # Insure we found the entity on at least one end. | ||
1854 : | if (! $found) { | ||
1855 : | Confess("Entity \"$originEntityName\" does not use $relationshipName."); | ||
1856 : | } | ||
1857 : | } | ||
1858 : | } | ||
1859 : | |||
1860 : | parrello | 1.79 | =head3 DeleteRow |
1861 : | |||
1862 : | C<< $erdb->DeleteRow($relationshipName, $fromLink, $toLink, \%values); >> | ||
1863 : | |||
1864 : | Delete a row from a relationship. In most cases, only the from-link and to-link are | ||
1865 : | needed; however, for relationships with intersection data values can be specified | ||
1866 : | for the other fields using a hash. | ||
1867 : | |||
1868 : | =over 4 | ||
1869 : | |||
1870 : | =item relationshipName | ||
1871 : | |||
1872 : | Name of the relationship from which the row is to be deleted. | ||
1873 : | |||
1874 : | =item fromLink | ||
1875 : | |||
1876 : | ID of the entity instance in the From direction. | ||
1877 : | |||
1878 : | =item toLink | ||
1879 : | |||
1880 : | ID of the entity instance in the To direction. | ||
1881 : | |||
1882 : | =item values | ||
1883 : | |||
1884 : | Reference to a hash of other values to be used for filtering the delete. | ||
1885 : | |||
1886 : | =back | ||
1887 : | |||
1888 : | =cut | ||
1889 : | |||
1890 : | sub DeleteRow { | ||
1891 : | # Get the parameters. | ||
1892 : | my ($self, $relationshipName, $fromLink, $toLink, $values) = @_; | ||
1893 : | # Create a hash of all the filter information. | ||
1894 : | my %filter = ('from-link' => $fromLink, 'to-link' => $toLink); | ||
1895 : | if (defined $values) { | ||
1896 : | for my $key (keys %{$values}) { | ||
1897 : | $filter{$key} = $values->{$key}; | ||
1898 : | } | ||
1899 : | } | ||
1900 : | # Build an SQL statement out of the hash. | ||
1901 : | my @filters = (); | ||
1902 : | my @parms = (); | ||
1903 : | for my $key (keys %filter) { | ||
1904 : | push @filters, _FixName($key) . " = ?"; | ||
1905 : | push @parms, $filter{$key}; | ||
1906 : | } | ||
1907 : | Trace("Parms for delete row are " . join(", ", map { "\"$_\"" } @parms) . ".") if T(SQL => 4); | ||
1908 : | my $command = "DELETE FROM $relationshipName WHERE " . | ||
1909 : | join(" AND ", @filters); | ||
1910 : | # Execute it. | ||
1911 : | my $dbh = $self->{_dbh}; | ||
1912 : | $dbh->SQL($command, undef, @parms); | ||
1913 : | } | ||
1914 : | |||
1915 : | parrello | 1.70 | =head3 SortNeeded |
1916 : | |||
1917 : | C<< my $parms = $erdb->SortNeeded($relationName); >> | ||
1918 : | |||
1919 : | Return the pipe command for the sort that should be applied to the specified | ||
1920 : | relation when creating the load file. | ||
1921 : | |||
1922 : | For example, if the load file should be sorted ascending by the first | ||
1923 : | field, this method would return | ||
1924 : | |||
1925 : | sort -k1 -t"\t" | ||
1926 : | |||
1927 : | If the first field is numeric, the method would return | ||
1928 : | |||
1929 : | sort -k1n -t"\t" | ||
1930 : | |||
1931 : | Unfortunately, due to a bug in the C<sort> command, we cannot eliminate duplicate | ||
1932 : | keys using a sort. | ||
1933 : | |||
1934 : | =over 4 | ||
1935 : | |||
1936 : | =item relationName | ||
1937 : | |||
1938 : | Name of the relation to be examined. | ||
1939 : | |||
1940 : | =item | ||
1941 : | |||
1942 : | Returns the sort command to use for sorting the relation, suitable for piping. | ||
1943 : | |||
1944 : | =back | ||
1945 : | |||
1946 : | =cut | ||
1947 : | #: Return Type $; | ||
1948 : | sub SortNeeded { | ||
1949 : | # Get the parameters. | ||
1950 : | my ($self, $relationName) = @_; | ||
1951 : | # Declare a descriptor to hold the names of the key fields. | ||
1952 : | my @keyNames = (); | ||
1953 : | # Get the relation structure. | ||
1954 : | my $relationData = $self->_FindRelation($relationName); | ||
1955 : | # Find out if the relation is a primary entity relation, | ||
1956 : | # a relationship relation, or a secondary entity relation. | ||
1957 : | my $entityTable = $self->{_metaData}->{Entities}; | ||
1958 : | my $relationshipTable = $self->{_metaData}->{Relationships}; | ||
1959 : | if (exists $entityTable->{$relationName}) { | ||
1960 : | # Here we have a primary entity relation. | ||
1961 : | push @keyNames, "id"; | ||
1962 : | } elsif (exists $relationshipTable->{$relationName}) { | ||
1963 : | # Here we have a relationship. We sort using the FROM index. | ||
1964 : | my $relationshipData = $relationshipTable->{$relationName}; | ||
1965 : | parrello | 1.75 | my $index = $relationData->{Indexes}->{idxFrom}; |
1966 : | parrello | 1.70 | push @keyNames, @{$index->{IndexFields}}; |
1967 : | } else { | ||
1968 : | # Here we have a secondary entity relation, so we have a sort on the ID field. | ||
1969 : | push @keyNames, "id"; | ||
1970 : | } | ||
1971 : | # Now we parse the key names into sort parameters. First, we prime the return | ||
1972 : | # string. | ||
1973 : | my $retVal = "sort -t\"\t\" "; | ||
1974 : | # Get the relation's field list. | ||
1975 : | my @fields = @{$relationData->{Fields}}; | ||
1976 : | # Loop through the keys. | ||
1977 : | for my $keyData (@keyNames) { | ||
1978 : | # Get the key and the ordering. | ||
1979 : | my ($keyName, $ordering); | ||
1980 : | if ($keyData =~ /^([^ ]+) DESC/) { | ||
1981 : | ($keyName, $ordering) = ($1, "descending"); | ||
1982 : | } else { | ||
1983 : | ($keyName, $ordering) = ($keyData, "ascending"); | ||
1984 : | } | ||
1985 : | # Find the key's position and type. | ||
1986 : | my $fieldSpec; | ||
1987 : | for (my $i = 0; $i <= $#fields && ! $fieldSpec; $i++) { | ||
1988 : | my $thisField = $fields[$i]; | ||
1989 : | if ($thisField->{name} eq $keyName) { | ||
1990 : | # Get the sort modifier for this field type. The modifier | ||
1991 : | # decides whether we're using a character, numeric, or | ||
1992 : | # floating-point sort. | ||
1993 : | my $modifier = $TypeTable{$thisField->{type}}->{sort}; | ||
1994 : | # If the index is descending for this field, denote we want | ||
1995 : | # to reverse the sort order on this field. | ||
1996 : | if ($ordering eq 'descending') { | ||
1997 : | $modifier .= "r"; | ||
1998 : | } | ||
1999 : | # Store the position and modifier into the field spec, which | ||
2000 : | # will stop the inner loop. Note that the field number is | ||
2001 : | # 1-based in the sort command, so we have to increment the | ||
2002 : | # index. | ||
2003 : | $fieldSpec = ($i + 1) . $modifier; | ||
2004 : | } | ||
2005 : | } | ||
2006 : | # Add this field to the sort command. | ||
2007 : | $retVal .= " -k$fieldSpec"; | ||
2008 : | } | ||
2009 : | # Return the result. | ||
2010 : | return $retVal; | ||
2011 : | } | ||
2012 : | |||
2013 : | parrello | 1.6 | =head3 GetList |
2014 : | |||
2015 : | parrello | 1.45 | C<< my @dbObjects = $erdb->GetList(\@objectNames, $filterClause, \@params); >> |
2016 : | parrello | 1.6 | |
2017 : | Return a list of object descriptors for the specified objects as determined by the | ||
2018 : | specified filter clause. | ||
2019 : | |||
2020 : | This method is essentially the same as L</Get> except it returns a list of objects rather | ||
2021 : | parrello | 1.7 | than a query object that can be used to get the results one record at a time. |
2022 : | parrello | 1.6 | |
2023 : | =over 4 | ||
2024 : | |||
2025 : | =item objectNames | ||
2026 : | |||
2027 : | List containing the names of the entity and relationship objects to be retrieved. | ||
2028 : | |||
2029 : | =item filterClause | ||
2030 : | |||
2031 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
2032 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
2033 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
2034 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
2035 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
2036 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
2037 : | be included in the list of object names. | ||
2038 : | |||
2039 : | The filter clause can also specify a sort order. To do this, simply follow the filter string | ||
2040 : | with an ORDER BY clause. For example, the following filter string gets all genomes for a | ||
2041 : | particular genus and sorts them by species name. | ||
2042 : | |||
2043 : | C<< "Genome(genus) = ? ORDER BY Genome(species)" >> | ||
2044 : | |||
2045 : | The rules for field references in a sort order are the same as those for field references in the | ||
2046 : | filter clause in general; however, odd things may happen if a sort field is from a secondary | ||
2047 : | relation. | ||
2048 : | |||
2049 : | parrello | 1.45 | =item params |
2050 : | parrello | 1.6 | |
2051 : | parrello | 1.45 | Reference to a list of parameter values to be substituted into the filter clause. |
2052 : | parrello | 1.6 | |
2053 : | =item RETURN | ||
2054 : | |||
2055 : | Returns a list of B<DBObject>s that satisfy the query conditions. | ||
2056 : | |||
2057 : | =back | ||
2058 : | |||
2059 : | =cut | ||
2060 : | #: Return Type @% | ||
2061 : | sub GetList { | ||
2062 : | # Get the parameters. | ||
2063 : | parrello | 1.45 | my ($self, $objectNames, $filterClause, $params) = @_; |
2064 : | parrello | 1.10 | # Declare the return variable. |
2065 : | my @retVal = (); | ||
2066 : | # Perform the query. | ||
2067 : | parrello | 1.45 | my $query = $self->Get($objectNames, $filterClause, $params); |
2068 : | parrello | 1.10 | # Loop through the results. |
2069 : | while (my $object = $query->Fetch) { | ||
2070 : | push @retVal, $object; | ||
2071 : | } | ||
2072 : | parrello | 1.6 | # Return the result. |
2073 : | return @retVal; | ||
2074 : | } | ||
2075 : | |||
2076 : | parrello | 1.45 | =head3 GetCount |
2077 : | |||
2078 : | C<< my $count = $erdb->GetCount(\@objectNames, $filter, \@params); >> | ||
2079 : | |||
2080 : | Return the number of rows found by a specified query. This method would | ||
2081 : | normally be used to count the records in a single table. For example, in a | ||
2082 : | genetics database | ||
2083 : | |||
2084 : | my $count = $erdb->GetCount(['Genome'], 'Genome(genus-species) LIKE ?', ['homo %']); | ||
2085 : | |||
2086 : | would return the number of genomes for the genus I<homo>. It is conceivable, however, | ||
2087 : | to use it to return records based on a join. For example, | ||
2088 : | |||
2089 : | parrello | 1.47 | my $count = $erdb->GetCount(['HasFeature', 'Genome'], 'Genome(genus-species) LIKE ?', |
2090 : | parrello | 1.45 | ['homo %']); |
2091 : | |||
2092 : | would return the number of features for genomes in the genus I<homo>. Note that | ||
2093 : | only the rows from the first table are counted. If the above command were | ||
2094 : | |||
2095 : | my $count = $erdb->GetCount(['Genome', 'Feature'], 'Genome(genus-species) LIKE ?', | ||
2096 : | ['homo %']); | ||
2097 : | |||
2098 : | it would return the number of genomes, not the number of genome/feature pairs. | ||
2099 : | |||
2100 : | =over 4 | ||
2101 : | |||
2102 : | =item objectNames | ||
2103 : | |||
2104 : | Reference to a list of the objects (entities and relationships) included in the | ||
2105 : | query. | ||
2106 : | |||
2107 : | =item filter | ||
2108 : | |||
2109 : | A filter clause for restricting the query. The rules are the same as for the L</Get> | ||
2110 : | method. | ||
2111 : | |||
2112 : | =item params | ||
2113 : | |||
2114 : | Reference to a list of the parameter values to be substituted for the parameter marks | ||
2115 : | in the filter. | ||
2116 : | |||
2117 : | =item RETURN | ||
2118 : | |||
2119 : | Returns a count of the number of records in the first table that would satisfy | ||
2120 : | the query. | ||
2121 : | |||
2122 : | =back | ||
2123 : | |||
2124 : | =cut | ||
2125 : | |||
2126 : | sub GetCount { | ||
2127 : | # Get the parameters. | ||
2128 : | my ($self, $objectNames, $filter, $params) = @_; | ||
2129 : | parrello | 1.66 | # Insure the params argument is an array reference if the caller left it off. |
2130 : | if (! defined($params)) { | ||
2131 : | $params = []; | ||
2132 : | } | ||
2133 : | parrello | 1.45 | # Declare the return variable. |
2134 : | my $retVal; | ||
2135 : | parrello | 1.47 | # Find out if we're counting an entity or a relationship. |
2136 : | my $countedField; | ||
2137 : | if ($self->IsEntity($objectNames->[0])) { | ||
2138 : | $countedField = "id"; | ||
2139 : | } else { | ||
2140 : | # For a relationship we count the to-link because it's usually more | ||
2141 : | # numerous. Note we're automatically converting to the SQL form | ||
2142 : | # of the field name (to_link vs. to-link). | ||
2143 : | $countedField = "to_link"; | ||
2144 : | } | ||
2145 : | parrello | 1.45 | # Create the SQL command suffix to get the desired records. |
2146 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = $self->_SetupSQL($objectNames, | ||
2147 : | $filter); | ||
2148 : | # Prefix it with text telling it we want a record count. | ||
2149 : | my $firstObject = $mappedNameListRef->[0]; | ||
2150 : | parrello | 1.47 | my $command = "SELECT COUNT($firstObject.$countedField) $suffix"; |
2151 : | parrello | 1.45 | # Prepare and execute the command. |
2152 : | my $sth = $self->_GetStatementHandle($command, $params); | ||
2153 : | # Get the count value. | ||
2154 : | ($retVal) = $sth->fetchrow_array(); | ||
2155 : | # Check for a problem. | ||
2156 : | if (! defined($retVal)) { | ||
2157 : | if ($sth->err) { | ||
2158 : | # Here we had an SQL error. | ||
2159 : | Confess("Error retrieving row count: " . $sth->errstr()); | ||
2160 : | } else { | ||
2161 : | # Here we have no result. | ||
2162 : | Confess("No result attempting to retrieve row count."); | ||
2163 : | } | ||
2164 : | } | ||
2165 : | # Return the result. | ||
2166 : | return $retVal; | ||
2167 : | } | ||
2168 : | |||
2169 : | parrello | 1.1 | =head3 ComputeObjectSentence |
2170 : | |||
2171 : | parrello | 1.18 | C<< my $sentence = $erdb->ComputeObjectSentence($objectName); >> |
2172 : | parrello | 1.1 | |
2173 : | Check an object name, and if it is a relationship convert it to a relationship sentence. | ||
2174 : | |||
2175 : | =over 4 | ||
2176 : | |||
2177 : | =item objectName | ||
2178 : | |||
2179 : | Name of the entity or relationship. | ||
2180 : | |||
2181 : | =item RETURN | ||
2182 : | |||
2183 : | Returns a string containing the entity name or a relationship sentence. | ||
2184 : | |||
2185 : | =back | ||
2186 : | |||
2187 : | =cut | ||
2188 : | |||
2189 : | sub ComputeObjectSentence { | ||
2190 : | parrello | 1.10 | # Get the parameters. |
2191 : | my ($self, $objectName) = @_; | ||
2192 : | # Set the default return value. | ||
2193 : | my $retVal = $objectName; | ||
2194 : | # Look for the object as a relationship. | ||
2195 : | my $relTable = $self->{_metaData}->{Relationships}; | ||
2196 : | if (exists $relTable->{$objectName}) { | ||
2197 : | # Get the relationship sentence. | ||
2198 : | $retVal = _ComputeRelationshipSentence($objectName, $relTable->{$objectName}); | ||
2199 : | } | ||
2200 : | # Return the result. | ||
2201 : | return $retVal; | ||
2202 : | parrello | 1.1 | } |
2203 : | |||
2204 : | =head3 DumpRelations | ||
2205 : | |||
2206 : | parrello | 1.18 | C<< $erdb->DumpRelations($outputDirectory); >> |
2207 : | parrello | 1.1 | |
2208 : | Write the contents of all the relations to tab-delimited files in the specified directory. | ||
2209 : | Each file will have the same name as the relation dumped, with an extension of DTX. | ||
2210 : | |||
2211 : | =over 4 | ||
2212 : | |||
2213 : | =item outputDirectory | ||
2214 : | |||
2215 : | Name of the directory into which the relation files should be dumped. | ||
2216 : | |||
2217 : | =back | ||
2218 : | |||
2219 : | =cut | ||
2220 : | |||
2221 : | sub DumpRelations { | ||
2222 : | parrello | 1.10 | # Get the parameters. |
2223 : | my ($self, $outputDirectory) = @_; | ||
2224 : | # Now we need to run through all the relations. First, we loop through the entities. | ||
2225 : | my $metaData = $self->{_metaData}; | ||
2226 : | my $entities = $metaData->{Entities}; | ||
2227 : | for my $entityName (keys %{$entities}) { | ||
2228 : | my $entityStructure = $entities->{$entityName}; | ||
2229 : | # Get the entity's relations. | ||
2230 : | my $relationList = $entityStructure->{Relations}; | ||
2231 : | # Loop through the relations, dumping them. | ||
2232 : | for my $relationName (keys %{$relationList}) { | ||
2233 : | my $relation = $relationList->{$relationName}; | ||
2234 : | $self->_DumpRelation($outputDirectory, $relationName, $relation); | ||
2235 : | } | ||
2236 : | } | ||
2237 : | # Next, we loop through the relationships. | ||
2238 : | my $relationships = $metaData->{Relationships}; | ||
2239 : | for my $relationshipName (keys %{$relationships}) { | ||
2240 : | my $relationshipStructure = $relationships->{$relationshipName}; | ||
2241 : | # Dump this relationship's relation. | ||
2242 : | $self->_DumpRelation($outputDirectory, $relationshipName, $relationshipStructure->{Relations}->{$relationshipName}); | ||
2243 : | } | ||
2244 : | parrello | 1.1 | } |
2245 : | |||
2246 : | parrello | 1.51 | =head3 InsertValue |
2247 : | |||
2248 : | C<< $erdb->InsertValue($entityID, $fieldName, $value); >> | ||
2249 : | |||
2250 : | This method will insert a new value into the database. The value must be one | ||
2251 : | associated with a secondary relation, since primary values cannot be inserted: | ||
2252 : | they occur exactly once. Secondary values, on the other hand, can be missing | ||
2253 : | or multiply-occurring. | ||
2254 : | |||
2255 : | =over 4 | ||
2256 : | |||
2257 : | =item entityID | ||
2258 : | |||
2259 : | ID of the object that is to receive the new value. | ||
2260 : | |||
2261 : | =item fieldName | ||
2262 : | |||
2263 : | Field name for the new value-- this includes the entity name, since | ||
2264 : | field names are of the format I<objectName>C<(>I<fieldName>C<)>. | ||
2265 : | |||
2266 : | =item value | ||
2267 : | |||
2268 : | New value to be put in the field. | ||
2269 : | |||
2270 : | =back | ||
2271 : | |||
2272 : | =cut | ||
2273 : | |||
2274 : | sub InsertValue { | ||
2275 : | # Get the parameters. | ||
2276 : | my ($self, $entityID, $fieldName, $value) = @_; | ||
2277 : | # Parse the entity name and the real field name. | ||
2278 : | if ($fieldName =~ /^([^(]+)\(([^)]+)\)/) { | ||
2279 : | my $entityName = $1; | ||
2280 : | my $fieldTitle = $2; | ||
2281 : | # Get its descriptor. | ||
2282 : | if (!$self->IsEntity($entityName)) { | ||
2283 : | Confess("$entityName is not a valid entity."); | ||
2284 : | } else { | ||
2285 : | my $entityData = $self->{_metaData}->{Entities}->{$entityName}; | ||
2286 : | # Find the relation containing this field. | ||
2287 : | my $fieldHash = $entityData->{Fields}; | ||
2288 : | parrello | 1.52 | if (! exists $fieldHash->{$fieldTitle}) { |
2289 : | parrello | 1.51 | Confess("$fieldTitle not found in $entityName."); |
2290 : | } else { | ||
2291 : | my $relation = $fieldHash->{$fieldTitle}->{relation}; | ||
2292 : | if ($relation eq $entityName) { | ||
2293 : | Confess("Cannot do InsertValue on primary field $fieldTitle of $entityName."); | ||
2294 : | } else { | ||
2295 : | # Now we can create an INSERT statement. | ||
2296 : | my $dbh = $self->{_dbh}; | ||
2297 : | my $fixedName = _FixName($fieldTitle); | ||
2298 : | parrello | 1.53 | my $statement = "INSERT INTO $relation (id, $fixedName) VALUES(?, ?)"; |
2299 : | parrello | 1.51 | # Execute the command. |
2300 : | $dbh->SQL($statement, 0, $entityID, $value); | ||
2301 : | } | ||
2302 : | } | ||
2303 : | } | ||
2304 : | } else { | ||
2305 : | Confess("$fieldName is not a valid field name."); | ||
2306 : | } | ||
2307 : | } | ||
2308 : | |||
2309 : | parrello | 1.1 | =head3 InsertObject |
2310 : | |||
2311 : | parrello | 1.78 | C<< $erdb->InsertObject($objectType, \%fieldHash); >> |
2312 : | parrello | 1.1 | |
2313 : | Insert an object into the database. The object is defined by a type name and then a hash | ||
2314 : | of field names to values. Field values in the primary relation are represented by scalars. | ||
2315 : | (Note that for relationships, the primary relation is the B<only> relation.) | ||
2316 : | Field values for the other relations comprising the entity are always list references. For | ||
2317 : | example, the following line inserts an inactive PEG feature named C<fig|188.1.peg.1> with aliases | ||
2318 : | C<ZP_00210270.1> and C<gi|46206278>. | ||
2319 : | |||
2320 : | parrello | 1.18 | C<< $erdb->InsertObject('Feature', { id => 'fig|188.1.peg.1', active => 0, feature-type => 'peg', alias => ['ZP_00210270.1', 'gi|46206278']}); >> |
2321 : | parrello | 1.1 | |
2322 : | The next statement inserts a C<HasProperty> relationship between feature C<fig|158879.1.peg.1> and | ||
2323 : | property C<4> with an evidence URL of C<http://seedu.uchicago.edu/query.cgi?article_id=142>. | ||
2324 : | |||
2325 : | parrello | 1.57 | C<< $erdb->InsertObject('HasProperty', { 'from-link' => 'fig|158879.1.peg.1', 'to-link' => 4, evidence => 'http://seedu.uchicago.edu/query.cgi?article_id=142'}); >> |
2326 : | parrello | 1.1 | |
2327 : | =over 4 | ||
2328 : | |||
2329 : | =item newObjectType | ||
2330 : | |||
2331 : | Type name of the object to insert. | ||
2332 : | |||
2333 : | =item fieldHash | ||
2334 : | |||
2335 : | Hash of field names to values. | ||
2336 : | |||
2337 : | =back | ||
2338 : | |||
2339 : | =cut | ||
2340 : | |||
2341 : | sub InsertObject { | ||
2342 : | parrello | 1.10 | # Get the parameters. |
2343 : | my ($self, $newObjectType, $fieldHash) = @_; | ||
2344 : | # Denote that so far we appear successful. | ||
2345 : | my $retVal = 1; | ||
2346 : | # Get the database handle. | ||
2347 : | my $dbh = $self->{_dbh}; | ||
2348 : | # Get the relation list. | ||
2349 : | my $relationTable = $self->_GetRelationTable($newObjectType); | ||
2350 : | # Loop through the relations. We'll build insert statements for each one. If a relation is | ||
2351 : | # secondary, we may end up generating multiple insert statements. If an error occurs, we | ||
2352 : | # stop the loop. | ||
2353 : | my @relationList = keys %{$relationTable}; | ||
2354 : | for (my $i = 0; $retVal && $i <= $#relationList; $i++) { | ||
2355 : | my $relationName = $relationList[$i]; | ||
2356 : | my $relationDefinition = $relationTable->{$relationName}; | ||
2357 : | # Get the relation's fields. For each field we will collect a value in the corresponding | ||
2358 : | # position of the @valueList array. If one of the fields is missing, we will add it to the | ||
2359 : | # @missing list. | ||
2360 : | my @fieldList = @{$relationDefinition->{Fields}}; | ||
2361 : | my @fieldNameList = (); | ||
2362 : | my @valueList = (); | ||
2363 : | my @missing = (); | ||
2364 : | my $recordCount = 1; | ||
2365 : | for my $fieldDescriptor (@fieldList) { | ||
2366 : | # Get the field name and save it. Note we need to fix it up so the hyphens | ||
2367 : | # are converted to underscores. | ||
2368 : | my $fieldName = $fieldDescriptor->{name}; | ||
2369 : | push @fieldNameList, _FixName($fieldName); | ||
2370 : | # Look for the named field in the incoming structure. Note that we are looking | ||
2371 : | # for the real field name, not the fixed-up one! | ||
2372 : | if (exists $fieldHash->{$fieldName}) { | ||
2373 : | # Here we found the field. Stash it in the value list. | ||
2374 : | my $value = $fieldHash->{$fieldName}; | ||
2375 : | push @valueList, $value; | ||
2376 : | # If the value is a list, we may need to increment the record count. | ||
2377 : | if (ref $value eq "ARRAY") { | ||
2378 : | my $thisCount = @{$value}; | ||
2379 : | if ($recordCount == 1) { | ||
2380 : | # Here we have our first list, so we save its count. | ||
2381 : | $recordCount = $thisCount; | ||
2382 : | } elsif ($recordCount != $thisCount) { | ||
2383 : | # Here we have a second list, so its length has to match the | ||
2384 : | # previous lists. | ||
2385 : | Trace("Field $value in new $newObjectType object has an invalid list length $thisCount. Expected $recordCount.") if T(0); | ||
2386 : | $retVal = 0; | ||
2387 : | } | ||
2388 : | } | ||
2389 : | } else { | ||
2390 : | # Here the field is not present. Flag it as missing. | ||
2391 : | push @missing, $fieldName; | ||
2392 : | } | ||
2393 : | } | ||
2394 : | # If we are the primary relation, add the new-record flag. | ||
2395 : | if ($relationName eq $newObjectType) { | ||
2396 : | push @valueList, 1; | ||
2397 : | push @fieldNameList, "new_record"; | ||
2398 : | } | ||
2399 : | # Only proceed if there are no missing fields. | ||
2400 : | if (@missing > 0) { | ||
2401 : | Trace("Relation $relationName for $newObjectType skipped due to missing fields: " . | ||
2402 : | join(' ', @missing)) if T(1); | ||
2403 : | } else { | ||
2404 : | # Build the INSERT statement. | ||
2405 : | my $statement = "INSERT INTO $relationName (" . join (', ', @fieldNameList) . | ||
2406 : | ") VALUES ("; | ||
2407 : | # Create a marker list of the proper size and put it in the statement. | ||
2408 : | my @markers = (); | ||
2409 : | while (@markers < @fieldNameList) { push @markers, '?'; } | ||
2410 : | $statement .= join(', ', @markers) . ")"; | ||
2411 : | # We have the insert statement, so prepare it. | ||
2412 : | my $sth = $dbh->prepare_command($statement); | ||
2413 : | Trace("Insert statement prepared: $statement") if T(3); | ||
2414 : | # Now we loop through the values. If a value is scalar, we use it unmodified. If it's | ||
2415 : | # a list, we use the current element. The values are stored in the @parameterList array. | ||
2416 : | my $done = 0; | ||
2417 : | for (my $i = 0; $i < $recordCount; $i++) { | ||
2418 : | # Clear the parameter list array. | ||
2419 : | my @parameterList = (); | ||
2420 : | # Loop through the values. | ||
2421 : | for my $value (@valueList) { | ||
2422 : | # Check to see if this is a scalar value. | ||
2423 : | if (ref $value eq "ARRAY") { | ||
2424 : | # Here we have a list value. Pull the current entry. | ||
2425 : | push @parameterList, $value->[$i]; | ||
2426 : | } else { | ||
2427 : | # Here we have a scalar value. Use it unmodified. | ||
2428 : | push @parameterList, $value; | ||
2429 : | } | ||
2430 : | } | ||
2431 : | # Execute the INSERT statement with the specified parameter list. | ||
2432 : | $retVal = $sth->execute(@parameterList); | ||
2433 : | if (!$retVal) { | ||
2434 : | my $errorString = $sth->errstr(); | ||
2435 : | parrello | 1.78 | Confess("Error inserting into $relationName: $errorString"); |
2436 : | parrello | 1.10 | } |
2437 : | } | ||
2438 : | } | ||
2439 : | } | ||
2440 : | parrello | 1.78 | # Return a 1 for backward compatability. |
2441 : | return 1; | ||
2442 : | } | ||
2443 : | |||
2444 : | =head3 UpdateEntity | ||
2445 : | |||
2446 : | C<< $erdb->UpdateEntity($entityName, $id, \%fields); >> | ||
2447 : | |||
2448 : | Update the values of an entity. This is an unprotected update, so it should only be | ||
2449 : | done if the database resides on a database server. | ||
2450 : | |||
2451 : | =over 4 | ||
2452 : | |||
2453 : | =item entityName | ||
2454 : | |||
2455 : | Name of the entity to update. (This is the entity type.) | ||
2456 : | |||
2457 : | =item id | ||
2458 : | |||
2459 : | ID of the entity to update. If no entity exists with this ID, an error will be thrown. | ||
2460 : | |||
2461 : | =item fields | ||
2462 : | |||
2463 : | Reference to a hash mapping field names to their new values. All of the fields named | ||
2464 : | must be in the entity's primary relation, and they cannot any of them be the ID field. | ||
2465 : | |||
2466 : | =back | ||
2467 : | |||
2468 : | =cut | ||
2469 : | |||
2470 : | sub UpdateEntity { | ||
2471 : | # Get the parameters. | ||
2472 : | my ($self, $entityName, $id, $fields) = @_; | ||
2473 : | # Get a list of the field names being updated. | ||
2474 : | my @fieldList = keys %{$fields}; | ||
2475 : | # Verify that the fields exist. | ||
2476 : | my $checker = $self->GetFieldTable($entityName); | ||
2477 : | for my $field (@fieldList) { | ||
2478 : | if ($field eq 'id') { | ||
2479 : | Confess("Cannot update the ID field for entity $entityName."); | ||
2480 : | } elsif ($checker->{$field}->{relation} ne $entityName) { | ||
2481 : | Confess("Cannot find $field in primary relation of $entityName."); | ||
2482 : | } | ||
2483 : | } | ||
2484 : | # Build the SQL statement. | ||
2485 : | my @sets = (); | ||
2486 : | my @valueList = (); | ||
2487 : | for my $field (@fieldList) { | ||
2488 : | push @sets, _FixName($field) . " = ?"; | ||
2489 : | push @valueList, $fields->{$field}; | ||
2490 : | } | ||
2491 : | my $command = "UPDATE $entityName SET " . join(", ", @sets) . " WHERE id = ?"; | ||
2492 : | # Add the ID to the list of binding values. | ||
2493 : | push @valueList, $id; | ||
2494 : | # Call SQL to do the work. | ||
2495 : | my $rows = $self->{_dbh}->SQL($command, 0, @valueList); | ||
2496 : | # Check for errors. | ||
2497 : | if ($rows == 0) { | ||
2498 : | Confess("Entity $id of type $entityName not found."); | ||
2499 : | } | ||
2500 : | parrello | 1.1 | } |
2501 : | |||
2502 : | =head3 LoadTable | ||
2503 : | |||
2504 : | parrello | 1.78 | C<< my $results = $erdb->LoadTable($fileName, $relationName, $truncateFlag); >> |
2505 : | parrello | 1.1 | |
2506 : | parrello | 1.9 | Load data from a tab-delimited file into a specified table, optionally re-creating the table |
2507 : | first. | ||
2508 : | parrello | 1.1 | |
2509 : | =over 4 | ||
2510 : | |||
2511 : | =item fileName | ||
2512 : | |||
2513 : | Name of the file from which the table data should be loaded. | ||
2514 : | |||
2515 : | =item relationName | ||
2516 : | |||
2517 : | Name of the relation to be loaded. This is the same as the table name. | ||
2518 : | |||
2519 : | =item truncateFlag | ||
2520 : | |||
2521 : | TRUE if the table should be dropped and re-created, else FALSE | ||
2522 : | |||
2523 : | =item RETURN | ||
2524 : | |||
2525 : | parrello | 1.28 | Returns a statistical object containing a list of the error messages. |
2526 : | parrello | 1.1 | |
2527 : | =back | ||
2528 : | |||
2529 : | =cut | ||
2530 : | sub LoadTable { | ||
2531 : | parrello | 1.10 | # Get the parameters. |
2532 : | my ($self, $fileName, $relationName, $truncateFlag) = @_; | ||
2533 : | # Create the statistical return object. | ||
2534 : | my $retVal = _GetLoadStats(); | ||
2535 : | # Trace the fact of the load. | ||
2536 : | Trace("Loading table $relationName from $fileName") if T(2); | ||
2537 : | # Get the database handle. | ||
2538 : | my $dbh = $self->{_dbh}; | ||
2539 : | parrello | 1.22 | # Get the input file size. |
2540 : | my $fileSize = -s $fileName; | ||
2541 : | parrello | 1.10 | # Get the relation data. |
2542 : | my $relation = $self->_FindRelation($relationName); | ||
2543 : | # Check the truncation flag. | ||
2544 : | if ($truncateFlag) { | ||
2545 : | Trace("Creating table $relationName") if T(2); | ||
2546 : | parrello | 1.19 | # Compute the row count estimate. We take the size of the load file, |
2547 : | # divide it by the estimated row size, and then multiply by 1.5 to | ||
2548 : | # leave extra room. We postulate a minimum row count of 1000 to | ||
2549 : | # prevent problems with incoming empty load files. | ||
2550 : | my $rowSize = $self->EstimateRowSize($relationName); | ||
2551 : | my $estimate = FIG::max($fileSize * 1.5 / $rowSize, 1000); | ||
2552 : | parrello | 1.10 | # Re-create the table without its index. |
2553 : | parrello | 1.19 | $self->CreateTable($relationName, 0, $estimate); |
2554 : | parrello | 1.10 | # If this is a pre-index DBMS, create the index here. |
2555 : | if ($dbh->{_preIndex}) { | ||
2556 : | eval { | ||
2557 : | $self->CreateIndex($relationName); | ||
2558 : | }; | ||
2559 : | if ($@) { | ||
2560 : | $retVal->AddMessage($@); | ||
2561 : | } | ||
2562 : | } | ||
2563 : | } | ||
2564 : | parrello | 1.3 | # Load the table. |
2565 : | parrello | 1.10 | my $rv; |
2566 : | eval { | ||
2567 : | parrello | 1.20 | $rv = $dbh->load_table(file => $fileName, tbl => $relationName); |
2568 : | parrello | 1.10 | }; |
2569 : | if (!defined $rv) { | ||
2570 : | parrello | 1.3 | $retVal->AddMessage($@) if ($@); |
2571 : | parrello | 1.73 | $retVal->AddMessage("Table load failed for $relationName using $fileName: " . $dbh->error_message); |
2572 : | parrello | 1.10 | Trace("Table load failed for $relationName.") if T(1); |
2573 : | } else { | ||
2574 : | parrello | 1.22 | # Here we successfully loaded the table. |
2575 : | $retVal->Add("tables"); | ||
2576 : | my $size = -s $fileName; | ||
2577 : | Trace("$size bytes loaded into $relationName.") if T(2); | ||
2578 : | parrello | 1.10 | # If we're rebuilding, we need to create the table indexes. |
2579 : | parrello | 1.70 | if ($truncateFlag) { |
2580 : | # Indexes are created here for PostGres. For PostGres, indexes are | ||
2581 : | # best built at the end. For MySQL, the reverse is true. | ||
2582 : | if (! $dbh->{_preIndex}) { | ||
2583 : | eval { | ||
2584 : | $self->CreateIndex($relationName); | ||
2585 : | }; | ||
2586 : | if ($@) { | ||
2587 : | $retVal->AddMessage($@); | ||
2588 : | } | ||
2589 : | } | ||
2590 : | # The full-text index (if any) is always built last, even for MySQL. | ||
2591 : | # First we need to see if this table has a full-text index. Only | ||
2592 : | # primary relations are allowed that privilege. | ||
2593 : | parrello | 1.81 | Trace("Checking for full-text index on $relationName.") if T(2); |
2594 : | parrello | 1.70 | if ($self->_IsPrimary($relationName)) { |
2595 : | parrello | 1.82 | $self->CreateSearchIndex($relationName); |
2596 : | parrello | 1.10 | } |
2597 : | } | ||
2598 : | } | ||
2599 : | parrello | 1.20 | # Analyze the table to improve performance. |
2600 : | parrello | 1.61 | Trace("Analyzing and compacting $relationName.") if T(3); |
2601 : | olson | 1.16 | $dbh->vacuum_it($relationName); |
2602 : | parrello | 1.61 | Trace("$relationName load completed.") if T(3); |
2603 : | parrello | 1.10 | # Return the statistics. |
2604 : | return $retVal; | ||
2605 : | parrello | 1.1 | } |
2606 : | |||
2607 : | parrello | 1.82 | =head3 CreateSearchIndex |
2608 : | |||
2609 : | C<< $erdb->CreateSearchIndex($objectName); >> | ||
2610 : | |||
2611 : | Check for a full-text search index on the specified entity or relationship object, and | ||
2612 : | if one is required, rebuild it. | ||
2613 : | |||
2614 : | =over 4 | ||
2615 : | |||
2616 : | =item objectName | ||
2617 : | |||
2618 : | Name of the entity or relationship to be indexed. | ||
2619 : | |||
2620 : | =back | ||
2621 : | |||
2622 : | =cut | ||
2623 : | |||
2624 : | sub CreateSearchIndex { | ||
2625 : | # Get the parameters. | ||
2626 : | my ($self, $objectName) = @_; | ||
2627 : | # Get the relation's entity/relationship structure. | ||
2628 : | my $structure = $self->_GetStructure($objectName); | ||
2629 : | # Get the database handle. | ||
2630 : | my $dbh = $self->{_dbh}; | ||
2631 : | Trace("Checking for search fields in $objectName.") if T(3); | ||
2632 : | # Check for a searchable fields list. | ||
2633 : | if (exists $structure->{searchFields}) { | ||
2634 : | # Here we know that we need to create a full-text search index. | ||
2635 : | # Get an SQL-formatted field name list. | ||
2636 : | my $fields = join(", ", _FixNames(@{$structure->{searchFields}})); | ||
2637 : | # Create the index. If it already exists, it will be dropped. | ||
2638 : | $dbh->create_index(tbl => $objectName, idx => "search_idx", | ||
2639 : | flds => $fields, kind => 'fulltext'); | ||
2640 : | Trace("Index created for $fields in $objectName.") if T(2); | ||
2641 : | } | ||
2642 : | } | ||
2643 : | |||
2644 : | parrello | 1.73 | =head3 DropRelation |
2645 : | parrello | 1.1 | |
2646 : | parrello | 1.73 | C<< $erdb->DropRelation($relationName); >> |
2647 : | parrello | 1.1 | |
2648 : | parrello | 1.73 | Physically drop a relation from the database. |
2649 : | parrello | 1.1 | |
2650 : | =over 4 | ||
2651 : | |||
2652 : | parrello | 1.73 | =item relationName |
2653 : | parrello | 1.1 | |
2654 : | parrello | 1.73 | Name of the relation to drop. If it does not exist, this method will have |
2655 : | no effect. | ||
2656 : | parrello | 1.1 | |
2657 : | =back | ||
2658 : | |||
2659 : | =cut | ||
2660 : | |||
2661 : | parrello | 1.73 | sub DropRelation { |
2662 : | parrello | 1.10 | # Get the parameters. |
2663 : | parrello | 1.73 | my ($self, $relationName) = @_; |
2664 : | # Get the database handle. | ||
2665 : | my $dbh = $self->{_dbh}; | ||
2666 : | # Drop the relation. The method used here has no effect if the relation | ||
2667 : | # does not exist. | ||
2668 : | parrello | 1.74 | Trace("Invoking DB Kernel to drop $relationName.") if T(3); |
2669 : | $dbh->drop_table(tbl => $relationName); | ||
2670 : | parrello | 1.1 | } |
2671 : | |||
2672 : | parrello | 1.78 | =head3 MatchSqlPattern |
2673 : | |||
2674 : | C<< my $matched = ERDB::MatchSqlPattern($value, $pattern); >> | ||
2675 : | |||
2676 : | Determine whether or not a specified value matches an SQL pattern. An SQL | ||
2677 : | pattern has two wild card characters: C<%> that matches multiple characters, | ||
2678 : | and C<_> that matches a single character. These can be escaped using a | ||
2679 : | backslash (C<\>). We pull this off by converting the SQL pattern to a | ||
2680 : | PERL regular expression. As per SQL rules, the match is case-insensitive. | ||
2681 : | |||
2682 : | =over 4 | ||
2683 : | |||
2684 : | =item value | ||
2685 : | |||
2686 : | Value to be matched against the pattern. Note that an undefined or empty | ||
2687 : | value will not match anything. | ||
2688 : | |||
2689 : | =item pattern | ||
2690 : | |||
2691 : | SQL pattern against which to match the value. An undefined or empty pattern will | ||
2692 : | match everything. | ||
2693 : | |||
2694 : | =item RETURN | ||
2695 : | |||
2696 : | Returns TRUE if the value and pattern match, else FALSE. | ||
2697 : | |||
2698 : | =back | ||
2699 : | |||
2700 : | =cut | ||
2701 : | |||
2702 : | sub MatchSqlPattern { | ||
2703 : | # Get the parameters. | ||
2704 : | my ($value, $pattern) = @_; | ||
2705 : | # Declare the return variable. | ||
2706 : | my $retVal; | ||
2707 : | # Insure we have a pattern. | ||
2708 : | if (! defined($pattern) || $pattern eq "") { | ||
2709 : | $retVal = 1; | ||
2710 : | } else { | ||
2711 : | # Break the pattern into pieces around the wildcard characters. Because we | ||
2712 : | # use parentheses in the split function's delimiter expression, we'll get | ||
2713 : | # list elements for the delimiters as well as the rest of the string. | ||
2714 : | my @pieces = split /([_%]|\\[_%])/, $pattern; | ||
2715 : | # Check some fast special cases. | ||
2716 : | if ($pattern eq '%') { | ||
2717 : | # A null pattern matches everything. | ||
2718 : | $retVal = 1; | ||
2719 : | } elsif (@pieces == 1) { | ||
2720 : | # No wildcards, so we have a literal comparison. Note we're case-insensitive. | ||
2721 : | $retVal = (lc($value) eq lc($pattern)); | ||
2722 : | } elsif (@pieces == 2 && $pieces[1] eq '%') { | ||
2723 : | # A wildcard at the end, so we have a substring match. This is also case-insensitive. | ||
2724 : | $retVal = (lc(substr($value, 0, length($pieces[0]))) eq lc($pieces[0])); | ||
2725 : | } else { | ||
2726 : | # Okay, we have to do it the hard way. Convert each piece to a PERL pattern. | ||
2727 : | my $realPattern = ""; | ||
2728 : | for my $piece (@pieces) { | ||
2729 : | # Determine the type of piece. | ||
2730 : | if ($piece eq "") { | ||
2731 : | # Empty pieces are ignored. | ||
2732 : | } elsif ($piece eq "%") { | ||
2733 : | # Here we have a multi-character wildcard. Note that it can match | ||
2734 : | # zero or more characters. | ||
2735 : | $realPattern .= ".*" | ||
2736 : | } elsif ($piece eq "_") { | ||
2737 : | # Here we have a single-character wildcard. | ||
2738 : | $realPattern .= "."; | ||
2739 : | } elsif ($piece eq "\\%" || $piece eq "\\_") { | ||
2740 : | # This is an escape sequence (which is a rare thing, actually). | ||
2741 : | $realPattern .= substr($piece, 1, 1); | ||
2742 : | } else { | ||
2743 : | # Here we have raw text. | ||
2744 : | $realPattern .= quotemeta($piece); | ||
2745 : | } | ||
2746 : | } | ||
2747 : | # Do the match. | ||
2748 : | $retVal = ($value =~ /^$realPattern$/i ? 1 : 0); | ||
2749 : | } | ||
2750 : | } | ||
2751 : | # Return the result. | ||
2752 : | return $retVal; | ||
2753 : | } | ||
2754 : | |||
2755 : | parrello | 1.6 | =head3 GetEntity |
2756 : | |||
2757 : | parrello | 1.18 | C<< my $entityObject = $erdb->GetEntity($entityType, $ID); >> |
2758 : | parrello | 1.6 | |
2759 : | Return an object describing the entity instance with a specified ID. | ||
2760 : | |||
2761 : | =over 4 | ||
2762 : | |||
2763 : | =item entityType | ||
2764 : | |||
2765 : | Entity type name. | ||
2766 : | |||
2767 : | =item ID | ||
2768 : | |||
2769 : | ID of the desired entity. | ||
2770 : | |||
2771 : | =item RETURN | ||
2772 : | |||
2773 : | Returns a B<DBObject> representing the desired entity instance, or an undefined value if no | ||
2774 : | instance is found with the specified key. | ||
2775 : | |||
2776 : | =back | ||
2777 : | |||
2778 : | =cut | ||
2779 : | |||
2780 : | sub GetEntity { | ||
2781 : | parrello | 1.10 | # Get the parameters. |
2782 : | my ($self, $entityType, $ID) = @_; | ||
2783 : | # Create a query. | ||
2784 : | parrello | 1.45 | my $query = $self->Get([$entityType], "$entityType(id) = ?", [$ID]); |
2785 : | parrello | 1.10 | # Get the first (and only) object. |
2786 : | my $retVal = $query->Fetch(); | ||
2787 : | # Return the result. | ||
2788 : | return $retVal; | ||
2789 : | parrello | 1.6 | } |
2790 : | |||
2791 : | parrello | 1.69 | =head3 GetChoices |
2792 : | |||
2793 : | C<< my @values = $erdb->GetChoices($entityName, $fieldName); >> | ||
2794 : | |||
2795 : | Return a list of all the values for the specified field that are represented in the | ||
2796 : | specified entity. | ||
2797 : | |||
2798 : | Note that if the field is not indexed, then this will be a very slow operation. | ||
2799 : | |||
2800 : | =over 4 | ||
2801 : | |||
2802 : | =item entityName | ||
2803 : | |||
2804 : | Name of an entity in the database. | ||
2805 : | |||
2806 : | =item fieldName | ||
2807 : | |||
2808 : | Name of a field belonging to the entity. This is a raw field name without | ||
2809 : | the standard parenthesized notation used in most calls. | ||
2810 : | |||
2811 : | =item RETURN | ||
2812 : | |||
2813 : | Returns a list of the distinct values for the specified field in the database. | ||
2814 : | |||
2815 : | =back | ||
2816 : | |||
2817 : | =cut | ||
2818 : | |||
2819 : | sub GetChoices { | ||
2820 : | # Get the parameters. | ||
2821 : | my ($self, $entityName, $fieldName) = @_; | ||
2822 : | # Declare the return variable. | ||
2823 : | my @retVal; | ||
2824 : | # Get the entity data structure. | ||
2825 : | my $entityData = $self->_GetStructure($entityName); | ||
2826 : | # Get the field. | ||
2827 : | my $fieldHash = $entityData->{Fields}; | ||
2828 : | if (! exists $fieldHash->{$fieldName}) { | ||
2829 : | Confess("$fieldName not found in $entityName."); | ||
2830 : | } else { | ||
2831 : | # Get the name of the relation containing the field. | ||
2832 : | my $relation = $fieldHash->{$fieldName}->{relation}; | ||
2833 : | # Fix up the field name. | ||
2834 : | my $realName = _FixName($fieldName); | ||
2835 : | # Get the database handle. | ||
2836 : | my $dbh = $self->{_dbh}; | ||
2837 : | # Query the database. | ||
2838 : | my $results = $dbh->SQL("SELECT DISTINCT $realName FROM $relation"); | ||
2839 : | # Clean the results. They are stored as a list of lists, and we just want the one list. | ||
2840 : | @retVal = sort map { $_->[0] } @{$results}; | ||
2841 : | } | ||
2842 : | # Return the result. | ||
2843 : | return @retVal; | ||
2844 : | } | ||
2845 : | |||
2846 : | parrello | 1.6 | =head3 GetEntityValues |
2847 : | |||
2848 : | parrello | 1.18 | C<< my @values = $erdb->GetEntityValues($entityType, $ID, \@fields); >> |
2849 : | parrello | 1.6 | |
2850 : | parrello | 1.68 | Return a list of values from a specified entity instance. If the entity instance |
2851 : | does not exist, an empty list is returned. | ||
2852 : | parrello | 1.6 | |
2853 : | =over 4 | ||
2854 : | |||
2855 : | =item entityType | ||
2856 : | |||
2857 : | Entity type name. | ||
2858 : | |||
2859 : | =item ID | ||
2860 : | |||
2861 : | ID of the desired entity. | ||
2862 : | |||
2863 : | =item fields | ||
2864 : | |||
2865 : | List of field names, each of the form I<objectName>C<(>I<fieldName>C<)>. | ||
2866 : | |||
2867 : | =item RETURN | ||
2868 : | |||
2869 : | Returns a flattened list of the values of the specified fields for the specified entity. | ||
2870 : | |||
2871 : | =back | ||
2872 : | |||
2873 : | =cut | ||
2874 : | |||
2875 : | sub GetEntityValues { | ||
2876 : | parrello | 1.10 | # Get the parameters. |
2877 : | my ($self, $entityType, $ID, $fields) = @_; | ||
2878 : | # Get the specified entity. | ||
2879 : | my $entity = $self->GetEntity($entityType, $ID); | ||
2880 : | # Declare the return list. | ||
2881 : | my @retVal = (); | ||
2882 : | # If we found the entity, push the values into the return list. | ||
2883 : | if ($entity) { | ||
2884 : | push @retVal, $entity->Values($fields); | ||
2885 : | } | ||
2886 : | # Return the result. | ||
2887 : | return @retVal; | ||
2888 : | parrello | 1.6 | } |
2889 : | parrello | 1.1 | |
2890 : | parrello | 1.7 | =head3 GetAll |
2891 : | |||
2892 : | parrello | 1.18 | C<< my @list = $erdb->GetAll(\@objectNames, $filterClause, \@parameters, \@fields, $count); >> |
2893 : | parrello | 1.7 | |
2894 : | Return a list of values taken from the objects returned by a query. The first three | ||
2895 : | parameters correspond to the parameters of the L</Get> method. The final parameter is | ||
2896 : | a list of the fields desired from each record found by the query. The field name | ||
2897 : | syntax is the standard syntax used for fields in the B<ERDB> system-- | ||
2898 : | B<I<objectName>(I<fieldName>)>-- where I<objectName> is the name of the relevant entity | ||
2899 : | or relationship and I<fieldName> is the name of the field. | ||
2900 : | |||
2901 : | The list returned will be a list of lists. Each element of the list will contain | ||
2902 : | the values returned for the fields specified in the fourth parameter. If one of the | ||
2903 : | fields specified returns multiple values, they are flattened in with the rest. For | ||
2904 : | example, the following call will return a list of the features in a particular | ||
2905 : | spreadsheet cell, and each feature will be represented by a list containing the | ||
2906 : | feature ID followed by all of its aliases. | ||
2907 : | |||
2908 : | parrello | 1.78 | C<< @query = $erdb->Get(['ContainsFeature', 'Feature'], "ContainsFeature(from-link) = ?", [$ssCellID], ['Feature(id)', 'Feature(alias)']); >> |
2909 : | parrello | 1.7 | |
2910 : | =over 4 | ||
2911 : | |||
2912 : | =item objectNames | ||
2913 : | |||
2914 : | List containing the names of the entity and relationship objects to be retrieved. | ||
2915 : | |||
2916 : | =item filterClause | ||
2917 : | |||
2918 : | WHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
2919 : | be parameterized with parameter markers (C<?>). Each field used must be specified in the standard form | ||
2920 : | B<I<objectName>(I<fieldName>)>. Any parameters specified in the filter clause should be added to the | ||
2921 : | parameter list as additional parameters. The fields in a filter clause can come from primary | ||
2922 : | entity relations, relationship relations, or secondary entity relations; however, all of the | ||
2923 : | entities and relationships involved must be included in the list of object names. | ||
2924 : | |||
2925 : | =item parameterList | ||
2926 : | |||
2927 : | List of the parameters to be substituted in for the parameters marks in the filter clause. | ||
2928 : | |||
2929 : | =item fields | ||
2930 : | |||
2931 : | List of the fields to be returned in each element of the list returned. | ||
2932 : | |||
2933 : | =item count | ||
2934 : | |||
2935 : | Maximum number of records to return. If omitted or 0, all available records will be returned. | ||
2936 : | |||
2937 : | =item RETURN | ||
2938 : | |||
2939 : | Returns a list of list references. Each element of the return list contains the values for the | ||
2940 : | fields specified in the B<fields> parameter. | ||
2941 : | |||
2942 : | =back | ||
2943 : | |||
2944 : | =cut | ||
2945 : | #: Return Type @@; | ||
2946 : | sub GetAll { | ||
2947 : | parrello | 1.10 | # Get the parameters. |
2948 : | my ($self, $objectNames, $filterClause, $parameterList, $fields, $count) = @_; | ||
2949 : | # Translate the parameters from a list reference to a list. If the parameter | ||
2950 : | # list is a scalar we convert it into a singleton list. | ||
2951 : | my @parmList = (); | ||
2952 : | if (ref $parameterList eq "ARRAY") { | ||
2953 : | parrello | 1.45 | Trace("GetAll parm list is an array.") if T(4); |
2954 : | parrello | 1.10 | @parmList = @{$parameterList}; |
2955 : | } else { | ||
2956 : | parrello | 1.45 | Trace("GetAll parm list is a scalar: $parameterList.") if T(4); |
2957 : | parrello | 1.10 | push @parmList, $parameterList; |
2958 : | } | ||
2959 : | parrello | 1.30 | # Insure the counter has a value. |
2960 : | if (!defined $count) { | ||
2961 : | $count = 0; | ||
2962 : | } | ||
2963 : | # Add the row limit to the filter clause. | ||
2964 : | if ($count > 0) { | ||
2965 : | $filterClause .= " LIMIT $count"; | ||
2966 : | } | ||
2967 : | parrello | 1.10 | # Create the query. |
2968 : | parrello | 1.45 | my $query = $self->Get($objectNames, $filterClause, \@parmList); |
2969 : | parrello | 1.10 | # Set up a counter of the number of records read. |
2970 : | my $fetched = 0; | ||
2971 : | # Loop through the records returned, extracting the fields. Note that if the | ||
2972 : | # counter is non-zero, we stop when the number of records read hits the count. | ||
2973 : | my @retVal = (); | ||
2974 : | while (($count == 0 || $fetched < $count) && (my $row = $query->Fetch())) { | ||
2975 : | my @rowData = $row->Values($fields); | ||
2976 : | push @retVal, \@rowData; | ||
2977 : | $fetched++; | ||
2978 : | } | ||
2979 : | parrello | 1.54 | Trace("$fetched rows returned in GetAll.") if T(SQL => 4); |
2980 : | parrello | 1.10 | # Return the resulting list. |
2981 : | return @retVal; | ||
2982 : | parrello | 1.7 | } |
2983 : | |||
2984 : | parrello | 1.55 | =head3 Exists |
2985 : | |||
2986 : | C<< my $found = $sprout->Exists($entityName, $entityID); >> | ||
2987 : | |||
2988 : | Return TRUE if an entity exists, else FALSE. | ||
2989 : | |||
2990 : | =over 4 | ||
2991 : | |||
2992 : | =item entityName | ||
2993 : | |||
2994 : | Name of the entity type (e.g. C<Feature>) relevant to the existence check. | ||
2995 : | |||
2996 : | =item entityID | ||
2997 : | |||
2998 : | ID of the entity instance whose existence is to be checked. | ||
2999 : | |||
3000 : | =item RETURN | ||
3001 : | |||
3002 : | Returns TRUE if the entity instance exists, else FALSE. | ||
3003 : | |||
3004 : | =back | ||
3005 : | |||
3006 : | =cut | ||
3007 : | #: Return Type $; | ||
3008 : | sub Exists { | ||
3009 : | # Get the parameters. | ||
3010 : | my ($self, $entityName, $entityID) = @_; | ||
3011 : | # Check for the entity instance. | ||
3012 : | Trace("Checking existence of $entityName with ID=$entityID.") if T(4); | ||
3013 : | my $testInstance = $self->GetEntity($entityName, $entityID); | ||
3014 : | # Return an existence indicator. | ||
3015 : | my $retVal = ($testInstance ? 1 : 0); | ||
3016 : | return $retVal; | ||
3017 : | } | ||
3018 : | |||
3019 : | parrello | 1.18 | =head3 EstimateRowSize |
3020 : | |||
3021 : | C<< my $rowSize = $erdb->EstimateRowSize($relName); >> | ||
3022 : | |||
3023 : | Estimate the row size of the specified relation. The estimated row size is computed by adding | ||
3024 : | up the average length for each data type. | ||
3025 : | |||
3026 : | =over 4 | ||
3027 : | |||
3028 : | =item relName | ||
3029 : | |||
3030 : | Name of the relation whose estimated row size is desired. | ||
3031 : | |||
3032 : | =item RETURN | ||
3033 : | |||
3034 : | Returns an estimate of the row size for the specified relation. | ||
3035 : | |||
3036 : | =back | ||
3037 : | |||
3038 : | =cut | ||
3039 : | #: Return Type $; | ||
3040 : | sub EstimateRowSize { | ||
3041 : | # Get the parameters. | ||
3042 : | my ($self, $relName) = @_; | ||
3043 : | # Declare the return variable. | ||
3044 : | my $retVal = 0; | ||
3045 : | # Find the relation descriptor. | ||
3046 : | my $relation = $self->_FindRelation($relName); | ||
3047 : | # Get the list of fields. | ||
3048 : | for my $fieldData (@{$relation->{Fields}}) { | ||
3049 : | # Get the field type and add its length. | ||
3050 : | my $fieldLen = $TypeTable{$fieldData->{type}}->{avgLen}; | ||
3051 : | $retVal += $fieldLen; | ||
3052 : | } | ||
3053 : | # Return the result. | ||
3054 : | return $retVal; | ||
3055 : | } | ||
3056 : | |||
3057 : | parrello | 1.38 | =head3 GetFieldTable |
3058 : | |||
3059 : | C<< my $fieldHash = $self->GetFieldTable($objectnName); >> | ||
3060 : | |||
3061 : | Get the field structure for a specified entity or relationship. | ||
3062 : | |||
3063 : | =over 4 | ||
3064 : | |||
3065 : | =item objectName | ||
3066 : | |||
3067 : | Name of the desired entity or relationship. | ||
3068 : | |||
3069 : | =item RETURN | ||
3070 : | |||
3071 : | The table containing the field descriptors for the specified object. | ||
3072 : | |||
3073 : | =back | ||
3074 : | |||
3075 : | =cut | ||
3076 : | |||
3077 : | sub GetFieldTable { | ||
3078 : | # Get the parameters. | ||
3079 : | my ($self, $objectName) = @_; | ||
3080 : | # Get the descriptor from the metadata. | ||
3081 : | my $objectData = $self->_GetStructure($objectName); | ||
3082 : | # Return the object's field table. | ||
3083 : | return $objectData->{Fields}; | ||
3084 : | } | ||
3085 : | |||
3086 : | parrello | 1.73 | =head3 SplitKeywords |
3087 : | |||
3088 : | C<< my @keywords = ERDB::SplitKeywords($keywordString); >> | ||
3089 : | |||
3090 : | This method returns a list of the positive keywords in the specified | ||
3091 : | keyword string. All of the operators will have been stripped off, | ||
3092 : | and if the keyword is preceded by a minus operator (C<->), it will | ||
3093 : | not be in the list returned. The idea here is to get a list of the | ||
3094 : | keywords the user wants to see. The list will be processed to remove | ||
3095 : | duplicates. | ||
3096 : | |||
3097 : | It is possible to create a string that confuses this method. For example | ||
3098 : | |||
3099 : | frog toad -frog | ||
3100 : | |||
3101 : | would return both C<frog> and C<toad>. If this is a problem we can deal | ||
3102 : | with it later. | ||
3103 : | |||
3104 : | =over 4 | ||
3105 : | |||
3106 : | =item keywordString | ||
3107 : | |||
3108 : | The keyword string to be parsed. | ||
3109 : | |||
3110 : | =item RETURN | ||
3111 : | |||
3112 : | Returns a list of the words in the keyword string the user wants to | ||
3113 : | see. | ||
3114 : | |||
3115 : | =back | ||
3116 : | |||
3117 : | =cut | ||
3118 : | |||
3119 : | sub SplitKeywords { | ||
3120 : | # Get the parameters. | ||
3121 : | my ($keywordString) = @_; | ||
3122 : | # Make a safety copy of the string. (This helps during debugging.) | ||
3123 : | my $workString = $keywordString; | ||
3124 : | # Convert operators we don't care about to spaces. | ||
3125 : | $workString =~ tr/+"()<>/ /; | ||
3126 : | # Split the rest of the string along space boundaries. Note that we | ||
3127 : | # eliminate any words that are zero length or begin with a minus sign. | ||
3128 : | my @wordList = grep { $_ && substr($_, 0, 1) ne "-" } split /\s+/, $workString; | ||
3129 : | # Use a hash to remove duplicates. | ||
3130 : | my %words = map { $_ => 1 } @wordList; | ||
3131 : | # Return the result. | ||
3132 : | return sort keys %words; | ||
3133 : | } | ||
3134 : | |||
3135 : | =head3 ValidateFieldName | ||
3136 : | |||
3137 : | C<< my $okFlag = ERDB::ValidateFieldName($fieldName); >> | ||
3138 : | |||
3139 : | Return TRUE if the specified field name is valid, else FALSE. Valid field names must | ||
3140 : | be hyphenated words subject to certain restrictions. | ||
3141 : | |||
3142 : | =over 4 | ||
3143 : | |||
3144 : | =item fieldName | ||
3145 : | |||
3146 : | Field name to be validated. | ||
3147 : | |||
3148 : | =item RETURN | ||
3149 : | |||
3150 : | Returns TRUE if the field name is valid, else FALSE. | ||
3151 : | |||
3152 : | =back | ||
3153 : | |||
3154 : | =cut | ||
3155 : | |||
3156 : | sub ValidateFieldName { | ||
3157 : | # Get the parameters. | ||
3158 : | my ($fieldName) = @_; | ||
3159 : | # Declare the return variable. The field name is valid until we hear | ||
3160 : | # differently. | ||
3161 : | my $retVal = 1; | ||
3162 : | parrello | 1.78 | # Compute the maximum name length. |
3163 : | my $maxLen = $TypeTable{'name-string'}->{maxLen}; | ||
3164 : | parrello | 1.73 | # Look for bad stuff in the name. |
3165 : | if ($fieldName =~ /--/) { | ||
3166 : | # Here we have a doubled minus sign. | ||
3167 : | Trace("Field name $fieldName has a doubled hyphen.") if T(1); | ||
3168 : | $retVal = 0; | ||
3169 : | } elsif ($fieldName !~ /^[A-Za-z]/) { | ||
3170 : | # Here the field name is missing the initial letter. | ||
3171 : | Trace("Field name $fieldName does not begin with a letter.") if T(1); | ||
3172 : | $retVal = 0; | ||
3173 : | parrello | 1.78 | } elsif (length($fieldName) > $maxLen) { |
3174 : | # Here the field name is too long. | ||
3175 : | Trace("Maximum field name length is $maxLen. Field name must be truncated to " . substr($fieldName,0, $maxLen) . "."); | ||
3176 : | parrello | 1.73 | } else { |
3177 : | parrello | 1.75 | # Strip out the minus signs. Everything remaining must be a letter, |
3178 : | # underscore, or digit. | ||
3179 : | parrello | 1.73 | my $strippedName = $fieldName; |
3180 : | $strippedName =~ s/-//g; | ||
3181 : | parrello | 1.75 | if ($strippedName !~ /^(\w|\d)+$/) { |
3182 : | parrello | 1.73 | Trace("Field name $fieldName contains illegal characters.") if T(1); |
3183 : | $retVal = 0; | ||
3184 : | } | ||
3185 : | } | ||
3186 : | # Return the result. | ||
3187 : | return $retVal; | ||
3188 : | } | ||
3189 : | |||
3190 : | =head3 ReadMetaXML | ||
3191 : | |||
3192 : | C<< my $rawMetaData = ERDB::ReadDBD($fileName); >> | ||
3193 : | |||
3194 : | This method reads a raw database definition XML file and returns it. | ||
3195 : | Normally, the metadata used by the ERDB system has been processed and | ||
3196 : | modified to make it easier to load and retrieve the data; however, | ||
3197 : | this method can be used to get the data in its raw form. | ||
3198 : | |||
3199 : | =over 4 | ||
3200 : | |||
3201 : | =item fileName | ||
3202 : | |||
3203 : | Name of the XML file to read. | ||
3204 : | |||
3205 : | =item RETURN | ||
3206 : | |||
3207 : | Returns a hash reference containing the raw XML data from the specified file. | ||
3208 : | |||
3209 : | =back | ||
3210 : | |||
3211 : | =cut | ||
3212 : | |||
3213 : | sub ReadMetaXML { | ||
3214 : | # Get the parameters. | ||
3215 : | my ($fileName) = @_; | ||
3216 : | # Read the XML. | ||
3217 : | my $retVal = XML::Simple::XMLin($fileName, %XmlOptions, %XmlInOpts); | ||
3218 : | Trace("XML metadata loaded from file $fileName.") if T(1); | ||
3219 : | # Return the result. | ||
3220 : | return $retVal; | ||
3221 : | } | ||
3222 : | |||
3223 : | =head3 GetEntityFieldHash | ||
3224 : | |||
3225 : | C<< my $fieldHashRef = ERDB::GetEntityFieldHash($structure, $entityName); >> | ||
3226 : | |||
3227 : | Get the field hash of the named entity in the specified raw XML structure. | ||
3228 : | The field hash may not exist, in which case we need to create it. | ||
3229 : | |||
3230 : | =over 4 | ||
3231 : | |||
3232 : | =item structure | ||
3233 : | |||
3234 : | Raw XML structure defininng the database. This is not the run-time XML used by | ||
3235 : | an ERDB object, since that has all sorts of optimizations built-in. | ||
3236 : | |||
3237 : | =item entityName | ||
3238 : | |||
3239 : | Name of the entity whose field structure is desired. | ||
3240 : | |||
3241 : | =item RETURN | ||
3242 : | |||
3243 : | Returns the field hash used to define the entity's fields. | ||
3244 : | |||
3245 : | =back | ||
3246 : | |||
3247 : | =cut | ||
3248 : | |||
3249 : | sub GetEntityFieldHash { | ||
3250 : | # Get the parameters. | ||
3251 : | my ($structure, $entityName) = @_; | ||
3252 : | # Get the entity structure. | ||
3253 : | my $entityData = $structure->{Entities}->{$entityName}; | ||
3254 : | # Look for a field structure. | ||
3255 : | my $retVal = $entityData->{Fields}; | ||
3256 : | # If it doesn't exist, create it. | ||
3257 : | if (! defined($retVal)) { | ||
3258 : | $entityData->{Fields} = {}; | ||
3259 : | $retVal = $entityData->{Fields}; | ||
3260 : | } | ||
3261 : | # Return the result. | ||
3262 : | return $retVal; | ||
3263 : | } | ||
3264 : | |||
3265 : | =head3 WriteMetaXML | ||
3266 : | |||
3267 : | C<< ERDB::WriteMetaXML($structure, $fileName); >> | ||
3268 : | |||
3269 : | Write the metadata XML to a file. This method is the reverse of L</ReadMetaXML>, and is | ||
3270 : | used to update the database definition. It must be used with care, however, since it | ||
3271 : | will only work on a raw structure, not on the processed structure created by an ERDB | ||
3272 : | constructor. | ||
3273 : | |||
3274 : | =over 4 | ||
3275 : | |||
3276 : | =item structure | ||
3277 : | |||
3278 : | XML structure to be written to the file. | ||
3279 : | |||
3280 : | =item fileName | ||
3281 : | |||
3282 : | Name of the output file to which the updated XML should be stored. | ||
3283 : | |||
3284 : | =back | ||
3285 : | |||
3286 : | =cut | ||
3287 : | |||
3288 : | sub WriteMetaXML { | ||
3289 : | # Get the parameters. | ||
3290 : | my ($structure, $fileName) = @_; | ||
3291 : | # Compute the output. | ||
3292 : | my $fileString = XML::Simple::XMLout($structure, %XmlOptions, %XmlOutOpts); | ||
3293 : | # Write it to the file. | ||
3294 : | my $xmlOut = Open(undef, ">$fileName"); | ||
3295 : | print $xmlOut $fileString; | ||
3296 : | } | ||
3297 : | |||
3298 : | |||
3299 : | =head3 HTMLNote | ||
3300 : | |||
3301 : | Convert a note or comment to HTML by replacing some bulletin-board codes with HTML. The codes | ||
3302 : | supported are C<[b]> for B<bold>, C<[i]> for I<italics>, and C<[p]> for a new paragraph. | ||
3303 : | Except for C<[p]>, all the codes are closed by slash-codes. So, for | ||
3304 : | example, C<[b]Feature[/b]> displays the string C<Feature> in boldface. | ||
3305 : | |||
3306 : | C<< my $realHtml = ERDB::HTMLNote($dataString); >> | ||
3307 : | |||
3308 : | =over 4 | ||
3309 : | |||
3310 : | =item dataString | ||
3311 : | |||
3312 : | String to convert to HTML. | ||
3313 : | |||
3314 : | =item RETURN | ||
3315 : | |||
3316 : | An HTML string derived from the input string. | ||
3317 : | |||
3318 : | =back | ||
3319 : | |||
3320 : | =cut | ||
3321 : | |||
3322 : | sub HTMLNote { | ||
3323 : | # Get the parameter. | ||
3324 : | my ($dataString) = @_; | ||
3325 : | # HTML-escape the text. | ||
3326 : | my $retVal = CGI::escapeHTML($dataString); | ||
3327 : | # Substitute the bulletin board codes. | ||
3328 : | $retVal =~ s!\[(/?[bi])\]!<$1>!g; | ||
3329 : | $retVal =~ s!\[p\]!</p><p>!g; | ||
3330 : | parrello | 1.83 | $retVal =~ s!\[link\s+([^\]]+)\]!<a href="$1">!g; |
3331 : | $retVal =~ s!\[/link\]!</a>!g; | ||
3332 : | parrello | 1.73 | # Return the result. |
3333 : | return $retVal; | ||
3334 : | } | ||
3335 : | |||
3336 : | |||
3337 : | parrello | 1.48 | =head2 Data Mining Methods |
3338 : | |||
3339 : | parrello | 1.38 | =head3 GetUsefulCrossValues |
3340 : | |||
3341 : | C<< my @attrNames = $sprout->GetUsefulCrossValues($sourceEntity, $relationship); >> | ||
3342 : | |||
3343 : | Return a list of the useful attributes that would be returned by a B<Cross> call | ||
3344 : | from an entity of the source entity type through the specified relationship. This | ||
3345 : | means it will return the fields of the target entity type and the intersection data | ||
3346 : | fields in the relationship. Only primary table fields are returned. In other words, | ||
3347 : | the field names returned will be for fields where there is always one and only one | ||
3348 : | value. | ||
3349 : | |||
3350 : | =over 4 | ||
3351 : | |||
3352 : | =item sourceEntity | ||
3353 : | |||
3354 : | Name of the entity from which the relationship crossing will start. | ||
3355 : | |||
3356 : | =item relationship | ||
3357 : | |||
3358 : | Name of the relationship being crossed. | ||
3359 : | |||
3360 : | =item RETURN | ||
3361 : | |||
3362 : | Returns a list of field names in Sprout field format (I<objectName>C<(>I<fieldName>C<)>. | ||
3363 : | |||
3364 : | =back | ||
3365 : | |||
3366 : | =cut | ||
3367 : | #: Return Type @; | ||
3368 : | sub GetUsefulCrossValues { | ||
3369 : | # Get the parameters. | ||
3370 : | my ($self, $sourceEntity, $relationship) = @_; | ||
3371 : | # Declare the return variable. | ||
3372 : | my @retVal = (); | ||
3373 : | # Determine the target entity for the relationship. This is whichever entity is not | ||
3374 : | # the source entity. So, if the source entity is the FROM, we'll get the name of | ||
3375 : | # the TO, and vice versa. | ||
3376 : | my $relStructure = $self->_GetStructure($relationship); | ||
3377 : | my $targetEntityType = ($relStructure->{from} eq $sourceEntity ? "to" : "from"); | ||
3378 : | my $targetEntity = $relStructure->{$targetEntityType}; | ||
3379 : | # Get the field table for the entity. | ||
3380 : | my $entityFields = $self->GetFieldTable($targetEntity); | ||
3381 : | # The field table is a hash. The hash key is the field name. The hash value is a structure. | ||
3382 : | # For the entity fields, the key aspect of the target structure is that the {relation} value | ||
3383 : | # must match the entity name. | ||
3384 : | my @fieldList = map { "$targetEntity($_)" } grep { $entityFields->{$_}->{relation} eq $targetEntity } | ||
3385 : | keys %{$entityFields}; | ||
3386 : | # Push the fields found onto the return variable. | ||
3387 : | push @retVal, sort @fieldList; | ||
3388 : | # Get the field table for the relationship. | ||
3389 : | my $relationshipFields = $self->GetFieldTable($relationship); | ||
3390 : | # Here we have a different rule. We want all the fields other than "from-link" and "to-link". | ||
3391 : | # This may end up being an empty set. | ||
3392 : | my @fieldList2 = map { "$relationship($_)" } grep { $_ ne "from-link" && $_ ne "to-link" } | ||
3393 : | keys %{$relationshipFields}; | ||
3394 : | # Push these onto the return list. | ||
3395 : | push @retVal, sort @fieldList2; | ||
3396 : | # Return the result. | ||
3397 : | return @retVal; | ||
3398 : | } | ||
3399 : | |||
3400 : | parrello | 1.48 | =head3 FindColumn |
3401 : | |||
3402 : | C<< my $colIndex = ERDB::FindColumn($headerLine, $columnIdentifier); >> | ||
3403 : | |||
3404 : | Return the location a desired column in a data mining header line. The data | ||
3405 : | mining header line is a tab-separated list of column names. The column | ||
3406 : | identifier is either the numerical index of a column or the actual column | ||
3407 : | name. | ||
3408 : | |||
3409 : | =over 4 | ||
3410 : | |||
3411 : | =item headerLine | ||
3412 : | |||
3413 : | The header line from a data mining command, which consists of a tab-separated | ||
3414 : | list of column names. | ||
3415 : | |||
3416 : | =item columnIdentifier | ||
3417 : | |||
3418 : | Either the ordinal number of the desired column (1-based), or the name of the | ||
3419 : | desired column. | ||
3420 : | |||
3421 : | =item RETURN | ||
3422 : | |||
3423 : | Returns the array index (0-based) of the desired column. | ||
3424 : | |||
3425 : | =back | ||
3426 : | |||
3427 : | =cut | ||
3428 : | |||
3429 : | sub FindColumn { | ||
3430 : | # Get the parameters. | ||
3431 : | my ($headerLine, $columnIdentifier) = @_; | ||
3432 : | # Declare the return variable. | ||
3433 : | my $retVal; | ||
3434 : | # Split the header line into column names. | ||
3435 : | my @headers = ParseColumns($headerLine); | ||
3436 : | # Determine whether we have a number or a name. | ||
3437 : | if ($columnIdentifier =~ /^\d+$/) { | ||
3438 : | # Here we have a number. Subtract 1 and validate the result. | ||
3439 : | $retVal = $columnIdentifier - 1; | ||
3440 : | if ($retVal < 0 || $retVal > $#headers) { | ||
3441 : | Confess("Invalid column identifer \"$columnIdentifier\": value out of range."); | ||
3442 : | } | ||
3443 : | } else { | ||
3444 : | # Here we have a name. We need to find it in the list. | ||
3445 : | for (my $i = 0; $i <= $#headers && ! defined($retVal); $i++) { | ||
3446 : | if ($headers[$i] eq $columnIdentifier) { | ||
3447 : | $retVal = $i; | ||
3448 : | } | ||
3449 : | } | ||
3450 : | if (! defined($retVal)) { | ||
3451 : | Confess("Invalid column identifier \"$columnIdentifier\": value not found."); | ||
3452 : | } | ||
3453 : | } | ||
3454 : | # Return the result. | ||
3455 : | return $retVal; | ||
3456 : | } | ||
3457 : | |||
3458 : | =head3 ParseColumns | ||
3459 : | |||
3460 : | parrello | 1.50 | C<< my @columns = ERDB::ParseColumns($line); >> |
3461 : | parrello | 1.48 | |
3462 : | Convert the specified data line to a list of columns. | ||
3463 : | |||
3464 : | =over 4 | ||
3465 : | |||
3466 : | =item line | ||
3467 : | |||
3468 : | A data mining input, consisting of a tab-separated list of columns terminated by a | ||
3469 : | new-line. | ||
3470 : | |||
3471 : | =item RETURN | ||
3472 : | |||
3473 : | Returns a list consisting of the column values. | ||
3474 : | |||
3475 : | =back | ||
3476 : | |||
3477 : | =cut | ||
3478 : | |||
3479 : | sub ParseColumns { | ||
3480 : | # Get the parameters. | ||
3481 : | parrello | 1.50 | my ($line) = @_; |
3482 : | parrello | 1.48 | # Chop off the line-end. |
3483 : | chomp $line; | ||
3484 : | # Split it into a list. | ||
3485 : | my @retVal = split(/\t/, $line); | ||
3486 : | # Return the result. | ||
3487 : | return @retVal; | ||
3488 : | } | ||
3489 : | |||
3490 : | parrello | 1.71 | =head2 Virtual Methods |
3491 : | |||
3492 : | =head3 CleanKeywords | ||
3493 : | |||
3494 : | C<< my $cleanedString = $erdb->CleanKeywords($searchExpression); >> | ||
3495 : | |||
3496 : | Clean up a search expression or keyword list. This is a virtual method that may | ||
3497 : | be overridden by the subclass. The base-class method removes extra spaces | ||
3498 : | and converts everything to lower case. | ||
3499 : | |||
3500 : | =over 4 | ||
3501 : | |||
3502 : | =item searchExpression | ||
3503 : | |||
3504 : | Search expression or keyword list to clean. Note that a search expression may | ||
3505 : | contain boolean operators which need to be preserved. This includes leading | ||
3506 : | minus signs. | ||
3507 : | |||
3508 : | =item RETURN | ||
3509 : | |||
3510 : | Cleaned expression or keyword list. | ||
3511 : | |||
3512 : | =back | ||
3513 : | |||
3514 : | =cut | ||
3515 : | |||
3516 : | sub CleanKeywords { | ||
3517 : | # Get the parameters. | ||
3518 : | my ($self, $searchExpression) = @_; | ||
3519 : | # Lower-case the expression and copy it into the return variable. Note that we insure we | ||
3520 : | # don't accidentally end up with an undefined value. | ||
3521 : | my $retVal = lc($searchExpression || ""); | ||
3522 : | # Remove extra spaces. | ||
3523 : | $retVal =~ s/\s+/ /g; | ||
3524 : | $retVal =~ s/(^\s+)|(\s+$)//g; | ||
3525 : | # Return the result. | ||
3526 : | return $retVal; | ||
3527 : | } | ||
3528 : | |||
3529 : | parrello | 1.73 | =head3 GetSourceObject |
3530 : | |||
3531 : | C<< my $source = $erdb->GetSourceObject($entityName); >> | ||
3532 : | |||
3533 : | Return the object to be used in loading special attributes of the specified entity. The | ||
3534 : | algorithm for loading special attributes is stored in the C<DataGen> elements of the | ||
3535 : | XML | ||
3536 : | |||
3537 : | parrello | 1.1 | =head2 Internal Utility Methods |
3538 : | |||
3539 : | parrello | 1.70 | =head3 _RelationMap |
3540 : | |||
3541 : | C<< my @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef); >> | ||
3542 : | |||
3543 : | Create the relation map for an SQL query. The relation map is used by B<DBObject> | ||
3544 : | to determine how to interpret the results of the query. | ||
3545 : | |||
3546 : | =over 4 | ||
3547 : | |||
3548 : | =item mappedNameHashRef | ||
3549 : | |||
3550 : | Reference to a hash that maps modified object names to real object names. | ||
3551 : | |||
3552 : | =item mappedNameListRef | ||
3553 : | |||
3554 : | Reference to a list of modified object names in the order they appear in the | ||
3555 : | SELECT list. | ||
3556 : | |||
3557 : | =item RETURN | ||
3558 : | |||
3559 : | Returns a list of 2-tuples. Each tuple consists of an object name as used in the | ||
3560 : | query followed by the actual name of that object. This enables the B<DBObject> to | ||
3561 : | determine the order of the tables in the query and which object name belongs to each | ||
3562 : | mapped object name. Most of the time these two values are the same; however, if a | ||
3563 : | relation occurs twice in the query, the relation name in the field list and WHERE | ||
3564 : | clause will use a mapped name (generally the actual relation name with a numeric | ||
3565 : | suffix) that does not match the actual relation name. | ||
3566 : | |||
3567 : | =back | ||
3568 : | |||
3569 : | =cut | ||
3570 : | |||
3571 : | sub _RelationMap { | ||
3572 : | # Get the parameters. | ||
3573 : | my ($mappedNameHashRef, $mappedNameListRef) = @_; | ||
3574 : | # Declare the return variable. | ||
3575 : | my @retVal = (); | ||
3576 : | # Build the map. | ||
3577 : | for my $mappedName (@{$mappedNameListRef}) { | ||
3578 : | push @retVal, [$mappedName, $mappedNameHashRef->{$mappedName}]; | ||
3579 : | } | ||
3580 : | # Return it. | ||
3581 : | return @retVal; | ||
3582 : | } | ||
3583 : | |||
3584 : | |||
3585 : | =head3 _SetupSQL | ||
3586 : | parrello | 1.45 | |
3587 : | Process a list of object names and a filter clause so that they can be used to | ||
3588 : | build an SQL statement. This method takes in a reference to a list of object names | ||
3589 : | and a filter clause. It will return a corrected filter clause, a list of mapped | ||
3590 : | names and the mapped name hash. | ||
3591 : | |||
3592 : | This is an instance method. | ||
3593 : | |||
3594 : | =over 4 | ||
3595 : | |||
3596 : | =item objectNames | ||
3597 : | |||
3598 : | Reference to a list of the object names to be included in the query. | ||
3599 : | |||
3600 : | =item filterClause | ||
3601 : | |||
3602 : | A string containing the WHERE clause for the query (without the C<WHERE>) and also | ||
3603 : | optionally the C<ORDER BY> and C<LIMIT> clauses. | ||
3604 : | |||
3605 : | parrello | 1.70 | =item matchClause |
3606 : | |||
3607 : | An optional full-text search clause. If specified, it will be inserted at the | ||
3608 : | front of the WHERE clause. It should already be SQL-formatted; that is, the | ||
3609 : | field names should be in the form I<table>C<.>I<fieldName>. | ||
3610 : | |||
3611 : | parrello | 1.45 | =item RETURN |
3612 : | |||
3613 : | Returns a three-element list. The first element is the SQL statement suffix, beginning | ||
3614 : | with the FROM clause. The second element is a reference to a list of the names to be | ||
3615 : | used in retrieving the fields. The third element is a hash mapping the names to the | ||
3616 : | objects they represent. | ||
3617 : | |||
3618 : | =back | ||
3619 : | |||
3620 : | =cut | ||
3621 : | |||
3622 : | sub _SetupSQL { | ||
3623 : | parrello | 1.70 | my ($self, $objectNames, $filterClause, $matchClause) = @_; |
3624 : | parrello | 1.45 | # Adjust the list of object names to account for multiple occurrences of the |
3625 : | # same object. We start with a hash table keyed on object name that will | ||
3626 : | # return the object suffix. The first time an object is encountered it will | ||
3627 : | # not be found in the hash. The next time the hash will map the object name | ||
3628 : | # to 2, then 3, and so forth. | ||
3629 : | my %objectHash = (); | ||
3630 : | # This list will contain the object names as they are to appear in the | ||
3631 : | # FROM list. | ||
3632 : | my @fromList = (); | ||
3633 : | # This list contains the suffixed object name for each object. It is exactly | ||
3634 : | # parallel to the list in the $objectNames parameter. | ||
3635 : | my @mappedNameList = (); | ||
3636 : | # Finally, this hash translates from a mapped name to its original object name. | ||
3637 : | my %mappedNameHash = (); | ||
3638 : | # Now we create the lists. Note that for every single name we push something into | ||
3639 : | # @fromList and @mappedNameList. This insures that those two arrays are exactly | ||
3640 : | # parallel to $objectNames. | ||
3641 : | for my $objectName (@{$objectNames}) { | ||
3642 : | # Get the next suffix for this object. | ||
3643 : | my $suffix = $objectHash{$objectName}; | ||
3644 : | if (! $suffix) { | ||
3645 : | # Here we are seeing the object for the first time. The object name | ||
3646 : | # is used as is. | ||
3647 : | push @mappedNameList, $objectName; | ||
3648 : | push @fromList, $objectName; | ||
3649 : | $mappedNameHash{$objectName} = $objectName; | ||
3650 : | # Denote the next suffix will be 2. | ||
3651 : | $objectHash{$objectName} = 2; | ||
3652 : | } else { | ||
3653 : | # Here we've seen the object before. We construct a new name using | ||
3654 : | # the suffix from the hash and update the hash. | ||
3655 : | my $mappedName = "$objectName$suffix"; | ||
3656 : | $objectHash{$objectName} = $suffix + 1; | ||
3657 : | # The FROM list has the object name followed by the mapped name. This | ||
3658 : | # tells SQL it's still the same table, but we're using a different name | ||
3659 : | # for it to avoid confusion. | ||
3660 : | push @fromList, "$objectName $mappedName"; | ||
3661 : | # The mapped-name list contains the real mapped name. | ||
3662 : | push @mappedNameList, $mappedName; | ||
3663 : | # Finally, enable us to get back from the mapped name to the object name. | ||
3664 : | $mappedNameHash{$mappedName} = $objectName; | ||
3665 : | } | ||
3666 : | } | ||
3667 : | # Begin the SELECT suffix. It starts with | ||
3668 : | # | ||
3669 : | # FROM name1, name2, ... nameN | ||
3670 : | # | ||
3671 : | my $suffix = "FROM " . join(', ', @fromList); | ||
3672 : | parrello | 1.70 | # Now for the WHERE. First, we need a place for the filter string. |
3673 : | my $filterString = ""; | ||
3674 : | # We will also keep a list of conditions to add to the WHERE clause in order to link | ||
3675 : | # entities and relationships as well as primary relations to secondary ones. | ||
3676 : | my @joinWhere = (); | ||
3677 : | parrello | 1.45 | # Check for a filter clause. |
3678 : | if ($filterClause) { | ||
3679 : | # Here we have one, so we convert its field names and add it to the query. First, | ||
3680 : | # We create a copy of the filter string we can work with. | ||
3681 : | parrello | 1.70 | $filterString = $filterClause; |
3682 : | parrello | 1.45 | # Next, we sort the object names by length. This helps protect us from finding |
3683 : | # object names inside other object names when we're doing our search and replace. | ||
3684 : | my @sortedNames = sort { length($b) - length($a) } @mappedNameList; | ||
3685 : | # The final preparatory step is to create a hash table of relation names. The | ||
3686 : | # table begins with the relation names already in the SELECT command. We may | ||
3687 : | # need to add relations later if there is filtering on a field in a secondary | ||
3688 : | # relation. The secondary relations are the ones that contain multiply- | ||
3689 : | # occurring or optional fields. | ||
3690 : | my %fromNames = map { $_ => 1 } @sortedNames; | ||
3691 : | # We are ready to begin. We loop through the object names, replacing each | ||
3692 : | # object name's field references by the corresponding SQL field reference. | ||
3693 : | # Along the way, if we find a secondary relation, we will need to add it | ||
3694 : | # to the FROM clause. | ||
3695 : | for my $mappedName (@sortedNames) { | ||
3696 : | # Get the length of the object name plus 2. This is the value we add to the | ||
3697 : | # size of the field name to determine the size of the field reference as a | ||
3698 : | # whole. | ||
3699 : | my $nameLength = 2 + length $mappedName; | ||
3700 : | # Get the real object name for this mapped name. | ||
3701 : | my $objectName = $mappedNameHash{$mappedName}; | ||
3702 : | Trace("Processing $mappedName for object $objectName.") if T(4); | ||
3703 : | # Get the object's field list. | ||
3704 : | my $fieldList = $self->GetFieldTable($objectName); | ||
3705 : | # Find the field references for this object. | ||
3706 : | while ($filterString =~ m/$mappedName\(([^)]*)\)/g) { | ||
3707 : | # At this point, $1 contains the field name, and the current position | ||
3708 : | # is set immediately after the final parenthesis. We pull out the name of | ||
3709 : | # the field and the position and length of the field reference as a whole. | ||
3710 : | my $fieldName = $1; | ||
3711 : | my $len = $nameLength + length $fieldName; | ||
3712 : | my $pos = pos($filterString) - $len; | ||
3713 : | # Insure the field exists. | ||
3714 : | if (!exists $fieldList->{$fieldName}) { | ||
3715 : | Confess("Field $fieldName not found for object $objectName."); | ||
3716 : | } else { | ||
3717 : | Trace("Processing $fieldName at position $pos.") if T(4); | ||
3718 : | # Get the field's relation. | ||
3719 : | my $relationName = $fieldList->{$fieldName}->{relation}; | ||
3720 : | # Now we have a secondary relation. We need to insure it matches the | ||
3721 : | # mapped name of the primary relation. First we peel off the suffix | ||
3722 : | # from the mapped name. | ||
3723 : | my $mappingSuffix = substr $mappedName, length($objectName); | ||
3724 : | # Put the mapping suffix onto the relation name to get the | ||
3725 : | # mapped relation name. | ||
3726 : | my $mappedRelationName = "$relationName$mappingSuffix"; | ||
3727 : | # Insure the relation is in the FROM clause. | ||
3728 : | if (!exists $fromNames{$mappedRelationName}) { | ||
3729 : | # Add the relation to the FROM clause. | ||
3730 : | if ($mappedRelationName eq $relationName) { | ||
3731 : | # The name is un-mapped, so we add it without | ||
3732 : | # any frills. | ||
3733 : | $suffix .= ", $relationName"; | ||
3734 : | push @joinWhere, "$objectName.id = $relationName.id"; | ||
3735 : | } else { | ||
3736 : | # Here we have a mapping situation. | ||
3737 : | $suffix .= ", $relationName $mappedRelationName"; | ||
3738 : | push @joinWhere, "$mappedRelationName.id = $mappedName.id"; | ||
3739 : | } | ||
3740 : | # Denote we have this relation available for future fields. | ||
3741 : | $fromNames{$mappedRelationName} = 1; | ||
3742 : | } | ||
3743 : | # Form an SQL field reference from the relation name and the field name. | ||
3744 : | my $sqlReference = "$mappedRelationName." . _FixName($fieldName); | ||
3745 : | # Put it into the filter string in place of the old value. | ||
3746 : | substr($filterString, $pos, $len) = $sqlReference; | ||
3747 : | # Reposition the search. | ||
3748 : | pos $filterString = $pos + length $sqlReference; | ||
3749 : | } | ||
3750 : | } | ||
3751 : | } | ||
3752 : | parrello | 1.70 | } |
3753 : | # The next step is to join the objects together. We only need to do this if there | ||
3754 : | # is more than one object in the object list. We start with the first object and | ||
3755 : | # run through the objects after it. Note also that we make a safety copy of the | ||
3756 : | # list before running through it, because we shift off the first object before | ||
3757 : | # processing the rest. | ||
3758 : | my @mappedObjectList = @mappedNameList; | ||
3759 : | my $lastMappedObject = shift @mappedObjectList; | ||
3760 : | # Get the join table. | ||
3761 : | my $joinTable = $self->{_metaData}->{Joins}; | ||
3762 : | # Loop through the object list. | ||
3763 : | for my $thisMappedObject (@mappedObjectList) { | ||
3764 : | # Look for a join using the real object names. | ||
3765 : | my $lastObject = $mappedNameHash{$lastMappedObject}; | ||
3766 : | my $thisObject = $mappedNameHash{$thisMappedObject}; | ||
3767 : | my $joinKey = "$lastObject/$thisObject"; | ||
3768 : | if (!exists $joinTable->{$joinKey}) { | ||
3769 : | # Here there's no join, so we throw an error. | ||
3770 : | Confess("No join exists to connect from $lastMappedObject to $thisMappedObject."); | ||
3771 : | } else { | ||
3772 : | # Get the join clause. | ||
3773 : | my $unMappedJoin = $joinTable->{$joinKey}; | ||
3774 : | # Fix the names. | ||
3775 : | $unMappedJoin =~ s/$lastObject/$lastMappedObject/; | ||
3776 : | $unMappedJoin =~ s/$thisObject/$thisMappedObject/; | ||
3777 : | push @joinWhere, $unMappedJoin; | ||
3778 : | # Save this object as the last object for the next iteration. | ||
3779 : | $lastMappedObject = $thisMappedObject; | ||
3780 : | } | ||
3781 : | } | ||
3782 : | # Now we need to handle the whole ORDER BY / LIMIT thing. The important part | ||
3783 : | # here is we want the filter clause to be empty if there's no WHERE filter. | ||
3784 : | # We'll put the ORDER BY / LIMIT clauses in the following variable. | ||
3785 : | my $orderClause = ""; | ||
3786 : | # This is only necessary if we have a filter string in which the ORDER BY | ||
3787 : | # and LIMIT clauses can live. | ||
3788 : | if ($filterString) { | ||
3789 : | parrello | 1.45 | # Locate the ORDER BY or LIMIT verbs (if any). We use a non-greedy |
3790 : | # operator so that we find the first occurrence of either verb. | ||
3791 : | if ($filterString =~ m/^(.*?)\s*(ORDER BY|LIMIT)/g) { | ||
3792 : | # Here we have an ORDER BY or LIMIT verb. Split it off of the filter string. | ||
3793 : | my $pos = pos $filterString; | ||
3794 : | $orderClause = $2 . substr($filterString, $pos); | ||
3795 : | $filterString = $1; | ||
3796 : | } | ||
3797 : | parrello | 1.70 | } |
3798 : | # All the things that are supposed to be in the WHERE clause of the | ||
3799 : | # SELECT command need to be put into @joinWhere so we can string them | ||
3800 : | # together. We begin with the match clause. This is important, | ||
3801 : | # because the match clause's parameter mark must precede any parameter | ||
3802 : | # marks in the filter string. | ||
3803 : | if ($matchClause) { | ||
3804 : | push @joinWhere, $matchClause; | ||
3805 : | } | ||
3806 : | # Add the filter string. We put it in parentheses to avoid operator | ||
3807 : | # precedence problems with the match clause or the joins. | ||
3808 : | if ($filterString) { | ||
3809 : | Trace("Filter string is \"$filterString\".") if T(4); | ||
3810 : | push @joinWhere, "($filterString)"; | ||
3811 : | } | ||
3812 : | # String it all together into a big filter clause. | ||
3813 : | if (@joinWhere) { | ||
3814 : | $suffix .= " WHERE " . join(' AND ', @joinWhere); | ||
3815 : | } | ||
3816 : | # Add the sort or limit clause (if any). | ||
3817 : | if ($orderClause) { | ||
3818 : | $suffix .= " $orderClause"; | ||
3819 : | parrello | 1.45 | } |
3820 : | # Return the suffix, the mapped name list, and the mapped name hash. | ||
3821 : | return ($suffix, \@mappedNameList, \%mappedNameHash); | ||
3822 : | } | ||
3823 : | |||
3824 : | parrello | 1.70 | =head3 _GetStatementHandle |
3825 : | parrello | 1.45 | |
3826 : | This method will prepare and execute an SQL query, returning the statement handle. | ||
3827 : | The main reason for doing this here is so that everybody who does SQL queries gets | ||
3828 : | the benefit of tracing. | ||
3829 : | |||
3830 : | This is an instance method. | ||
3831 : | |||
3832 : | =over 4 | ||
3833 : | |||
3834 : | =item command | ||
3835 : | |||
3836 : | Command to prepare and execute. | ||
3837 : | |||
3838 : | =item params | ||
3839 : | |||
3840 : | Reference to a list of the values to be substituted in for the parameter marks. | ||
3841 : | |||
3842 : | =item RETURN | ||
3843 : | |||
3844 : | Returns a prepared and executed statement handle from which the caller can extract | ||
3845 : | results. | ||
3846 : | |||
3847 : | =back | ||
3848 : | |||
3849 : | =cut | ||
3850 : | |||
3851 : | sub _GetStatementHandle { | ||
3852 : | # Get the parameters. | ||
3853 : | my ($self, $command, $params) = @_; | ||
3854 : | # Trace the query. | ||
3855 : | Trace("SQL query: $command") if T(SQL => 3); | ||
3856 : | Trace("PARMS: '" . (join "', '", @{$params}) . "'") if (T(SQL => 4) && (@{$params} > 0)); | ||
3857 : | # Get the database handle. | ||
3858 : | my $dbh = $self->{_dbh}; | ||
3859 : | # Prepare the command. | ||
3860 : | my $sth = $dbh->prepare_command($command); | ||
3861 : | # Execute it with the parameters bound in. | ||
3862 : | parrello | 1.75 | $sth->execute(@{$params}) || Confess("SELECT error: " . $sth->errstr()); |
3863 : | parrello | 1.45 | # Return the statement handle. |
3864 : | return $sth; | ||
3865 : | } | ||
3866 : | |||
3867 : | parrello | 1.70 | =head3 _GetLoadStats |
3868 : | parrello | 1.1 | |
3869 : | Return a blank statistics object for use by the load methods. | ||
3870 : | |||
3871 : | This is a static method. | ||
3872 : | |||
3873 : | =cut | ||
3874 : | |||
3875 : | parrello | 1.29 | sub _GetLoadStats{ |
3876 : | parrello | 1.28 | return Stats->new(); |
3877 : | parrello | 1.1 | } |
3878 : | |||
3879 : | parrello | 1.70 | =head3 _DumpRelation |
3880 : | parrello | 1.1 | |
3881 : | parrello | 1.73 | Dump the specified relation to the specified output file in tab-delimited format. |
3882 : | parrello | 1.1 | |
3883 : | This is an instance method. | ||
3884 : | |||
3885 : | =over 4 | ||
3886 : | |||
3887 : | =item outputDirectory | ||
3888 : | |||
3889 : | Directory to contain the output file. | ||
3890 : | |||
3891 : | =item relationName | ||
3892 : | |||
3893 : | Name of the relation to dump. | ||
3894 : | |||
3895 : | =item relation | ||
3896 : | |||
3897 : | Structure describing the relation to be dumped. | ||
3898 : | |||
3899 : | =back | ||
3900 : | |||
3901 : | =cut | ||
3902 : | |||
3903 : | sub _DumpRelation { | ||
3904 : | parrello | 1.10 | # Get the parameters. |
3905 : | my ($self, $outputDirectory, $relationName, $relation) = @_; | ||
3906 : | # Open the output file. | ||
3907 : | my $fileName = "$outputDirectory/$relationName.dtx"; | ||
3908 : | open(DTXOUT, ">$fileName") || Confess("Could not open dump file $fileName: $!"); | ||
3909 : | # Create a query for the specified relation. | ||
3910 : | my $dbh = $self->{_dbh}; | ||
3911 : | my $query = $dbh->prepare_command("SELECT * FROM $relationName"); | ||
3912 : | # Execute the query. | ||
3913 : | $query->execute() || Confess("SELECT error dumping $relationName."); | ||
3914 : | # Loop through the results. | ||
3915 : | while (my @row = $query->fetchrow) { | ||
3916 : | # Escape any tabs or new-lines in the row text. | ||
3917 : | for my $field (@row) { | ||
3918 : | $field =~ s/\n/\\n/g; | ||
3919 : | $field =~ s/\t/\\t/g; | ||
3920 : | } | ||
3921 : | # Tab-join the row and write it to the output file. | ||
3922 : | my $rowText = join("\t", @row); | ||
3923 : | print DTXOUT "$rowText\n"; | ||
3924 : | } | ||
3925 : | # Close the output file. | ||
3926 : | close DTXOUT; | ||
3927 : | parrello | 1.1 | } |
3928 : | |||
3929 : | parrello | 1.70 | =head3 _GetStructure |
3930 : | parrello | 1.1 | |
3931 : | Get the data structure for a specified entity or relationship. | ||
3932 : | |||
3933 : | This is an instance method. | ||
3934 : | |||
3935 : | =over 4 | ||
3936 : | |||
3937 : | =item objectName | ||
3938 : | |||
3939 : | Name of the desired entity or relationship. | ||
3940 : | |||
3941 : | =item RETURN | ||
3942 : | |||
3943 : | The descriptor for the specified object. | ||
3944 : | |||
3945 : | =back | ||
3946 : | |||
3947 : | =cut | ||
3948 : | |||
3949 : | sub _GetStructure { | ||
3950 : | parrello | 1.10 | # Get the parameters. |
3951 : | my ($self, $objectName) = @_; | ||
3952 : | # Get the metadata structure. | ||
3953 : | my $metadata = $self->{_metaData}; | ||
3954 : | # Declare the variable to receive the descriptor. | ||
3955 : | my $retVal; | ||
3956 : | # Get the descriptor from the metadata. | ||
3957 : | if (exists $metadata->{Entities}->{$objectName}) { | ||
3958 : | $retVal = $metadata->{Entities}->{$objectName}; | ||
3959 : | } elsif (exists $metadata->{Relationships}->{$objectName}) { | ||
3960 : | $retVal = $metadata->{Relationships}->{$objectName}; | ||
3961 : | } else { | ||
3962 : | Confess("Object $objectName not found in database."); | ||
3963 : | } | ||
3964 : | # Return the descriptor. | ||
3965 : | return $retVal; | ||
3966 : | parrello | 1.1 | } |
3967 : | |||
3968 : | parrello | 1.70 | |
3969 : | |||
3970 : | =head3 _GetRelationTable | ||
3971 : | parrello | 1.1 | |
3972 : | Get the list of relations for a specified entity or relationship. | ||
3973 : | |||
3974 : | This is an instance method. | ||
3975 : | |||
3976 : | =over 4 | ||
3977 : | |||
3978 : | =item objectName | ||
3979 : | |||
3980 : | Name of the desired entity or relationship. | ||
3981 : | |||
3982 : | =item RETURN | ||
3983 : | |||
3984 : | A table containing the relations for the specified object. | ||
3985 : | |||
3986 : | =back | ||
3987 : | |||
3988 : | =cut | ||
3989 : | |||
3990 : | sub _GetRelationTable { | ||
3991 : | parrello | 1.10 | # Get the parameters. |
3992 : | my ($self, $objectName) = @_; | ||
3993 : | # Get the descriptor from the metadata. | ||
3994 : | my $objectData = $self->_GetStructure($objectName); | ||
3995 : | # Return the object's relation list. | ||
3996 : | return $objectData->{Relations}; | ||
3997 : | parrello | 1.1 | } |
3998 : | |||
3999 : | parrello | 1.70 | =head3 _ValidateFieldNames |
4000 : | parrello | 1.1 | |
4001 : | Determine whether or not the field names are valid. A description of the problems with the names | ||
4002 : | will be written to the standard error output. If there is an error, this method will abort. This is | ||
4003 : | a static method. | ||
4004 : | |||
4005 : | =over 4 | ||
4006 : | |||
4007 : | =item metadata | ||
4008 : | |||
4009 : | Metadata structure loaded from the XML data definition. | ||
4010 : | |||
4011 : | =back | ||
4012 : | |||
4013 : | =cut | ||
4014 : | |||
4015 : | sub _ValidateFieldNames { | ||
4016 : | parrello | 1.10 | # Get the object. |
4017 : | my ($metadata) = @_; | ||
4018 : | # Declare the return value. We assume success. | ||
4019 : | my $retVal = 1; | ||
4020 : | # Loop through the sections of the database definition. | ||
4021 : | for my $section ('Entities', 'Relationships') { | ||
4022 : | # Loop through the objects in this section. | ||
4023 : | for my $object (values %{$metadata->{$section}}) { | ||
4024 : | # Loop through the object's fields. | ||
4025 : | for my $fieldName (keys %{$object->{Fields}}) { | ||
4026 : | parrello | 1.73 | # If this field name is invalid, set the return value to zero |
4027 : | # so we know we encountered an error. | ||
4028 : | if (! ValidateFieldName($fieldName)) { | ||
4029 : | parrello | 1.10 | $retVal = 0; |
4030 : | } | ||
4031 : | } | ||
4032 : | } | ||
4033 : | } | ||
4034 : | # If an error was found, fail. | ||
4035 : | if ($retVal == 0) { | ||
4036 : | Confess("Errors found in field names."); | ||
4037 : | } | ||
4038 : | parrello | 1.1 | } |
4039 : | |||
4040 : | parrello | 1.70 | =head3 _LoadRelation |
4041 : | parrello | 1.1 | |
4042 : | Load a relation from the data in a tab-delimited disk file. The load will only take place if a disk | ||
4043 : | file with the same name as the relation exists in the specified directory. | ||
4044 : | |||
4045 : | This is an instance method. | ||
4046 : | |||
4047 : | =over 4 | ||
4048 : | |||
4049 : | =item dbh | ||
4050 : | |||
4051 : | DBKernel object for accessing the database. | ||
4052 : | |||
4053 : | =item directoryName | ||
4054 : | |||
4055 : | Name of the directory containing the tab-delimited data files. | ||
4056 : | |||
4057 : | =item relationName | ||
4058 : | |||
4059 : | Name of the relation to load. | ||
4060 : | |||
4061 : | =item rebuild | ||
4062 : | |||
4063 : | TRUE if the table should be dropped and re-created before loading. | ||
4064 : | |||
4065 : | =item RETURN | ||
4066 : | |||
4067 : | Returns a statistical object describing the number of records read and a list of error messages. | ||
4068 : | |||
4069 : | =back | ||
4070 : | |||
4071 : | =cut | ||
4072 : | |||
4073 : | sub _LoadRelation { | ||
4074 : | parrello | 1.10 | # Get the parameters. |
4075 : | my ($self, $directoryName, $relationName, $rebuild) = @_; | ||
4076 : | # Create the file name. | ||
4077 : | my $fileName = "$directoryName/$relationName"; | ||
4078 : | # If the file doesn't exist, try adding the .dtx suffix. | ||
4079 : | if (! -e $fileName) { | ||
4080 : | $fileName .= ".dtx"; | ||
4081 : | if (! -e $fileName) { | ||
4082 : | $fileName = ""; | ||
4083 : | } | ||
4084 : | } | ||
4085 : | # Create the return object. | ||
4086 : | my $retVal = _GetLoadStats(); | ||
4087 : | # If a file exists to load the table, its name will be in $fileName. Otherwise, $fileName will | ||
4088 : | # be a null string. | ||
4089 : | if ($fileName ne "") { | ||
4090 : | # Load the relation from the file. | ||
4091 : | $retVal = $self->LoadTable($fileName, $relationName, $rebuild); | ||
4092 : | } elsif ($rebuild) { | ||
4093 : | # Here we are rebuilding, but no file exists, so we just re-create the table. | ||
4094 : | $self->CreateTable($relationName, 1); | ||
4095 : | } | ||
4096 : | # Return the statistics from the load. | ||
4097 : | return $retVal; | ||
4098 : | parrello | 1.1 | } |
4099 : | |||
4100 : | parrello | 1.73 | |
4101 : | parrello | 1.70 | =head3 _LoadMetaData |
4102 : | parrello | 1.1 | |
4103 : | This method loads the data describing this database from an XML file into a metadata structure. | ||
4104 : | The resulting structure is a set of nested hash tables containing all the information needed to | ||
4105 : | load or use the database. The schema for the XML file is F<ERDatabase.xml>. | ||
4106 : | |||
4107 : | This is a static method. | ||
4108 : | |||
4109 : | =over 4 | ||
4110 : | |||
4111 : | =item filename | ||
4112 : | |||
4113 : | Name of the file containing the database definition. | ||
4114 : | |||
4115 : | =item RETURN | ||
4116 : | |||
4117 : | Returns a structure describing the database. | ||
4118 : | |||
4119 : | =back | ||
4120 : | |||
4121 : | =cut | ||
4122 : | |||
4123 : | sub _LoadMetaData { | ||
4124 : | parrello | 1.10 | # Get the parameters. |
4125 : | my ($filename) = @_; | ||
4126 : | parrello | 1.75 | Trace("Reading DBD from $filename.") if T(2); |
4127 : | parrello | 1.10 | # Slurp the XML file into a variable. Extensive use of options is used to insure we |
4128 : | # get the exact structure we want. | ||
4129 : | parrello | 1.73 | my $metadata = ReadMetaXML($filename); |
4130 : | parrello | 1.10 | # Before we go any farther, we need to validate the field and object names. If an error is found, |
4131 : | # the method below will fail. | ||
4132 : | _ValidateFieldNames($metadata); | ||
4133 : | # Next we need to create a hash table for finding relations. The entities and relationships are | ||
4134 : | # implemented as one or more database relations. | ||
4135 : | my %masterRelationTable = (); | ||
4136 : | # Loop through the entities. | ||
4137 : | my $entityList = $metadata->{Entities}; | ||
4138 : | for my $entityName (keys %{$entityList}) { | ||
4139 : | my $entityStructure = $entityList->{$entityName}; | ||
4140 : | # | ||
4141 : | parrello | 1.12 | # The first step is to create all the entity's default values. For C<Field> elements, |
4142 : | parrello | 1.10 | # the relation name must be added where it is not specified. For relationships, |
4143 : | # the B<from-link> and B<to-link> fields must be inserted, and for entities an B<id> | ||
4144 : | # field must be added to each relation. Finally, each field will have a C<PrettySort> attribute | ||
4145 : | # added that can be used to pull the implicit fields to the top when displaying the field | ||
4146 : | # documentation. The PrettySort values are 1-based and indicate in which pass through a | ||
4147 : | # relation's data the field should be displayed-- 1 for the first pass, 2 for the second, | ||
4148 : | # and so on. | ||
4149 : | # | ||
4150 : | # Fix up this entity. | ||
4151 : | _FixupFields($entityStructure, $entityName, 2, 3); | ||
4152 : | # Add the ID field. | ||
4153 : | _AddField($entityStructure, 'id', { type => $entityStructure->{keyType}, | ||
4154 : | relation => $entityName, | ||
4155 : | Notes => { content => "Unique identifier for this \[b\]$entityName\[/b\]." }, | ||
4156 : | PrettySort => 1}); | ||
4157 : | # | ||
4158 : | # The current field list enables us to quickly find the relation containing a particular field. | ||
4159 : | # We also need a list that tells us the fields in each relation. We do this by creating a | ||
4160 : | # Relations structure in the entity structure and collating the fields into it based on their | ||
4161 : | # C<relation> property. There is one tricky bit, which is that every relation has to have the | ||
4162 : | # C<id> field in it. Note also that the field list is put into a C<Fields> member of the | ||
4163 : | # relation's structure so that it looks more like the entity and relationship structures. | ||
4164 : | # | ||
4165 : | # First we need to create the relations list. | ||
4166 : | my $relationTable = { }; | ||
4167 : | # Loop through the fields. We use a list of field names to prevent a problem with | ||
4168 : | # the hash table cursor losing its place during the loop. | ||
4169 : | my $fieldList = $entityStructure->{Fields}; | ||
4170 : | my @fieldNames = keys %{$fieldList}; | ||
4171 : | for my $fieldName (@fieldNames) { | ||
4172 : | my $fieldData = $fieldList->{$fieldName}; | ||
4173 : | # Get the current field's relation name. | ||
4174 : | my $relationName = $fieldData->{relation}; | ||
4175 : | # Insure the relation exists. | ||
4176 : | if (!exists $relationTable->{$relationName}) { | ||
4177 : | $relationTable->{$relationName} = { Fields => { } }; | ||
4178 : | } | ||
4179 : | # Add the field to the relation's field structure. | ||
4180 : | $relationTable->{$relationName}->{Fields}->{$fieldName} = $fieldData; | ||
4181 : | } | ||
4182 : | # Now that we've organized all our fields by relation name we need to do some serious | ||
4183 : | # housekeeping. We must add the C<id> field to every relation and convert each relation | ||
4184 : | # to a list of fields. First, we need the ID field itself. | ||
4185 : | my $idField = $fieldList->{id}; | ||
4186 : | # Loop through the relations. | ||
4187 : | for my $relationName (keys %{$relationTable}) { | ||
4188 : | my $relation = $relationTable->{$relationName}; | ||
4189 : | # Get the relation's field list. | ||
4190 : | my $relationFieldList = $relation->{Fields}; | ||
4191 : | # Add the ID field to it. If the field's already there, it will not make any | ||
4192 : | # difference. | ||
4193 : | $relationFieldList->{id} = $idField; | ||
4194 : | # Convert the field set from a hash into a list using the pretty-sort number. | ||
4195 : | $relation->{Fields} = _ReOrderRelationTable($relationFieldList); | ||
4196 : | # Add the relation to the master table. | ||
4197 : | $masterRelationTable{$relationName} = $relation; | ||
4198 : | } | ||
4199 : | # The indexes come next. The primary relation will have a unique-keyed index based on the ID field. | ||
4200 : | # The other relations must have at least one index that begins with the ID field. In addition, the | ||
4201 : | # metadata may require alternate indexes. We do those alternate indexes first. To begin, we need to | ||
4202 : | # get the entity's field list and index list. | ||
4203 : | my $indexList = $entityStructure->{Indexes}; | ||
4204 : | # Loop through the indexes. | ||
4205 : | for my $indexData (@{$indexList}) { | ||
4206 : | # We need to find this index's fields. All of them should belong to the same relation. | ||
4207 : | # The ID field is an exception, since it's in all relations. | ||
4208 : | my $relationName = '0'; | ||
4209 : | for my $fieldDescriptor (@{$indexData->{IndexFields}}) { | ||
4210 : | # Get this field's name. | ||
4211 : | my $fieldName = $fieldDescriptor->{name}; | ||
4212 : | # Only proceed if it is NOT the ID field. | ||
4213 : | if ($fieldName ne 'id') { | ||
4214 : | # Find the relation containing the current index field. | ||
4215 : | my $thisName = $fieldList->{$fieldName}->{relation}; | ||
4216 : | if ($relationName eq '0') { | ||
4217 : | # Here we're looking at the first field, so we save its relation name. | ||
4218 : | $relationName = $thisName; | ||
4219 : | } elsif ($relationName ne $thisName) { | ||
4220 : | # Here we have a field mismatch. | ||
4221 : | Confess("Mixed index: field $fieldName does not belong to relation $relationName."); | ||
4222 : | } | ||
4223 : | } | ||
4224 : | } | ||
4225 : | # Now $relationName is the name of the relation that contains this index. Add the index structure | ||
4226 : | # to the relation. | ||
4227 : | push @{$relationTable->{$relationName}->{Indexes}}, $indexData; | ||
4228 : | } | ||
4229 : | # Now each index has been put in a relation. We need to add the primary index for the primary | ||
4230 : | # relation. | ||
4231 : | push @{$relationTable->{$entityName}->{Indexes}}, | ||
4232 : | { IndexFields => [ {name => 'id', order => 'ascending'} ], Unique => 'true', | ||
4233 : | Notes => { content => "Primary index for $entityName." } | ||
4234 : | }; | ||
4235 : | # The next step is to insure that each relation has at least one index that begins with the ID field. | ||
4236 : | # After that, we convert each relation's index list to an index table. We first need to loop through | ||
4237 : | # the relations. | ||
4238 : | for my $relationName (keys %{$relationTable}) { | ||
4239 : | my $relation = $relationTable->{$relationName}; | ||
4240 : | # Get the relation's index list. | ||
4241 : | my $indexList = $relation->{Indexes}; | ||
4242 : | # Insure this relation has an ID index. | ||
4243 : | my $found = 0; | ||
4244 : | for my $index (@{$indexList}) { | ||
4245 : | if ($index->{IndexFields}->[0]->{name} eq "id") { | ||
4246 : | $found = 1; | ||
4247 : | } | ||
4248 : | } | ||
4249 : | if ($found == 0) { | ||
4250 : | push @{$indexList}, { IndexFields => [ {name => 'id', order => 'ascending'} ] }; | ||
4251 : | } | ||
4252 : | # Now we need to convert the relation's index list to an index table. We begin by creating | ||
4253 : | # an empty table in the relation structure. | ||
4254 : | $relation->{Indexes} = { }; | ||
4255 : | # Loop through the indexes. | ||
4256 : | my $count = 0; | ||
4257 : | for my $index (@{$indexList}) { | ||
4258 : | # Add this index to the index table. | ||
4259 : | parrello | 1.75 | _AddIndex("idx$count", $relation, $index); |
4260 : | parrello | 1.10 | # Increment the counter so that the next index has a different name. |
4261 : | $count++; | ||
4262 : | } | ||
4263 : | } | ||
4264 : | # Finally, we add the relation structure to the entity. | ||
4265 : | $entityStructure->{Relations} = $relationTable; | ||
4266 : | } | ||
4267 : | # Loop through the relationships. Relationships actually turn out to be much simpler than entities. | ||
4268 : | # For one thing, there is only a single constituent relation. | ||
4269 : | my $relationshipList = $metadata->{Relationships}; | ||
4270 : | for my $relationshipName (keys %{$relationshipList}) { | ||
4271 : | my $relationshipStructure = $relationshipList->{$relationshipName}; | ||
4272 : | # Fix up this relationship. | ||
4273 : | _FixupFields($relationshipStructure, $relationshipName, 2, 3); | ||
4274 : | # Format a description for the FROM field. | ||
4275 : | my $fromEntity = $relationshipStructure->{from}; | ||
4276 : | parrello | 1.83 | my $fromComment = "[b]id[/b] of the source [b][link #$fromEntity]$fromEntity\[/link][/b]."; |
4277 : | parrello | 1.10 | # Get the FROM entity's key type. |
4278 : | my $fromType = $entityList->{$fromEntity}->{keyType}; | ||
4279 : | # Add the FROM field. | ||
4280 : | _AddField($relationshipStructure, 'from-link', { type => $fromType, | ||
4281 : | relation => $relationshipName, | ||
4282 : | Notes => { content => $fromComment }, | ||
4283 : | PrettySort => 1}); | ||
4284 : | # Format a description for the TO field. | ||
4285 : | my $toEntity = $relationshipStructure->{to}; | ||
4286 : | parrello | 1.83 | my $toComment = "[b]id[/b] of the target [b][link #$toEntity]$toEntity\[/link][/b]."; |
4287 : | parrello | 1.10 | # Get the TO entity's key type. |
4288 : | my $toType = $entityList->{$toEntity}->{keyType}; | ||
4289 : | # Add the TO field. | ||
4290 : | _AddField($relationshipStructure, 'to-link', { type=> $toType, | ||
4291 : | relation => $relationshipName, | ||
4292 : | Notes => { content => $toComment }, | ||
4293 : | PrettySort => 1}); | ||
4294 : | # Create an index-free relation from the fields. | ||
4295 : | my $thisRelation = { Fields => _ReOrderRelationTable($relationshipStructure->{Fields}), | ||
4296 : | Indexes => { } }; | ||
4297 : | $relationshipStructure->{Relations} = { $relationshipName => $thisRelation }; | ||
4298 : | # Create the FROM and TO indexes. | ||
4299 : | _CreateRelationshipIndex("From", $relationshipName, $relationshipStructure); | ||
4300 : | _CreateRelationshipIndex("To", $relationshipName, $relationshipStructure); | ||
4301 : | # Add the relation to the master table. | ||
4302 : | $masterRelationTable{$relationshipName} = $thisRelation; | ||
4303 : | } | ||
4304 : | # Now store the master relation table in the metadata structure. | ||
4305 : | $metadata->{RelationTable} = \%masterRelationTable; | ||
4306 : | # Our final task is to create the join table. The join table is a hash that describes all | ||
4307 : | # the join clauses for traveling through the relationships. The join clause is an equality | ||
4308 : | # condition that can be put into a WHERE clause in order to join two objects. Two relationships | ||
4309 : | # can be joined if they share an entity in common; and an entity can be joined to a relationship | ||
4310 : | # if the entity is at either end of the relationship. | ||
4311 : | my %joinTable = (); | ||
4312 : | # Loop through the entities. | ||
4313 : | for my $entityName (keys %{$entityList}) { | ||
4314 : | # Build three lists of the relationships connected to this entity. One will be | ||
4315 : | # for relationships from the entity, one for relationships to the entity, and | ||
4316 : | # one for recursive relationships. | ||
4317 : | my @fromList = (); | ||
4318 : | my @toList = (); | ||
4319 : | my @bothList = (); | ||
4320 : | parrello | 1.21 | Trace("Join table build for $entityName.") if T(metadata => 4); |
4321 : | parrello |