Parent Directory
|
Revision Log
Revision 1.76 - (view) (download) (as text)
1 : | parrello | 1.1 | package ERDB; |
2 : | |||
3 : | parrello | 1.10 | use strict; |
4 : | use Tracer; | ||
5 : | olson | 1.14 | use DBrtns; |
6 : | parrello | 1.10 | use Data::Dumper; |
7 : | use XML::Simple; | ||
8 : | use DBQuery; | ||
9 : | use DBObject; | ||
10 : | use Stats; | ||
11 : | use Time::HiRes qw(gettimeofday); | ||
12 : | parrello | 1.42 | use Digest::MD5 qw(md5_base64); |
13 : | parrello | 1.19 | use FIG; |
14 : | parrello | 1.73 | use CGI; |
15 : | parrello | 1.1 | |
16 : | =head1 Entity-Relationship Database Package | ||
17 : | |||
18 : | =head2 Introduction | ||
19 : | |||
20 : | The Entity-Relationship Database Package allows the client to create an easily-configurable | ||
21 : | database of Entities connected by Relationships. Each entity is represented by one or more | ||
22 : | relations in an underlying SQL database. Each relationship is represented by a single | ||
23 : | relation that connects two entities. | ||
24 : | |||
25 : | Although this package is designed for general use, all examples are derived from the | ||
26 : | Sprout database, which is the first database implemented using this package. | ||
27 : | |||
28 : | Each entity has at least one relation, the I<primary relation>, that has the same name as | ||
29 : | the entity. The primary relation contains a field named C<id> that contains the unique | ||
30 : | identifier of each entity instance. An entity may have additional relations that contain | ||
31 : | fields which are optional or can occur more than once. For example, the B<FEATURE> entity | ||
32 : | has a B<feature-type> attribute that occurs exactly once for each feature. This attribute | ||
33 : | is implemented by a C<feature_type> column in the primary relation C<Feature>. In addition, | ||
34 : | however, a feature may have zero or more aliases. These are implemented using a C<FeatureAlias> | ||
35 : | relation that contains two fields-- the feature ID (C<id>) and the alias name (C<alias>). | ||
36 : | The B<FEATURE> entity also contains an optional virulence number. This is implemented | ||
37 : | as a separate relation C<FeatureVirulence> which contains an ID (C<id>) and a virulence number | ||
38 : | parrello | 1.8 | (C<virulence>). If the virulence of a feature I<ABC> is known to be 6, there will be one row in |
39 : | the C<FeatureVirulence> relation possessing the value I<ABC> as its ID and 6 as its virulence | ||
40 : | number. If the virulence of I<ABC> is not known, there will not be any rows for it in | ||
41 : | C<FeatureVirulence>. | ||
42 : | parrello | 1.1 | |
43 : | Entities are connected by binary relationships implemented using single relations possessing the | ||
44 : | same name as the relationship itself and that has an I<arity> of 1-to-1 (C<11>), 1-to-many (C<1M>), | ||
45 : | or many-to-many (C<MM>). Each relationship's relation contains a C<from-link> field that contains the | ||
46 : | ID of the source entity and a C<to-link> field that contains the ID of the target entity. The name | ||
47 : | of the relationship is generally a verb phrase with the source entity as the subject and the | ||
48 : | target entity as the object. So, for example, the B<ComesFrom> relationship connects the B<GENOME> | ||
49 : | and B<SOURCE> entities, and indicates that a particular source organization participated in the | ||
50 : | mapping of the genome. A source organization frequently participates in the mapping | ||
51 : | of many genomes, and many source organizations can cooperate in the mapping of a single genome, so | ||
52 : | this relationship has an arity of many-to-many (C<MM>). The relation that implements the B<ComesFrom> | ||
53 : | relationship is called C<ComesFrom> and contains two fields-- C<from-link>, which contains a genome ID, | ||
54 : | and C<to-link>, which contains a source ID. | ||
55 : | |||
56 : | A relationship may itself have attributes. These attributes, known as I<intersection data attributes>, | ||
57 : | are implemented as additional fields in the relationship's relation. So, for example, the | ||
58 : | B<IsMadeUpOf> relationship connects the B<Contig> entity to the B<Sequence> entity, and is used | ||
59 : | to determine which sequences make up a contig. The relationship has as an attribute the | ||
60 : | B<start-position>, which indicates where in the contig that the sequence begins. This attribute | ||
61 : | is implemented as the C<start_position> field in the C<IsMadeUpOf> relation. | ||
62 : | |||
63 : | parrello | 1.73 | The database itself is described by an XML file. In addition to all the data required to define |
64 : | the entities, relationships, and attributes, the schema provides space for notes describing | ||
65 : | the data and what it means. These notes are used by L</ShowMetaData> to generate documentation | ||
66 : | for the database. | ||
67 : | |||
68 : | Special support is provided for text searching. An entity field can be marked as <em>searchable</em>, | ||
69 : | in which case it will be used to generate a text search index in which the user searches for words | ||
70 : | in the field instead of a particular field value. | ||
71 : | parrello | 1.1 | |
72 : | Finally, every entity and relationship object has a flag indicating if it is new or old. The object | ||
73 : | is considered I<old> if it was loaded by the L</LoadTables> method. It is considered I<new> if it | ||
74 : | was inserted by the L</InsertObject> method. | ||
75 : | |||
76 : | parrello | 1.8 | =head2 XML Database Description |
77 : | |||
78 : | =head3 Data Types | ||
79 : | |||
80 : | The ERDB system supports the following data types. Note that there are numerous string | ||
81 : | types depending on the maximum length. Some database packages limit the total number of | ||
82 : | characters you have in an index key; to insure the database works in all environments, | ||
83 : | the type of string should be the shortest one possible that supports all the known values. | ||
84 : | |||
85 : | =over 4 | ||
86 : | |||
87 : | =item char | ||
88 : | |||
89 : | single ASCII character | ||
90 : | |||
91 : | =item int | ||
92 : | |||
93 : | 32-bit signed integer | ||
94 : | |||
95 : | parrello | 1.60 | =item counter |
96 : | |||
97 : | 32-bit unsigned integer | ||
98 : | |||
99 : | parrello | 1.8 | =item date |
100 : | |||
101 : | 64-bit unsigned integer, representing a PERL date/time value | ||
102 : | |||
103 : | =item text | ||
104 : | |||
105 : | long string; Text fields cannot be used in indexes or sorting and do not support the | ||
106 : | normal syntax of filter clauses, but can be up to a billion character in length | ||
107 : | |||
108 : | =item float | ||
109 : | |||
110 : | double-precision floating-point number | ||
111 : | |||
112 : | =item boolean | ||
113 : | |||
114 : | single-bit numeric value; The value is stored as a 16-bit signed integer (for | ||
115 : | compatability with certain database packages), but the only values supported are | ||
116 : | 0 and 1. | ||
117 : | |||
118 : | parrello | 1.44 | =item id-string |
119 : | |||
120 : | variable-length string, maximum 25 characters | ||
121 : | |||
122 : | parrello | 1.8 | =item key-string |
123 : | |||
124 : | variable-length string, maximum 40 characters | ||
125 : | |||
126 : | =item name-string | ||
127 : | |||
128 : | variable-length string, maximum 80 characters | ||
129 : | |||
130 : | =item medium-string | ||
131 : | |||
132 : | variable-length string, maximum 160 characters | ||
133 : | |||
134 : | =item string | ||
135 : | |||
136 : | variable-length string, maximum 255 characters | ||
137 : | |||
138 : | parrello | 1.42 | =item hash-string |
139 : | |||
140 : | variable-length string, maximum 22 characters | ||
141 : | |||
142 : | parrello | 1.8 | =back |
143 : | |||
144 : | parrello | 1.42 | The hash-string data type has a special meaning. The actual key passed into the loader will |
145 : | be a string, but it will be digested into a 22-character MD5 code to save space. Although the | ||
146 : | MD5 algorithm is not perfect, it is extremely unlikely two strings will have the same | ||
147 : | digest. Therefore, it is presumed the keys will be unique. When the database is actually | ||
148 : | in use, the hashed keys will be presented rather than the original values. For this reason, | ||
149 : | they should not be used for entities where the key is meaningful. | ||
150 : | |||
151 : | parrello | 1.8 | =head3 Global Tags |
152 : | |||
153 : | The entire database definition must be inside a B<Database> tag. The display name of | ||
154 : | the database is given by the text associated with the B<Title> tag. The display name | ||
155 : | is only used in the automated documentation. It has no other effect. The entities and | ||
156 : | relationships are listed inside the B<Entities> and B<Relationships> tags, | ||
157 : | respectively. None of these tags have attributes. | ||
158 : | |||
159 : | parrello | 1.10 | <Database> |
160 : | <Title>... display title here...</Title> | ||
161 : | <Entities> | ||
162 : | ... entity definitions here ... | ||
163 : | </Entities> | ||
164 : | <Relationships> | ||
165 : | ... relationship definitions here... | ||
166 : | </Relationships> | ||
167 : | </Database> | ||
168 : | parrello | 1.8 | |
169 : | Entities, relationships, indexes, and fields all allow a text tag called B<Notes>. | ||
170 : | The text inside the B<Notes> tag contains comments that will appear when the database | ||
171 : | documentation is generated. Within a B<Notes> tag, you may use C<[i]> and C<[/i]> for | ||
172 : | italics, C<[b]> and C<[/b]> for bold, and C<[p]> for a new paragraph. | ||
173 : | |||
174 : | =head3 Fields | ||
175 : | |||
176 : | Both entities and relationships have fields described by B<Field> tags. A B<Field> | ||
177 : | tag can have B<Notes> associated with it. The complete set of B<Field> tags for an | ||
178 : | object mus be inside B<Fields> tags. | ||
179 : | |||
180 : | parrello | 1.10 | <Entity ... > |
181 : | <Fields> | ||
182 : | ... Field tags ... | ||
183 : | </Fields> | ||
184 : | </Entity> | ||
185 : | parrello | 1.8 | |
186 : | The attributes for the B<Field> tag are as follows. | ||
187 : | |||
188 : | =over 4 | ||
189 : | |||
190 : | =item name | ||
191 : | |||
192 : | Name of the field. The field name should contain only letters, digits, and hyphens (C<->), | ||
193 : | and the first character should be a letter. Most underlying databases are case-insensitive | ||
194 : | parrello | 1.70 | with the respect to field names, so a best practice is to use lower-case letters only. Finally, |
195 : | the name C<search-relevance> has special meaning for full-text searches and should not be | ||
196 : | used as a field name. | ||
197 : | parrello | 1.8 | |
198 : | =item type | ||
199 : | |||
200 : | Data type of the field. The legal data types are given above. | ||
201 : | |||
202 : | =item relation | ||
203 : | |||
204 : | Name of the relation containing the field. This should only be specified for entity | ||
205 : | fields. The ERDB system does not support optional fields or multi-occurring fields | ||
206 : | in the primary relation of an entity. Instead, they are put into secondary relations. | ||
207 : | So, for example, in the C<Genome> entity, the C<group-name> field indicates a special | ||
208 : | grouping used to select a subset of the genomes. A given genome may not be in any | ||
209 : | groups or may be in multiple groups. Therefore, C<group-name> specifies a relation | ||
210 : | value. The relation name specified must be a valid table name. By convention, it is | ||
211 : | usually the entity name followed by a qualifying word (e.g. C<GenomeGroup>). In an | ||
212 : | entity, the fields without a relation attribute are said to belong to the | ||
213 : | I<primary relation>. This relation has the same name as the entity itself. | ||
214 : | |||
215 : | parrello | 1.70 | =item searchable |
216 : | |||
217 : | If specified, then the field is a candidate for full-text searching. A single full-text | ||
218 : | index will be created for each relation with at least one searchable field in it. | ||
219 : | For best results, this option should only be used for string or text fields. | ||
220 : | |||
221 : | parrello | 1.73 | =item special |
222 : | |||
223 : | This attribute allows the subclass to assign special meaning for certain fields. | ||
224 : | The interpretation is up to the subclass itself. Currently, only entity fields | ||
225 : | can have this attribute. | ||
226 : | |||
227 : | parrello | 1.8 | =back |
228 : | |||
229 : | =head3 Indexes | ||
230 : | |||
231 : | An entity can have multiple alternate indexes associated with it. The fields must | ||
232 : | parrello | 1.74 | all be from the same relation. The alternate indexes assist in ordering results |
233 : | parrello | 1.8 | from a query. A relationship can have up to two indexes-- a I<to-index> and a |
234 : | I<from-index>. These order the results when crossing the relationship. For | ||
235 : | example, in the relationship C<HasContig> from C<Genome> to C<Contig>, the | ||
236 : | from-index would order the contigs of a ganome, and the to-index would order | ||
237 : | the genomes of a contig. A relationship's index must specify only fields in | ||
238 : | the relationship. | ||
239 : | |||
240 : | The indexes for an entity must be listed inside the B<Indexes> tag. The from-index | ||
241 : | of a relationship is specified using the B<FromIndex> tag; the to-index is specified | ||
242 : | using the B<ToIndex> tag. | ||
243 : | |||
244 : | Each index can contain a B<Notes> tag. In addition, it will have an B<IndexFields> | ||
245 : | tag containing the B<IndexField> tags. These specify, in order, the fields used in | ||
246 : | the index. The attributes of an B<IndexField> tag are as follows. | ||
247 : | |||
248 : | =over 4 | ||
249 : | |||
250 : | =item name | ||
251 : | |||
252 : | Name of the field. | ||
253 : | |||
254 : | =item order | ||
255 : | |||
256 : | Sort order of the field-- C<ascending> or C<descending>. | ||
257 : | |||
258 : | =back | ||
259 : | |||
260 : | The B<Index>, B<FromIndex>, and B<ToIndex> tags themselves have no attributes. | ||
261 : | |||
262 : | =head3 Object and Field Names | ||
263 : | |||
264 : | By convention entity and relationship names use capital casing (e.g. C<Genome> or | ||
265 : | C<HasRegionsIn>. Most underlying databases, however, are aggressively case-insensitive | ||
266 : | with respect to relation names, converting them internally to all-upper case or | ||
267 : | all-lower case. | ||
268 : | |||
269 : | If syntax or parsing errors occur when you try to load or use an ERDB database, the | ||
270 : | most likely reason is that one of your objects has an SQL reserved word as its name. | ||
271 : | The list of SQL reserved words keeps increasing; however, most are unlikely to show | ||
272 : | up as a noun or declarative verb phrase. The exceptions are C<Group>, C<User>, | ||
273 : | C<Table>, C<Index>, C<Object>, C<Date>, C<Number>, C<Update>, C<Time>, C<Percent>, | ||
274 : | C<Memo>, C<Order>, and C<Sum>. This problem can crop up in field names as well. | ||
275 : | |||
276 : | Every entity has a field called C<id> that acts as its primary key. Every relationship | ||
277 : | has fields called C<from-link> and C<to-link> that contain copies of the relevant | ||
278 : | entity IDs. These are essentially ERDB's reserved words, and should not be used | ||
279 : | for user-defined field names. | ||
280 : | |||
281 : | =head3 Entities | ||
282 : | |||
283 : | An entity is described by the B<Entity> tag. The entity can contain B<Notes>, an | ||
284 : | B<Indexes> tag containing one or more secondary indexes, and a B<Fields> tag | ||
285 : | containing one or more fields. The attributes of the B<Entity> tag are as follows. | ||
286 : | |||
287 : | =over 4 | ||
288 : | |||
289 : | =item name | ||
290 : | |||
291 : | Name of the entity. The entity name, by convention, uses capital casing (e.g. C<Genome> | ||
292 : | or C<GroupBlock>) and should be a noun or noun phrase. | ||
293 : | |||
294 : | =item keyType | ||
295 : | |||
296 : | Data type of the primary key. The primary key is always named C<id>. | ||
297 : | |||
298 : | =back | ||
299 : | |||
300 : | =head3 Relationships | ||
301 : | |||
302 : | A relationship is described by the C<Relationship> tag. Within a relationship, | ||
303 : | there can be a C<Notes> tag, a C<Fields> tag containing the intersection data | ||
304 : | fields, a C<FromIndex> tag containing the from-index, and a C<ToIndex> tag containing | ||
305 : | the to-index. | ||
306 : | |||
307 : | The C<Relationship> tag has the following attributes. | ||
308 : | |||
309 : | =over 4 | ||
310 : | |||
311 : | =item name | ||
312 : | |||
313 : | Name of the relationship. The relationship name, by convention, uses capital casing | ||
314 : | (e.g. C<ContainsRegionIn> or C<HasContig>), and should be a declarative verb | ||
315 : | phrase, designed to fit between the from-entity and the to-entity (e.g. | ||
316 : | Block C<ContainsRegionIn> Genome). | ||
317 : | |||
318 : | =item from | ||
319 : | |||
320 : | Name of the entity from which the relationship starts. | ||
321 : | |||
322 : | =item to | ||
323 : | |||
324 : | Name of the entity to which the relationship proceeds. | ||
325 : | |||
326 : | =item arity | ||
327 : | |||
328 : | Relationship type: C<1M> for one-to-many and C<MM> for many-to-many. | ||
329 : | |||
330 : | =back | ||
331 : | |||
332 : | parrello | 1.1 | =cut |
333 : | |||
334 : | # GLOBALS | ||
335 : | |||
336 : | # Table of information about our datatypes. "sqlType" is the corresponding SQL datatype string. | ||
337 : | # "maxLen" is the maximum permissible length of the incoming string data used to populate a field | ||
338 : | parrello | 1.73 | # of the specified type. "avgLen" is the average byte length for estimating |
339 : | parrello | 1.75 | # record sizes. "sort" is the key modifier for the sort command, "notes" is a type description, |
340 : | # and "indexMod", if non-zero, is the number of characters to use when the field is specified in an | ||
341 : | # index | ||
342 : | parrello | 1.73 | my %TypeTable = ( char => { sqlType => 'CHAR(1)', maxLen => 1, avgLen => 1, sort => "", |
343 : | parrello | 1.75 | indexMod => 0, notes => "single ASCII character"}, |
344 : | parrello | 1.73 | int => { sqlType => 'INTEGER', maxLen => 20, avgLen => 4, sort => "n", |
345 : | parrello | 1.75 | indexMod => 0, notes => "signed 32-bit integer"}, |
346 : | parrello | 1.73 | counter => { sqlType => 'INTEGER UNSIGNED', maxLen => 20, avgLen => 4, sort => "n", |
347 : | parrello | 1.75 | indexMod => 0, notes => "unsigned 32-bit integer"}, |
348 : | parrello | 1.73 | string => { sqlType => 'VARCHAR(255)', maxLen => 255, avgLen => 100, sort => "", |
349 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 255 characters"}, |
350 : | parrello | 1.73 | text => { sqlType => 'TEXT', maxLen => 1000000000, avgLen => 500, sort => "", |
351 : | parrello | 1.75 | indexMod => 255, notes => "character string, nearly unlimited length, only first 255 characters are indexed"}, |
352 : | parrello | 1.73 | date => { sqlType => 'BIGINT', maxLen => 80, avgLen => 8, sort => "n", |
353 : | parrello | 1.75 | indexMod => 0, notes => "signed, 64-bit integer"}, |
354 : | parrello | 1.73 | float => { sqlType => 'DOUBLE PRECISION', maxLen => 40, avgLen => 8, sort => "g", |
355 : | parrello | 1.75 | indexMod => 0, notes => "64-bit double precision floating-point number"}, |
356 : | parrello | 1.73 | boolean => { sqlType => 'SMALLINT', maxLen => 1, avgLen => 1, sort => "n", |
357 : | parrello | 1.75 | indexMod => 0, notes => "boolean value: 0 if false, 1 if true"}, |
358 : | parrello | 1.42 | 'hash-string' => |
359 : | parrello | 1.73 | { sqlType => 'VARCHAR(22)', maxLen => 22, avgLen => 22, sort => "", |
360 : | parrello | 1.75 | indexMod => 0, notes => "string stored in digested form, used for certain types of key fields"}, |
361 : | parrello | 1.44 | 'id-string' => |
362 : | parrello | 1.73 | { sqlType => 'VARCHAR(25)', maxLen => 25, avgLen => 25, sort => "", |
363 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 25 characters"}, |
364 : | parrello | 1.10 | 'key-string' => |
365 : | parrello | 1.73 | { sqlType => 'VARCHAR(40)', maxLen => 40, avgLen => 10, sort => "", |
366 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 40 characters"}, |
367 : | parrello | 1.10 | 'name-string' => |
368 : | parrello | 1.73 | { sqlType => 'VARCHAR(80)', maxLen => 80, avgLen => 40, sort => "", |
369 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 80 characters"}, |
370 : | parrello | 1.10 | 'medium-string' => |
371 : | parrello | 1.73 | { sqlType => 'VARCHAR(160)', maxLen => 160, avgLen => 40, sort => "", |
372 : | parrello | 1.75 | indexMod => 0, notes => "character string, 0 to 160 characters"}, |
373 : | parrello | 1.10 | ); |
374 : | parrello | 1.1 | |
375 : | # Table translating arities into natural language. | ||
376 : | my %ArityTable = ( '11' => 'one-to-one', | ||
377 : | parrello | 1.10 | '1M' => 'one-to-many', |
378 : | 'MM' => 'many-to-many' | ||
379 : | ); | ||
380 : | parrello | 1.1 | |
381 : | parrello | 1.73 | # Options for XML input and output. |
382 : | |||
383 : | my %XmlOptions = (GroupTags => { Relationships => 'Relationship', | ||
384 : | Entities => 'Entity', | ||
385 : | Fields => 'Field', | ||
386 : | Indexes => 'Index', | ||
387 : | IndexFields => 'IndexField' | ||
388 : | }, | ||
389 : | KeyAttr => { Relationship => 'name', | ||
390 : | Entity => 'name', | ||
391 : | Field => 'name' | ||
392 : | }, | ||
393 : | SuppressEmpty => 1, | ||
394 : | ); | ||
395 : | |||
396 : | my %XmlInOpts = ( | ||
397 : | ForceArray => ['Field', 'Index', 'IndexField'], | ||
398 : | ForceContent => 1, | ||
399 : | NormalizeSpace => 2, | ||
400 : | ); | ||
401 : | my %XmlOutOpts = ( | ||
402 : | RootName => 'Database', | ||
403 : | XMLDecl => 1, | ||
404 : | ); | ||
405 : | parrello | 1.1 | |
406 : | |||
407 : | =head2 Public Methods | ||
408 : | |||
409 : | =head3 new | ||
410 : | |||
411 : | parrello | 1.5 | C<< my $database = ERDB->new($dbh, $metaFileName); >> |
412 : | parrello | 1.1 | |
413 : | Create a new ERDB object. | ||
414 : | |||
415 : | =over 4 | ||
416 : | |||
417 : | =item dbh | ||
418 : | |||
419 : | DBKernel database object for the target database. | ||
420 : | |||
421 : | =item metaFileName | ||
422 : | |||
423 : | Name of the XML file containing the metadata. | ||
424 : | |||
425 : | =back | ||
426 : | |||
427 : | =cut | ||
428 : | |||
429 : | sub new { | ||
430 : | parrello | 1.10 | # Get the parameters. |
431 : | my ($class, $dbh, $metaFileName, $options) = @_; | ||
432 : | # Load the meta-data. | ||
433 : | my $metaData = _LoadMetaData($metaFileName); | ||
434 : | # Create the object. | ||
435 : | my $self = { _dbh => $dbh, | ||
436 : | _metaData => $metaData | ||
437 : | }; | ||
438 : | # Bless and return it. | ||
439 : | bless $self, $class; | ||
440 : | return $self; | ||
441 : | parrello | 1.1 | } |
442 : | |||
443 : | =head3 ShowMetaData | ||
444 : | |||
445 : | parrello | 1.18 | C<< $erdb->ShowMetaData($fileName); >> |
446 : | parrello | 1.1 | |
447 : | This method outputs a description of the database. This description can be used to help users create | ||
448 : | the data to be loaded into the relations. | ||
449 : | |||
450 : | =over 4 | ||
451 : | |||
452 : | =item filename | ||
453 : | |||
454 : | The name of the output file. | ||
455 : | |||
456 : | =back | ||
457 : | |||
458 : | =cut | ||
459 : | |||
460 : | sub ShowMetaData { | ||
461 : | parrello | 1.10 | # Get the parameters. |
462 : | my ($self, $filename) = @_; | ||
463 : | # Get the metadata and the title string. | ||
464 : | my $metadata = $self->{_metaData}; | ||
465 : | # Get the title string. | ||
466 : | my $title = $metadata->{Title}; | ||
467 : | # Get the entity and relationship lists. | ||
468 : | my $entityList = $metadata->{Entities}; | ||
469 : | my $relationshipList = $metadata->{Relationships}; | ||
470 : | # Open the output file. | ||
471 : | open(HTMLOUT, ">$filename") || Confess("Could not open MetaData display file $filename: $!"); | ||
472 : | Trace("Building MetaData table of contents.") if T(4); | ||
473 : | # Write the HTML heading stuff. | ||
474 : | print HTMLOUT "<html>\n<head>\n<title>$title</title>\n"; | ||
475 : | print HTMLOUT "</head>\n<body>\n"; | ||
476 : | parrello | 1.45 | # Write the documentation. |
477 : | print HTMLOUT $self->DisplayMetaData(); | ||
478 : | # Close the document. | ||
479 : | print HTMLOUT "</body>\n</html>\n"; | ||
480 : | # Close the file. | ||
481 : | close HTMLOUT; | ||
482 : | } | ||
483 : | |||
484 : | =head3 DisplayMetaData | ||
485 : | |||
486 : | C<< my $html = $erdb->DisplayMetaData(); >> | ||
487 : | |||
488 : | Return an HTML description of the database. This description can be used to help users create | ||
489 : | the data to be loaded into the relations and form queries. The output is raw includable HTML | ||
490 : | without any HEAD or BODY tags. | ||
491 : | |||
492 : | =over 4 | ||
493 : | |||
494 : | =item filename | ||
495 : | |||
496 : | The name of the output file. | ||
497 : | |||
498 : | =back | ||
499 : | |||
500 : | =cut | ||
501 : | |||
502 : | sub DisplayMetaData { | ||
503 : | # Get the parameters. | ||
504 : | my ($self) = @_; | ||
505 : | # Get the metadata and the title string. | ||
506 : | my $metadata = $self->{_metaData}; | ||
507 : | # Get the title string. | ||
508 : | my $title = $metadata->{Title}; | ||
509 : | # Get the entity and relationship lists. | ||
510 : | my $entityList = $metadata->{Entities}; | ||
511 : | my $relationshipList = $metadata->{Relationships}; | ||
512 : | # Declare the return variable. | ||
513 : | my $retVal = ""; | ||
514 : | # Open the output file. | ||
515 : | Trace("Building MetaData table of contents.") if T(4); | ||
516 : | parrello | 1.10 | # Here we do the table of contents. It starts as an unordered list of section names. Each |
517 : | # section contains an ordered list of entity or relationship subsections. | ||
518 : | parrello | 1.45 | $retVal .= "<ul>\n<li><a href=\"#EntitiesSection\">Entities</a>\n<ol>\n"; |
519 : | parrello | 1.10 | # Loop through the Entities, displaying a list item for each. |
520 : | foreach my $key (sort keys %{$entityList}) { | ||
521 : | # Display this item. | ||
522 : | parrello | 1.45 | $retVal .= "<li><a href=\"#$key\">$key</a></li>\n"; |
523 : | parrello | 1.10 | } |
524 : | # Close off the entity section and start the relationship section. | ||
525 : | parrello | 1.45 | $retVal .= "</ol></li>\n<li><a href=\"#RelationshipsSection\">Relationships</a>\n<ol>\n"; |
526 : | parrello | 1.10 | # Loop through the Relationships. |
527 : | foreach my $key (sort keys %{$relationshipList}) { | ||
528 : | # Display this item. | ||
529 : | my $relationshipTitle = _ComputeRelationshipSentence($key, $relationshipList->{$key}); | ||
530 : | parrello | 1.45 | $retVal .= "<li><a href=\"#$key\">$relationshipTitle</a></li>\n"; |
531 : | parrello | 1.10 | } |
532 : | # Close off the relationship section and list the join table section. | ||
533 : | parrello | 1.45 | $retVal .= "</ol></li>\n<li><a href=\"#JoinTable\">Join Table</a></li>\n"; |
534 : | parrello | 1.10 | # Close off the table of contents itself. |
535 : | parrello | 1.45 | $retVal .= "</ul>\n"; |
536 : | parrello | 1.10 | # Now we start with the actual data. Denote we're starting the entity section. |
537 : | parrello | 1.45 | $retVal .= "<a name=\"EntitiesSection\"></a><h2>Entities</h2>\n"; |
538 : | parrello | 1.10 | # Loop through the entities. |
539 : | for my $key (sort keys %{$entityList}) { | ||
540 : | Trace("Building MetaData entry for $key entity.") if T(4); | ||
541 : | # Create the entity header. It contains a bookmark and the entity name. | ||
542 : | parrello | 1.45 | $retVal .= "<a name=\"$key\"></a><h3>$key</h3>\n"; |
543 : | parrello | 1.10 | # Get the entity data. |
544 : | my $entityData = $entityList->{$key}; | ||
545 : | # If there's descriptive text, display it. | ||
546 : | if (my $notes = $entityData->{Notes}) { | ||
547 : | parrello | 1.73 | $retVal .= "<p>" . HTMLNote($notes->{content}) . "</p>\n"; |
548 : | parrello | 1.10 | } |
549 : | # Now we want a list of the entity's relationships. First, we set up the relationship subsection. | ||
550 : | parrello | 1.45 | $retVal .= "<h4>Relationships for <b>$key</b></h4>\n<ul>\n"; |
551 : | parrello | 1.10 | # Loop through the relationships. |
552 : | for my $relationship (sort keys %{$relationshipList}) { | ||
553 : | # Get the relationship data. | ||
554 : | my $relationshipStructure = $relationshipList->{$relationship}; | ||
555 : | # Only use the relationship if if has this entity in its FROM or TO fields. | ||
556 : | if ($relationshipStructure->{from} eq $key || $relationshipStructure->{to} eq $key) { | ||
557 : | # Get the relationship sentence and append the arity. | ||
558 : | my $relationshipDescription = _ComputeRelationshipSentence($relationship, $relationshipStructure); | ||
559 : | # Display the relationship data. | ||
560 : | parrello | 1.45 | $retVal .= "<li><a href=\"#$relationship\">$relationshipDescription</a></li>\n"; |
561 : | parrello | 1.10 | } |
562 : | } | ||
563 : | # Close off the relationship list. | ||
564 : | parrello | 1.45 | $retVal .= "</ul>\n"; |
565 : | parrello | 1.10 | # Get the entity's relations. |
566 : | my $relationList = $entityData->{Relations}; | ||
567 : | # Create a header for the relation subsection. | ||
568 : | parrello | 1.45 | $retVal .= "<h4>Relations for <b>$key</b></h4>\n"; |
569 : | parrello | 1.10 | # Loop through the relations, displaying them. |
570 : | for my $relation (sort keys %{$relationList}) { | ||
571 : | my $htmlString = _ShowRelationTable($relation, $relationList->{$relation}); | ||
572 : | parrello | 1.45 | $retVal .= $htmlString; |
573 : | parrello | 1.10 | } |
574 : | } | ||
575 : | # Denote we're starting the relationship section. | ||
576 : | parrello | 1.45 | $retVal .= "<a name=\"RelationshipsSection\"></a><h2>Relationships</h2>\n"; |
577 : | parrello | 1.10 | # Loop through the relationships. |
578 : | for my $key (sort keys %{$relationshipList}) { | ||
579 : | Trace("Building MetaData entry for $key relationship.") if T(4); | ||
580 : | # Get the relationship's structure. | ||
581 : | my $relationshipStructure = $relationshipList->{$key}; | ||
582 : | # Create the relationship header. | ||
583 : | my $headerText = _ComputeRelationshipHeading($key, $relationshipStructure); | ||
584 : | parrello | 1.45 | $retVal .= "<h3><a name=\"$key\"></a>$headerText</h3>\n"; |
585 : | parrello | 1.10 | # Get the entity names. |
586 : | my $fromEntity = $relationshipStructure->{from}; | ||
587 : | my $toEntity = $relationshipStructure->{to}; | ||
588 : | # Describe the relationship arity. Note there's a bit of trickiness involving recursive | ||
589 : | # many-to-many relationships. In a normal many-to-many we use two sentences to describe | ||
590 : | # the arity (one for each direction). This is a bad idea for a recursive relationship, | ||
591 : | # since both sentences will say the same thing. | ||
592 : | my $arity = $relationshipStructure->{arity}; | ||
593 : | if ($arity eq "11") { | ||
594 : | parrello | 1.45 | $retVal .= "<p>Each <b>$fromEntity</b> relates to at most one <b>$toEntity</b>.\n"; |
595 : | parrello | 1.10 | } else { |
596 : | parrello | 1.45 | $retVal .= "<p>Each <b>$fromEntity</b> relates to multiple <b>$toEntity</b>s.\n"; |
597 : | parrello | 1.10 | if ($arity eq "MM" && $fromEntity ne $toEntity) { |
598 : | parrello | 1.45 | $retVal .= "Each <b>$toEntity</b> relates to multiple <b>$fromEntity</b>s.\n"; |
599 : | parrello | 1.10 | } |
600 : | } | ||
601 : | parrello | 1.45 | $retVal .= "</p>\n"; |
602 : | parrello | 1.10 | # If there are notes on this relationship, display them. |
603 : | if (my $notes = $relationshipStructure->{Notes}) { | ||
604 : | parrello | 1.73 | $retVal .= "<p>" . HTMLNote($notes->{content}) . "</p>\n"; |
605 : | parrello | 1.10 | } |
606 : | # Generate the relationship's relation table. | ||
607 : | my $htmlString = _ShowRelationTable($key, $relationshipStructure->{Relations}->{$key}); | ||
608 : | parrello | 1.45 | $retVal .= $htmlString; |
609 : | parrello | 1.10 | } |
610 : | Trace("Building MetaData join table.") if T(4); | ||
611 : | # Denote we're starting the join table. | ||
612 : | parrello | 1.45 | $retVal .= "<a name=\"JoinTable\"></a><h3>Join Table</h3>\n"; |
613 : | parrello | 1.10 | # Create a table header. |
614 : | parrello | 1.45 | $retVal .= _OpenTable("Join Table", "Source", "Target", "Join Condition"); |
615 : | parrello | 1.10 | # Loop through the joins. |
616 : | my $joinTable = $metadata->{Joins}; | ||
617 : | my @joinKeys = keys %{$joinTable}; | ||
618 : | for my $joinKey (sort @joinKeys) { | ||
619 : | # Separate out the source, the target, and the join clause. | ||
620 : | $joinKey =~ m!^([^/]+)/(.+)$!; | ||
621 : | my ($sourceRelation, $targetRelation) = ($1, $2); | ||
622 : | parrello | 1.30 | Trace("Join with key $joinKey is from $sourceRelation to $targetRelation.") if T(Joins => 4); |
623 : | parrello | 1.10 | my $source = $self->ComputeObjectSentence($sourceRelation); |
624 : | my $target = $self->ComputeObjectSentence($targetRelation); | ||
625 : | my $clause = $joinTable->{$joinKey}; | ||
626 : | # Display them in a table row. | ||
627 : | parrello | 1.45 | $retVal .= "<tr><td>$source</td><td>$target</td><td>$clause</td></tr>\n"; |
628 : | parrello | 1.10 | } |
629 : | # Close the table. | ||
630 : | parrello | 1.45 | $retVal .= _CloseTable(); |
631 : | Trace("Built MetaData HTML.") if T(3); | ||
632 : | # Return the HTML. | ||
633 : | return $retVal; | ||
634 : | parrello | 1.1 | } |
635 : | |||
636 : | =head3 DumpMetaData | ||
637 : | |||
638 : | parrello | 1.18 | C<< $erdb->DumpMetaData(); >> |
639 : | parrello | 1.1 | |
640 : | Return a dump of the metadata structure. | ||
641 : | |||
642 : | =cut | ||
643 : | |||
644 : | sub DumpMetaData { | ||
645 : | parrello | 1.10 | # Get the parameters. |
646 : | my ($self) = @_; | ||
647 : | # Dump the meta-data. | ||
648 : | return Data::Dumper::Dumper($self->{_metaData}); | ||
649 : | parrello | 1.1 | } |
650 : | |||
651 : | parrello | 1.74 | =head3 FindIndexForEntity |
652 : | |||
653 : | C<< my $indexFound = ERDB::FindIndexForEntity($xml, $entityName, $attributeName); >> | ||
654 : | |||
655 : | This method locates the entry in an entity's index list that begins with the | ||
656 : | specified attribute name. If the entity has no index list, one will be | ||
657 : | created. This method works on raw XML, not a live ERDB object. | ||
658 : | |||
659 : | =over 4 | ||
660 : | |||
661 : | =item xml | ||
662 : | |||
663 : | The raw XML structure defining the database. | ||
664 : | |||
665 : | =item entityName | ||
666 : | |||
667 : | The name of the relevant entity. | ||
668 : | |||
669 : | =item attributeName | ||
670 : | |||
671 : | The name of the attribute relevant to the search. | ||
672 : | |||
673 : | =item RETURN | ||
674 : | |||
675 : | The numerical index in the index list of the index entry for the specified entity and | ||
676 : | attribute, or C<undef> if no such index exists. | ||
677 : | |||
678 : | =back | ||
679 : | |||
680 : | =cut | ||
681 : | |||
682 : | sub FindIndexForEntity { | ||
683 : | # Get the parameters. | ||
684 : | my ($xml, $entityName, $attributeName) = @_; | ||
685 : | # Declare the return variable. | ||
686 : | my $retVal; | ||
687 : | # Get the named entity. | ||
688 : | my $entityData = $xml->{Entities}->{$entityName}; | ||
689 : | if (! $entityData) { | ||
690 : | Confess("Entity $entityName not found in DBD structure."); | ||
691 : | } else { | ||
692 : | # Insure it has an index list. | ||
693 : | if (! exists $entityData->{Indexes}) { | ||
694 : | $entityData->{Indexes} = []; | ||
695 : | } else { | ||
696 : | # Search for the desired index. | ||
697 : | my $indexList = $entityData->{Indexes}; | ||
698 : | my $n = scalar @{$indexList}; | ||
699 : | Trace("Searching $n indexes in index list for $entityName.") if T(2); | ||
700 : | # We use an indexed FOR here because we're returning an | ||
701 : | # index number instead of an object. We do THAT so we can | ||
702 : | # delete the index from the list if needed. | ||
703 : | for (my $i = 0; $i < $n && !defined($retVal); $i++) { | ||
704 : | my $index = $indexList->[$i]; | ||
705 : | my $fields = $index->{IndexFields}; | ||
706 : | # Technically this IF should be safe (that is, we are guaranteed | ||
707 : | # the existence of a "$fields->[0]"), because when we load the XML | ||
708 : | # we have SuppressEmpty specified. | ||
709 : | if ($fields->[0]->{name} eq $attributeName) { | ||
710 : | $retVal = $i; | ||
711 : | } | ||
712 : | } | ||
713 : | } | ||
714 : | } | ||
715 : | Trace("Index for $attributeName of $entityName found at position $retVal.") if defined($retVal) && T(3); | ||
716 : | Trace("Index for $attributeName not found in $entityName.") if !defined($retVal) && T(3); | ||
717 : | # Return the result. | ||
718 : | return $retVal; | ||
719 : | } | ||
720 : | |||
721 : | parrello | 1.1 | =head3 CreateTables |
722 : | |||
723 : | parrello | 1.18 | C<< $erdb->CreateTables(); >> |
724 : | parrello | 1.1 | |
725 : | This method creates the tables for the database from the metadata structure loaded by the | ||
726 : | constructor. It is expected this function will only be used on rare occasions, when the | ||
727 : | parrello | 1.2 | user needs to start with an empty database. Otherwise, the L</LoadTables> method can be |
728 : | parrello | 1.1 | used by itself with the truncate flag turned on. |
729 : | |||
730 : | =cut | ||
731 : | |||
732 : | sub CreateTables { | ||
733 : | parrello | 1.10 | # Get the parameters. |
734 : | my ($self) = @_; | ||
735 : | parrello | 1.23 | # Get the relation names. |
736 : | my @relNames = $self->GetTableNames(); | ||
737 : | # Loop through the relations. | ||
738 : | for my $relationName (@relNames) { | ||
739 : | # Create a table for this relation. | ||
740 : | $self->CreateTable($relationName); | ||
741 : | Trace("Relation $relationName created.") if T(2); | ||
742 : | parrello | 1.10 | } |
743 : | parrello | 1.1 | } |
744 : | |||
745 : | =head3 CreateTable | ||
746 : | |||
747 : | parrello | 1.18 | C<< $erdb->CreateTable($tableName, $indexFlag, $estimatedRows); >> |
748 : | parrello | 1.1 | |
749 : | Create the table for a relation and optionally create its indexes. | ||
750 : | |||
751 : | =over 4 | ||
752 : | |||
753 : | =item relationName | ||
754 : | |||
755 : | Name of the relation (which will also be the table name). | ||
756 : | |||
757 : | parrello | 1.18 | =item indexFlag |
758 : | parrello | 1.1 | |
759 : | TRUE if the indexes for the relation should be created, else FALSE. If FALSE, | ||
760 : | L</CreateIndexes> must be called later to bring the indexes into existence. | ||
761 : | |||
762 : | parrello | 1.18 | =item estimatedRows (optional) |
763 : | |||
764 : | If specified, the estimated maximum number of rows for the relation. This | ||
765 : | information allows the creation of tables using storage engines that are | ||
766 : | faster but require size estimates, such as MyISAM. | ||
767 : | |||
768 : | parrello | 1.1 | =back |
769 : | |||
770 : | =cut | ||
771 : | |||
772 : | sub CreateTable { | ||
773 : | parrello | 1.10 | # Get the parameters. |
774 : | parrello | 1.18 | my ($self, $relationName, $indexFlag, $estimatedRows) = @_; |
775 : | parrello | 1.10 | # Get the database handle. |
776 : | my $dbh = $self->{_dbh}; | ||
777 : | # Get the relation data and determine whether or not the relation is primary. | ||
778 : | my $relationData = $self->_FindRelation($relationName); | ||
779 : | my $rootFlag = $self->_IsPrimary($relationName); | ||
780 : | # Create a list of the field data. | ||
781 : | my @fieldList; | ||
782 : | for my $fieldData (@{$relationData->{Fields}}) { | ||
783 : | # Assemble the field name and type. | ||
784 : | my $fieldName = _FixName($fieldData->{name}); | ||
785 : | my $fieldString = "$fieldName $TypeTable{$fieldData->{type}}->{sqlType} NOT NULL "; | ||
786 : | # Push the result into the field list. | ||
787 : | push @fieldList, $fieldString; | ||
788 : | } | ||
789 : | # If this is a root table, add the "new_record" flag. It defaults to 0, so | ||
790 : | if ($rootFlag) { | ||
791 : | push @fieldList, "new_record $TypeTable{boolean}->{sqlType} NOT NULL DEFAULT 0"; | ||
792 : | } | ||
793 : | # Convert the field list into a comma-delimited string. | ||
794 : | my $fieldThing = join(', ', @fieldList); | ||
795 : | # Insure the table is not already there. | ||
796 : | $dbh->drop_table(tbl => $relationName); | ||
797 : | Trace("Table $relationName dropped.") if T(2); | ||
798 : | parrello | 1.18 | # If there are estimated rows, create an estimate so we can take advantage of |
799 : | # faster DB technologies. | ||
800 : | my $estimation = undef; | ||
801 : | if ($estimatedRows) { | ||
802 : | $estimation = [$self->EstimateRowSize($relationName), $estimatedRows]; | ||
803 : | } | ||
804 : | parrello | 1.10 | # Create the table. |
805 : | Trace("Creating table $relationName: $fieldThing") if T(2); | ||
806 : | parrello | 1.18 | $dbh->create_table(tbl => $relationName, flds => $fieldThing, estimates => $estimation); |
807 : | parrello | 1.10 | Trace("Relation $relationName created in database.") if T(2); |
808 : | parrello | 1.70 | # If we want to build the indexes, we do it here. Note that the full-text search |
809 : | # index will not be built until the table has been loaded. | ||
810 : | parrello | 1.10 | if ($indexFlag) { |
811 : | $self->CreateIndex($relationName); | ||
812 : | } | ||
813 : | parrello | 1.1 | } |
814 : | |||
815 : | parrello | 1.31 | =head3 VerifyFields |
816 : | |||
817 : | C<< my $count = $erdb->VerifyFields($relName, \@fieldList); >> | ||
818 : | |||
819 : | Run through the list of proposed field values, insuring that all the character fields are | ||
820 : | below the maximum length. If any fields are too long, they will be truncated in place. | ||
821 : | |||
822 : | =over 4 | ||
823 : | |||
824 : | =item relName | ||
825 : | |||
826 : | Name of the relation for which the specified fields are destined. | ||
827 : | |||
828 : | =item fieldList | ||
829 : | |||
830 : | Reference to a list, in order, of the fields to be put into the relation. | ||
831 : | |||
832 : | =item RETURN | ||
833 : | |||
834 : | Returns the number of fields truncated. | ||
835 : | |||
836 : | =back | ||
837 : | |||
838 : | =cut | ||
839 : | |||
840 : | sub VerifyFields { | ||
841 : | # Get the parameters. | ||
842 : | my ($self, $relName, $fieldList) = @_; | ||
843 : | # Initialize the return value. | ||
844 : | my $retVal = 0; | ||
845 : | # Get the relation definition. | ||
846 : | my $relData = $self->_FindRelation($relName); | ||
847 : | # Get the list of field descriptors. | ||
848 : | my $fieldTypes = $relData->{Fields}; | ||
849 : | my $fieldCount = scalar @{$fieldTypes}; | ||
850 : | # Loop through the two lists. | ||
851 : | for (my $i = 0; $i < $fieldCount; $i++) { | ||
852 : | # Get the type of the current field. | ||
853 : | my $fieldType = $fieldTypes->[$i]->{type}; | ||
854 : | # If it's a character field, verify the length. | ||
855 : | if ($fieldType =~ /string/) { | ||
856 : | my $maxLen = $TypeTable{$fieldType}->{maxLen}; | ||
857 : | my $oldString = $fieldList->[$i]; | ||
858 : | if (length($oldString) > $maxLen) { | ||
859 : | # Here it's too big, so we truncate it. | ||
860 : | Trace("Truncating field $i in relation $relName to $maxLen characters from \"$oldString\".") if T(1); | ||
861 : | $fieldList->[$i] = substr $oldString, 0, $maxLen; | ||
862 : | $retVal++; | ||
863 : | } | ||
864 : | } | ||
865 : | } | ||
866 : | # Return the truncation count. | ||
867 : | return $retVal; | ||
868 : | } | ||
869 : | |||
870 : | parrello | 1.42 | =head3 DigestFields |
871 : | |||
872 : | C<< $erdb->DigestFields($relName, $fieldList); >> | ||
873 : | |||
874 : | Digest the strings in the field list that correspond to data type C<hash-string> in the | ||
875 : | specified relation. | ||
876 : | |||
877 : | =over 4 | ||
878 : | |||
879 : | =item relName | ||
880 : | |||
881 : | Name of the relation to which the fields belong. | ||
882 : | |||
883 : | =item fieldList | ||
884 : | |||
885 : | List of field contents to be loaded into the relation. | ||
886 : | |||
887 : | =back | ||
888 : | |||
889 : | =cut | ||
890 : | #: Return Type ; | ||
891 : | sub DigestFields { | ||
892 : | # Get the parameters. | ||
893 : | my ($self, $relName, $fieldList) = @_; | ||
894 : | # Get the relation definition. | ||
895 : | my $relData = $self->_FindRelation($relName); | ||
896 : | # Get the list of field descriptors. | ||
897 : | my $fieldTypes = $relData->{Fields}; | ||
898 : | my $fieldCount = scalar @{$fieldTypes}; | ||
899 : | # Loop through the two lists. | ||
900 : | for (my $i = 0; $i < $fieldCount; $i++) { | ||
901 : | # Get the type of the current field. | ||
902 : | my $fieldType = $fieldTypes->[$i]->{type}; | ||
903 : | # If it's a hash string, digest it in place. | ||
904 : | if ($fieldType eq 'hash-string') { | ||
905 : | parrello | 1.46 | $fieldList->[$i] = $self->DigestKey($fieldList->[$i]); |
906 : | parrello | 1.42 | } |
907 : | } | ||
908 : | } | ||
909 : | |||
910 : | parrello | 1.46 | =head3 DigestKey |
911 : | |||
912 : | C<< my $digested = $erdb->DigestKey($keyValue); >> | ||
913 : | |||
914 : | Return the digested value of a symbolic key. The digested value can then be plugged into a | ||
915 : | key-based search into a table with key-type hash-string. | ||
916 : | |||
917 : | Currently the digesting process is independent of the database structure, but that may not | ||
918 : | always be the case, so this is an instance method instead of a static method. | ||
919 : | |||
920 : | =over 4 | ||
921 : | |||
922 : | =item keyValue | ||
923 : | |||
924 : | Key value to digest. | ||
925 : | |||
926 : | =item RETURN | ||
927 : | |||
928 : | parrello | 1.56 | Digested value of the key. |
929 : | parrello | 1.46 | |
930 : | =back | ||
931 : | |||
932 : | =cut | ||
933 : | |||
934 : | sub DigestKey { | ||
935 : | # Get the parameters. | ||
936 : | my ($self, $keyValue) = @_; | ||
937 : | # Compute the digest. | ||
938 : | my $retVal = md5_base64($keyValue); | ||
939 : | # Return the result. | ||
940 : | return $retVal; | ||
941 : | } | ||
942 : | |||
943 : | parrello | 1.1 | =head3 CreateIndex |
944 : | |||
945 : | parrello | 1.18 | C<< $erdb->CreateIndex($relationName); >> |
946 : | parrello | 1.1 | |
947 : | Create the indexes for a relation. If a table is being loaded from a large source file (as | ||
948 : | parrello | 1.12 | is the case in L</LoadTable>), it is sometimes best to create the indexes after the load. |
949 : | If that is the case, then L</CreateTable> should be called with the index flag set to | ||
950 : | FALSE, and this method used after the load to create the indexes for the table. | ||
951 : | parrello | 1.1 | |
952 : | =cut | ||
953 : | |||
954 : | sub CreateIndex { | ||
955 : | parrello | 1.10 | # Get the parameters. |
956 : | my ($self, $relationName) = @_; | ||
957 : | # Get the relation's descriptor. | ||
958 : | my $relationData = $self->_FindRelation($relationName); | ||
959 : | # Get the database handle. | ||
960 : | my $dbh = $self->{_dbh}; | ||
961 : | # Now we need to create this relation's indexes. We do this by looping through its index table. | ||
962 : | my $indexHash = $relationData->{Indexes}; | ||
963 : | for my $indexName (keys %{$indexHash}) { | ||
964 : | my $indexData = $indexHash->{$indexName}; | ||
965 : | # Get the index's field list. | ||
966 : | parrello | 1.75 | my @rawFields = @{$indexData->{IndexFields}}; |
967 : | # Get a hash of the relation's field types. | ||
968 : | my %types = map { $_->{name} => $_->{type} } @{$relationData->{Fields}}; | ||
969 : | # We need to check for text fields. We need a append a length limitation for them. To do | ||
970 : | # that, we need the relation's field list. | ||
971 : | my $relFields = $relationData->{Fields}; | ||
972 : | for (my $i = 0; $i <= $#rawFields; $i++) { | ||
973 : | # Get the field type. | ||
974 : | my $field = $rawFields[$i]; | ||
975 : | my $type = $types{$field}; | ||
976 : | # Ask if it requires using prefix notation for the index. | ||
977 : | my $mod = $TypeTable{$type}->{indexMod}; | ||
978 : | Trace("Field $field ($i) in $relationName has type $type and indexMod $mod.") if T(3); | ||
979 : | if ($mod) { | ||
980 : | # Append the prefix length to the field name, | ||
981 : | $rawFields[$i] .= "($mod)"; | ||
982 : | } | ||
983 : | } | ||
984 : | my @fieldList = _FixNames(@rawFields); | ||
985 : | parrello | 1.10 | my $flds = join(', ', @fieldList); |
986 : | # Get the index's uniqueness flag. | ||
987 : | parrello | 1.70 | my $unique = (exists $indexData->{Unique} ? 'unique' : undef); |
988 : | parrello | 1.10 | # Create the index. |
989 : | parrello | 1.24 | my $rv = $dbh->create_index(idx => $indexName, tbl => $relationName, |
990 : | parrello | 1.70 | flds => $flds, kind => $unique); |
991 : | parrello | 1.24 | if ($rv) { |
992 : | Trace("Index created: $indexName for $relationName ($flds)") if T(1); | ||
993 : | } else { | ||
994 : | Confess("Error creating index $indexName for $relationName using ($flds): " . $dbh->error_message()); | ||
995 : | } | ||
996 : | parrello | 1.10 | } |
997 : | parrello | 1.1 | } |
998 : | |||
999 : | parrello | 1.75 | =head3 GetSecondaryFields |
1000 : | |||
1001 : | C<< my %fieldTuples = $erdb->GetSecondaryFields($entityName); >> | ||
1002 : | |||
1003 : | This method will return a list of the name and type of each of the secondary | ||
1004 : | fields for a specified entity. Secondary fields are stored in two-column tables | ||
1005 : | in addition to the primary entity table. This enables the field to have no value | ||
1006 : | or to have multiple values. | ||
1007 : | |||
1008 : | =over 4 | ||
1009 : | |||
1010 : | =item entityName | ||
1011 : | |||
1012 : | Name of the entity whose secondary fields are desired. | ||
1013 : | |||
1014 : | =item RETURN | ||
1015 : | |||
1016 : | Returns a hash mapping the field names to their field types. | ||
1017 : | |||
1018 : | =back | ||
1019 : | |||
1020 : | =cut | ||
1021 : | |||
1022 : | sub GetSecondaryFields { | ||
1023 : | # Get the parameters. | ||
1024 : | my ($self, $entityName) = @_; | ||
1025 : | # Declare the return variable. | ||
1026 : | my %retVal = (); | ||
1027 : | # Look for the entity. | ||
1028 : | my $table = $self->GetFieldTable($entityName); | ||
1029 : | # Loop through the fields, pulling out the secondaries. | ||
1030 : | for my $field (sort keys %{$table}) { | ||
1031 : | if ($table->{$field}->{relation} ne $entityName) { | ||
1032 : | # Here we have a secondary field. | ||
1033 : | $retVal{$field} = $table->{$field}->{type}; | ||
1034 : | } | ||
1035 : | } | ||
1036 : | # Return the result. | ||
1037 : | return %retVal; | ||
1038 : | } | ||
1039 : | |||
1040 : | =head3 GetFieldRelationName | ||
1041 : | |||
1042 : | C<< my $name = $erdb->GetFieldRelationName($objectName, $fieldName); >> | ||
1043 : | |||
1044 : | Return the name of the relation containing a specified field. | ||
1045 : | |||
1046 : | =over 4 | ||
1047 : | |||
1048 : | =item objectName | ||
1049 : | |||
1050 : | Name of the entity or relationship containing the field. | ||
1051 : | |||
1052 : | =item fieldName | ||
1053 : | |||
1054 : | Name of the relevant field in that entity or relationship. | ||
1055 : | |||
1056 : | =item RETURN | ||
1057 : | |||
1058 : | Returns the name of the database relation containing the field, or C<undef> if | ||
1059 : | the field does not exist. | ||
1060 : | |||
1061 : | =back | ||
1062 : | |||
1063 : | =cut | ||
1064 : | |||
1065 : | sub GetFieldRelationName { | ||
1066 : | # Get the parameters. | ||
1067 : | my ($self, $objectName, $fieldName) = @_; | ||
1068 : | # Declare the return variable. | ||
1069 : | my $retVal; | ||
1070 : | # Get the object field table. | ||
1071 : | my $table = $self->GetFieldTable($objectName); | ||
1072 : | # Only proceed if the field exists. | ||
1073 : | if (exists $table->{$fieldName}) { | ||
1074 : | # Determine the name of the relation that contains this field. | ||
1075 : | $retVal = $table->{$fieldName}->{relation}; | ||
1076 : | } | ||
1077 : | # Return the result. | ||
1078 : | return $retVal; | ||
1079 : | } | ||
1080 : | |||
1081 : | =head3 DeleteValue | ||
1082 : | |||
1083 : | C<< my $numDeleted = $erdb->DeleteValue($entityName, $id, $fieldName, $fieldValue); >> | ||
1084 : | |||
1085 : | Delete secondary field values from the database. This method can be used to delete all | ||
1086 : | values of a specified field for a particular entity instance, or only a single value. | ||
1087 : | |||
1088 : | Secondary fields are stored in two-column relations separate from an entity's primary | ||
1089 : | table, and as a result a secondary field can legitimately have no value or multiple | ||
1090 : | values. Therefore, it makes sense to talk about deleting secondary fields where it | ||
1091 : | would not make sense for primary fields. | ||
1092 : | |||
1093 : | =over 4 | ||
1094 : | |||
1095 : | =item entityName | ||
1096 : | |||
1097 : | Name of the entity from which the fields are to be deleted. | ||
1098 : | |||
1099 : | =item id | ||
1100 : | |||
1101 : | ID of the entity instance to be processed. If the instance is not found, this | ||
1102 : | parrello | 1.76 | method will have no effect. If C<undef> is specified, all values for all of |
1103 : | the entity instances will be deleted. | ||
1104 : | parrello | 1.75 | |
1105 : | =item fieldName | ||
1106 : | |||
1107 : | Name of the field whose values are to be deleted. | ||
1108 : | |||
1109 : | =item fieldValue (optional) | ||
1110 : | |||
1111 : | Value to be deleted. If not specified, then all values of the specified field | ||
1112 : | will be deleted for the entity instance. If specified, then only the values which | ||
1113 : | match this parameter will be deleted. | ||
1114 : | |||
1115 : | =item RETURN | ||
1116 : | |||
1117 : | Returns the number of rows deleted. | ||
1118 : | |||
1119 : | =back | ||
1120 : | |||
1121 : | =cut | ||
1122 : | |||
1123 : | sub DeleteValue { | ||
1124 : | # Get the parameters. | ||
1125 : | my ($self, $entityName, $id, $fieldName, $fieldValue) = @_; | ||
1126 : | # Declare the return value. | ||
1127 : | my $retVal = 0; | ||
1128 : | # We need to set up an SQL command to do the deletion. First, we | ||
1129 : | # find the name of the field's relation. | ||
1130 : | my $table = $self->GetFieldTable($entityName); | ||
1131 : | my $field = $table->{$fieldName}; | ||
1132 : | my $relation = $field->{relation}; | ||
1133 : | # Make sure this is a secondary field. | ||
1134 : | if ($relation eq $entityName) { | ||
1135 : | Confess("Cannot delete values of $fieldName for $entityName."); | ||
1136 : | } else { | ||
1137 : | # Set up the SQL command to delete all values. | ||
1138 : | parrello | 1.76 | my $sql = "DELETE FROM $relation"; |
1139 : | # Build the filter. | ||
1140 : | my @filters = (); | ||
1141 : | my @parms = (); | ||
1142 : | # Check for a filter by ID. | ||
1143 : | if (defined $id) { | ||
1144 : | push @filters, "id = ?"; | ||
1145 : | push @parms, $id; | ||
1146 : | } | ||
1147 : | # Check for a filter by value. | ||
1148 : | parrello | 1.75 | if (defined $fieldValue) { |
1149 : | parrello | 1.76 | push @filters, "$fieldName = ?"; |
1150 : | parrello | 1.75 | push @parms, $fieldValue; |
1151 : | } | ||
1152 : | parrello | 1.76 | # Append the filters to the command. |
1153 : | if (@filters) { | ||
1154 : | $sql .= " WHERE " . join(" AND ", @filters); | ||
1155 : | } | ||
1156 : | parrello | 1.75 | # Execute the command. |
1157 : | my $dbh = $self->{_dbh}; | ||
1158 : | $retVal = $dbh->SQL($sql, 0, @parms); | ||
1159 : | } | ||
1160 : | # Return the result. | ||
1161 : | return $retVal; | ||
1162 : | } | ||
1163 : | |||
1164 : | parrello | 1.1 | =head3 LoadTables |
1165 : | |||
1166 : | parrello | 1.18 | C<< my $stats = $erdb->LoadTables($directoryName, $rebuild); >> |
1167 : | parrello | 1.1 | |
1168 : | This method will load the database tables from a directory. The tables must already have been created | ||
1169 : | in the database. (This can be done by calling L</CreateTables>.) The caller passes in a directory name; | ||
1170 : | all of the relations to be loaded must have a file in the directory with the same name as the relation | ||
1171 : | (optionally with a suffix of C<.dtx>). Each file must be a tab-delimited table of field values. Each | ||
1172 : | line of the file will be loaded as a row of the target relation table. The field values should be in | ||
1173 : | the same order as the fields in the relation tables generated by L</ShowMetaData>. The old data is | ||
1174 : | erased before the new data is loaded in. | ||
1175 : | |||
1176 : | A certain amount of translation automatically takes place. Ctrl-M characters are deleted, and | ||
1177 : | tab and new-line characters inside a field are escaped as C<\t> and C<\n>, respectively. Dates must | ||
1178 : | be entered as a Unix timestamp, that is, as an integer number of seconds since the base epoch. | ||
1179 : | |||
1180 : | =over 4 | ||
1181 : | |||
1182 : | =item directoryName | ||
1183 : | |||
1184 : | Name of the directory containing the relation files to be loaded. | ||
1185 : | |||
1186 : | =item rebuild | ||
1187 : | |||
1188 : | TRUE if the tables should be dropped and rebuilt, else FALSE. This is, unfortunately, the | ||
1189 : | only way to erase existing data in the tables, since the TRUNCATE command is not supported | ||
1190 : | by all of the DB engines we use. | ||
1191 : | |||
1192 : | =item RETURN | ||
1193 : | |||
1194 : | Returns a statistical object describing the number of records read and a list of the error messages. | ||
1195 : | |||
1196 : | =back | ||
1197 : | |||
1198 : | =cut | ||
1199 : | |||
1200 : | sub LoadTables { | ||
1201 : | parrello | 1.10 | # Get the parameters. |
1202 : | my ($self, $directoryName, $rebuild) = @_; | ||
1203 : | # Start the timer. | ||
1204 : | my $startTime = gettimeofday; | ||
1205 : | # Clean any trailing slash from the directory name. | ||
1206 : | $directoryName =~ s!/\\$!!; | ||
1207 : | # Declare the return variable. | ||
1208 : | my $retVal = Stats->new(); | ||
1209 : | parrello | 1.23 | # Get the relation names. |
1210 : | my @relNames = $self->GetTableNames(); | ||
1211 : | for my $relationName (@relNames) { | ||
1212 : | # Try to load this relation. | ||
1213 : | my $result = $self->_LoadRelation($directoryName, $relationName, $rebuild); | ||
1214 : | parrello | 1.10 | # Accumulate the statistics. |
1215 : | $retVal->Accumulate($result); | ||
1216 : | } | ||
1217 : | # Add the duration of the load to the statistical object. | ||
1218 : | $retVal->Add('duration', gettimeofday - $startTime); | ||
1219 : | # Return the accumulated statistics. | ||
1220 : | return $retVal; | ||
1221 : | parrello | 1.1 | } |
1222 : | |||
1223 : | parrello | 1.23 | |
1224 : | parrello | 1.1 | =head3 GetTableNames |
1225 : | |||
1226 : | parrello | 1.18 | C<< my @names = $erdb->GetTableNames; >> |
1227 : | parrello | 1.1 | |
1228 : | Return a list of the relations required to implement this database. | ||
1229 : | |||
1230 : | =cut | ||
1231 : | |||
1232 : | sub GetTableNames { | ||
1233 : | parrello | 1.10 | # Get the parameters. |
1234 : | my ($self) = @_; | ||
1235 : | # Get the relation list from the metadata. | ||
1236 : | my $relationTable = $self->{_metaData}->{RelationTable}; | ||
1237 : | # Return the relation names. | ||
1238 : | return keys %{$relationTable}; | ||
1239 : | parrello | 1.1 | } |
1240 : | |||
1241 : | =head3 GetEntityTypes | ||
1242 : | |||
1243 : | parrello | 1.18 | C<< my @names = $erdb->GetEntityTypes; >> |
1244 : | parrello | 1.1 | |
1245 : | Return a list of the entity type names. | ||
1246 : | |||
1247 : | =cut | ||
1248 : | |||
1249 : | sub GetEntityTypes { | ||
1250 : | parrello | 1.10 | # Get the database object. |
1251 : | my ($self) = @_; | ||
1252 : | # Get the entity list from the metadata object. | ||
1253 : | my $entityList = $self->{_metaData}->{Entities}; | ||
1254 : | # Return the list of entity names in alphabetical order. | ||
1255 : | return sort keys %{$entityList}; | ||
1256 : | parrello | 1.1 | } |
1257 : | |||
1258 : | parrello | 1.73 | =head3 GetDataTypes |
1259 : | |||
1260 : | C<< my %types = ERDB::GetDataTypes(); >> | ||
1261 : | |||
1262 : | Return a table of ERDB data types. The table returned is a hash of hashes. | ||
1263 : | The keys of the big hash are the datatypes. Each smaller hash has several | ||
1264 : | values used to manage the data. The most interesting is the SQL type (key | ||
1265 : | C<sqlType>) and the descriptive node (key C<notes>). | ||
1266 : | |||
1267 : | Note that changing the values in the smaller hashes will seriously break | ||
1268 : | things, so this data should be treated as read-only. | ||
1269 : | |||
1270 : | =cut | ||
1271 : | |||
1272 : | sub GetDataTypes { | ||
1273 : | return %TypeTable; | ||
1274 : | } | ||
1275 : | |||
1276 : | |||
1277 : | parrello | 1.20 | =head3 IsEntity |
1278 : | |||
1279 : | C<< my $flag = $erdb->IsEntity($entityName); >> | ||
1280 : | |||
1281 : | Return TRUE if the parameter is an entity name, else FALSE. | ||
1282 : | |||
1283 : | =over 4 | ||
1284 : | |||
1285 : | =item entityName | ||
1286 : | |||
1287 : | Object name to be tested. | ||
1288 : | |||
1289 : | =item RETURN | ||
1290 : | |||
1291 : | Returns TRUE if the specified string is an entity name, else FALSE. | ||
1292 : | |||
1293 : | =back | ||
1294 : | |||
1295 : | =cut | ||
1296 : | |||
1297 : | sub IsEntity { | ||
1298 : | # Get the parameters. | ||
1299 : | my ($self, $entityName) = @_; | ||
1300 : | # Test to see if it's an entity. | ||
1301 : | return exists $self->{_metaData}->{Entities}->{$entityName}; | ||
1302 : | } | ||
1303 : | |||
1304 : | parrello | 1.1 | =head3 Get |
1305 : | |||
1306 : | parrello | 1.45 | C<< my $query = $erdb->Get(\@objectNames, $filterClause, \@params); >> |
1307 : | parrello | 1.1 | |
1308 : | This method returns a query object for entities of a specified type using a specified filter. | ||
1309 : | The filter is a standard WHERE/ORDER BY clause with question marks as parameter markers and each | ||
1310 : | field name represented in the form B<I<objectName>(I<fieldName>)>. For example, the | ||
1311 : | following call requests all B<Genome> objects for the genus specified in the variable | ||
1312 : | $genus. | ||
1313 : | |||
1314 : | parrello | 1.45 | C<< $query = $erdb->Get(['Genome'], "Genome(genus) = ?", [$genus]); >> |
1315 : | parrello | 1.1 | |
1316 : | The WHERE clause contains a single question mark, so there is a single additional | ||
1317 : | parameter representing the parameter value. It would also be possible to code | ||
1318 : | |||
1319 : | parrello | 1.18 | C<< $query = $erdb->Get(['Genome'], "Genome(genus) = \'$genus\'"); >> |
1320 : | parrello | 1.1 | |
1321 : | however, this version of the call would generate a syntax error if there were any quote | ||
1322 : | characters inside the variable C<$genus>. | ||
1323 : | |||
1324 : | The use of the strange parenthesized notation for field names enables us to distinguish | ||
1325 : | hyphens contained within field names from minus signs that participate in the computation | ||
1326 : | of the WHERE clause. All of the methods that manipulate fields will use this same notation. | ||
1327 : | |||
1328 : | It is possible to specify multiple entity and relationship names in order to retrieve more than | ||
1329 : | one object's data at the same time, which allows highly complex joined queries. For example, | ||
1330 : | |||
1331 : | parrello | 1.45 | C<< $query = $erdb->Get(['Genome', 'ComesFrom', 'Source'], "Genome(genus) = ?", [$genus]); >> |
1332 : | parrello | 1.1 | |
1333 : | If multiple names are specified, then the query processor will automatically determine a | ||
1334 : | join path between the entities and relationships. The algorithm used is very simplistic. | ||
1335 : | parrello | 1.39 | In particular, if a relationship is recursive, the path is determined by the order in which |
1336 : | the entity and the relationship appear. For example, consider a recursive relationship | ||
1337 : | B<IsParentOf> which relates B<People> objects to other B<People> objects. If the join path is | ||
1338 : | parrello | 1.1 | coded as C<['People', 'IsParentOf']>, then the people returned will be parents. If, however, |
1339 : | the join path is C<['IsParentOf', 'People']>, then the people returned will be children. | ||
1340 : | |||
1341 : | parrello | 1.39 | If an entity or relationship is mentioned twice, the name for the second occurrence will |
1342 : | be suffixed with C<2>, the third occurrence will be suffixed with C<3>, and so forth. So, | ||
1343 : | for example, if we have C<['Feature', 'HasContig', 'Contig', 'HasContig']>, then the | ||
1344 : | B<to-link> field of the first B<HasContig> is specified as C<HasContig(to-link)>, while | ||
1345 : | the B<to-link> field of the second B<HasContig> is specified as C<HasContig2(to-link)>. | ||
1346 : | |||
1347 : | parrello | 1.1 | =over 4 |
1348 : | |||
1349 : | =item objectNames | ||
1350 : | |||
1351 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1352 : | |||
1353 : | =item filterClause | ||
1354 : | |||
1355 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1356 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
1357 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
1358 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
1359 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
1360 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
1361 : | be included in the list of object names. | ||
1362 : | |||
1363 : | The filter clause can also specify a sort order. To do this, simply follow the filter string | ||
1364 : | with an ORDER BY clause. For example, the following filter string gets all genomes for a | ||
1365 : | particular genus and sorts them by species name. | ||
1366 : | |||
1367 : | C<< "Genome(genus) = ? ORDER BY Genome(species)" >> | ||
1368 : | |||
1369 : | parrello | 1.30 | Note that the case is important. Only an uppercase "ORDER BY" with a single space will |
1370 : | be processed. The idea is to make it less likely to find the verb by accident. | ||
1371 : | |||
1372 : | parrello | 1.1 | The rules for field references in a sort order are the same as those for field references in the |
1373 : | filter clause in general; however, odd things may happen if a sort field is from a secondary | ||
1374 : | relation. | ||
1375 : | |||
1376 : | parrello | 1.39 | Finally, you can limit the number of rows returned by adding a LIMIT clause. The LIMIT must |
1377 : | be the last thing in the filter clause, and it contains only the word "LIMIT" followed by | ||
1378 : | a positive number. So, for example | ||
1379 : | |||
1380 : | C<< "Genome(genus) = ? ORDER BY Genome(species) LIMIT 10" >> | ||
1381 : | |||
1382 : | will only return the first ten genomes for the specified genus. The ORDER BY clause is not | ||
1383 : | required. For example, to just get the first 10 genomes in the B<Genome> table, you could | ||
1384 : | use | ||
1385 : | |||
1386 : | C<< "LIMIT 10" >> | ||
1387 : | |||
1388 : | parrello | 1.45 | =item params |
1389 : | parrello | 1.1 | |
1390 : | parrello | 1.45 | Reference to a list of parameter values to be substituted into the filter clause. |
1391 : | parrello | 1.1 | |
1392 : | =item RETURN | ||
1393 : | |||
1394 : | Returns a B<DBQuery> that can be used to iterate through all of the results. | ||
1395 : | |||
1396 : | =back | ||
1397 : | |||
1398 : | =cut | ||
1399 : | |||
1400 : | sub Get { | ||
1401 : | parrello | 1.10 | # Get the parameters. |
1402 : | parrello | 1.45 | my ($self, $objectNames, $filterClause, $params) = @_; |
1403 : | # Process the SQL stuff. | ||
1404 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = | ||
1405 : | $self->_SetupSQL($objectNames, $filterClause); | ||
1406 : | # Create the query. | ||
1407 : | my $command = "SELECT DISTINCT " . join(".*, ", @{$mappedNameListRef}) . | ||
1408 : | ".* $suffix"; | ||
1409 : | my $sth = $self->_GetStatementHandle($command, $params); | ||
1410 : | parrello | 1.39 | # Now we create the relation map, which enables DBQuery to determine the order, name |
1411 : | # and mapped name for each object in the query. | ||
1412 : | my @relationMap = (); | ||
1413 : | parrello | 1.45 | for my $mappedName (@{$mappedNameListRef}) { |
1414 : | push @relationMap, [$mappedName, $mappedNameHashRef->{$mappedName}]; | ||
1415 : | parrello | 1.39 | } |
1416 : | parrello | 1.10 | # Return the statement object. |
1417 : | parrello | 1.39 | my $retVal = DBQuery::_new($self, $sth, \@relationMap); |
1418 : | parrello | 1.10 | return $retVal; |
1419 : | parrello | 1.1 | } |
1420 : | |||
1421 : | parrello | 1.70 | =head3 Search |
1422 : | |||
1423 : | C<< my $query = $erdb->Search($searchExpression, $idx, \@objectNames, $filterClause, \@params); >> | ||
1424 : | |||
1425 : | Perform a full text search with filtering. The search will be against a specified object | ||
1426 : | in the object name list. That object will get an extra field containing the search | ||
1427 : | relevance. Note that except for the search expression, the parameters of this method are | ||
1428 : | the same as those for L</Get> and follow the same rules. | ||
1429 : | |||
1430 : | =over 4 | ||
1431 : | |||
1432 : | =item searchExpression | ||
1433 : | |||
1434 : | parrello | 1.75 | Boolean search expression for the text fields of the target object. The default mode for |
1435 : | a Boolean search expression is OR, but we want the default to be AND, so we will | ||
1436 : | add a C<+> operator to each word with no other operator before it. | ||
1437 : | parrello | 1.70 | |
1438 : | =item idx | ||
1439 : | |||
1440 : | Index in the I<$objectNames> list of the table to be searched in full-text mode. | ||
1441 : | |||
1442 : | =item objectNames | ||
1443 : | |||
1444 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1445 : | |||
1446 : | =item filterClause | ||
1447 : | |||
1448 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1449 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
1450 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
1451 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
1452 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
1453 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
1454 : | be included in the list of object names. | ||
1455 : | |||
1456 : | =item params | ||
1457 : | |||
1458 : | Reference to a list of parameter values to be substituted into the filter clause. | ||
1459 : | |||
1460 : | =item RETURN | ||
1461 : | |||
1462 : | Returns a query object for the specified search. | ||
1463 : | |||
1464 : | =back | ||
1465 : | |||
1466 : | =cut | ||
1467 : | |||
1468 : | sub Search { | ||
1469 : | # Get the parameters. | ||
1470 : | my ($self, $searchExpression, $idx, $objectNames, $filterClause, $params) = @_; | ||
1471 : | # Declare the return variable. | ||
1472 : | my $retVal; | ||
1473 : | parrello | 1.72 | # Create a safety copy of the parameter list. Note we have to be careful to insure |
1474 : | # a parameter list exists before we copy it. | ||
1475 : | my @myParams = (); | ||
1476 : | if (defined $params) { | ||
1477 : | @myParams = @{$params}; | ||
1478 : | } | ||
1479 : | parrello | 1.70 | # Get the first object's structure so we have access to the searchable fields. |
1480 : | my $object1Name = $objectNames->[$idx]; | ||
1481 : | my $object1Structure = $self->_GetStructure($object1Name); | ||
1482 : | # Get the field list. | ||
1483 : | if (! exists $object1Structure->{searchFields}) { | ||
1484 : | Confess("No searchable index for $object1Name."); | ||
1485 : | } else { | ||
1486 : | # Get the field list. | ||
1487 : | my @fields = @{$object1Structure->{searchFields}}; | ||
1488 : | parrello | 1.71 | # Clean the search expression. |
1489 : | my $actualKeywords = $self->CleanKeywords($searchExpression); | ||
1490 : | parrello | 1.75 | # Prefix a "+" to each uncontrolled word. This converts the default |
1491 : | # search mode from OR to AND. | ||
1492 : | $actualKeywords =~ s/(^|\s)(\w)/$1\+$2/g; | ||
1493 : | parrello | 1.72 | Trace("Actual keywords for search are\n$actualKeywords") if T(3); |
1494 : | parrello | 1.70 | # We need two match expressions, one for the filter clause and one in the |
1495 : | # query itself. Both will use a parameter mark, so we need to push the | ||
1496 : | # search expression onto the front of the parameter list twice. | ||
1497 : | parrello | 1.71 | unshift @myParams, $actualKeywords, $actualKeywords; |
1498 : | parrello | 1.70 | # Build the match expression. |
1499 : | my @matchFilterFields = map { "$object1Name." . _FixName($_) } @fields; | ||
1500 : | my $matchClause = "MATCH (" . join(", ", @matchFilterFields) . ") AGAINST (? IN BOOLEAN MODE)"; | ||
1501 : | # Process the SQL stuff. | ||
1502 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = | ||
1503 : | $self->_SetupSQL($objectNames, $filterClause, $matchClause); | ||
1504 : | # Create the query. Note that the match clause is inserted at the front of | ||
1505 : | # the select fields. | ||
1506 : | my $command = "SELECT DISTINCT $matchClause, " . join(".*, ", @{$mappedNameListRef}) . | ||
1507 : | ".* $suffix"; | ||
1508 : | my $sth = $self->_GetStatementHandle($command, \@myParams); | ||
1509 : | # Now we create the relation map, which enables DBQuery to determine the order, name | ||
1510 : | # and mapped name for each object in the query. | ||
1511 : | my @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef); | ||
1512 : | # Return the statement object. | ||
1513 : | $retVal = DBQuery::_new($self, $sth, \@relationMap, $object1Name); | ||
1514 : | } | ||
1515 : | return $retVal; | ||
1516 : | } | ||
1517 : | |||
1518 : | parrello | 1.45 | =head3 GetFlat |
1519 : | |||
1520 : | C<< my @list = $erdb->GetFlat(\@objectNames, $filterClause, \@parameterList, $field); >> | ||
1521 : | |||
1522 : | This is a variation of L</GetAll> that asks for only a single field per record and | ||
1523 : | returns a single flattened list. | ||
1524 : | |||
1525 : | =over 4 | ||
1526 : | |||
1527 : | =item objectNames | ||
1528 : | |||
1529 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1530 : | |||
1531 : | =item filterClause | ||
1532 : | |||
1533 : | WHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1534 : | be parameterized with parameter markers (C<?>). Each field used must be specified in the standard form | ||
1535 : | B<I<objectName>(I<fieldName>)>. Any parameters specified in the filter clause should be added to the | ||
1536 : | parameter list as additional parameters. The fields in a filter clause can come from primary | ||
1537 : | entity relations, relationship relations, or secondary entity relations; however, all of the | ||
1538 : | entities and relationships involved must be included in the list of object names. | ||
1539 : | |||
1540 : | =item parameterList | ||
1541 : | |||
1542 : | List of the parameters to be substituted in for the parameters marks in the filter clause. | ||
1543 : | |||
1544 : | =item field | ||
1545 : | |||
1546 : | Name of the field to be used to get the elements of the list returned. | ||
1547 : | |||
1548 : | =item RETURN | ||
1549 : | |||
1550 : | Returns a list of values. | ||
1551 : | |||
1552 : | =back | ||
1553 : | |||
1554 : | =cut | ||
1555 : | #: Return Type @; | ||
1556 : | sub GetFlat { | ||
1557 : | # Get the parameters. | ||
1558 : | my ($self, $objectNames, $filterClause, $parameterList, $field) = @_; | ||
1559 : | # Construct the query. | ||
1560 : | my $query = $self->Get($objectNames, $filterClause, $parameterList); | ||
1561 : | # Create the result list. | ||
1562 : | my @retVal = (); | ||
1563 : | # Loop through the records, adding the field values found to the result list. | ||
1564 : | while (my $row = $query->Fetch()) { | ||
1565 : | push @retVal, $row->Value($field); | ||
1566 : | } | ||
1567 : | # Return the list created. | ||
1568 : | return @retVal; | ||
1569 : | } | ||
1570 : | |||
1571 : | parrello | 1.73 | =head3 SpecialFields |
1572 : | |||
1573 : | C<< my %specials = $erdb->SpecialFields($entityName); >> | ||
1574 : | |||
1575 : | Return a hash mapping special fields in the specified entity to the value of their | ||
1576 : | C<special> attribute. This enables the subclass to get access to the special field | ||
1577 : | attributes without needed to plumb the internal ERDB data structures. | ||
1578 : | |||
1579 : | =over 4 | ||
1580 : | |||
1581 : | =item entityName | ||
1582 : | |||
1583 : | Name of the entity whose special fields are desired. | ||
1584 : | |||
1585 : | =item RETURN | ||
1586 : | |||
1587 : | Returns a hash. The keys of the hash are the special field names, and the values | ||
1588 : | are the values from each special field's C<special> attribute. | ||
1589 : | |||
1590 : | =back | ||
1591 : | |||
1592 : | =cut | ||
1593 : | |||
1594 : | sub SpecialFields { | ||
1595 : | # Get the parameters. | ||
1596 : | my ($self, $entityName) = @_; | ||
1597 : | # Declare the return variable. | ||
1598 : | my %retVal = (); | ||
1599 : | # Find the entity's data structure. | ||
1600 : | parrello | 1.75 | my $entityData = $self->{_metaData}->{Entities}->{$entityName}; |
1601 : | parrello | 1.73 | # Loop through its fields, adding each special field to the return hash. |
1602 : | my $fieldHash = $entityData->{Fields}; | ||
1603 : | for my $fieldName (keys %{$fieldHash}) { | ||
1604 : | my $fieldData = $fieldHash->{$fieldName}; | ||
1605 : | if (exists $fieldData->{special}) { | ||
1606 : | $retVal{$fieldName} = $fieldData->{special}; | ||
1607 : | } | ||
1608 : | } | ||
1609 : | # Return the result. | ||
1610 : | return %retVal; | ||
1611 : | } | ||
1612 : | |||
1613 : | parrello | 1.32 | =head3 Delete |
1614 : | |||
1615 : | C<< my $stats = $erdb->Delete($entityName, $objectID); >> | ||
1616 : | |||
1617 : | Delete an entity instance from the database. The instance is deleted along with all entity and | ||
1618 : | relationship instances dependent on it. The idea of dependence here is recursive. An object is | ||
1619 : | always dependent on itself. An object is dependent if it is a 1-to-many or many-to-many | ||
1620 : | relationship connected to a dependent entity or the "to" entity connected to a 1-to-many | ||
1621 : | dependent relationship. | ||
1622 : | |||
1623 : | =over 4 | ||
1624 : | |||
1625 : | =item entityName | ||
1626 : | |||
1627 : | Name of the entity type for the instance being deleted. | ||
1628 : | |||
1629 : | =item objectID | ||
1630 : | |||
1631 : | ID of the entity instance to be deleted. If the ID contains a wild card character (C<%>), | ||
1632 : | then it is presumed to by a LIKE pattern. | ||
1633 : | |||
1634 : | =item testFlag | ||
1635 : | |||
1636 : | If TRUE, the delete statements will be traced without being executed. | ||
1637 : | |||
1638 : | =item RETURN | ||
1639 : | |||
1640 : | Returns a statistics object indicating how many records of each particular table were | ||
1641 : | deleted. | ||
1642 : | |||
1643 : | =back | ||
1644 : | |||
1645 : | =cut | ||
1646 : | #: Return Type $%; | ||
1647 : | sub Delete { | ||
1648 : | # Get the parameters. | ||
1649 : | my ($self, $entityName, $objectID, $testFlag) = @_; | ||
1650 : | # Declare the return variable. | ||
1651 : | my $retVal = Stats->new(); | ||
1652 : | # Get the DBKernel object. | ||
1653 : | my $db = $self->{_dbh}; | ||
1654 : | # We're going to generate all the paths branching out from the starting entity. One of | ||
1655 : | # the things we have to be careful about is preventing loops. We'll use a hash to | ||
1656 : | # determine if we've hit a loop. | ||
1657 : | my %alreadyFound = (); | ||
1658 : | parrello | 1.33 | # These next lists will serve as our result stack. We start by pushing object lists onto |
1659 : | parrello | 1.32 | # the stack, and then popping them off to do the deletes. This means the deletes will |
1660 : | # start with the longer paths before getting to the shorter ones. That, in turn, makes | ||
1661 : | # sure we don't delete records that might be needed to forge relationships back to the | ||
1662 : | parrello | 1.33 | # original item. We have two lists-- one for TO-relationships, and one for |
1663 : | # FROM-relationships and entities. | ||
1664 : | my @fromPathList = (); | ||
1665 : | my @toPathList = (); | ||
1666 : | parrello | 1.32 | # This final hash is used to remember what work still needs to be done. We push paths |
1667 : | # onto the list, then pop them off to extend the paths. We prime it with the starting | ||
1668 : | # point. Note that we will work hard to insure that the last item on a path in the | ||
1669 : | parrello | 1.75 | # to-do list is always an entity. |
1670 : | parrello | 1.32 | my @todoList = ([$entityName]); |
1671 : | while (@todoList) { | ||
1672 : | # Get the current path. | ||
1673 : | my $current = pop @todoList; | ||
1674 : | # Copy it into a list. | ||
1675 : | my @stackedPath = @{$current}; | ||
1676 : | # Pull off the last item on the path. It will always be an entity. | ||
1677 : | my $entityName = pop @stackedPath; | ||
1678 : | # Add it to the alreadyFound list. | ||
1679 : | $alreadyFound{$entityName} = 1; | ||
1680 : | # Get the entity data. | ||
1681 : | my $entityData = $self->_GetStructure($entityName); | ||
1682 : | # The first task is to loop through the entity's relation. A DELETE command will | ||
1683 : | # be needed for each of them. | ||
1684 : | my $relations = $entityData->{Relations}; | ||
1685 : | for my $relation (keys %{$relations}) { | ||
1686 : | my @augmentedList = (@stackedPath, $relation); | ||
1687 : | parrello | 1.33 | push @fromPathList, \@augmentedList; |
1688 : | parrello | 1.32 | } |
1689 : | # Now we need to look for relationships connected to this entity. | ||
1690 : | my $relationshipList = $self->{_metaData}->{Relationships}; | ||
1691 : | for my $relationshipName (keys %{$relationshipList}) { | ||
1692 : | my $relationship = $relationshipList->{$relationshipName}; | ||
1693 : | # Check the FROM field. We're only interested if it's us. | ||
1694 : | if ($relationship->{from} eq $entityName) { | ||
1695 : | # Add the path to this relationship. | ||
1696 : | my @augmentedList = (@stackedPath, $entityName, $relationshipName); | ||
1697 : | parrello | 1.33 | push @fromPathList, \@augmentedList; |
1698 : | parrello | 1.32 | # Check the arity. If it's MM we're done. If it's 1M |
1699 : | # and the target hasn't been seen yet, we want to | ||
1700 : | # stack the entity for future processing. | ||
1701 : | if ($relationship->{arity} eq '1M') { | ||
1702 : | my $toEntity = $relationship->{to}; | ||
1703 : | if (! exists $alreadyFound{$toEntity}) { | ||
1704 : | # Here we have a new entity that's dependent on | ||
1705 : | # the current entity, so we need to stack it. | ||
1706 : | my @stackList = (@augmentedList, $toEntity); | ||
1707 : | parrello | 1.33 | push @fromPathList, \@stackList; |
1708 : | parrello | 1.34 | } else { |
1709 : | Trace("$toEntity ignored because it occurred previously.") if T(4); | ||
1710 : | parrello | 1.32 | } |
1711 : | } | ||
1712 : | } | ||
1713 : | # Now check the TO field. In this case only the relationship needs | ||
1714 : | parrello | 1.33 | # deletion. |
1715 : | parrello | 1.32 | if ($relationship->{to} eq $entityName) { |
1716 : | my @augmentedList = (@stackedPath, $entityName, $relationshipName); | ||
1717 : | parrello | 1.33 | push @toPathList, \@augmentedList; |
1718 : | parrello | 1.32 | } |
1719 : | } | ||
1720 : | } | ||
1721 : | # Create the first qualifier for the WHERE clause. This selects the | ||
1722 : | # keys of the primary entity records to be deleted. When we're deleting | ||
1723 : | # from a dependent table, we construct a join page from the first qualifier | ||
1724 : | # to the table containing the dependent records to delete. | ||
1725 : | my $qualifier = ($objectID =~ /%/ ? "LIKE ?" : "= ?"); | ||
1726 : | parrello | 1.33 | # We need to make two passes. The first is through the to-list, and |
1727 : | # the second through the from-list. The from-list is second because | ||
1728 : | # the to-list may need to pass through some of the entities the | ||
1729 : | # from-list would delete. | ||
1730 : | my %stackList = ( from_link => \@fromPathList, to_link => \@toPathList ); | ||
1731 : | # Now it's time to do the deletes. We do it in two passes. | ||
1732 : | for my $keyName ('to_link', 'from_link') { | ||
1733 : | # Get the list for this key. | ||
1734 : | my @pathList = @{$stackList{$keyName}}; | ||
1735 : | parrello | 1.34 | Trace(scalar(@pathList) . " entries in path list for $keyName.") if T(3); |
1736 : | parrello | 1.33 | # Loop through this list. |
1737 : | while (my $path = pop @pathList) { | ||
1738 : | # Get the table whose rows are to be deleted. | ||
1739 : | my @pathTables = @{$path}; | ||
1740 : | parrello | 1.37 | # Start the DELETE statement. We need to call DBKernel because the |
1741 : | # syntax of a DELETE-USING varies among DBMSs. | ||
1742 : | parrello | 1.33 | my $target = $pathTables[$#pathTables]; |
1743 : | parrello | 1.37 | my $stmt = $db->SetUsing(@pathTables); |
1744 : | parrello | 1.33 | # Now start the WHERE. The first thing is the ID field from the starting table. That |
1745 : | # starting table will either be the entity relation or one of the entity's | ||
1746 : | # sub-relations. | ||
1747 : | $stmt .= " WHERE $pathTables[0].id $qualifier"; | ||
1748 : | # Now we run through the remaining entities in the path, connecting them up. | ||
1749 : | for (my $i = 1; $i <= $#pathTables; $i += 2) { | ||
1750 : | # Connect the current relationship to the preceding entity. | ||
1751 : | my ($entity, $rel) = @pathTables[$i-1,$i]; | ||
1752 : | # The style of connection depends on the direction of the relationship. | ||
1753 : | parrello | 1.35 | $stmt .= " AND $entity.id = $rel.$keyName"; |
1754 : | parrello | 1.32 | if ($i + 1 <= $#pathTables) { |
1755 : | # Here there's a next entity, so connect that to the relationship's | ||
1756 : | # to-link. | ||
1757 : | my $entity2 = $pathTables[$i+1]; | ||
1758 : | parrello | 1.35 | $stmt .= " AND $rel.to_link = $entity2.id"; |
1759 : | parrello | 1.32 | } |
1760 : | } | ||
1761 : | parrello | 1.33 | # Now we have our desired DELETE statement. |
1762 : | if ($testFlag) { | ||
1763 : | # Here the user wants to trace without executing. | ||
1764 : | Trace($stmt) if T(0); | ||
1765 : | } else { | ||
1766 : | # Here we can delete. Note that the SQL method dies with a confessing | ||
1767 : | # if an error occurs, so we just go ahead and do it. | ||
1768 : | parrello | 1.36 | Trace("Executing delete from $target using '$objectID'.") if T(3); |
1769 : | my $rv = $db->SQL($stmt, 0, $objectID); | ||
1770 : | parrello | 1.33 | # Accumulate the statistics for this delete. The only rows deleted |
1771 : | # are from the target table, so we use its name to record the | ||
1772 : | # statistic. | ||
1773 : | $retVal->Add($target, $rv); | ||
1774 : | } | ||
1775 : | parrello | 1.32 | } |
1776 : | } | ||
1777 : | # Return the result. | ||
1778 : | return $retVal; | ||
1779 : | } | ||
1780 : | |||
1781 : | parrello | 1.70 | =head3 SortNeeded |
1782 : | |||
1783 : | C<< my $parms = $erdb->SortNeeded($relationName); >> | ||
1784 : | |||
1785 : | Return the pipe command for the sort that should be applied to the specified | ||
1786 : | relation when creating the load file. | ||
1787 : | |||
1788 : | For example, if the load file should be sorted ascending by the first | ||
1789 : | field, this method would return | ||
1790 : | |||
1791 : | sort -k1 -t"\t" | ||
1792 : | |||
1793 : | If the first field is numeric, the method would return | ||
1794 : | |||
1795 : | sort -k1n -t"\t" | ||
1796 : | |||
1797 : | Unfortunately, due to a bug in the C<sort> command, we cannot eliminate duplicate | ||
1798 : | keys using a sort. | ||
1799 : | |||
1800 : | =over 4 | ||
1801 : | |||
1802 : | =item relationName | ||
1803 : | |||
1804 : | Name of the relation to be examined. | ||
1805 : | |||
1806 : | =item | ||
1807 : | |||
1808 : | Returns the sort command to use for sorting the relation, suitable for piping. | ||
1809 : | |||
1810 : | =back | ||
1811 : | |||
1812 : | =cut | ||
1813 : | #: Return Type $; | ||
1814 : | sub SortNeeded { | ||
1815 : | # Get the parameters. | ||
1816 : | my ($self, $relationName) = @_; | ||
1817 : | # Declare a descriptor to hold the names of the key fields. | ||
1818 : | my @keyNames = (); | ||
1819 : | # Get the relation structure. | ||
1820 : | my $relationData = $self->_FindRelation($relationName); | ||
1821 : | # Find out if the relation is a primary entity relation, | ||
1822 : | # a relationship relation, or a secondary entity relation. | ||
1823 : | my $entityTable = $self->{_metaData}->{Entities}; | ||
1824 : | my $relationshipTable = $self->{_metaData}->{Relationships}; | ||
1825 : | if (exists $entityTable->{$relationName}) { | ||
1826 : | # Here we have a primary entity relation. | ||
1827 : | push @keyNames, "id"; | ||
1828 : | } elsif (exists $relationshipTable->{$relationName}) { | ||
1829 : | # Here we have a relationship. We sort using the FROM index. | ||
1830 : | my $relationshipData = $relationshipTable->{$relationName}; | ||
1831 : | parrello | 1.75 | my $index = $relationData->{Indexes}->{idxFrom}; |
1832 : | parrello | 1.70 | push @keyNames, @{$index->{IndexFields}}; |
1833 : | } else { | ||
1834 : | # Here we have a secondary entity relation, so we have a sort on the ID field. | ||
1835 : | push @keyNames, "id"; | ||
1836 : | } | ||
1837 : | # Now we parse the key names into sort parameters. First, we prime the return | ||
1838 : | # string. | ||
1839 : | my $retVal = "sort -t\"\t\" "; | ||
1840 : | # Get the relation's field list. | ||
1841 : | my @fields = @{$relationData->{Fields}}; | ||
1842 : | # Loop through the keys. | ||
1843 : | for my $keyData (@keyNames) { | ||
1844 : | # Get the key and the ordering. | ||
1845 : | my ($keyName, $ordering); | ||
1846 : | if ($keyData =~ /^([^ ]+) DESC/) { | ||
1847 : | ($keyName, $ordering) = ($1, "descending"); | ||
1848 : | } else { | ||
1849 : | ($keyName, $ordering) = ($keyData, "ascending"); | ||
1850 : | } | ||
1851 : | # Find the key's position and type. | ||
1852 : | my $fieldSpec; | ||
1853 : | for (my $i = 0; $i <= $#fields && ! $fieldSpec; $i++) { | ||
1854 : | my $thisField = $fields[$i]; | ||
1855 : | if ($thisField->{name} eq $keyName) { | ||
1856 : | # Get the sort modifier for this field type. The modifier | ||
1857 : | # decides whether we're using a character, numeric, or | ||
1858 : | # floating-point sort. | ||
1859 : | my $modifier = $TypeTable{$thisField->{type}}->{sort}; | ||
1860 : | # If the index is descending for this field, denote we want | ||
1861 : | # to reverse the sort order on this field. | ||
1862 : | if ($ordering eq 'descending') { | ||
1863 : | $modifier .= "r"; | ||
1864 : | } | ||
1865 : | # Store the position and modifier into the field spec, which | ||
1866 : | # will stop the inner loop. Note that the field number is | ||
1867 : | # 1-based in the sort command, so we have to increment the | ||
1868 : | # index. | ||
1869 : | $fieldSpec = ($i + 1) . $modifier; | ||
1870 : | } | ||
1871 : | } | ||
1872 : | # Add this field to the sort command. | ||
1873 : | $retVal .= " -k$fieldSpec"; | ||
1874 : | } | ||
1875 : | # Return the result. | ||
1876 : | return $retVal; | ||
1877 : | } | ||
1878 : | |||
1879 : | parrello | 1.6 | =head3 GetList |
1880 : | |||
1881 : | parrello | 1.45 | C<< my @dbObjects = $erdb->GetList(\@objectNames, $filterClause, \@params); >> |
1882 : | parrello | 1.6 | |
1883 : | Return a list of object descriptors for the specified objects as determined by the | ||
1884 : | specified filter clause. | ||
1885 : | |||
1886 : | This method is essentially the same as L</Get> except it returns a list of objects rather | ||
1887 : | parrello | 1.7 | than a query object that can be used to get the results one record at a time. |
1888 : | parrello | 1.6 | |
1889 : | =over 4 | ||
1890 : | |||
1891 : | =item objectNames | ||
1892 : | |||
1893 : | List containing the names of the entity and relationship objects to be retrieved. | ||
1894 : | |||
1895 : | =item filterClause | ||
1896 : | |||
1897 : | WHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
1898 : | be parameterized with parameter markers (C<?>). Each field used in the WHERE clause must be | ||
1899 : | specified in the standard form B<I<objectName>(I<fieldName>)>. Any parameters specified | ||
1900 : | in the filter clause should be added to the parameter list as additional parameters. The | ||
1901 : | fields in a filter clause can come from primary entity relations, relationship relations, | ||
1902 : | or secondary entity relations; however, all of the entities and relationships involved must | ||
1903 : | be included in the list of object names. | ||
1904 : | |||
1905 : | The filter clause can also specify a sort order. To do this, simply follow the filter string | ||
1906 : | with an ORDER BY clause. For example, the following filter string gets all genomes for a | ||
1907 : | particular genus and sorts them by species name. | ||
1908 : | |||
1909 : | C<< "Genome(genus) = ? ORDER BY Genome(species)" >> | ||
1910 : | |||
1911 : | The rules for field references in a sort order are the same as those for field references in the | ||
1912 : | filter clause in general; however, odd things may happen if a sort field is from a secondary | ||
1913 : | relation. | ||
1914 : | |||
1915 : | parrello | 1.45 | =item params |
1916 : | parrello | 1.6 | |
1917 : | parrello | 1.45 | Reference to a list of parameter values to be substituted into the filter clause. |
1918 : | parrello | 1.6 | |
1919 : | =item RETURN | ||
1920 : | |||
1921 : | Returns a list of B<DBObject>s that satisfy the query conditions. | ||
1922 : | |||
1923 : | =back | ||
1924 : | |||
1925 : | =cut | ||
1926 : | #: Return Type @% | ||
1927 : | sub GetList { | ||
1928 : | # Get the parameters. | ||
1929 : | parrello | 1.45 | my ($self, $objectNames, $filterClause, $params) = @_; |
1930 : | parrello | 1.10 | # Declare the return variable. |
1931 : | my @retVal = (); | ||
1932 : | # Perform the query. | ||
1933 : | parrello | 1.45 | my $query = $self->Get($objectNames, $filterClause, $params); |
1934 : | parrello | 1.10 | # Loop through the results. |
1935 : | while (my $object = $query->Fetch) { | ||
1936 : | push @retVal, $object; | ||
1937 : | } | ||
1938 : | parrello | 1.6 | # Return the result. |
1939 : | return @retVal; | ||
1940 : | } | ||
1941 : | |||
1942 : | parrello | 1.45 | =head3 GetCount |
1943 : | |||
1944 : | C<< my $count = $erdb->GetCount(\@objectNames, $filter, \@params); >> | ||
1945 : | |||
1946 : | Return the number of rows found by a specified query. This method would | ||
1947 : | normally be used to count the records in a single table. For example, in a | ||
1948 : | genetics database | ||
1949 : | |||
1950 : | my $count = $erdb->GetCount(['Genome'], 'Genome(genus-species) LIKE ?', ['homo %']); | ||
1951 : | |||
1952 : | would return the number of genomes for the genus I<homo>. It is conceivable, however, | ||
1953 : | to use it to return records based on a join. For example, | ||
1954 : | |||
1955 : | parrello | 1.47 | my $count = $erdb->GetCount(['HasFeature', 'Genome'], 'Genome(genus-species) LIKE ?', |
1956 : | parrello | 1.45 | ['homo %']); |
1957 : | |||
1958 : | would return the number of features for genomes in the genus I<homo>. Note that | ||
1959 : | only the rows from the first table are counted. If the above command were | ||
1960 : | |||
1961 : | my $count = $erdb->GetCount(['Genome', 'Feature'], 'Genome(genus-species) LIKE ?', | ||
1962 : | ['homo %']); | ||
1963 : | |||
1964 : | it would return the number of genomes, not the number of genome/feature pairs. | ||
1965 : | |||
1966 : | =over 4 | ||
1967 : | |||
1968 : | =item objectNames | ||
1969 : | |||
1970 : | Reference to a list of the objects (entities and relationships) included in the | ||
1971 : | query. | ||
1972 : | |||
1973 : | =item filter | ||
1974 : | |||
1975 : | A filter clause for restricting the query. The rules are the same as for the L</Get> | ||
1976 : | method. | ||
1977 : | |||
1978 : | =item params | ||
1979 : | |||
1980 : | Reference to a list of the parameter values to be substituted for the parameter marks | ||
1981 : | in the filter. | ||
1982 : | |||
1983 : | =item RETURN | ||
1984 : | |||
1985 : | Returns a count of the number of records in the first table that would satisfy | ||
1986 : | the query. | ||
1987 : | |||
1988 : | =back | ||
1989 : | |||
1990 : | =cut | ||
1991 : | |||
1992 : | sub GetCount { | ||
1993 : | # Get the parameters. | ||
1994 : | my ($self, $objectNames, $filter, $params) = @_; | ||
1995 : | parrello | 1.66 | # Insure the params argument is an array reference if the caller left it off. |
1996 : | if (! defined($params)) { | ||
1997 : | $params = []; | ||
1998 : | } | ||
1999 : | parrello | 1.45 | # Declare the return variable. |
2000 : | my $retVal; | ||
2001 : | parrello | 1.47 | # Find out if we're counting an entity or a relationship. |
2002 : | my $countedField; | ||
2003 : | if ($self->IsEntity($objectNames->[0])) { | ||
2004 : | $countedField = "id"; | ||
2005 : | } else { | ||
2006 : | # For a relationship we count the to-link because it's usually more | ||
2007 : | # numerous. Note we're automatically converting to the SQL form | ||
2008 : | # of the field name (to_link vs. to-link). | ||
2009 : | $countedField = "to_link"; | ||
2010 : | } | ||
2011 : | parrello | 1.45 | # Create the SQL command suffix to get the desired records. |
2012 : | my ($suffix, $mappedNameListRef, $mappedNameHashRef) = $self->_SetupSQL($objectNames, | ||
2013 : | $filter); | ||
2014 : | # Prefix it with text telling it we want a record count. | ||
2015 : | my $firstObject = $mappedNameListRef->[0]; | ||
2016 : | parrello | 1.47 | my $command = "SELECT COUNT($firstObject.$countedField) $suffix"; |
2017 : | parrello | 1.45 | # Prepare and execute the command. |
2018 : | my $sth = $self->_GetStatementHandle($command, $params); | ||
2019 : | # Get the count value. | ||
2020 : | ($retVal) = $sth->fetchrow_array(); | ||
2021 : | # Check for a problem. | ||
2022 : | if (! defined($retVal)) { | ||
2023 : | if ($sth->err) { | ||
2024 : | # Here we had an SQL error. | ||
2025 : | Confess("Error retrieving row count: " . $sth->errstr()); | ||
2026 : | } else { | ||
2027 : | # Here we have no result. | ||
2028 : | Confess("No result attempting to retrieve row count."); | ||
2029 : | } | ||
2030 : | } | ||
2031 : | # Return the result. | ||
2032 : | return $retVal; | ||
2033 : | } | ||
2034 : | |||
2035 : | parrello | 1.1 | =head3 ComputeObjectSentence |
2036 : | |||
2037 : | parrello | 1.18 | C<< my $sentence = $erdb->ComputeObjectSentence($objectName); >> |
2038 : | parrello | 1.1 | |
2039 : | Check an object name, and if it is a relationship convert it to a relationship sentence. | ||
2040 : | |||
2041 : | =over 4 | ||
2042 : | |||
2043 : | =item objectName | ||
2044 : | |||
2045 : | Name of the entity or relationship. | ||
2046 : | |||
2047 : | =item RETURN | ||
2048 : | |||
2049 : | Returns a string containing the entity name or a relationship sentence. | ||
2050 : | |||
2051 : | =back | ||
2052 : | |||
2053 : | =cut | ||
2054 : | |||
2055 : | sub ComputeObjectSentence { | ||
2056 : | parrello | 1.10 | # Get the parameters. |
2057 : | my ($self, $objectName) = @_; | ||
2058 : | # Set the default return value. | ||
2059 : | my $retVal = $objectName; | ||
2060 : | # Look for the object as a relationship. | ||
2061 : | my $relTable = $self->{_metaData}->{Relationships}; | ||
2062 : | if (exists $relTable->{$objectName}) { | ||
2063 : | # Get the relationship sentence. | ||
2064 : | $retVal = _ComputeRelationshipSentence($objectName, $relTable->{$objectName}); | ||
2065 : | } | ||
2066 : | # Return the result. | ||
2067 : | return $retVal; | ||
2068 : | parrello | 1.1 | } |
2069 : | |||
2070 : | =head3 DumpRelations | ||
2071 : | |||
2072 : | parrello | 1.18 | C<< $erdb->DumpRelations($outputDirectory); >> |
2073 : | parrello | 1.1 | |
2074 : | Write the contents of all the relations to tab-delimited files in the specified directory. | ||
2075 : | Each file will have the same name as the relation dumped, with an extension of DTX. | ||
2076 : | |||
2077 : | =over 4 | ||
2078 : | |||
2079 : | =item outputDirectory | ||
2080 : | |||
2081 : | Name of the directory into which the relation files should be dumped. | ||
2082 : | |||
2083 : | =back | ||
2084 : | |||
2085 : | =cut | ||
2086 : | |||
2087 : | sub DumpRelations { | ||
2088 : | parrello | 1.10 | # Get the parameters. |
2089 : | my ($self, $outputDirectory) = @_; | ||
2090 : | # Now we need to run through all the relations. First, we loop through the entities. | ||
2091 : | my $metaData = $self->{_metaData}; | ||
2092 : | my $entities = $metaData->{Entities}; | ||
2093 : | for my $entityName (keys %{$entities}) { | ||
2094 : | my $entityStructure = $entities->{$entityName}; | ||
2095 : | # Get the entity's relations. | ||
2096 : | my $relationList = $entityStructure->{Relations}; | ||
2097 : | # Loop through the relations, dumping them. | ||
2098 : | for my $relationName (keys %{$relationList}) { | ||
2099 : | my $relation = $relationList->{$relationName}; | ||
2100 : | $self->_DumpRelation($outputDirectory, $relationName, $relation); | ||
2101 : | } | ||
2102 : | } | ||
2103 : | # Next, we loop through the relationships. | ||
2104 : | my $relationships = $metaData->{Relationships}; | ||
2105 : | for my $relationshipName (keys %{$relationships}) { | ||
2106 : | my $relationshipStructure = $relationships->{$relationshipName}; | ||
2107 : | # Dump this relationship's relation. | ||
2108 : | $self->_DumpRelation($outputDirectory, $relationshipName, $relationshipStructure->{Relations}->{$relationshipName}); | ||
2109 : | } | ||
2110 : | parrello | 1.1 | } |
2111 : | |||
2112 : | parrello | 1.51 | =head3 InsertValue |
2113 : | |||
2114 : | C<< $erdb->InsertValue($entityID, $fieldName, $value); >> | ||
2115 : | |||
2116 : | This method will insert a new value into the database. The value must be one | ||
2117 : | associated with a secondary relation, since primary values cannot be inserted: | ||
2118 : | they occur exactly once. Secondary values, on the other hand, can be missing | ||
2119 : | or multiply-occurring. | ||
2120 : | |||
2121 : | =over 4 | ||
2122 : | |||
2123 : | =item entityID | ||
2124 : | |||
2125 : | ID of the object that is to receive the new value. | ||
2126 : | |||
2127 : | =item fieldName | ||
2128 : | |||
2129 : | Field name for the new value-- this includes the entity name, since | ||
2130 : | field names are of the format I<objectName>C<(>I<fieldName>C<)>. | ||
2131 : | |||
2132 : | =item value | ||
2133 : | |||
2134 : | New value to be put in the field. | ||
2135 : | |||
2136 : | =back | ||
2137 : | |||
2138 : | =cut | ||
2139 : | |||
2140 : | sub InsertValue { | ||
2141 : | # Get the parameters. | ||
2142 : | my ($self, $entityID, $fieldName, $value) = @_; | ||
2143 : | # Parse the entity name and the real field name. | ||
2144 : | if ($fieldName =~ /^([^(]+)\(([^)]+)\)/) { | ||
2145 : | my $entityName = $1; | ||
2146 : | my $fieldTitle = $2; | ||
2147 : | # Get its descriptor. | ||
2148 : | if (!$self->IsEntity($entityName)) { | ||
2149 : | Confess("$entityName is not a valid entity."); | ||
2150 : | } else { | ||
2151 : | my $entityData = $self->{_metaData}->{Entities}->{$entityName}; | ||
2152 : | # Find the relation containing this field. | ||
2153 : | my $fieldHash = $entityData->{Fields}; | ||
2154 : | parrello | 1.52 | if (! exists $fieldHash->{$fieldTitle}) { |
2155 : | parrello | 1.51 | Confess("$fieldTitle not found in $entityName."); |
2156 : | } else { | ||
2157 : | my $relation = $fieldHash->{$fieldTitle}->{relation}; | ||
2158 : | if ($relation eq $entityName) { | ||
2159 : | Confess("Cannot do InsertValue on primary field $fieldTitle of $entityName."); | ||
2160 : | } else { | ||
2161 : | # Now we can create an INSERT statement. | ||
2162 : | my $dbh = $self->{_dbh}; | ||
2163 : | my $fixedName = _FixName($fieldTitle); | ||
2164 : | parrello | 1.53 | my $statement = "INSERT INTO $relation (id, $fixedName) VALUES(?, ?)"; |
2165 : | parrello | 1.51 | # Execute the command. |
2166 : | $dbh->SQL($statement, 0, $entityID, $value); | ||
2167 : | } | ||
2168 : | } | ||
2169 : | } | ||
2170 : | } else { | ||
2171 : | Confess("$fieldName is not a valid field name."); | ||
2172 : | } | ||
2173 : | } | ||
2174 : | |||
2175 : | parrello | 1.1 | =head3 InsertObject |
2176 : | |||
2177 : | parrello | 1.18 | C<< my $ok = $erdb->InsertObject($objectType, \%fieldHash); >> |
2178 : | parrello | 1.1 | |
2179 : | Insert an object into the database. The object is defined by a type name and then a hash | ||
2180 : | of field names to values. Field values in the primary relation are represented by scalars. | ||
2181 : | (Note that for relationships, the primary relation is the B<only> relation.) | ||
2182 : | Field values for the other relations comprising the entity are always list references. For | ||
2183 : | example, the following line inserts an inactive PEG feature named C<fig|188.1.peg.1> with aliases | ||
2184 : | C<ZP_00210270.1> and C<gi|46206278>. | ||
2185 : | |||
2186 : | parrello | 1.18 | C<< $erdb->InsertObject('Feature', { id => 'fig|188.1.peg.1', active => 0, feature-type => 'peg', alias => ['ZP_00210270.1', 'gi|46206278']}); >> |
2187 : | parrello | 1.1 | |
2188 : | The next statement inserts a C<HasProperty> relationship between feature C<fig|158879.1.peg.1> and | ||
2189 : | property C<4> with an evidence URL of C<http://seedu.uchicago.edu/query.cgi?article_id=142>. | ||
2190 : | |||
2191 : | parrello | 1.57 | C<< $erdb->InsertObject('HasProperty', { 'from-link' => 'fig|158879.1.peg.1', 'to-link' => 4, evidence => 'http://seedu.uchicago.edu/query.cgi?article_id=142'}); >> |
2192 : | parrello | 1.1 | |
2193 : | =over 4 | ||
2194 : | |||
2195 : | =item newObjectType | ||
2196 : | |||
2197 : | Type name of the object to insert. | ||
2198 : | |||
2199 : | =item fieldHash | ||
2200 : | |||
2201 : | Hash of field names to values. | ||
2202 : | |||
2203 : | =item RETURN | ||
2204 : | |||
2205 : | Returns 1 if successful, 0 if an error occurred. | ||
2206 : | |||
2207 : | =back | ||
2208 : | |||
2209 : | =cut | ||
2210 : | |||
2211 : | sub InsertObject { | ||
2212 : | parrello | 1.10 | # Get the parameters. |
2213 : | my ($self, $newObjectType, $fieldHash) = @_; | ||
2214 : | # Denote that so far we appear successful. | ||
2215 : | my $retVal = 1; | ||
2216 : | # Get the database handle. | ||
2217 : | my $dbh = $self->{_dbh}; | ||
2218 : | # Get the relation list. | ||
2219 : | my $relationTable = $self->_GetRelationTable($newObjectType); | ||
2220 : | # Loop through the relations. We'll build insert statements for each one. If a relation is | ||
2221 : | # secondary, we may end up generating multiple insert statements. If an error occurs, we | ||
2222 : | # stop the loop. | ||
2223 : | my @relationList = keys %{$relationTable}; | ||
2224 : | for (my $i = 0; $retVal && $i <= $#relationList; $i++) { | ||
2225 : | my $relationName = $relationList[$i]; | ||
2226 : | my $relationDefinition = $relationTable->{$relationName}; | ||
2227 : | # Get the relation's fields. For each field we will collect a value in the corresponding | ||
2228 : | # position of the @valueList array. If one of the fields is missing, we will add it to the | ||
2229 : | # @missing list. | ||
2230 : | my @fieldList = @{$relationDefinition->{Fields}}; | ||
2231 : | my @fieldNameList = (); | ||
2232 : | my @valueList = (); | ||
2233 : | my @missing = (); | ||
2234 : | my $recordCount = 1; | ||
2235 : | for my $fieldDescriptor (@fieldList) { | ||
2236 : | # Get the field name and save it. Note we need to fix it up so the hyphens | ||
2237 : | # are converted to underscores. | ||
2238 : | my $fieldName = $fieldDescriptor->{name}; | ||
2239 : | push @fieldNameList, _FixName($fieldName); | ||
2240 : | # Look for the named field in the incoming structure. Note that we are looking | ||
2241 : | # for the real field name, not the fixed-up one! | ||
2242 : | if (exists $fieldHash->{$fieldName}) { | ||
2243 : | # Here we found the field. Stash it in the value list. | ||
2244 : | my $value = $fieldHash->{$fieldName}; | ||
2245 : | push @valueList, $value; | ||
2246 : | # If the value is a list, we may need to increment the record count. | ||
2247 : | if (ref $value eq "ARRAY") { | ||
2248 : | my $thisCount = @{$value}; | ||
2249 : | if ($recordCount == 1) { | ||
2250 : | # Here we have our first list, so we save its count. | ||
2251 : | $recordCount = $thisCount; | ||
2252 : | } elsif ($recordCount != $thisCount) { | ||
2253 : | # Here we have a second list, so its length has to match the | ||
2254 : | # previous lists. | ||
2255 : | Trace("Field $value in new $newObjectType object has an invalid list length $thisCount. Expected $recordCount.") if T(0); | ||
2256 : | $retVal = 0; | ||
2257 : | } | ||
2258 : | } | ||
2259 : | } else { | ||
2260 : | # Here the field is not present. Flag it as missing. | ||
2261 : | push @missing, $fieldName; | ||
2262 : | } | ||
2263 : | } | ||
2264 : | # If we are the primary relation, add the new-record flag. | ||
2265 : | if ($relationName eq $newObjectType) { | ||
2266 : | push @valueList, 1; | ||
2267 : | push @fieldNameList, "new_record"; | ||
2268 : | } | ||
2269 : | # Only proceed if there are no missing fields. | ||
2270 : | if (@missing > 0) { | ||
2271 : | Trace("Relation $relationName for $newObjectType skipped due to missing fields: " . | ||
2272 : | join(' ', @missing)) if T(1); | ||
2273 : | } else { | ||
2274 : | # Build the INSERT statement. | ||
2275 : | my $statement = "INSERT INTO $relationName (" . join (', ', @fieldNameList) . | ||
2276 : | ") VALUES ("; | ||
2277 : | # Create a marker list of the proper size and put it in the statement. | ||
2278 : | my @markers = (); | ||
2279 : | while (@markers < @fieldNameList) { push @markers, '?'; } | ||
2280 : | $statement .= join(', ', @markers) . ")"; | ||
2281 : | # We have the insert statement, so prepare it. | ||
2282 : | my $sth = $dbh->prepare_command($statement); | ||
2283 : | Trace("Insert statement prepared: $statement") if T(3); | ||
2284 : | # Now we loop through the values. If a value is scalar, we use it unmodified. If it's | ||
2285 : | # a list, we use the current element. The values are stored in the @parameterList array. | ||
2286 : | my $done = 0; | ||
2287 : | for (my $i = 0; $i < $recordCount; $i++) { | ||
2288 : | # Clear the parameter list array. | ||
2289 : | my @parameterList = (); | ||
2290 : | # Loop through the values. | ||
2291 : | for my $value (@valueList) { | ||
2292 : | # Check to see if this is a scalar value. | ||
2293 : | if (ref $value eq "ARRAY") { | ||
2294 : | # Here we have a list value. Pull the current entry. | ||
2295 : | push @parameterList, $value->[$i]; | ||
2296 : | } else { | ||
2297 : | # Here we have a scalar value. Use it unmodified. | ||
2298 : | push @parameterList, $value; | ||
2299 : | } | ||
2300 : | } | ||
2301 : | # Execute the INSERT statement with the specified parameter list. | ||
2302 : | $retVal = $sth->execute(@parameterList); | ||
2303 : | if (!$retVal) { | ||
2304 : | my $errorString = $sth->errstr(); | ||
2305 : | Trace("Insert error: $errorString.") if T(0); | ||
2306 : | } | ||
2307 : | } | ||
2308 : | } | ||
2309 : | } | ||
2310 : | # Return the success indicator. | ||
2311 : | return $retVal; | ||
2312 : | parrello | 1.1 | } |
2313 : | |||
2314 : | =head3 LoadTable | ||
2315 : | |||
2316 : | parrello | 1.18 | C<< my %results = $erdb->LoadTable($fileName, $relationName, $truncateFlag); >> |
2317 : | parrello | 1.1 | |
2318 : | parrello | 1.9 | Load data from a tab-delimited file into a specified table, optionally re-creating the table |
2319 : | first. | ||
2320 : | parrello | 1.1 | |
2321 : | =over 4 | ||
2322 : | |||
2323 : | =item fileName | ||
2324 : | |||
2325 : | Name of the file from which the table data should be loaded. | ||
2326 : | |||
2327 : | =item relationName | ||
2328 : | |||
2329 : | Name of the relation to be loaded. This is the same as the table name. | ||
2330 : | |||
2331 : | =item truncateFlag | ||
2332 : | |||
2333 : | TRUE if the table should be dropped and re-created, else FALSE | ||
2334 : | |||
2335 : | =item RETURN | ||
2336 : | |||
2337 : | parrello | 1.28 | Returns a statistical object containing a list of the error messages. |
2338 : | parrello | 1.1 | |
2339 : | =back | ||
2340 : | |||
2341 : | =cut | ||
2342 : | sub LoadTable { | ||
2343 : | parrello | 1.10 | # Get the parameters. |
2344 : | my ($self, $fileName, $relationName, $truncateFlag) = @_; | ||
2345 : | # Create the statistical return object. | ||
2346 : | my $retVal = _GetLoadStats(); | ||
2347 : | # Trace the fact of the load. | ||
2348 : | Trace("Loading table $relationName from $fileName") if T(2); | ||
2349 : | # Get the database handle. | ||
2350 : | my $dbh = $self->{_dbh}; | ||
2351 : | parrello | 1.22 | # Get the input file size. |
2352 : | my $fileSize = -s $fileName; | ||
2353 : | parrello | 1.10 | # Get the relation data. |
2354 : | my $relation = $self->_FindRelation($relationName); | ||
2355 : | # Check the truncation flag. | ||
2356 : | if ($truncateFlag) { | ||
2357 : | Trace("Creating table $relationName") if T(2); | ||
2358 : | parrello | 1.19 | # Compute the row count estimate. We take the size of the load file, |
2359 : | # divide it by the estimated row size, and then multiply by 1.5 to | ||
2360 : | # leave extra room. We postulate a minimum row count of 1000 to | ||
2361 : | # prevent problems with incoming empty load files. | ||
2362 : | my $rowSize = $self->EstimateRowSize($relationName); | ||
2363 : | my $estimate = FIG::max($fileSize * 1.5 / $rowSize, 1000); | ||
2364 : | parrello | 1.10 | # Re-create the table without its index. |
2365 : | parrello | 1.19 | $self->CreateTable($relationName, 0, $estimate); |
2366 : | parrello | 1.10 | # If this is a pre-index DBMS, create the index here. |
2367 : | if ($dbh->{_preIndex}) { | ||
2368 : | eval { | ||
2369 : | $self->CreateIndex($relationName); | ||
2370 : | }; | ||
2371 : | if ($@) { | ||
2372 : | $retVal->AddMessage($@); | ||
2373 : | } | ||
2374 : | } | ||
2375 : | } | ||
2376 : | parrello | 1.3 | # Load the table. |
2377 : | parrello | 1.10 | my $rv; |
2378 : | eval { | ||
2379 : | parrello | 1.20 | $rv = $dbh->load_table(file => $fileName, tbl => $relationName); |
2380 : | parrello | 1.10 | }; |
2381 : | if (!defined $rv) { | ||
2382 : | parrello | 1.3 | $retVal->AddMessage($@) if ($@); |
2383 : | parrello | 1.73 | $retVal->AddMessage("Table load failed for $relationName using $fileName: " . $dbh->error_message); |
2384 : | parrello | 1.10 | Trace("Table load failed for $relationName.") if T(1); |
2385 : | } else { | ||
2386 : | parrello | 1.22 | # Here we successfully loaded the table. |
2387 : | $retVal->Add("tables"); | ||
2388 : | my $size = -s $fileName; | ||
2389 : | Trace("$size bytes loaded into $relationName.") if T(2); | ||
2390 : | parrello | 1.10 | # If we're rebuilding, we need to create the table indexes. |
2391 : | parrello | 1.70 | if ($truncateFlag) { |
2392 : | # Indexes are created here for PostGres. For PostGres, indexes are | ||
2393 : | # best built at the end. For MySQL, the reverse is true. | ||
2394 : | if (! $dbh->{_preIndex}) { | ||
2395 : | eval { | ||
2396 : | $self->CreateIndex($relationName); | ||
2397 : | }; | ||
2398 : | if ($@) { | ||
2399 : | $retVal->AddMessage($@); | ||
2400 : | } | ||
2401 : | } | ||
2402 : | # The full-text index (if any) is always built last, even for MySQL. | ||
2403 : | # First we need to see if this table has a full-text index. Only | ||
2404 : | # primary relations are allowed that privilege. | ||
2405 : | if ($self->_IsPrimary($relationName)) { | ||
2406 : | # Get the relation's entity/relationship structure. | ||
2407 : | my $structure = $self->_GetStructure($relationName); | ||
2408 : | # Check for a searchable fields list. | ||
2409 : | if (exists $structure->{searchFields}) { | ||
2410 : | # Here we know that we need to create a full-text search index. | ||
2411 : | # Get an SQL-formatted field name list. | ||
2412 : | my $fields = join(", ", $self->_FixNames(@{$structure->{searchFields}})); | ||
2413 : | # Create the index. | ||
2414 : | parrello | 1.75 | $dbh->create_index(tbl => $relationName, idx => "search_idx", |
2415 : | parrello | 1.70 | flds => $fields, kind => 'fulltext'); |
2416 : | } | ||
2417 : | parrello | 1.10 | } |
2418 : | } | ||
2419 : | } | ||
2420 : | parrello | 1.20 | # Analyze the table to improve performance. |
2421 : | parrello | 1.61 | Trace("Analyzing and compacting $relationName.") if T(3); |
2422 : | olson | 1.16 | $dbh->vacuum_it($relationName); |
2423 : | parrello | 1.61 | Trace("$relationName load completed.") if T(3); |
2424 : | parrello | 1.10 | # Return the statistics. |
2425 : | return $retVal; | ||
2426 : | parrello | 1.1 | } |
2427 : | |||
2428 : | parrello | 1.73 | =head3 DropRelation |
2429 : | parrello | 1.1 | |
2430 : | parrello | 1.73 | C<< $erdb->DropRelation($relationName); >> |
2431 : | parrello | 1.1 | |
2432 : | parrello | 1.73 | Physically drop a relation from the database. |
2433 : | parrello | 1.1 | |
2434 : | =over 4 | ||
2435 : | |||
2436 : | parrello | 1.73 | =item relationName |
2437 : | parrello | 1.1 | |
2438 : | parrello | 1.73 | Name of the relation to drop. If it does not exist, this method will have |
2439 : | no effect. | ||
2440 : | parrello | 1.1 | |
2441 : | =back | ||
2442 : | |||
2443 : | =cut | ||
2444 : | |||
2445 : | parrello | 1.73 | sub DropRelation { |
2446 : | parrello | 1.10 | # Get the parameters. |
2447 : | parrello | 1.73 | my ($self, $relationName) = @_; |
2448 : | # Get the database handle. | ||
2449 : | my $dbh = $self->{_dbh}; | ||
2450 : | # Drop the relation. The method used here has no effect if the relation | ||
2451 : | # does not exist. | ||
2452 : | parrello | 1.74 | Trace("Invoking DB Kernel to drop $relationName.") if T(3); |
2453 : | $dbh->drop_table(tbl => $relationName); | ||
2454 : | parrello | 1.1 | } |
2455 : | |||
2456 : | parrello | 1.6 | =head3 GetEntity |
2457 : | |||
2458 : | parrello | 1.18 | C<< my $entityObject = $erdb->GetEntity($entityType, $ID); >> |
2459 : | parrello | 1.6 | |
2460 : | Return an object describing the entity instance with a specified ID. | ||
2461 : | |||
2462 : | =over 4 | ||
2463 : | |||
2464 : | =item entityType | ||
2465 : | |||
2466 : | Entity type name. | ||
2467 : | |||
2468 : | =item ID | ||
2469 : | |||
2470 : | ID of the desired entity. | ||
2471 : | |||
2472 : | =item RETURN | ||
2473 : | |||
2474 : | Returns a B<DBObject> representing the desired entity instance, or an undefined value if no | ||
2475 : | instance is found with the specified key. | ||
2476 : | |||
2477 : | =back | ||
2478 : | |||
2479 : | =cut | ||
2480 : | |||
2481 : | sub GetEntity { | ||
2482 : | parrello | 1.10 | # Get the parameters. |
2483 : | my ($self, $entityType, $ID) = @_; | ||
2484 : | # Create a query. | ||
2485 : | parrello | 1.45 | my $query = $self->Get([$entityType], "$entityType(id) = ?", [$ID]); |
2486 : | parrello | 1.10 | # Get the first (and only) object. |
2487 : | my $retVal = $query->Fetch(); | ||
2488 : | # Return the result. | ||
2489 : | return $retVal; | ||
2490 : | parrello | 1.6 | } |
2491 : | |||
2492 : | parrello | 1.69 | =head3 GetChoices |
2493 : | |||
2494 : | C<< my @values = $erdb->GetChoices($entityName, $fieldName); >> | ||
2495 : | |||
2496 : | Return a list of all the values for the specified field that are represented in the | ||
2497 : | specified entity. | ||
2498 : | |||
2499 : | Note that if the field is not indexed, then this will be a very slow operation. | ||
2500 : | |||
2501 : | =over 4 | ||
2502 : | |||
2503 : | =item entityName | ||
2504 : | |||
2505 : | Name of an entity in the database. | ||
2506 : | |||
2507 : | =item fieldName | ||
2508 : | |||
2509 : | Name of a field belonging to the entity. This is a raw field name without | ||
2510 : | the standard parenthesized notation used in most calls. | ||
2511 : | |||
2512 : | =item RETURN | ||
2513 : | |||
2514 : | Returns a list of the distinct values for the specified field in the database. | ||
2515 : | |||
2516 : | =back | ||
2517 : | |||
2518 : | =cut | ||
2519 : | |||
2520 : | sub GetChoices { | ||
2521 : | # Get the parameters. | ||
2522 : | my ($self, $entityName, $fieldName) = @_; | ||
2523 : | # Declare the return variable. | ||
2524 : | my @retVal; | ||
2525 : | # Get the entity data structure. | ||
2526 : | my $entityData = $self->_GetStructure($entityName); | ||
2527 : | # Get the field. | ||
2528 : | my $fieldHash = $entityData->{Fields}; | ||
2529 : | if (! exists $fieldHash->{$fieldName}) { | ||
2530 : | Confess("$fieldName not found in $entityName."); | ||
2531 : | } else { | ||
2532 : | # Get the name of the relation containing the field. | ||
2533 : | my $relation = $fieldHash->{$fieldName}->{relation}; | ||
2534 : | # Fix up the field name. | ||
2535 : | my $realName = _FixName($fieldName); | ||
2536 : | # Get the database handle. | ||
2537 : | my $dbh = $self->{_dbh}; | ||
2538 : | # Query the database. | ||
2539 : | my $results = $dbh->SQL("SELECT DISTINCT $realName FROM $relation"); | ||
2540 : | # Clean the results. They are stored as a list of lists, and we just want the one list. | ||
2541 : | @retVal = sort map { $_->[0] } @{$results}; | ||
2542 : | } | ||
2543 : | # Return the result. | ||
2544 : | return @retVal; | ||
2545 : | } | ||
2546 : | |||
2547 : | parrello | 1.6 | =head3 GetEntityValues |
2548 : | |||
2549 : | parrello | 1.18 | C<< my @values = $erdb->GetEntityValues($entityType, $ID, \@fields); >> |
2550 : | parrello | 1.6 | |
2551 : | parrello | 1.68 | Return a list of values from a specified entity instance. If the entity instance |
2552 : | does not exist, an empty list is returned. | ||
2553 : | parrello | 1.6 | |
2554 : | =over 4 | ||
2555 : | |||
2556 : | =item entityType | ||
2557 : | |||
2558 : | Entity type name. | ||
2559 : | |||
2560 : | =item ID | ||
2561 : | |||
2562 : | ID of the desired entity. | ||
2563 : | |||
2564 : | =item fields | ||
2565 : | |||
2566 : | List of field names, each of the form I<objectName>C<(>I<fieldName>C<)>. | ||
2567 : | |||
2568 : | =item RETURN | ||
2569 : | |||
2570 : | Returns a flattened list of the values of the specified fields for the specified entity. | ||
2571 : | |||
2572 : | =back | ||
2573 : | |||
2574 : | =cut | ||
2575 : | |||
2576 : | sub GetEntityValues { | ||
2577 : | parrello | 1.10 | # Get the parameters. |
2578 : | my ($self, $entityType, $ID, $fields) = @_; | ||
2579 : | # Get the specified entity. | ||
2580 : | my $entity = $self->GetEntity($entityType, $ID); | ||
2581 : | # Declare the return list. | ||
2582 : | my @retVal = (); | ||
2583 : | # If we found the entity, push the values into the return list. | ||
2584 : | if ($entity) { | ||
2585 : | push @retVal, $entity->Values($fields); | ||
2586 : | } | ||
2587 : | # Return the result. | ||
2588 : | return @retVal; | ||
2589 : | parrello | 1.6 | } |
2590 : | parrello | 1.1 | |
2591 : | parrello | 1.7 | =head3 GetAll |
2592 : | |||
2593 : | parrello | 1.18 | C<< my @list = $erdb->GetAll(\@objectNames, $filterClause, \@parameters, \@fields, $count); >> |
2594 : | parrello | 1.7 | |
2595 : | Return a list of values taken from the objects returned by a query. The first three | ||
2596 : | parameters correspond to the parameters of the L</Get> method. The final parameter is | ||
2597 : | a list of the fields desired from each record found by the query. The field name | ||
2598 : | syntax is the standard syntax used for fields in the B<ERDB> system-- | ||
2599 : | B<I<objectName>(I<fieldName>)>-- where I<objectName> is the name of the relevant entity | ||
2600 : | or relationship and I<fieldName> is the name of the field. | ||
2601 : | |||
2602 : | The list returned will be a list of lists. Each element of the list will contain | ||
2603 : | the values returned for the fields specified in the fourth parameter. If one of the | ||
2604 : | fields specified returns multiple values, they are flattened in with the rest. For | ||
2605 : | example, the following call will return a list of the features in a particular | ||
2606 : | spreadsheet cell, and each feature will be represented by a list containing the | ||
2607 : | feature ID followed by all of its aliases. | ||
2608 : | |||
2609 : | parrello | 1.18 | C<< $query = $erdb->Get(['ContainsFeature', 'Feature'], "ContainsFeature(from-link) = ?", [$ssCellID], ['Feature(id)', 'Feature(alias)']); >> |
2610 : | parrello | 1.7 | |
2611 : | =over 4 | ||
2612 : | |||
2613 : | =item objectNames | ||
2614 : | |||
2615 : | List containing the names of the entity and relationship objects to be retrieved. | ||
2616 : | |||
2617 : | =item filterClause | ||
2618 : | |||
2619 : | WHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause can | ||
2620 : | be parameterized with parameter markers (C<?>). Each field used must be specified in the standard form | ||
2621 : | B<I<objectName>(I<fieldName>)>. Any parameters specified in the filter clause should be added to the | ||
2622 : | parameter list as additional parameters. The fields in a filter clause can come from primary | ||
2623 : | entity relations, relationship relations, or secondary entity relations; however, all of the | ||
2624 : | entities and relationships involved must be included in the list of object names. | ||
2625 : | |||
2626 : | =item parameterList | ||
2627 : | |||
2628 : | List of the parameters to be substituted in for the parameters marks in the filter clause. | ||
2629 : | |||
2630 : | =item fields | ||
2631 : | |||
2632 : | List of the fields to be returned in each element of the list returned. | ||
2633 : | |||
2634 : | =item count | ||
2635 : | |||
2636 : | Maximum number of records to return. If omitted or 0, all available records will be returned. | ||
2637 : | |||
2638 : | =item RETURN | ||
2639 : | |||
2640 : | Returns a list of list references. Each element of the return list contains the values for the | ||
2641 : | fields specified in the B<fields> parameter. | ||
2642 : | |||
2643 : | =back | ||
2644 : | |||
2645 : | =cut | ||
2646 : | #: Return Type @@; | ||
2647 : | sub GetAll { | ||
2648 : | parrello | 1.10 | # Get the parameters. |
2649 : | my ($self, $objectNames, $filterClause, $parameterList, $fields, $count) = @_; | ||
2650 : | # Translate the parameters from a list reference to a list. If the parameter | ||
2651 : | # list is a scalar we convert it into a singleton list. | ||
2652 : | my @parmList = (); | ||
2653 : | if (ref $parameterList eq "ARRAY") { | ||
2654 : | parrello | 1.45 | Trace("GetAll parm list is an array.") if T(4); |
2655 : | parrello | 1.10 | @parmList = @{$parameterList}; |
2656 : | } else { | ||
2657 : | parrello | 1.45 | Trace("GetAll parm list is a scalar: $parameterList.") if T(4); |
2658 : | parrello | 1.10 | push @parmList, $parameterList; |
2659 : | } | ||
2660 : | parrello | 1.30 | # Insure the counter has a value. |
2661 : | if (!defined $count) { | ||
2662 : | $count = 0; | ||
2663 : | } | ||
2664 : | # Add the row limit to the filter clause. | ||
2665 : | if ($count > 0) { | ||
2666 : | $filterClause .= " LIMIT $count"; | ||
2667 : | } | ||
2668 : | parrello | 1.10 | # Create the query. |
2669 : | parrello | 1.45 | my $query = $self->Get($objectNames, $filterClause, \@parmList); |
2670 : | parrello | 1.10 | # Set up a counter of the number of records read. |
2671 : | my $fetched = 0; | ||
2672 : | # Loop through the records returned, extracting the fields. Note that if the | ||
2673 : | # counter is non-zero, we stop when the number of records read hits the count. | ||
2674 : | my @retVal = (); | ||
2675 : | while (($count == 0 || $fetched < $count) && (my $row = $query->Fetch())) { | ||
2676 : | my @rowData = $row->Values($fields); | ||
2677 : | push @retVal, \@rowData; | ||
2678 : | $fetched++; | ||
2679 : | } | ||
2680 : | parrello | 1.54 | Trace("$fetched rows returned in GetAll.") if T(SQL => 4); |
2681 : | parrello | 1.10 | # Return the resulting list. |
2682 : | return @retVal; | ||
2683 : | parrello | 1.7 | } |
2684 : | |||
2685 : | parrello | 1.55 | =head3 Exists |
2686 : | |||
2687 : | C<< my $found = $sprout->Exists($entityName, $entityID); >> | ||
2688 : | |||
2689 : | Return TRUE if an entity exists, else FALSE. | ||
2690 : | |||
2691 : | =over 4 | ||
2692 : | |||
2693 : | =item entityName | ||
2694 : | |||
2695 : | Name of the entity type (e.g. C<Feature>) relevant to the existence check. | ||
2696 : | |||
2697 : | =item entityID | ||
2698 : | |||
2699 : | ID of the entity instance whose existence is to be checked. | ||
2700 : | |||
2701 : | =item RETURN | ||
2702 : | |||
2703 : | Returns TRUE if the entity instance exists, else FALSE. | ||
2704 : | |||
2705 : | =back | ||
2706 : | |||
2707 : | =cut | ||
2708 : | #: Return Type $; | ||
2709 : | sub Exists { | ||
2710 : | # Get the parameters. | ||
2711 : | my ($self, $entityName, $entityID) = @_; | ||
2712 : | # Check for the entity instance. | ||
2713 : | Trace("Checking existence of $entityName with ID=$entityID.") if T(4); | ||
2714 : | my $testInstance = $self->GetEntity($entityName, $entityID); | ||
2715 : | # Return an existence indicator. | ||
2716 : | my $retVal = ($testInstance ? 1 : 0); | ||
2717 : | return $retVal; | ||
2718 : | } | ||
2719 : | |||
2720 : | parrello | 1.18 | =head3 EstimateRowSize |
2721 : | |||
2722 : | C<< my $rowSize = $erdb->EstimateRowSize($relName); >> | ||
2723 : | |||
2724 : | Estimate the row size of the specified relation. The estimated row size is computed by adding | ||
2725 : | up the average length for each data type. | ||
2726 : | |||
2727 : | =over 4 | ||
2728 : | |||
2729 : | =item relName | ||
2730 : | |||
2731 : | Name of the relation whose estimated row size is desired. | ||
2732 : | |||
2733 : | =item RETURN | ||
2734 : | |||
2735 : | Returns an estimate of the row size for the specified relation. | ||
2736 : | |||
2737 : | =back | ||
2738 : | |||
2739 : | =cut | ||
2740 : | #: Return Type $; | ||
2741 : | sub EstimateRowSize { | ||
2742 : | # Get the parameters. | ||
2743 : | my ($self, $relName) = @_; | ||
2744 : | # Declare the return variable. | ||
2745 : | my $retVal = 0; | ||
2746 : | # Find the relation descriptor. | ||
2747 : | my $relation = $self->_FindRelation($relName); | ||
2748 : | # Get the list of fields. | ||
2749 : | for my $fieldData (@{$relation->{Fields}}) { | ||
2750 : | # Get the field type and add its length. | ||
2751 : | my $fieldLen = $TypeTable{$fieldData->{type}}->{avgLen}; | ||
2752 : | $retVal += $fieldLen; | ||
2753 : | } | ||
2754 : | # Return the result. | ||
2755 : | return $retVal; | ||
2756 : | } | ||
2757 : | |||
2758 : | parrello | 1.38 | =head3 GetFieldTable |
2759 : | |||
2760 : | C<< my $fieldHash = $self->GetFieldTable($objectnName); >> | ||
2761 : | |||
2762 : | Get the field structure for a specified entity or relationship. | ||
2763 : | |||
2764 : | =over 4 | ||
2765 : | |||
2766 : | =item objectName | ||
2767 : | |||
2768 : | Name of the desired entity or relationship. | ||
2769 : | |||
2770 : | =item RETURN | ||
2771 : | |||
2772 : | The table containing the field descriptors for the specified object. | ||
2773 : | |||
2774 : | =back | ||
2775 : | |||
2776 : | =cut | ||
2777 : | |||
2778 : | sub GetFieldTable { | ||
2779 : | # Get the parameters. | ||
2780 : | my ($self, $objectName) = @_; | ||
2781 : | # Get the descriptor from the metadata. | ||
2782 : | my $objectData = $self->_GetStructure($objectName); | ||
2783 : | # Return the object's field table. | ||
2784 : | return $objectData->{Fields}; | ||
2785 : | } | ||
2786 : | |||
2787 : | parrello | 1.73 | =head3 SplitKeywords |
2788 : | |||
2789 : | C<< my @keywords = ERDB::SplitKeywords($keywordString); >> | ||
2790 : | |||
2791 : | This method returns a list of the positive keywords in the specified | ||
2792 : | keyword string. All of the operators will have been stripped off, | ||
2793 : | and if the keyword is preceded by a minus operator (C<->), it will | ||
2794 : | not be in the list returned. The idea here is to get a list of the | ||
2795 : | keywords the user wants to see. The list will be processed to remove | ||
2796 : | duplicates. | ||
2797 : | |||
2798 : | It is possible to create a string that confuses this method. For example | ||
2799 : | |||
2800 : | frog toad -frog | ||
2801 : | |||
2802 : | would return both C<frog> and C<toad>. If this is a problem we can deal | ||
2803 : | with it later. | ||
2804 : | |||
2805 : | =over 4 | ||
2806 : | |||
2807 : | =item keywordString | ||
2808 : | |||
2809 : | The keyword string to be parsed. | ||
2810 : | |||
2811 : | =item RETURN | ||
2812 : | |||
2813 : | Returns a list of the words in the keyword string the user wants to | ||
2814 : | see. | ||
2815 : | |||
2816 : | =back | ||
2817 : | |||
2818 : | =cut | ||
2819 : | |||
2820 : | sub SplitKeywords { | ||
2821 : | # Get the parameters. | ||
2822 : | my ($keywordString) = @_; | ||
2823 : | # Make a safety copy of the string. (This helps during debugging.) | ||
2824 : | my $workString = $keywordString; | ||
2825 : | # Convert operators we don't care about to spaces. | ||
2826 : | $workString =~ tr/+"()<>/ /; | ||
2827 : | # Split the rest of the string along space boundaries. Note that we | ||
2828 : | # eliminate any words that are zero length or begin with a minus sign. | ||
2829 : | my @wordList = grep { $_ && substr($_, 0, 1) ne "-" } split /\s+/, $workString; | ||
2830 : | # Use a hash to remove duplicates. | ||
2831 : | my %words = map { $_ => 1 } @wordList; | ||
2832 : | # Return the result. | ||
2833 : | return sort keys %words; | ||
2834 : | } | ||
2835 : | |||
2836 : | =head3 ValidateFieldName | ||
2837 : | |||
2838 : | C<< my $okFlag = ERDB::ValidateFieldName($fieldName); >> | ||
2839 : | |||
2840 : | Return TRUE if the specified field name is valid, else FALSE. Valid field names must | ||
2841 : | be hyphenated words subject to certain restrictions. | ||
2842 : | |||
2843 : | =over 4 | ||
2844 : | |||
2845 : | =item fieldName | ||
2846 : | |||
2847 : | Field name to be validated. | ||
2848 : | |||
2849 : | =item RETURN | ||
2850 : | |||
2851 : | Returns TRUE if the field name is valid, else FALSE. | ||
2852 : | |||
2853 : | =back | ||
2854 : | |||
2855 : | =cut | ||
2856 : | |||
2857 : | sub ValidateFieldName { | ||
2858 : | # Get the parameters. | ||
2859 : | my ($fieldName) = @_; | ||
2860 : | # Declare the return variable. The field name is valid until we hear | ||
2861 : | # differently. | ||
2862 : | my $retVal = 1; | ||
2863 : | # Look for bad stuff in the name. | ||
2864 : | if ($fieldName =~ /--/) { | ||
2865 : | # Here we have a doubled minus sign. | ||
2866 : | Trace("Field name $fieldName has a doubled hyphen.") if T(1); | ||
2867 : | $retVal = 0; | ||
2868 : | } elsif ($fieldName !~ /^[A-Za-z]/) { | ||
2869 : | # Here the field name is missing the initial letter. | ||
2870 : | Trace("Field name $fieldName does not begin with a letter.") if T(1); | ||
2871 : | $retVal = 0; | ||
2872 : | } else { | ||
2873 : | parrello | 1.75 | # Strip out the minus signs. Everything remaining must be a letter, |
2874 : | # underscore, or digit. | ||
2875 : | parrello | 1.73 | my $strippedName = $fieldName; |
2876 : | $strippedName =~ s/-//g; | ||
2877 : | parrello | 1.75 | if ($strippedName !~ /^(\w|\d)+$/) { |
2878 : | parrello | 1.73 | Trace("Field name $fieldName contains illegal characters.") if T(1); |
2879 : | $retVal = 0; | ||
2880 : | } | ||
2881 : | } | ||
2882 : | # Return the result. | ||
2883 : | return $retVal; | ||
2884 : | } | ||
2885 : | |||
2886 : | =head3 ReadMetaXML | ||
2887 : | |||
2888 : | C<< my $rawMetaData = ERDB::ReadDBD($fileName); >> | ||
2889 : | |||
2890 : | This method reads a raw database definition XML file and returns it. | ||
2891 : | Normally, the metadata used by the ERDB system has been processed and | ||
2892 : | modified to make it easier to load and retrieve the data; however, | ||
2893 : | this method can be used to get the data in its raw form. | ||
2894 : | |||
2895 : | =over 4 | ||
2896 : | |||
2897 : | =item fileName | ||
2898 : | |||
2899 : | Name of the XML file to read. | ||
2900 : | |||
2901 : | =item RETURN | ||
2902 : | |||
2903 : | Returns a hash reference containing the raw XML data from the specified file. | ||
2904 : | |||
2905 : | =back | ||
2906 : | |||
2907 : | =cut | ||
2908 : | |||
2909 : | sub ReadMetaXML { | ||
2910 : | # Get the parameters. | ||
2911 : | my ($fileName) = @_; | ||
2912 : | # Read the XML. | ||
2913 : | my $retVal = XML::Simple::XMLin($fileName, %XmlOptions, %XmlInOpts); | ||
2914 : | Trace("XML metadata loaded from file $fileName.") if T(1); | ||
2915 : | # Return the result. | ||
2916 : | return $retVal; | ||
2917 : | } | ||
2918 : | |||
2919 : | =head3 GetEntityFieldHash | ||
2920 : | |||
2921 : | C<< my $fieldHashRef = ERDB::GetEntityFieldHash($structure, $entityName); >> | ||
2922 : | |||
2923 : | Get the field hash of the named entity in the specified raw XML structure. | ||
2924 : | The field hash may not exist, in which case we need to create it. | ||
2925 : | |||
2926 : | =over 4 | ||
2927 : | |||
2928 : | =item structure | ||
2929 : | |||
2930 : | Raw XML structure defininng the database. This is not the run-time XML used by | ||
2931 : | an ERDB object, since that has all sorts of optimizations built-in. | ||
2932 : | |||
2933 : | =item entityName | ||
2934 : | |||
2935 : | Name of the entity whose field structure is desired. | ||
2936 : | |||
2937 : | =item RETURN | ||
2938 : | |||
2939 : | Returns the field hash used to define the entity's fields. | ||
2940 : | |||
2941 : | =back | ||
2942 : | |||
2943 : | =cut | ||
2944 : | |||
2945 : | sub GetEntityFieldHash { | ||
2946 : | # Get the parameters. | ||
2947 : | my ($structure, $entityName) = @_; | ||
2948 : | # Get the entity structure. | ||
2949 : | my $entityData = $structure->{Entities}->{$entityName}; | ||
2950 : | # Look for a field structure. | ||
2951 : | my $retVal = $entityData->{Fields}; | ||
2952 : | # If it doesn't exist, create it. | ||
2953 : | if (! defined($retVal)) { | ||
2954 : | $entityData->{Fields} = {}; | ||
2955 : | $retVal = $entityData->{Fields}; | ||
2956 : | } | ||
2957 : | # Return the result. | ||
2958 : | return $retVal; | ||
2959 : | } | ||
2960 : | |||
2961 : | =head3 WriteMetaXML | ||
2962 : | |||
2963 : | C<< ERDB::WriteMetaXML($structure, $fileName); >> | ||
2964 : | |||
2965 : | Write the metadata XML to a file. This method is the reverse of L</ReadMetaXML>, and is | ||
2966 : | used to update the database definition. It must be used with care, however, since it | ||
2967 : | will only work on a raw structure, not on the processed structure created by an ERDB | ||
2968 : | constructor. | ||
2969 : | |||
2970 : | =over 4 | ||
2971 : | |||
2972 : | =item structure | ||
2973 : | |||
2974 : | XML structure to be written to the file. | ||
2975 : | |||
2976 : | =item fileName | ||
2977 : | |||
2978 : | Name of the output file to which the updated XML should be stored. | ||
2979 : | |||
2980 : | =back | ||
2981 : | |||
2982 : | =cut | ||
2983 : | |||
2984 : | sub WriteMetaXML { | ||
2985 : | # Get the parameters. | ||
2986 : | my ($structure, $fileName) = @_; | ||
2987 : | # Compute the output. | ||
2988 : | my $fileString = XML::Simple::XMLout($structure, %XmlOptions, %XmlOutOpts); | ||
2989 : | # Write it to the file. | ||
2990 : | my $xmlOut = Open(undef, ">$fileName"); | ||
2991 : | print $xmlOut $fileString; | ||
2992 : | } | ||
2993 : | |||
2994 : | |||
2995 : | =head3 HTMLNote | ||
2996 : | |||
2997 : | Convert a note or comment to HTML by replacing some bulletin-board codes with HTML. The codes | ||
2998 : | supported are C<[b]> for B<bold>, C<[i]> for I<italics>, and C<[p]> for a new paragraph. | ||
2999 : | Except for C<[p]>, all the codes are closed by slash-codes. So, for | ||
3000 : | example, C<[b]Feature[/b]> displays the string C<Feature> in boldface. | ||
3001 : | |||
3002 : | C<< my $realHtml = ERDB::HTMLNote($dataString); >> | ||
3003 : | |||
3004 : | =over 4 | ||
3005 : | |||
3006 : | =item dataString | ||
3007 : | |||
3008 : | String to convert to HTML. | ||
3009 : | |||
3010 : | =item RETURN | ||
3011 : | |||
3012 : | An HTML string derived from the input string. | ||
3013 : | |||
3014 : | =back | ||
3015 : | |||
3016 : | =cut | ||
3017 : | |||
3018 : | sub HTMLNote { | ||
3019 : | # Get the parameter. | ||
3020 : | my ($dataString) = @_; | ||
3021 : | # HTML-escape the text. | ||
3022 : | my $retVal = CGI::escapeHTML($dataString); | ||
3023 : | # Substitute the bulletin board codes. | ||
3024 : | $retVal =~ s!\[(/?[bi])\]!<$1>!g; | ||
3025 : | $retVal =~ s!\[p\]!</p><p>!g; | ||
3026 : | # Return the result. | ||
3027 : | return $retVal; | ||
3028 : | } | ||
3029 : | |||
3030 : | |||
3031 : | parrello | 1.48 | =head2 Data Mining Methods |
3032 : | |||
3033 : | parrello | 1.38 | =head3 GetUsefulCrossValues |
3034 : | |||
3035 : | C<< my @attrNames = $sprout->GetUsefulCrossValues($sourceEntity, $relationship); >> | ||
3036 : | |||
3037 : | Return a list of the useful attributes that would be returned by a B<Cross> call | ||
3038 : | from an entity of the source entity type through the specified relationship. This | ||
3039 : | means it will return the fields of the target entity type and the intersection data | ||
3040 : | fields in the relationship. Only primary table fields are returned. In other words, | ||
3041 : | the field names returned will be for fields where there is always one and only one | ||
3042 : | value. | ||
3043 : | |||
3044 : | =over 4 | ||
3045 : | |||
3046 : | =item sourceEntity | ||
3047 : | |||
3048 : | Name of the entity from which the relationship crossing will start. | ||
3049 : | |||
3050 : | =item relationship | ||
3051 : | |||
3052 : | Name of the relationship being crossed. | ||
3053 : | |||
3054 : | =item RETURN | ||
3055 : | |||
3056 : | Returns a list of field names in Sprout field format (I<objectName>C<(>I<fieldName>C<)>. | ||
3057 : | |||
3058 : | =back | ||
3059 : | |||
3060 : | =cut | ||
3061 : | #: Return Type @; | ||
3062 : | sub GetUsefulCrossValues { | ||
3063 : | # Get the parameters. | ||
3064 : | my ($self, $sourceEntity, $relationship) = @_; | ||
3065 : | # Declare the return variable. | ||
3066 : | my @retVal = (); | ||
3067 : | # Determine the target entity for the relationship. This is whichever entity is not | ||
3068 : | # the source entity. So, if the source entity is the FROM, we'll get the name of | ||
3069 : | # the TO, and vice versa. | ||
3070 : | my $relStructure = $self->_GetStructure($relationship); | ||
3071 : | my $targetEntityType = ($relStructure->{from} eq $sourceEntity ? "to" : "from"); | ||
3072 : | my $targetEntity = $relStructure->{$targetEntityType}; | ||
3073 : | # Get the field table for the entity. | ||
3074 : | my $entityFields = $self->GetFieldTable($targetEntity); | ||
3075 : | # The field table is a hash. The hash key is the field name. The hash value is a structure. | ||
3076 : | # For the entity fields, the key aspect of the target structure is that the {relation} value | ||
3077 : | # must match the entity name. | ||
3078 : | my @fieldList = map { "$targetEntity($_)" } grep { $entityFields->{$_}->{relation} eq $targetEntity } | ||
3079 : | keys %{$entityFields}; | ||
3080 : | # Push the fields found onto the return variable. | ||
3081 : | push @retVal, sort @fieldList; | ||
3082 : | # Get the field table for the relationship. | ||
3083 : | my $relationshipFields = $self->GetFieldTable($relationship); | ||
3084 : | # Here we have a different rule. We want all the fields other than "from-link" and "to-link". | ||
3085 : | # This may end up being an empty set. | ||
3086 : | my @fieldList2 = map { "$relationship($_)" } grep { $_ ne "from-link" && $_ ne "to-link" } | ||
3087 : | keys %{$relationshipFields}; | ||
3088 : | # Push these onto the return list. | ||
3089 : | push @retVal, sort @fieldList2; | ||
3090 : | # Return the result. | ||
3091 : | return @retVal; | ||
3092 : | } | ||
3093 : | |||
3094 : | parrello | 1.48 | =head3 FindColumn |
3095 : | |||
3096 : | C<< my $colIndex = ERDB::FindColumn($headerLine, $columnIdentifier); >> | ||
3097 : | |||
3098 : | Return the location a desired column in a data mining header line. The data | ||
3099 : | mining header line is a tab-separated list of column names. The column | ||
3100 : | identifier is either the numerical index of a column or the actual column | ||
3101 : | name. | ||
3102 : | |||
3103 : | =over 4 | ||
3104 : | |||
3105 : | =item headerLine | ||
3106 : | |||
3107 : | The header line from a data mining command, which consists of a tab-separated | ||
3108 : | list of column names. | ||
3109 : | |||
3110 : | =item columnIdentifier | ||
3111 : | |||
3112 : | Either the ordinal number of the desired column (1-based), or the name of the | ||
3113 : | desired column. | ||
3114 : | |||
3115 : | =item RETURN | ||
3116 : | |||
3117 : | Returns the array index (0-based) of the desired column. | ||
3118 : | |||
3119 : | =back | ||
3120 : | |||
3121 : | =cut | ||
3122 : | |||
3123 : | sub FindColumn { | ||
3124 : | # Get the parameters. | ||
3125 : | my ($headerLine, $columnIdentifier) = @_; | ||
3126 : | # Declare the return variable. | ||
3127 : | my $retVal; | ||
3128 : | # Split the header line into column names. | ||
3129 : | my @headers = ParseColumns($headerLine); | ||
3130 : | # Determine whether we have a number or a name. | ||
3131 : | if ($columnIdentifier =~ /^\d+$/) { | ||
3132 : | # Here we have a number. Subtract 1 and validate the result. | ||
3133 : | $retVal = $columnIdentifier - 1; | ||
3134 : | if ($retVal < 0 || $retVal > $#headers) { | ||
3135 : | Confess("Invalid column identifer \"$columnIdentifier\": value out of range."); | ||
3136 : | } | ||
3137 : | } else { | ||
3138 : | # Here we have a name. We need to find it in the list. | ||
3139 : | for (my $i = 0; $i <= $#headers && ! defined($retVal); $i++) { | ||
3140 : | if ($headers[$i] eq $columnIdentifier) { | ||
3141 : | $retVal = $i; | ||
3142 : | } | ||
3143 : | } | ||
3144 : | if (! defined($retVal)) { | ||
3145 : | Confess("Invalid column identifier \"$columnIdentifier\": value not found."); | ||
3146 : | } | ||
3147 : | } | ||
3148 : | # Return the result. | ||
3149 : | return $retVal; | ||
3150 : | } | ||
3151 : | |||
3152 : | =head3 ParseColumns | ||
3153 : | |||
3154 : | parrello | 1.50 | C<< my @columns = ERDB::ParseColumns($line); >> |
3155 : | parrello | 1.48 | |
3156 : | Convert the specified data line to a list of columns. | ||
3157 : | |||
3158 : | =over 4 | ||
3159 : | |||
3160 : | =item line | ||
3161 : | |||
3162 : | A data mining input, consisting of a tab-separated list of columns terminated by a | ||
3163 : | new-line. | ||
3164 : | |||
3165 : | =item RETURN | ||
3166 : | |||
3167 : | Returns a list consisting of the column values. | ||
3168 : | |||
3169 : | =back | ||
3170 : | |||
3171 : | =cut | ||
3172 : | |||
3173 : | sub ParseColumns { | ||
3174 : | # Get the parameters. | ||
3175 : | parrello | 1.50 | my ($line) = @_; |
3176 : | parrello | 1.48 | # Chop off the line-end. |
3177 : | chomp $line; | ||
3178 : | # Split it into a list. | ||
3179 : | my @retVal = split(/\t/, $line); | ||
3180 : | # Return the result. | ||
3181 : | return @retVal; | ||
3182 : | } | ||
3183 : | |||
3184 : | parrello | 1.71 | =head2 Virtual Methods |
3185 : | |||
3186 : | =head3 CleanKeywords | ||
3187 : | |||
3188 : | C<< my $cleanedString = $erdb->CleanKeywords($searchExpression); >> | ||
3189 : | |||
3190 : | Clean up a search expression or keyword list. This is a virtual method that may | ||
3191 : | be overridden by the subclass. The base-class method removes extra spaces | ||
3192 : | and converts everything to lower case. | ||
3193 : | |||
3194 : | =over 4 | ||
3195 : | |||
3196 : | =item searchExpression | ||
3197 : | |||
3198 : | Search expression or keyword list to clean. Note that a search expression may | ||
3199 : | contain boolean operators which need to be preserved. This includes leading | ||
3200 : | minus signs. | ||
3201 : | |||
3202 : | =item RETURN | ||
3203 : | |||
3204 : | Cleaned expression or keyword list. | ||
3205 : | |||
3206 : | =back | ||
3207 : | |||
3208 : | =cut | ||
3209 : | |||
3210 : | sub CleanKeywords { | ||
3211 : | # Get the parameters. | ||
3212 : | my ($self, $searchExpression) = @_; | ||
3213 : | # Lower-case the expression and copy it into the return variable. Note that we insure we | ||
3214 : | # don't accidentally end up with an undefined value. | ||
3215 : | my $retVal = lc($searchExpression || ""); | ||
3216 : | # Remove extra spaces. | ||
3217 : | $retVal =~ s/\s+/ /g; | ||
3218 : | $retVal =~ s/(^\s+)|(\s+$)//g; | ||
3219 : | # Return the result. | ||
3220 : | return $retVal; | ||
3221 : | } | ||
3222 : | |||
3223 : | parrello | 1.73 | =head3 GetSourceObject |
3224 : | |||
3225 : | C<< my $source = $erdb->GetSourceObject($entityName); >> | ||
3226 : | |||
3227 : | Return the object to be used in loading special attributes of the specified entity. The | ||
3228 : | algorithm for loading special attributes is stored in the C<DataGen> elements of the | ||
3229 : | XML | ||
3230 : | |||
3231 : | parrello | 1.1 | =head2 Internal Utility Methods |
3232 : | |||
3233 : | parrello | 1.70 | =head3 _RelationMap |
3234 : | |||
3235 : | C<< my @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef); >> | ||
3236 : | |||
3237 : | Create the relation map for an SQL query. The relation map is used by B<DBObject> | ||
3238 : | to determine how to interpret the results of the query. | ||
3239 : | |||
3240 : | =over 4 | ||
3241 : | |||
3242 : | =item mappedNameHashRef | ||
3243 : | |||
3244 : | Reference to a hash that maps modified object names to real object names. | ||
3245 : | |||
3246 : | =item mappedNameListRef | ||
3247 : | |||
3248 : | Reference to a list of modified object names in the order they appear in the | ||
3249 : | SELECT list. | ||
3250 : | |||
3251 : | =item RETURN | ||
3252 : | |||
3253 : | Returns a list of 2-tuples. Each tuple consists of an object name as used in the | ||
3254 : | query followed by the actual name of that object. This enables the B<DBObject> to | ||
3255 : | determine the order of the tables in the query and which object name belongs to each | ||
3256 : | mapped object name. Most of the time these two values are the same; however, if a | ||
3257 : | relation occurs twice in the query, the relation name in the field list and WHERE | ||
3258 : | clause will use a mapped name (generally the actual relation name with a numeric | ||
3259 : | suffix) that does not match the actual relation name. | ||
3260 : | |||
3261 : | =back | ||
3262 : | |||
3263 : | =cut | ||
3264 : | |||
3265 : | sub _RelationMap { | ||
3266 : | # Get the parameters. | ||
3267 : | my ($mappedNameHashRef, $mappedNameListRef) = @_; | ||
3268 : | # Declare the return variable. | ||
3269 : | my @retVal = (); | ||
3270 : | # Build the map. | ||
3271 : | for my $mappedName (@{$mappedNameListRef}) { | ||
3272 : | push @retVal, [$mappedName, $mappedNameHashRef->{$mappedName}]; | ||
3273 : | } | ||
3274 : | # Return it. | ||
3275 : | return @retVal; | ||
3276 : | } | ||
3277 : | |||
3278 : | |||
3279 : | =head3 _SetupSQL | ||
3280 : | parrello | 1.45 | |
3281 : | Process a list of object names and a filter clause so that they can be used to | ||
3282 : | build an SQL statement. This method takes in a reference to a list of object names | ||
3283 : | and a filter clause. It will return a corrected filter clause, a list of mapped | ||
3284 : | names and the mapped name hash. | ||
3285 : | |||
3286 : | This is an instance method. | ||
3287 : | |||
3288 : | =over 4 | ||
3289 : | |||
3290 : | =item objectNames | ||
3291 : | |||
3292 : | Reference to a list of the object names to be included in the query. | ||
3293 : | |||
3294 : | =item filterClause | ||
3295 : | |||
3296 : | A string containing the WHERE clause for the query (without the C<WHERE>) and also | ||
3297 : | optionally the C<ORDER BY> and C<LIMIT> clauses. | ||
3298 : | |||
3299 : | parrello | 1.70 | =item matchClause |
3300 : | |||
3301 : | An optional full-text search clause. If specified, it will be inserted at the | ||
3302 : | front of the WHERE clause. It should already be SQL-formatted; that is, the | ||
3303 : | field names should be in the form I<table>C<.>I<fieldName>. | ||
3304 : | |||
3305 : | parrello | 1.45 | =item RETURN |
3306 : | |||
3307 : | Returns a three-element list. The first element is the SQL statement suffix, beginning | ||
3308 : | with the FROM clause. The second element is a reference to a list of the names to be | ||
3309 : | used in retrieving the fields. The third element is a hash mapping the names to the | ||
3310 : | objects they represent. | ||
3311 : | |||
3312 : | =back | ||
3313 : | |||
3314 : | =cut | ||
3315 : | |||
3316 : | sub _SetupSQL { | ||
3317 : | parrello | 1.70 | my ($self, $objectNames, $filterClause, $matchClause) = @_; |
3318 : | parrello | 1.45 | # Adjust the list of object names to account for multiple occurrences of the |
3319 : | # same object. We start with a hash table keyed on object name that will | ||
3320 : | # return the object suffix. The first time an object is encountered it will | ||
3321 : | # not be found in the hash. The next time the hash will map the object name | ||
3322 : | # to 2, then 3, and so forth. | ||
3323 : | my %objectHash = (); | ||
3324 : | # This list will contain the object names as they are to appear in the | ||
3325 : | # FROM list. | ||
3326 : | my @fromList = (); | ||
3327 : | # This list contains the suffixed object name for each object. It is exactly | ||
3328 : | # parallel to the list in the $objectNames parameter. | ||
3329 : | my @mappedNameList = (); | ||
3330 : | # Finally, this hash translates from a mapped name to its original object name. | ||
3331 : | my %mappedNameHash = (); | ||
3332 : | # Now we create the lists. Note that for every single name we push something into | ||
3333 : | # @fromList and @mappedNameList. This insures that those two arrays are exactly | ||
3334 : | # parallel to $objectNames. | ||
3335 : | for my $objectName (@{$objectNames}) { | ||
3336 : | # Get the next suffix for this object. | ||
3337 : | my $suffix = $objectHash{$objectName}; | ||
3338 : | if (! $suffix) { | ||
3339 : | # Here we are seeing the object for the first time. The object name | ||
3340 : | # is used as is. | ||
3341 : | push @mappedNameList, $objectName; | ||
3342 : | push @fromList, $objectName; | ||
3343 : | $mappedNameHash{$objectName} = $objectName; | ||
3344 : | # Denote the next suffix will be 2. | ||
3345 : | $objectHash{$objectName} = 2; | ||
3346 : | } else { | ||
3347 : | # Here we've seen the object before. We construct a new name using | ||
3348 : | # the suffix from the hash and update the hash. | ||
3349 : | my $mappedName = "$objectName$suffix"; | ||
3350 : | $objectHash{$objectName} = $suffix + 1; | ||
3351 : | # The FROM list has the object name followed by the mapped name. This | ||
3352 : | # tells SQL it's still the same table, but we're using a different name | ||
3353 : | # for it to avoid confusion. | ||
3354 : | push @fromList, "$objectName $mappedName"; | ||
3355 : | # The mapped-name list contains the real mapped name. | ||
3356 : | push @mappedNameList, $mappedName; | ||
3357 : | # Finally, enable us to get back from the mapped name to the object name. | ||
3358 : | $mappedNameHash{$mappedName} = $objectName; | ||
3359 : | } | ||
3360 : | } | ||
3361 : | # Begin the SELECT suffix. It starts with | ||
3362 : | # | ||
3363 : | # FROM name1, name2, ... nameN | ||
3364 : | # | ||
3365 : | my $suffix = "FROM " . join(', ', @fromList); | ||
3366 : | parrello | 1.70 | # Now for the WHERE. First, we need a place for the filter string. |
3367 : | my $filterString = ""; | ||
3368 : | # We will also keep a list of conditions to add to the WHERE clause in order to link | ||
3369 : | # entities and relationships as well as primary relations to secondary ones. | ||
3370 : | my @joinWhere = (); | ||
3371 : | parrello | 1.45 | # Check for a filter clause. |
3372 : | if ($filterClause) { | ||
3373 : | # Here we have one, so we convert its field names and add it to the query. First, | ||
3374 : | # We create a copy of the filter string we can work with. | ||
3375 : | parrello | 1.70 | $filterString = $filterClause; |
3376 : | parrello | 1.45 | # Next, we sort the object names by length. This helps protect us from finding |
3377 : | # object names inside other object names when we're doing our search and replace. | ||
3378 : | my @sortedNames = sort { length($b) - length($a) } @mappedNameList; | ||
3379 : | # The final preparatory step is to create a hash table of relation names. The | ||
3380 : | # table begins with the relation names already in the SELECT command. We may | ||
3381 : | # need to add relations later if there is filtering on a field in a secondary | ||
3382 : | # relation. The secondary relations are the ones that contain multiply- | ||
3383 : | # occurring or optional fields. | ||
3384 : | my %fromNames = map { $_ => 1 } @sortedNames; | ||
3385 : | # We are ready to begin. We loop through the object names, replacing each | ||
3386 : | # object name's field references by the corresponding SQL field reference. | ||
3387 : | # Along the way, if we find a secondary relation, we will need to add it | ||
3388 : | # to the FROM clause. | ||
3389 : | for my $mappedName (@sortedNames) { | ||
3390 : | # Get the length of the object name plus 2. This is the value we add to the | ||
3391 : | # size of the field name to determine the size of the field reference as a | ||
3392 : | # whole. | ||
3393 : | my $nameLength = 2 + length $mappedName; | ||
3394 : | # Get the real object name for this mapped name. | ||
3395 : | my $objectName = $mappedNameHash{$mappedName}; | ||
3396 : | Trace("Processing $mappedName for object $objectName.") if T(4); | ||
3397 : | # Get the object's field list. | ||
3398 : | my $fieldList = $self->GetFieldTable($objectName); | ||
3399 : | # Find the field references for this object. | ||
3400 : | while ($filterString =~ m/$mappedName\(([^)]*)\)/g) { | ||
3401 : | # At this point, $1 contains the field name, and the current position | ||
3402 : | # is set immediately after the final parenthesis. We pull out the name of | ||
3403 : | # the field and the position and length of the field reference as a whole. | ||
3404 : | my $fieldName = $1; | ||
3405 : | my $len = $nameLength + length $fieldName; | ||
3406 : | my $pos = pos($filterString) - $len; | ||
3407 : | # Insure the field exists. | ||
3408 : | if (!exists $fieldList->{$fieldName}) { | ||
3409 : | Confess("Field $fieldName not found for object $objectName."); | ||
3410 : | } else { | ||
3411 : | Trace("Processing $fieldName at position $pos.") if T(4); | ||
3412 : | # Get the field's relation. | ||
3413 : | my $relationName = $fieldList->{$fieldName}->{relation}; | ||
3414 : | # Now we have a secondary relation. We need to insure it matches the | ||
3415 : | # mapped name of the primary relation. First we peel off the suffix | ||
3416 : | # from the mapped name. | ||
3417 : | my $mappingSuffix = substr $mappedName, length($objectName); | ||
3418 : | # Put the mapping suffix onto the relation name to get the | ||
3419 : | # mapped relation name. | ||
3420 : | my $mappedRelationName = "$relationName$mappingSuffix"; | ||
3421 : | # Insure the relation is in the FROM clause. | ||
3422 : | if (!exists $fromNames{$mappedRelationName}) { | ||
3423 : | # Add the relation to the FROM clause. | ||
3424 : | if ($mappedRelationName eq $relationName) { | ||
3425 : | # The name is un-mapped, so we add it without | ||
3426 : | # any frills. | ||
3427 : | $suffix .= ", $relationName"; | ||
3428 : | push @joinWhere, "$objectName.id = $relationName.id"; | ||
3429 : | } else { | ||
3430 : | # Here we have a mapping situation. | ||
3431 : | $suffix .= ", $relationName $mappedRelationName"; | ||
3432 : | push @joinWhere, "$mappedRelationName.id = $mappedName.id"; | ||
3433 : | } | ||
3434 : | # Denote we have this relation available for future fields. | ||
3435 : | $fromNames{$mappedRelationName} = 1; | ||
3436 : | } | ||
3437 : | # Form an SQL field reference from the relation name and the field name. | ||
3438 : | my $sqlReference = "$mappedRelationName." . _FixName($fieldName); | ||
3439 : | # Put it into the filter string in place of the old value. | ||
3440 : | substr($filterString, $pos, $len) = $sqlReference; | ||
3441 : | # Reposition the search. | ||
3442 : | pos $filterString = $pos + length $sqlReference; | ||
3443 : | } | ||
3444 : | } | ||
3445 : | } | ||
3446 : | parrello | 1.70 | } |
3447 : | # The next step is to join the objects together. We only need to do this if there | ||
3448 : | # is more than one object in the object list. We start with the first object and | ||
3449 : | # run through the objects after it. Note also that we make a safety copy of the | ||
3450 : | # list before running through it, because we shift off the first object before | ||
3451 : | # processing the rest. | ||
3452 : | my @mappedObjectList = @mappedNameList; | ||
3453 : | my $lastMappedObject = shift @mappedObjectList; | ||
3454 : | # Get the join table. | ||
3455 : | my $joinTable = $self->{_metaData}->{Joins}; | ||
3456 : | # Loop through the object list. | ||
3457 : | for my $thisMappedObject (@mappedObjectList) { | ||
3458 : | # Look for a join using the real object names. | ||
3459 : | my $lastObject = $mappedNameHash{$lastMappedObject}; | ||
3460 : | my $thisObject = $mappedNameHash{$thisMappedObject}; | ||
3461 : | my $joinKey = "$lastObject/$thisObject"; | ||
3462 : | if (!exists $joinTable->{$joinKey}) { | ||
3463 : | # Here there's no join, so we throw an error. | ||
3464 : | Confess("No join exists to connect from $lastMappedObject to $thisMappedObject."); | ||
3465 : | } else { | ||
3466 : | # Get the join clause. | ||
3467 : | my $unMappedJoin = $joinTable->{$joinKey}; | ||
3468 : | # Fix the names. | ||
3469 : | $unMappedJoin =~ s/$lastObject/$lastMappedObject/; | ||
3470 : | $unMappedJoin =~ s/$thisObject/$thisMappedObject/; | ||
3471 : | push @joinWhere, $unMappedJoin; | ||
3472 : | # Save this object as the last object for the next iteration. | ||
3473 : | $lastMappedObject = $thisMappedObject; | ||
3474 : | } | ||
3475 : | } | ||
3476 : | # Now we need to handle the whole ORDER BY / LIMIT thing. The important part | ||
3477 : | # here is we want the filter clause to be empty if there's no WHERE filter. | ||
3478 : | # We'll put the ORDER BY / LIMIT clauses in the following variable. | ||
3479 : | my $orderClause = ""; | ||
3480 : | # This is only necessary if we have a filter string in which the ORDER BY | ||
3481 : | # and LIMIT clauses can live. | ||
3482 : | if ($filterString) { | ||
3483 : | parrello | 1.45 | # Locate the ORDER BY or LIMIT verbs (if any). We use a non-greedy |
3484 : | # operator so that we find the first occurrence of either verb. | ||
3485 : | if ($filterString =~ m/^(.*?)\s*(ORDER BY|LIMIT)/g) { | ||
3486 : | # Here we have an ORDER BY or LIMIT verb. Split it off of the filter string. | ||
3487 : | my $pos = pos $filterString; | ||
3488 : | $orderClause = $2 . substr($filterString, $pos); | ||
3489 : | $filterString = $1; | ||
3490 : | } | ||
3491 : | parrello | 1.70 | } |
3492 : | # All the things that are supposed to be in the WHERE clause of the | ||
3493 : | # SELECT command need to be put into @joinWhere so we can string them | ||
3494 : | # together. We begin with the match clause. This is important, | ||
3495 : | # because the match clause's parameter mark must precede any parameter | ||
3496 : | # marks in the filter string. | ||
3497 : | if ($matchClause) { | ||
3498 : | push @joinWhere, $matchClause; | ||
3499 : | } | ||
3500 : | # Add the filter string. We put it in parentheses to avoid operator | ||
3501 : | # precedence problems with the match clause or the joins. | ||
3502 : | if ($filterString) { | ||
3503 : | Trace("Filter string is \"$filterString\".") if T(4); | ||
3504 : | push @joinWhere, "($filterString)"; | ||
3505 : | } | ||
3506 : | # String it all together into a big filter clause. | ||
3507 : | if (@joinWhere) { | ||
3508 : | $suffix .= " WHERE " . join(' AND ', @joinWhere); | ||
3509 : | } | ||
3510 : | # Add the sort or limit clause (if any). | ||
3511 : | if ($orderClause) { | ||
3512 : | $suffix .= " $orderClause"; | ||
3513 : | parrello | 1.45 | } |
3514 : | # Return the suffix, the mapped name list, and the mapped name hash. | ||
3515 : | return ($suffix, \@mappedNameList, \%mappedNameHash); | ||
3516 : | } | ||
3517 : | |||
3518 : | parrello | 1.70 | =head3 _GetStatementHandle |
3519 : | parrello | 1.45 | |
3520 : | This method will prepare and execute an SQL query, returning the statement handle. | ||
3521 : | The main reason for doing this here is so that everybody who does SQL queries gets | ||
3522 : | the benefit of tracing. | ||
3523 : | |||
3524 : | This is an instance method. | ||
3525 : | |||
3526 : | =over 4 | ||
3527 : | |||
3528 : | =item command | ||
3529 : | |||
3530 : | Command to prepare and execute. | ||
3531 : | |||
3532 : | =item params | ||
3533 : | |||
3534 : | Reference to a list of the values to be substituted in for the parameter marks. | ||
3535 : | |||
3536 : | =item RETURN | ||
3537 : | |||
3538 : | Returns a prepared and executed statement handle from which the caller can extract | ||
3539 : | results. | ||
3540 : | |||
3541 : | =back | ||
3542 : | |||
3543 : | =cut | ||
3544 : | |||
3545 : | sub _GetStatementHandle { | ||
3546 : | # Get the parameters. | ||
3547 : | my ($self, $command, $params) = @_; | ||
3548 : | # Trace the query. | ||
3549 : | Trace("SQL query: $command") if T(SQL => 3); | ||
3550 : | Trace("PARMS: '" . (join "', '", @{$params}) . "'") if (T(SQL => 4) && (@{$params} > 0)); | ||
3551 : | # Get the database handle. | ||
3552 : | my $dbh = $self->{_dbh}; | ||
3553 : | # Prepare the command. | ||
3554 : | my $sth = $dbh->prepare_command($command); | ||
3555 : | # Execute it with the parameters bound in. | ||
3556 : | parrello | 1.75 | $sth->execute(@{$params}) || Confess("SELECT error: " . $sth->errstr()); |
3557 : | parrello | 1.45 | # Return the statement handle. |
3558 : | return $sth; | ||
3559 : | } | ||
3560 : | |||
3561 : | parrello | 1.70 | =head3 _GetLoadStats |
3562 : | parrello | 1.1 | |
3563 : | Return a blank statistics object for use by the load methods. | ||
3564 : | |||
3565 : | This is a static method. | ||
3566 : | |||
3567 : | =cut | ||
3568 : | |||
3569 : | parrello | 1.29 | sub _GetLoadStats{ |
3570 : | parrello | 1.28 | return Stats->new(); |
3571 : | parrello | 1.1 | } |
3572 : | |||
3573 : | parrello | 1.70 | =head3 _DumpRelation |
3574 : | parrello | 1.1 | |
3575 : | parrello | 1.73 | Dump the specified relation to the specified output file in tab-delimited format. |
3576 : | parrello | 1.1 | |
3577 : | This is an instance method. | ||
3578 : | |||
3579 : | =over 4 | ||
3580 : | |||
3581 : | =item outputDirectory | ||
3582 : | |||
3583 : | Directory to contain the output file. | ||
3584 : | |||
3585 : | =item relationName | ||
3586 : | |||
3587 : | Name of the relation to dump. | ||
3588 : | |||
3589 : | =item relation | ||
3590 : | |||
3591 : | Structure describing the relation to be dumped. | ||
3592 : | |||
3593 : | =back | ||
3594 : | |||
3595 : | =cut | ||
3596 : | |||
3597 : | sub _DumpRelation { | ||
3598 : | parrello | 1.10 | # Get the parameters. |
3599 : | my ($self, $outputDirectory, $relationName, $relation) = @_; | ||
3600 : | # Open the output file. | ||
3601 : | my $fileName = "$outputDirectory/$relationName.dtx"; | ||
3602 : | open(DTXOUT, ">$fileName") || Confess("Could not open dump file $fileName: $!"); | ||
3603 : | # Create a query for the specified relation. | ||
3604 : | my $dbh = $self->{_dbh}; | ||
3605 : | my $query = $dbh->prepare_command("SELECT * FROM $relationName"); | ||
3606 : | # Execute the query. | ||
3607 : | $query->execute() || Confess("SELECT error dumping $relationName."); | ||
3608 : | # Loop through the results. | ||
3609 : | while (my @row = $query->fetchrow) { | ||
3610 : | # Escape any tabs or new-lines in the row text. | ||
3611 : | for my $field (@row) { | ||
3612 : | $field =~ s/\n/\\n/g; | ||
3613 : | $field =~ s/\t/\\t/g; | ||
3614 : | } | ||
3615 : | # Tab-join the row and write it to the output file. | ||
3616 : | my $rowText = join("\t", @row); | ||
3617 : | print DTXOUT "$rowText\n"; | ||
3618 : | } | ||
3619 : | # Close the output file. | ||
3620 : | close DTXOUT; | ||
3621 : | parrello | 1.1 | } |
3622 : | |||
3623 : | parrello | 1.70 | =head3 _GetStructure |
3624 : | parrello | 1.1 | |
3625 : | Get the data structure for a specified entity or relationship. | ||
3626 : | |||
3627 : | This is an instance method. | ||
3628 : | |||
3629 : | =over 4 | ||
3630 : | |||
3631 : | =item objectName | ||
3632 : | |||
3633 : | Name of the desired entity or relationship. | ||
3634 : | |||
3635 : | =item RETURN | ||
3636 : | |||
3637 : | The descriptor for the specified object. | ||
3638 : | |||
3639 : | =back | ||
3640 : | |||
3641 : | =cut | ||
3642 : | |||
3643 : | sub _GetStructure { | ||
3644 : | parrello | 1.10 | # Get the parameters. |
3645 : | my ($self, $objectName) = @_; | ||
3646 : | # Get the metadata structure. | ||
3647 : | my $metadata = $self->{_metaData}; | ||
3648 : | # Declare the variable to receive the descriptor. | ||
3649 : | my $retVal; | ||
3650 : | # Get the descriptor from the metadata. | ||
3651 : | if (exists $metadata->{Entities}->{$objectName}) { | ||
3652 : | $retVal = $metadata->{Entities}->{$objectName}; | ||
3653 : | } elsif (exists $metadata->{Relationships}->{$objectName}) { | ||
3654 : | $retVal = $metadata->{Relationships}->{$objectName}; | ||
3655 : | } else { | ||
3656 : | Confess("Object $objectName not found in database."); | ||
3657 : | } | ||
3658 : | # Return the descriptor. | ||
3659 : | return $retVal; | ||
3660 : | parrello | 1.1 | } |
3661 : | |||
3662 : | parrello | 1.70 | |
3663 : | |||
3664 : | =head3 _GetRelationTable | ||
3665 : | parrello | 1.1 | |
3666 : | Get the list of relations for a specified entity or relationship. | ||
3667 : | |||
3668 : | This is an instance method. | ||
3669 : | |||
3670 : | =over 4 | ||
3671 : | |||
3672 : | =item objectName | ||
3673 : | |||
3674 : | Name of the desired entity or relationship. | ||
3675 : | |||
3676 : | =item RETURN | ||
3677 : | |||
3678 : | A table containing the relations for the specified object. | ||
3679 : | |||
3680 : | =back | ||
3681 : | |||
3682 : | =cut | ||
3683 : | |||
3684 : | sub _GetRelationTable { | ||
3685 : | parrello | 1.10 | # Get the parameters. |
3686 : | my ($self, $objectName) = @_; | ||
3687 : | # Get the descriptor from the metadata. | ||
3688 : | my $objectData = $self->_GetStructure($objectName); | ||
3689 : | # Return the object's relation list. | ||
3690 : | return $objectData->{Relations}; | ||
3691 : | parrello | 1.1 | } |
3692 : | |||
3693 : | parrello | 1.70 | =head3 _ValidateFieldNames |
3694 : | parrello | 1.1 | |
3695 : | Determine whether or not the field names are valid. A description of the problems with the names | ||
3696 : | will be written to the standard error output. If there is an error, this method will abort. This is | ||
3697 : | a static method. | ||
3698 : | |||
3699 : | =over 4 | ||
3700 : | |||
3701 : | =item metadata | ||
3702 : | |||
3703 : | Metadata structure loaded from the XML data definition. | ||
3704 : | |||
3705 : | =back | ||
3706 : | |||
3707 : | =cut | ||
3708 : | |||
3709 : | sub _ValidateFieldNames { | ||
3710 : | parrello | 1.10 | # Get the object. |
3711 : | my ($metadata) = @_; | ||
3712 : | # Declare the return value. We assume success. | ||
3713 : | my $retVal = 1; | ||
3714 : | # Loop through the sections of the database definition. | ||
3715 : | for my $section ('Entities', 'Relationships') { | ||
3716 : | # Loop through the objects in this section. | ||
3717 : | for my $object (values %{$metadata->{$section}}) { | ||
3718 : | # Loop through the object's fields. | ||
3719 : | for my $fieldName (keys %{$object->{Fields}}) { | ||
3720 : | parrello | 1.73 | # If this field name is invalid, set the return value to zero |
3721 : | # so we know we encountered an error. | ||
3722 : | if (! ValidateFieldName($fieldName)) { | ||
3723 : | parrello | 1.10 | $retVal = 0; |
3724 : | } | ||
3725 : | } | ||
3726 : | } | ||
3727 : | } | ||
3728 : | # If an error was found, fail. | ||
3729 : | if ($retVal == 0) { | ||
3730 : | Confess("Errors found in field names."); | ||
3731 : | } | ||
3732 : | parrello | 1.1 | } |
3733 : | |||
3734 : | parrello | 1.70 | =head3 _LoadRelation |
3735 : | parrello | 1.1 | |
3736 : | Load a relation from the data in a tab-delimited disk file. The load will only take place if a disk | ||
3737 : | file with the same name as the relation exists in the specified directory. | ||
3738 : | |||
3739 : | This is an instance method. | ||
3740 : | |||
3741 : | =over 4 | ||
3742 : | |||
3743 : | =item dbh | ||
3744 : | |||
3745 : | DBKernel object for accessing the database. | ||
3746 : | |||
3747 : | =item directoryName | ||
3748 : | |||
3749 : | Name of the directory containing the tab-delimited data files. | ||
3750 : | |||
3751 : | =item relationName | ||
3752 : | |||
3753 : | Name of the relation to load. | ||
3754 : | |||
3755 : | =item rebuild | ||
3756 : | |||
3757 : | TRUE if the table should be dropped and re-created before loading. | ||
3758 : | |||
3759 : | =item RETURN | ||
3760 : | |||
3761 : | Returns a statistical object describing the number of records read and a list of error messages. | ||
3762 : | |||
3763 : | =back | ||
3764 : | |||
3765 : | =cut | ||
3766 : | |||
3767 : | sub _LoadRelation { | ||
3768 : | parrello | 1.10 | # Get the parameters. |
3769 : | my ($self, $directoryName, $relationName, $rebuild) = @_; | ||
3770 : | # Create the file name. | ||
3771 : | my $fileName = "$directoryName/$relationName"; | ||
3772 : | # If the file doesn't exist, try adding the .dtx suffix. | ||
3773 : | if (! -e $fileName) { | ||
3774 : | $fileName .= ".dtx"; | ||
3775 : | if (! -e $fileName) { | ||
3776 : | $fileName = ""; | ||
3777 : | } | ||
3778 : | } | ||
3779 : | # Create the return object. | ||
3780 : | my $retVal = _GetLoadStats(); | ||
3781 : | # If a file exists to load the table, its name will be in $fileName. Otherwise, $fileName will | ||
3782 : | # be a null string. | ||
3783 : | if ($fileName ne "") { | ||
3784 : | # Load the relation from the file. | ||
3785 : | $retVal = $self->LoadTable($fileName, $relationName, $rebuild); | ||
3786 : | } elsif ($rebuild) { | ||
3787 : | # Here we are rebuilding, but no file exists, so we just re-create the table. | ||
3788 : | $self->CreateTable($relationName, 1); | ||
3789 : | } | ||
3790 : | # Return the statistics from the load. | ||
3791 : | return $retVal; | ||
3792 : | parrello | 1.1 | } |
3793 : | |||
3794 : | parrello | 1.73 | |
3795 : | parrello | 1.70 | =head3 _LoadMetaData |
3796 : | parrello | 1.1 | |
3797 : | This method loads the data describing this database from an XML file into a metadata structure. | ||
3798 : | The resulting structure is a set of nested hash tables containing all the information needed to | ||
3799 : | load or use the database. The schema for the XML file is F<ERDatabase.xml>. | ||
3800 : | |||
3801 : | This is a static method. | ||
3802 : | |||
3803 : | =over 4 | ||
3804 : | |||
3805 : | =item filename | ||
3806 : | |||
3807 : | Name of the file containing the database definition. | ||
3808 : | |||
3809 : | =item RETURN | ||
3810 : | |||
3811 : | Returns a structure describing the database. | ||
3812 : | |||
3813 : | =back | ||
3814 : | |||
3815 : | =cut | ||
3816 : | |||
3817 : | sub _LoadMetaData { | ||
3818 : | parrello | 1.10 | # Get the parameters. |
3819 : | my ($filename) = @_; | ||
3820 : | parrello | 1.75 | Trace("Reading DBD from $filename.") if T(2); |
3821 : | parrello | 1.10 | # Slurp the XML file into a variable. Extensive use of options is used to insure we |
3822 : | # get the exact structure we want. | ||
3823 : | parrello | 1.73 | my $metadata = ReadMetaXML($filename); |
3824 : | parrello | 1.10 | # Before we go any farther, we need to validate the field and object names. If an error is found, |
3825 : | # the method below will fail. | ||
3826 : | _ValidateFieldNames($metadata); | ||
3827 : | # Next we need to create a hash table for finding relations. The entities and relationships are | ||
3828 : | # implemented as one or more database relations. | ||
3829 : | my %masterRelationTable = (); | ||
3830 : | # Loop through the entities. | ||
3831 : | my $entityList = $metadata->{Entities}; | ||
3832 : | for my $entityName (keys %{$entityList}) { | ||
3833 : | my $entityStructure = $entityList->{$entityName}; | ||
3834 : | # | ||
3835 : | parrello | 1.12 | # The first step is to create all the entity's default values. For C<Field> elements, |
3836 : | parrello | 1.10 | # the relation name must be added where it is not specified. For relationships, |
3837 : | # the B<from-link> and B<to-link> fields must be inserted, and for entities an B<id> | ||
3838 : | # field must be added to each relation. Finally, each field will have a C<PrettySort> attribute | ||
3839 : | # added that can be used to pull the implicit fields to the top when displaying the field | ||
3840 : | # documentation. The PrettySort values are 1-based and indicate in which pass through a | ||
3841 : | # relation's data the field should be displayed-- 1 for the first pass, 2 for the second, | ||
3842 : | # and so on. | ||
3843 : | # | ||
3844 : | # Fix up this entity. | ||
3845 : | _FixupFields($entityStructure, $entityName, 2, 3); | ||
3846 : | # Add the ID field. | ||
3847 : | _AddField($entityStructure, 'id', { type => $entityStructure->{keyType}, | ||
3848 : | relation => $entityName, | ||
3849 : | Notes => { content => "Unique identifier for this \[b\]$entityName\[/b\]." }, | ||
3850 : | PrettySort => 1}); | ||
3851 : | # | ||
3852 : | # The current field list enables us to quickly find the relation containing a particular field. | ||
3853 : | # We also need a list that tells us the fields in each relation. We do this by creating a | ||
3854 : | # Relations structure in the entity structure and collating the fields into it based on their | ||
3855 : | # C<relation> property. There is one tricky bit, which is that every relation has to have the | ||
3856 : | # C<id> field in it. Note also that the field list is put into a C<Fields> member of the | ||
3857 : | # relation's structure so that it looks more like the entity and relationship structures. | ||
3858 : | # | ||
3859 : | # First we need to create the relations list. | ||
3860 : | my $relationTable = { }; | ||
3861 : | # Loop through the fields. We use a list of field names to prevent a problem with | ||
3862 : | # the hash table cursor losing its place during the loop. | ||
3863 : | my $fieldList = $entityStructure->{Fields}; | ||
3864 : | my @fieldNames = keys %{$fieldList}; | ||
3865 : | for my $fieldName (@fieldNames) { | ||
3866 : | my $fieldData = $fieldList->{$fieldName}; | ||
3867 : | # Get the current field's relation name. | ||
3868 : | my $relationName = $fieldData->{relation}; | ||
3869 : | # Insure the relation exists. | ||
3870 : | if (!exists $relationTable->{$relationName}) { | ||
3871 : | $relationTable->{$relationName} = { Fields => { } }; | ||
3872 : | } | ||
3873 : | # Add the field to the relation's field structure. | ||
3874 : | $relationTable->{$relationName}->{Fields}->{$fieldName} = $fieldData; | ||
3875 : | } | ||
3876 : | # Now that we've organized all our fields by relation name we need to do some serious | ||
3877 : | # housekeeping. We must add the C<id> field to every relation and convert each relation | ||
3878 : | # to a list of fields. First, we need the ID field itself. | ||
3879 : | my $idField = $fieldList->{id}; | ||
3880 : | # Loop through the relations. | ||
3881 : | for my $relationName (keys %{$relationTable}) { | ||
3882 : | my $relation = $relationTable->{$relationName}; | ||
3883 : | # Get the relation's field list. | ||
3884 : | my $relationFieldList = $relation->{Fields}; | ||
3885 : | # Add the ID field to it. If the field's already there, it will not make any | ||
3886 : | # difference. | ||
3887 : | $relationFieldList->{id} = $idField; | ||
3888 : | # Convert the field set from a hash into a list using the pretty-sort number. | ||
3889 : | $relation->{Fields} = _ReOrderRelationTable($relationFieldList); | ||
3890 : | # Add the relation to the master table. | ||
3891 : | $masterRelationTable{$relationName} = $relation; | ||
3892 : | } | ||
3893 : | # The indexes come next. The primary relation will have a unique-keyed index based on the ID field. | ||
3894 : | # The other relations must have at least one index that begins with the ID field. In addition, the | ||
3895 : | # metadata may require alternate indexes. We do those alternate indexes first. To begin, we need to | ||
3896 : | # get the entity's field list and index list. | ||
3897 : | my $indexList = $entityStructure->{Indexes}; | ||
3898 : | # Loop through the indexes. | ||
3899 : | for my $indexData (@{$indexList}) { | ||
3900 : | # We need to find this index's fields. All of them should belong to the same relation. | ||
3901 : | # The ID field is an exception, since it's in all relations. | ||
3902 : | my $relationName = '0'; | ||
3903 : | for my $fieldDescriptor (@{$indexData->{IndexFields}}) { | ||
3904 : | # Get this field's name. | ||
3905 : | my $fieldName = $fieldDescriptor->{name}; | ||
3906 : | # Only proceed if it is NOT the ID field. | ||
3907 : | if ($fieldName ne 'id') { | ||
3908 : | # Find the relation containing the current index field. | ||
3909 : | my $thisName = $fieldList->{$fieldName}->{relation}; | ||
3910 : | if ($relationName eq '0') { | ||
3911 : | # Here we're looking at the first field, so we save its relation name. | ||
3912 : | $relationName = $thisName; | ||
3913 : | } elsif ($relationName ne $thisName) { | ||
3914 : | # Here we have a field mismatch. | ||
3915 : | Confess("Mixed index: field $fieldName does not belong to relation $relationName."); | ||
3916 : | } | ||
3917 : | } | ||
3918 : | } | ||
3919 : | # Now $relationName is the name of the relation that contains this index. Add the index structure | ||
3920 : | # to the relation. | ||
3921 : | push @{$relationTable->{$relationName}->{Indexes}}, $indexData; | ||
3922 : | } | ||
3923 : | # Now each index has been put in a relation. We need to add the primary index for the primary | ||
3924 : | # relation. | ||
3925 : | push @{$relationTable->{$entityName}->{Indexes}}, | ||
3926 : | { IndexFields => [ {name => 'id', order => 'ascending'} ], Unique => 'true', | ||
3927 : | Notes => { content => "Primary index for $entityName." } | ||
3928 : | }; | ||
3929 : | # The next step is to insure that each relation has at least one index that begins with the ID field. | ||
3930 : | # After that, we convert each relation's index list to an index table. We first need to loop through | ||
3931 : | # the relations. | ||
3932 : | for my $relationName (keys %{$relationTable}) { | ||
3933 : | my $relation = $relationTable->{$relationName}; | ||
3934 : | # Get the relation's index list. | ||
3935 : | my $indexList = $relation->{Indexes}; | ||
3936 : | # Insure this relation has an ID index. | ||
3937 : | my $found = 0; | ||
3938 : | for my $index (@{$indexList}) { | ||
3939 : | if ($index->{IndexFields}->[0]->{name} eq "id") { | ||
3940 : | $found = 1; | ||
3941 : | } | ||
3942 : | } | ||
3943 : | if ($found == 0) { | ||
3944 : | push @{$indexList}, { IndexFields => [ {name => 'id', order => 'ascending'} ] }; | ||
3945 : | } | ||
3946 : | # Now we need to convert the relation's index list to an index table. We begin by creating | ||
3947 : | # an empty table in the relation structure. | ||
3948 : | $relation->{Indexes} = { }; | ||
3949 : | # Loop through the indexes. | ||
3950 : | my $count = 0; | ||
3951 : | for my $index (@{$indexList}) { | ||
3952 : | # Add this index to the index table. | ||
3953 : | parrello | 1.75 | _AddIndex("idx$count", $relation, $index); |
3954 : | parrello | 1.10 | # Increment the counter so that the next index has a different name. |
3955 : | $count++; | ||
3956 : | } | ||
3957 : | } | ||
3958 : | # Finally, we add the relation structure to the entity. | ||
3959 : | $entityStructure->{Relations} = $relationTable; | ||
3960 : | } | ||
3961 : | # Loop through the relationships. Relationships actually turn out to be much simpler than entities. | ||
3962 : | # For one thing, there is only a single constituent relation. | ||
3963 : | my $relationshipList = $metadata->{Relationships}; | ||
3964 : | for my $relationshipName (keys %{$relationshipList}) { | ||
3965 : | my $relationshipStructure = $relationshipList->{$relationshipName}; | ||
3966 : | # Fix up this relationship. | ||
3967 : | _FixupFields($relationshipStructure, $relationshipName, 2, 3); | ||
3968 : | # Format a description for the FROM field. | ||
3969 : | my $fromEntity = $relationshipStructure->{from}; | ||
3970 : | my $fromComment = "<b>id</b> of the source <b><a href=\"#$fromEntity\">$fromEntity</a></b>."; | ||
3971 : | # Get the FROM entity's key type. | ||
3972 : | my $fromType = $entityList->{$fromEntity}->{keyType}; | ||
3973 : | # Add the FROM field. | ||
3974 : | _AddField($relationshipStructure, 'from-link', { type => $fromType, | ||
3975 : | relation => $relationshipName, | ||
3976 : | Notes => { content => $fromComment }, | ||
3977 : | PrettySort => 1}); | ||
3978 : | # Format a description for the TO field. | ||
3979 : | my $toEntity = $relationshipStructure->{to}; | ||
3980 : | my $toComment = "<b>id</b> of the target <b><a href=\"#$toEntity\">$toEntity</a></b>."; | ||
3981 : | # Get the TO entity's key type. | ||
3982 : | my $toType = $entityList->{$toEntity}->{keyType}; | ||
3983 : | # Add the TO field. | ||
3984 : | _AddField($relationshipStructure, 'to-link', { type=> $toType, | ||
3985 : | relation => $relationshipName, | ||
3986 : | Notes => { content => $toComment }, | ||
3987 : | PrettySort => 1}); | ||
3988 : | # Create an index-free relation from the fields. | ||
3989 : | my $thisRelation = { Fields => _ReOrderRelationTable($relationshipStructure->{Fields}), | ||
3990 : | Indexes => { } }; | ||
3991 : | $relationshipStructure->{Relations} = { $relationshipName => $thisRelation }; | ||
3992 : | # Create the FROM and TO indexes. | ||
3993 : | _CreateRelationshipIndex("From", $relationshipName, $relationshipStructure); | ||
3994 : | _CreateRelationshipIndex("To", $relationshipName, $relationshipStructure); | ||
3995 : | # Add the relation to the master table. | ||
3996 : | $masterRelationTable{$relationshipName} = $thisRelation; | ||
3997 : | } | ||
3998 : | # Now store the master relation table in the metadata structure. | ||
3999 : | $metadata->{RelationTable} = \%masterRelationTable; | ||
4000 : | # Our final task is to create the join table. The join table is a hash that describes all | ||
4001 : | # the join clauses for traveling through the relationships. The join clause is an equality | ||
4002 : | # condition that can be put into a WHERE clause in order to join two objects. Two relationships | ||
4003 : | # can be joined if they share an entity in common; and an entity can be joined to a relationship | ||
4004 : | # if the entity is at either end of the relationship. | ||
4005 : | my %joinTable = (); | ||
4006 : | # Loop through the entities. | ||
4007 : | for my $entityName (keys %{$entityList}) { | ||
4008 : | # Build three lists of the relationships connected to this entity. One will be | ||
4009 : | # for relationships from the entity, one for relationships to the entity, and | ||
4010 : | # one for recursive relationships. | ||
4011 : | my @fromList = (); | ||
4012 : | my @toList = (); | ||
4013 : | my @bothList = (); | ||
4014 : | parrello | 1.21 | Trace("Join table build for $entityName.") if T(metadata => 4); |
4015 : | parrello | 1.10 | for my $relationshipName (keys %{$relationshipList}) { |
4016 : | my $relationship = $relationshipList->{$relationshipName}; | ||
4017 : | # Determine if this relationship has our entity in one of its link fields. | ||
4018 : | my $fromEntity = $relationship->{from}; | ||
4019 : | my $toEntity = $relationship->{to}; | ||
4020 : | parrello | 1.41 | Trace("Join check for relationship $relationshipName from $fromEntity to $toEntity.") if T(Joins => 4); |
4021 : | parrello | 1.10 | if ($fromEntity eq $entityName) { |
4022 : | if ($toEntity eq $entityName) { | ||
4023 : | # Here the relationship is recursive. | ||
4024 : | push @bothList, $relationshipName; | ||
4025 : | parrello | 1.21 | Trace("Relationship $relationshipName put in both-list.") if T(metadata => 4); |
4026 : | parrello | 1.10 | } else { |
4027 : | # Here the relationship comes from the entity. | ||
4028 : | push @fromList, $relationshipName; | ||
4029 : | parrello | 1.21 | Trace("Relationship $relationshipName put in from-list.") if T(metadata => 4); |
4030 : | parrello | 1.10 | } |
4031 : | } elsif ($toEntity eq $entityName) { | ||
4032 : | # Here the relationship goes to the entity. | ||
4033 : | push @toList, $relationshipName; | ||
4034 : | parrello | 1.21 | Trace("Relationship $relationshipName put in to-list.") if T(metadata => 4); |
4035 : | parrello | 1.10 | } |
4036 : | } | ||
4037 : | # Create the nonrecursive joins. Note that we build two hashes for running | ||
4038 : | # through the nonrecursive relationships since we'll have an outer loop | ||
4039 : | # and an inner loop, and we can't do two "each" iterations on the same | ||
4040 : | # hash table at the same time. | ||
4041 : | my %directRelationships = ( from => \@fromList, to => \@toList ); | ||
4042 : | my %otherRelationships = ( from => \@fromList, to => \@toList ); | ||
4043 : | for my $linkType (keys %directRelationships) { | ||
4044 : | my $relationships = $directRelationships{$linkType}; | ||
4045 : | # Loop through all the relationships. | ||
4046 : | for my $relationshipName (@{$relationships}) { | ||
4047 : | # Create joins between the entity and this relationship. | ||
4048 : | my $linkField = "$relationshipName.${linkType}_link"; | ||
4049 : | my $joinClause = "$entityName.id = $linkField"; | ||
4050 : | parrello | 1.21 | Trace("Entity join clause is $joinClause for $entityName and $relationshipName.") if T(metadata => 4); |
4051 : | parrello | 1.10 | $joinTable{"$entityName/$relationshipName"} = $joinClause; |
4052 : | $joinTable{"$relationshipName/$entityName"} = $joinClause; | ||
4053 : | # Create joins between this relationship and the other relationships. | ||
4054 : | for my $otherType (keys %otherRelationships) { | ||
4055 : | my $otherships = $otherRelationships{$otherType}; | ||
4056 : | for my $otherName (@{$otherships}) { | ||
4057 : | # Get the key for this join. | ||
4058 : | my $joinKey = "$otherName/$relationshipName"; | ||
4059 : | # Check for a duplicate or a self-join. | ||
4060 : | if (exists $joinTable{$joinKey}) { | ||
4061 : | # Here we have a duplicate, which means that the join | ||
4062 : | # path is ambiguous. We delete the join from the join | ||
4063 : | # table to prevent it from being used. | ||
4064 : | delete $joinTable{$joinKey}; | ||
4065 : | Trace("Deleting ambiguous join $joinKey.") if T(4); | ||
4066 : | } elsif ($otherName ne $relationshipName) { | ||
4067 : | # Here we have a valid join. Note that joins between a | ||
4068 : | # relationship and itself are prohibited. | ||
4069 : | my $relJoinClause = "$otherName.${otherType}_link = $linkField"; | ||
4070 : | $joinTable{$joinKey} = $relJoinClause; | ||
4071 : | parrello | 1.21 | Trace("Relationship join clause is $relJoinClause for $joinKey.") if T(metadata => 4); |
4072 : | parrello | 1.10 | } |
4073 : | } | ||
4074 : | } | ||
4075 : | # Create joins between this relationship and the recursive relationships. | ||
4076 : | # We don't need to check for ambiguous joins here, because a recursive | ||
4077 : | # relationship can only be ambiguous with another recursive relationship, | ||
4078 : | # and the incoming relationship from the outer loop is never recursive. | ||
4079 : | for my $otherName (@bothList) { | ||
4080 : | parrello | 1.21 | Trace("Setting up relationship joins to recursive relationship $otherName with $relationshipName.") if T(metadata => 4); |
4081 : | parrello | 1.10 | # Join from the left. |
4082 : | $joinTable{"$relationshipName/$otherName"} = | ||
4083 : | "$linkField = $otherName.from_link"; | ||
4084 : | # Join from the right. | ||
4085 : | $joinTable{"$otherName/$relationshipName"} = | ||
4086 : | "$otherName.to_link = $linkField"; | ||
4087 : | } | ||
4088 : | } | ||
4089 : | } | ||
4090 : | # Create entity joins for the recursive relationships. Unlike the non-recursive | ||
4091 : | # joins, the direction makes a difference with the recursive joins. This can give | ||
4092 : | # rise to situations where we can't create the path we want; however, it is always | ||
4093 : | # possible to get the same effect using multiple queries. | ||
4094 : | for my $relationshipName (@bothList) { | ||
4095 : | parrello | 1.21 | Trace("Setting up entity joins to recursive relationship $relationshipName with $entityName.") if T(metadata => 4); |
4096 : | parrello | 1.10 | # Join to the entity from each direction. |
4097 : | $joinTable{"$entityName/$relationshipName"} = | ||
4098 : | "$entityName.id = $relationshipName.from_link"; | ||
4099 : | $joinTable{"$relationshipName/$entityName"} = | ||
4100 : | "$relationshipName.to_link = $entityName.id"; | ||
4101 : | } | ||
4102 : | } | ||
4103 : | # Add the join table to the structure. | ||
4104 : | $metadata->{Joins} = \%joinTable; | ||
4105 : | # Return the slurped and fixed-up structure. | ||
4106 : | return $metadata; | ||
4107 : | parrello | 1.1 | } |
4108 : | |||
4109 : | parrello | 1.70 | =head3 _CreateRelationshipIndex |
4110 : | parrello | 1.1 | |
4111 : | Create an index for a relationship's relation. | ||
4112 : | |||
4113 : | This is a static method. | ||
4114 : | |||
4115 : | =over 4 | ||
4116 : | |||
4117 : | =item indexKey | ||
4118 : | |||
4119 : | Type of index: either C<"From"> or C<"To">. | ||
4120 : | |||
4121 : | =item relationshipName | ||
4122 : | |||
4123 : | Name of the relationship. | ||
4124 : | |||
4125 : | =item relationshipStructure | ||
4126 : | |||
4127 : | Structure describing the relationship that the index will sort. | ||
4128 : | |||
4129 : | =back | ||
4130 : | |||
4131 : | =cut | ||
4132 : | |||
4133 : | sub _CreateRelationshipIndex { | ||
4134 : | parrello | 1.10 | # Get the parameters. |
4135 : | my ($indexKey, $relationshipName, $relationshipStructure) = @_; | ||
4136 : | # Get the target relation. | ||
4137 : | my $relationStructure = $relationshipStructure->{Relations}->{$relationshipName}; | ||
4138 : | # Create a descriptor for the link field that goes at the beginning of this index. | ||
4139 : | my $firstField = { name => lcfirst $indexKey . '-link', order => 'ascending' }; | ||
4140 : | # Get the target index descriptor. | ||
4141 : | my $newIndex = $relationshipStructure->{$indexKey . "Index"}; | ||
4142 : | # Add the first field to the index's field list. Due to the craziness of PERL, if the | ||
4143 : | # index descriptor does not exist, it will be created automatically so we can add | ||
4144 : | # the field to it. | ||
4145 : | unshift @{$newIndex->{IndexFields}}, $firstField; | ||
4146 : | parrello | 1.12 | # If this is a one-to-many relationship, the "To" index is unique. |
4147 : | if ($relationshipStructure->{arity} eq "1M" && $indexKey eq "To") { | ||
4148 : | $newIndex->{Unique} = 'true'; | ||
4149 : | } | ||
4150 : | parrello | 1.10 | # Add the index to the relation. |
4151 : | parrello | 1.75 | _AddIndex("idx$indexKey", $relationStructure, $newIndex); |
4152 : | parrello | 1.1 | } |
4153 : | |||
4154 : | parrello | 1.70 | =head3 _AddIndex |
4155 : | parrello | 1.1 | |
4156 : | Add an index to a relation structure. | ||
4157 : | |||
4158 : | This is a static method. | ||
4159 : | |||
4160 : | =over 4 | ||
4161 : | |||
4162 : | =item indexName | ||
4163 : | |||
4164 : | Name to give to the new index. | ||
4165 : | |||
4166 : | =item relationStructure | ||
4167 : | |||
4168 : | Relation structure to which the new index should be added. | ||
4169 : | |||
4170 : | =item newIndex | ||
4171 : | |||
4172 : | New index to add. | ||
4173 : | |||
4174 : | =back | ||
4175 : | |||
4176 : | =cut | ||
4177 : | |||
4178 : | sub _AddIndex { | ||
4179 : | parrello | 1.10 | # Get the parameters. |
4180 : | my ($indexName, $relationStructure, $newIndex) = @_; | ||
4181 : | # We want to re-do the index's field list. Instead of an object for each field, | ||
4182 : | # we want a string consisting of the field name optionally followed by the token DESC. | ||
4183 : | my @fieldList = ( ); | ||
4184 : | for my $field (@{$newIndex->{IndexFields}}) { | ||
4185 : | # Create a string containing the field name. | ||
4186 : | my $fieldString = $field->{name}; | ||
4187 : | # Add the ordering token if needed. | ||
4188 : | if ($field->{order} eq "descending") { | ||
4189 : | $fieldString .= " DESC"; | ||
4190 : | } | ||
4191 : | # Push the result onto the field list. | ||
4192 : | push @fieldList, $fieldString; | ||
4193 : | } | ||
4194 : | # Store the field list just created as the new index field list. | ||
4195 : | $newIndex->{IndexFields} = \@fieldList; | ||
4196 : | # Add the index to the relation's index list. | ||
4197 : | $relationStructure->{Indexes}->{$indexName} = $newIndex; | ||
4198 : | parrello | 1.1 | } |
4199 : | |||
4200 : | parrello | 1.70 | =head3 _FixupFields |
4201 : | parrello | 1.1 | |
4202 : | This method fixes the field list for an entity or relationship. It will add the caller-specified | ||
4203 : | relation name to fields that do not have a name and set the C<PrettySort> value as specified. | ||
4204 : | |||
4205 : | This is a static method. | ||
4206 : | |||
4207 : | =over 4 | ||
4208 : | |||
4209 : | =item structure | ||
4210 : | |||
4211 : | Entity or relationship structure to be fixed up. | ||
4212 : | |||
4213 : | =item defaultRelationName | ||
4214 : | |||
4215 : | Default relation name to be added to the fields. | ||
4216 : | |||
4217 : | =item prettySortValue | ||
4218 : | |||
4219 : | C<PrettySort> value for the relation's normal fields. | ||
4220 : | |||
4221 : | =item textPrettySortValue | ||
4222 : | |||
4223 : | C<PrettySort> value for the relation's text fields. This value can be set to one greater than the | ||
4224 : | normal pretty sort value so that text fields go at the end of each relation. | ||
4225 : | |||
4226 : | =back | ||
4227 : | |||
4228 : | =cut | ||
4229 : | |||
4230 : | sub _FixupFields { | ||
4231 : | parrello | 1.10 | # Get the parameters. |
4232 : | my ($structure, $defaultRelationName, $prettySortValue, $textPrettySortValue) = @_; | ||
4233 : | # Insure the structure has a field list. | ||
4234 : | if (!exists $structure->{Fields}) { | ||
4235 : | # Here it doesn't, so we create a new one. | ||
4236 : | $structure->{Fields} = { }; | ||
4237 : | } else { | ||
4238 : | parrello | 1.70 | # Here we have a field list. We need to track the searchable fields, so we |
4239 : | # create a list for stashing them. | ||
4240 : | my @textFields = (); | ||
4241 : | # Loop through the fields. | ||
4242 : | parrello | 1.10 | my $fieldStructures = $structure->{Fields}; |
4243 : | for my $fieldName (keys %{$fieldStructures}) { | ||
4244 : | parrello | 1.8 | Trace("Processing field $fieldName of $defaultRelationName.") if T(4); |
4245 : | parrello | 1.10 | my $fieldData = $fieldStructures->{$fieldName}; |
4246 : | # Get the field type. | ||
4247 : | my $type = $fieldData->{type}; | ||
4248 : | # Plug in a relation name if it is needed. | ||
4249 : | Tracer::MergeOptions($fieldData, { relation => $defaultRelationName }); | ||
4250 : | parrello | 1.70 | # Check for searchability. |
4251 : | if ($fieldData->{searchable}) { | ||
4252 : | # Only allow this for a primary relation. | ||
4253 : | if ($fieldData->{relation} ne $defaultRelationName) { | ||
4254 : | Confess("Field $fieldName of $defaultRelationName is in secondary relations and cannot be searchable."); | ||
4255 : | } else { | ||
4256 : | push @textFields, $fieldName; | ||
4257 : | } | ||
4258 : | } | ||
4259 : | parrello | 1.10 | # Add the PrettySortValue. |
4260 : | $fieldData->{PrettySort} = (($type eq "text") ? $textPrettySortValue : $prettySortValue); | ||
4261 : | } | ||
4262 : | parrello | 1.70 | # If there are searchable fields, remember the fact. |
4263 : | if (@textFields) { | ||
4264 : | $structure->{searchFields} = \@textFields; | ||
4265 : | } | ||
4266 : | parrello | 1.10 | } |
4267 : | parrello | 1.1 | } |
4268 : | |||
4269 : | parrello | 1.70 | =head3 _FixName |
4270 : | parrello | 1.1 | |
4271 : | Fix the incoming field name so that it is a legal SQL column name. | ||
4272 : | |||
4273 : | This is a static method. | ||
4274 : | |||
4275 : | =over 4 | ||
4276 : | |||
4277 : | =item fieldName | ||
4278 : | |||
4279 : | Field name to fix. | ||
4280 : | |||
4281 : | =item RETURN | ||
4282 : | |||
4283 : | Returns the fixed-up field name. | ||
4284 : | |||
4285 : | =back | ||
4286 : | |||
4287 : | =cut | ||
4288 : | |||
4289 : | sub _FixName { | ||
4290 : | parrello | 1.10 | # Get the parameter. |
4291 : | my ($fieldName) = @_; | ||
4292 : | # Replace its minus signs with underscores. | ||
4293 : | $fieldName =~ s/-/_/g; | ||
4294 : | # Return the result. | ||
4295 : | return $fieldName; | ||
4296 : | parrello | 1.1 | } |
4297 : | |||
4298 : | parrello | 1.70 | =head3 _FixNames |
4299 : | parrello | 1.1 | |
4300 : | Fix all the field names in a list. | ||
4301 : | |||
4302 : | This is a static method. | ||
4303 : | |||
4304 : | =over 4 | ||
4305 : | |||
4306 : | =item field1, field2, field3, ... fieldn | ||
4307 : | |||
4308 : | List of field names to fix. | ||
4309 : | |||
4310 : | =item RETURN | ||
4311 : | |||
4312 : | Returns a list of fixed-up versions of the incoming field names. | ||
4313 : | |||
4314 : | =back | ||
4315 : | |||
4316 : | =cut | ||
4317 : | |||
4318 : | sub _FixNames { | ||
4319 : | parrello | 1.10 | # Create the result list. |