[Bio] / FigWebPages / Attributes.html Repository:
ViewVC logotype

Diff of /FigWebPages/Attributes.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.3, Sat Mar 12 20:30:39 2005 UTC revision 1.4, Tue Jul 12 14:34:15 2005 UTC
# Line 1  Line 1 
1  <h1 style="text-align: center">Attributes</h1>  <h1 style="text-align: center">Attributes</h1>
2    <h2 style="text-align: center">Updated July 11th, 2005. Rob Edwards</h2>
3    
4    
5                    <ul>
6                    <h3 style="text-align: center">Contents</h3>
7                            <li><a href="#overview">Overview</a></li>
8                            <li><a href="#definitions">Definitions</a></li>
9                            <li><a href="#methods">Methods for accessing attributes</a></li>
10                            <li><a href="#get_attributes">get_attributes</a></li>
11                            <li><a href="#add_attribute">add_attribute</a></li>
12                            <li><a href="#delete_attribute">delete_attribute</a></li>
13                            <li><a href="#change_attribute">change_attribute</a></li>
14                            <li><a href="#erase_attribute_entirely">erase_attribute_entirely</a></li>
15                            <li><a href="#get_keys">get_keys</a></li>
16                            <li><a href="#get_values">get_values</a></li>
17                            <li><a href="#key_info">key_info</a></li>
18                            <li><a href="#get_key_value">get_key_value</a></li>
19                            <li><a href="#guess_value_format">guess_value_format</a></li>
20                            <li><a href="#attribute_location">attribute_location</a></li>
21                    </ul>
22    
23  <p>I have added attributes to the database in a more significant way. This page is to document those attributes and ways to access/modify them. The page has two sections, a non-technical section for general discussion and overview, and a technical section for behind-the-scenes type information.</p>  <p>I have added attributes to the database in a more significant way. This page is to document those attributes and ways to access/modify them. The page has two sections, a non-technical section for general discussion and overview, and a technical section for behind-the-scenes type information.</p>
24    
25  <p>Most people should read the first section and ignore the second section.</p>  <p>Most people should read the first section and ignore the second section.</p>
26    
27  <p>A comment on nomenclature: I use the term tag/value pairs and attributes interchangeably. Something can have an attribute, and you have to say what that is and what its value is. We also have an idea that an attribute can be a URL, and if so, it should be presented as a URL. So we actually have tag, value, value. You will see this in action. The third element is called URL, but we always check and make sure that it begins http before turning it into a URL, and so we reserve the option of renaming this and making it something else. That will be mentioned here.</p>  <p>A comment on nomenclature: I use the term key/value pairs and attributes interchangeably. Something can have an attribute, and you have to say what that is and what its value is. We also have an idea that an attribute can be a URL, and if so, it should be presented as a URL. So we actually have key, value, value. You will see this in action. The third element is called URL, but we always check and make sure that it begins http before turning it into a URL, and so we reserve the option of renaming this and making it something else. That will be mentioned here.</p>
28    
29    <h3><a name="overview">Overview</a></h3>
30    
31  <h2>Non-technical Section</h2>  <p>We have extended the notion of key/value pairs beyond things associated with a peg and into the arena of anything. Any feature such as peg, prophage, rna, insertion element, and so on, can have a key/value pair associated with it. In addition, <em>genomes</em> have key/value pairs associated with them. In this sense, we can annotate the organisms from which the genomes were derived and begin to ask complex questions of the type "show me all organisms that are motile but don't have any flagellar genes". We are working on this interface.<p>
32    
33  <p>We have extended the notion of tag/value pairs beyond things associated with a peg and into the arena of anything. Any feature such as peg, prophage, rna, insertion element, and so on, can have a tag value pair associated with it. In addition, <em>genomes</em> have tag/value pairs associated with them. In this sense, we can annotate the organisms from which the genomes were derived and begin to ask complex questions of the type "show me all organisms that are motile but don't have any flagellar genes". We are working on this interface.<p>  <p>The key/value pairs are designed to be "lightweight" objects ideal for data mining rather than the rich, complex objects associated with annotations. If you are curating individual proteins you should probably do that using the annotation links on <a href="/FIG/protein.cgi">protein.cgi</a> since those allow tracking of who does what, and when. In contrast the key/value pairs will likely be loaded in batch from the command line without regard for overwriting other values!</p>
34    
35  <p>Try the following exercises to see key/value pairs in action:</p>  <p>Try the following exercises to see key/value pairs in action:</p>
36    
# Line 20  Line 42 
42    
43  <li>Now choose WIDTH from the same pull down menu, and click show spreadsheet. Because width is a numeric variable, I grouped these key/value pairs in 1/10ths of the maximum. If you look at the Color Descriptions box you will see ranges (this is not perfect at the moment, but it is on the way).</li>  <li>Now choose WIDTH from the same pull down menu, and click show spreadsheet. Because width is a numeric variable, I grouped these key/value pairs in 1/10ths of the maximum. If you look at the Color Descriptions box you will see ranges (this is not perfect at the moment, but it is on the way).</li>
44    
45  <li>Now reset the WIDTH pull-down menu to empty (the first option in the list), and choose PIRSF from the menu labelled "color columns by each PEGs attribute" and click show spreadsheet. This is the same as before, but hopefully we can add more tags here and color other things.</li>  <li>Now reset the WIDTH pull-down menu to empty (the first option in the list), and choose PIRSF from the menu labelled "color columns by each PEGs attribute" and click show spreadsheet. This is the same as before, but hopefully we can add more keys here and color other things.</li>
46    
47  <li>From one of the PEGs that is colored as having a PIRSF link click on the link to get to the protein page. There is the attributes box (as before), and a new "Edit Attributes" button. When you click this, you will get three fields, key, value, and URL. If you go to a protein that does not have any attributes yet, you still get the edit box to let you add some attributes.</li>  <li>From one of the PEGs that is colored as having a PIRSF link click on the link to get to the protein page. There is the attributes box (as before), and a new "Edit Attributes" button. When you click this, you will get three fields, key, value, and URL. If you go to a protein that does not have any attributes yet, you still get the edit box to let you add some attributes.</li>
48    
 <ul><p>The rules that apply here are:</p>  
 <li>the text is free form and can be whatever you like.</li>  
 <li>the key is case insensitive (at the moment generally uppercase, but I may change this to Sentence Case)</li>  
 <li>if the URL is a webpage, the key/value pair will be visible on the protein page. The URL doesn't have to be a webpage, and as I mentioned before, will probably become a flag for many other things.</li>  
 <li>you can add, edit, or delete individual key/value pairs here.</li>  
 <li>if you have a lot of key value pairs, you can send them to me and I'll load them in batch.</li>  
 </ul>  
49    
50  </ul>  </ul>
51    
52    
53  <h2>Technical stuff</h2>  <h3><a name="definitions">Definitions</a></h3>
54    
55    These are the definition of attributes in the SEED and describes the locations and implementations of the files and directories used to store and retrieve attributes.
56    
 <p>Rightly or wrongly I moved some of the methods in FIG.pm associated with attributes. I also renamed a couple of them. The old names are still valid, they are just pointers to the new routines. Apparently nothing breaks, but let me know if it does.</p>  
57    
 <p>There are now four base methods for handling attributes:</p>  
58  <ol>  <ol>
59  <li>add_attribute    - to add a new attribute to an object</li>          <li style="font-weight: 700">Attributes have the following four fields</li>
60  <li>delete_attribute - to remove an existing attribute from an object</li>          <ul>
61  <li>change_attribute - to modify an existing attribute</li>                  <li><em>ID</em>. This is usually a gene id or genome id but doesn't <i>have</i> to be.</li>
62  <li>get_attributes   - to get attributes for an object</li>                  <li><em>Key</em>. This is the key. The key should be unique (but doesn't have to be) and we will provide a method through the clearinghoouse to allow you to register a key and/or check whether someone else has assigned a key.
63                    <ul>
64                            <li>The key does not have to be unique, but this will assist in the exchange of data between machines.</li>
65                            <li>Keys are case sensitive</li>
66                            <li>An optional mapping is provided between a key and an explanation of what the key means (see below)</li>
67                            <li>By default, any key can have multiple values. If a key is to have only one value then a boolean can be set (see below) to limit this behavior</li>
68                    </ul>
69                    <li><em>Value</em>. The value is free form and there are no limitations on what is contained in the value.
70                    <li><em>URL</em>. The URL is optional, and not required for any data set.
71            </ul>
72            <br>
73            <li style="font-weight: 700">File Locations</li>
74            <ul>
75                    <li><em>General Attributes</em> Attributes are stored in the following locations:</li>
76                    <ul>
77                            <li>$FIG_Config::organisms/xxxxx/Attributes contains the genome and organism attributes</li>
78                            <li>$FIG_Config::organisms/xxxxx/Features/peg/Attributes contains the attributes for pegs</li>
79                            <li>$FIG_Config::organisms/xxxxx/Features/rna/Attributes contains the attributes for rnas... etc</li>
80                            <li>Note that no general attributes should be stored in $FIG_Config::global (see below)</li>
81                    </ul>
82                    <br>
83                    <li><em>Deleted Attributes</em>
84                    <ul>
85                            <li>Deleted attributes are stored in the text file $FIG_Config::global/Attributes/deleted_attributes. The only information that is stored here is the ID and the key. Note that this will currently delete all occurences of this key from this ID (hence with multiple values, all will be deleted).</li>
86                    </ul>
87                    <br>
88                    <li><em>Metadata</em></li>
89                    <ul>
90                            <li>Metadata associated with a key is stored in $FIG_Config::global/Attributes/attribute_keys</li>
91                            <li>This file has the following format, with the columns separated by tabs:</li>
92                            <ol>
93                            <li>key</li>
94                            <li>single datum only. A boolean, if set will limit the data associated with the key to a single datum, otherwise the key will be assumed to allow multiple data sets. Note that this is for information only and we will store all the data associated with a key</li>
95                            <li>Other information about the key (e.g. name of experiment, experimental details, etc).</li>
96                            </ol>
97                    </ul>
98            </ul>
99  </ol>  </ol>
100    
101    
 <p>In addition, there are some methods that make specific calls I am using:</p>  
 <ol>  
 <li>get_tags         - get tags for either all known objects or a selected type of object (peg, rna, genome, etc)</li>  
 </ol>  
102    
103    <h3><a name="methods">Methods for accessing attributes</a></h3>
104    <p>The attributes methods have now been rewritten for handling all kinds of attributes. The key/value pairs can be associated with a feature like a peg, rna, or prophage, or a genome.</p>
105    <p>There are several base attribute methods:</p>
106    <pre>
107     get_attributes
108     add_attribute
109     delete_attribute
110     change_attribute</pre>
111    <p>There are also methods for more complex things:</p>
112    <pre>
113     get_keys
114     get_values
115     guess_value_format</pre>
116    <p>By default all keys are case sensitive, and all keys have leading and trailing white space removed.</p>
117    <p>Attributes are not on a 1:1 correlation, so a single key can have several values.</p>
118    <p>
119    </p>
120    <h3><a name="get_attributes">get_attributes</a></h3>
121    <p>Get attributes requires one of four keys:
122    fid (which can be genome, peg, rna, or other id),
123    key,
124    value,
125    url</p>
126    <p>It will find any attribute that has the characteristics that you request, and if any values match it will return a four-ple of:
127    [fid, key, value, url]</p>
128    <p>You can request an E. coli key like this
129    $fig-&gt;get_attributes('83333.1');</p>
130    <p>You can request any PIRSF key like this
131    $fig-&gt;get_attributes('', 'PIRSF');</p>
132    <p>You can request any google url like this
133    $fig-&gt;get_attributes('', '', '', 'http://www.google.com');</p>
134    <p>NOTE: If there are no attributes an empty array will be returned. You need to check for this and not assume that it will be undef.</p>
135    <p>
136    </p>
137    <h3><a name="add_attribute">add_attribute</a></h3>
138    <p>Add a new key/value pair to something. Something can be a genome id, a peg, an rna, prophage, whatever.</p>
139    <p>Arguments:</p>
140    <pre>
141            feature id, this can be a peg, genome, etc,
142            key name. This is case sensitive and has the leading and trailing white space removed
143            value
144            optional URL to add
145            optional file to store the attributes in.</pre>
146    <p>A note on file names. At the moment the file assigned_attributes is used to store new attributes by default, and load_attributes loads that file last so any changes will overwrite existing keys. However this is not quite true since we can now have multiple key/values for a single peg. Using this method you can define a filename to store the attributes in. The directory structure will be figured out for you, so you can use something like ``pirsf'' as the file name.</p>
147    <p>
148    </p>
149    <h3><a name="delete_attribute">delete_attribute</a></h3>
150    <p>Remove a key from a feature.</p>
151    <pre>
152     Arguments:
153            feature id, this can be a peg, genome, etc,
154            key name to delete</pre>
155    <pre>
156     Deleted attributes are stored in global/deleted_attributes</pre>
157    <p>
158    </p>
159    <h3><a name="change_attribute">change_attribute</a></h3>
160    <pre>
161     Change the value of a key/value pair (and optionally its url).</pre>
162    <pre>
163     Arguments:
164            feature id, this can be a peg, genome, etc,
165            key name whose value to replace
166            value to replace it with
167            optional URL to add
168            optional file to store the changes in.</pre>
169    <p>See the note in add_attributes about files. Almost always you should not include this so that the default (assigned_attributes) is used as it is loaded last. However, this allows you to change the file if you wish.</p>
170    <p>Returns 0 on error and 1 on success.</p>
171    <p>
172    </p>
173    <h3><a name="erase_attribute_entirely">erase_attribute_entirely</a></h3>
174    <p>This method will remove any notion of the attribute that you give it. It is different from delete as that just removes a single attribute associated with a peg. This will remove the files and uninstall the attributes from the database so there is no memory of that type of attribute. All of the attribute files are moved to FIG_Tmp/Attributes/deleted_attributes, and so you can recover the data for a while. Still, you should probably use this carefully!</p>
175    <p>I use this to clean out old PIR superfamily attributes immediately before installing the new correspondence table.</p>
176    <p>e.g. my $status=$fig-&gt;erase_attribute_entirely(``pirsf'');</p>
177    <p>This will return the number of files that were moved to the new location</p>
178    <p>
179    </p>
180    <h3><a name="get_keys">get_keys</a></h3>
181    <p>Get all the keys that we know about.</p>
182    <p>Without any arguments:</p>
183    <p>Returns a reference to a hash, where the key is the type of feature (peg, genome, rna, prophage, etc), and the value is a reference to a hash where the key is the key name and the value is a reference to an array of all features with that id.</p>
184    <p>e.g.</p>
185    <p>print ``There are  '' , scalar @{{$fig-&gt;get_keys}-&gt;{'peg'}-&gt;{'PIRSF'}}, `` PIRSF keys in the database\n'';</p>
186    <p>my $keys=$fig-&gt;get_keys;
187    foreach my $type (keys %$keys)
188    {
189     foreach my $label (keys %{$keys-&gt;{$type}})
190     {
191      foreach my $peg (@{$keys-&gt;{$type}-&gt;{$label}})
192      {
193        .. do something to each peg and genome here
194      }
195     }
196    }</p>
197    <p>With an argument (that should be a recognized type like peg, rna, genome, etc):</p>
198    <p>Returns a reference to a hash where the key is the key name and the value is the reference to the array. This should use less memory than above.
199    The argument should be (currently) peg, rna, pp, genome, or any other recognized feature type (generally defined as the .peg. in the fid). The default is to return all keys, and this can also be specified with all</p>
200    <p>
201    </p>
202    <h3><a name="get_values">get_values</a></h3>
203    <p>Get all the values that we know about</p>
204    <p>Without any arguments:</p>
205    <p>Returns a reference to a hash, where the key is the type of feature (peg, genome, rna, prophage, etc), and the value is a reference to a hash where the key is the value and the value is the number of occurences</p>
206    <p>e.g. print ``There are  '' , {$fig-&gt;get_values}-&gt;{'peg'}-&gt;{'100'}, `` keys with the value 100 in  the database\n'';</p>
207    <p>With a single argument:</p>
208    <p>The argument is assumed to be the type (rna, peg, genome, etc).</p>
209    <p>With two arguments:</p>
210    <p>The first argument is the type (rna, peg, genome, etc), and the second argument is the key.</p>
211    <p>In each case it will return a reference to a hash.</p>
212    <p>E.g.</p>
213    <pre>
214            $fig-&gt;get_values(); # will get all values</pre>
215    <pre>
216            $fig-&gt;get_values('peg'); # will get all values for pegs</pre>
217    <pre>
218            $fig-&gt;get_values('peg', 'pirsf'); # will get all values for pegs with attribute pirsf</pre>
219    <pre>
220            $fig-&gt;get_values(undef, 'pirsf'); # will get all values for anything with that attribute</pre>
221    <p>
222    </p>
223    <h3><a name="key_info">key_info</a></h3>
224    <p>Access a reference to an array of [single, explanation]</p>
225    <p>Single is a boolean, if it is true only the last value returned should be used. Note that the other methods willl still return all the values, it is upto the implementer to ensure that only the last value is used.</p>
226    <p>Explanation is a user-derived explanation that can be defined.</p>
227    <p>if a reference to an array is provided, along with the key, those values will be set.</p>
228    <p>e.g.
229    $fig-&gt;key_info($key, \@data); # set the data
230    $data=$fig-&gt;key_info($key); # get the data</p>
231    <p>
232    </p>
233    <h3><a name="get_key_value">get_key_value</a></h3>
234    <p>Given a key and a value will return anything that has both</p>
235    <p>E.g.</p>
236    <pre>
237            my @nonmotile_genomes = $fig-&gt;get_key_value('motile', 'non-motile');
238            my @bluepegs          = $fig-&gt;get_key_value('color', 'blue');</pre>
239    <p>If either the key or the value is ommitted will return all the matching sets.</p>
240    <p>
241    </p>
242    <h3><a name="guess_value_format">guess_value_format</a></h3>
243    <p>There are occassions where I want to know what a value is for a key. I have three scenarios right now:</p>
244    <pre>
245     1. strings
246     2. numbers
247     3. percentiles ( a type of number, I know)</pre>
248    <p>In these cases, I may want to know something about them and do something interesting with them. This will try and guess what the values are for a given key so that you can try and limit what people add. At the moment this is pure guess work, although I suppose we could put some restrictions on t/v pairs I don't feel like.</p>
249    <p>This method will return a reference to an array. If the element is a string there will only be one element in that array, the word ``string''. If the value is a number, there will be three elements, the word ``float'' in position 0, and then the minimum and maximum values. You can figure out if it is a percent :-)</p>
250    <p>
251    </p>
252    <h3><a name="attribute_location">attribute_location</a></h3>
253    <p>This is just an internal method to find the appropriate location of the attributes file depending on whether it is a peg, an rna, or a genome or whatever.</p>
254    <p>
255    </p>

Legend:
Removed from v.1.3  
changed lines
  Added in v.1.4

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3