[Bio] / FigWebPages / Attributes.html Repository:
ViewVC logotype

Annotation of /FigWebPages/Attributes.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (view) (download) (as text)

1 : redwards 1.1 <h1 style="text-align: center">Attributes</h1>
2 : redwards 1.4 <h2 style="text-align: center">Updated July 11th, 2005. Rob Edwards</h2>
3 :    
4 :    
5 :     <ul>
6 :     <h3 style="text-align: center">Contents</h3>
7 :     <li><a href="#overview">Overview</a></li>
8 :     <li><a href="#definitions">Definitions</a></li>
9 :     <li><a href="#methods">Methods for accessing attributes</a></li>
10 :     <li><a href="#get_attributes">get_attributes</a></li>
11 :     <li><a href="#add_attribute">add_attribute</a></li>
12 :     <li><a href="#delete_attribute">delete_attribute</a></li>
13 :     <li><a href="#change_attribute">change_attribute</a></li>
14 :     <li><a href="#erase_attribute_entirely">erase_attribute_entirely</a></li>
15 :     <li><a href="#get_keys">get_keys</a></li>
16 :     <li><a href="#get_values">get_values</a></li>
17 :     <li><a href="#key_info">key_info</a></li>
18 :     <li><a href="#get_key_value">get_key_value</a></li>
19 :     <li><a href="#guess_value_format">guess_value_format</a></li>
20 :     <li><a href="#attribute_location">attribute_location</a></li>
21 :     </ul>
22 : redwards 1.1
23 :     <p>I have added attributes to the database in a more significant way. This page is to document those attributes and ways to access/modify them. The page has two sections, a non-technical section for general discussion and overview, and a technical section for behind-the-scenes type information.</p>
24 :    
25 :     <p>Most people should read the first section and ignore the second section.</p>
26 :    
27 : redwards 1.4 <p>A comment on nomenclature: I use the term key/value pairs and attributes interchangeably. Something can have an attribute, and you have to say what that is and what its value is. We also have an idea that an attribute can be a URL, and if so, it should be presented as a URL. So we actually have key, value, value. You will see this in action. The third element is called URL, but we always check and make sure that it begins http before turning it into a URL, and so we reserve the option of renaming this and making it something else. That will be mentioned here.</p>
28 : redwards 1.1
29 : redwards 1.4 <h3><a name="overview">Overview</a></h3>
30 : redwards 1.1
31 : redwards 1.4 <p>We have extended the notion of key/value pairs beyond things associated with a peg and into the arena of anything. Any feature such as peg, prophage, rna, insertion element, and so on, can have a key/value pair associated with it. In addition, <em>genomes</em> have key/value pairs associated with them. In this sense, we can annotate the organisms from which the genomes were derived and begin to ask complex questions of the type "show me all organisms that are motile but don't have any flagellar genes". We are working on this interface.<p>
32 :    
33 :     <p>The key/value pairs are designed to be "lightweight" objects ideal for data mining rather than the rich, complex objects associated with annotations. If you are curating individual proteins you should probably do that using the annotation links on <a href="/FIG/protein.cgi">protein.cgi</a> since those allow tracking of who does what, and when. In contrast the key/value pairs will likely be loaded in batch from the command line without regard for overwriting other values!</p>
34 : redwards 1.1
35 : redwards 1.3 <p>Try the following exercises to see key/value pairs in action:</p>
36 :    
37 :     <ul>
38 :     <li>Choose an organism from the FIG search page and select statistics to see the list. There is an option at the bottom of the page to edit the key/value pairs, and this will pull up a table where you can enter the information for an organim.
39 :     </li>
40 :    
41 :     <li>Open the <a href="http://localhost/FIG/subsys.cgi?user=&ssa_name=Flagellum&request=show_ssa&can_alter=">Flagellum subsytem</a>, and scroll to the checkboxes/buttons at the bottom. There are two pulldown lists, from the first one (labeled "color rows by each organism's attribute" choose MOTILE), and click show spreadsheet. The sheet is now highlighted with motile and non-motile organisms that have flagella. This view is also helped by decresing the text size from the view menu. There is a key at the bottom just above the "show spreadsheet" button so you know which color is which, and in this case there is only motile and non-motile. This key is also an active link that will limit the display of the spreadsheet to just those particular organisms that you have highlighted.</li>
42 :    
43 :     <li>Now choose WIDTH from the same pull down menu, and click show spreadsheet. Because width is a numeric variable, I grouped these key/value pairs in 1/10ths of the maximum. If you look at the Color Descriptions box you will see ranges (this is not perfect at the moment, but it is on the way).</li>
44 :    
45 : redwards 1.4 <li>Now reset the WIDTH pull-down menu to empty (the first option in the list), and choose PIRSF from the menu labelled "color columns by each PEGs attribute" and click show spreadsheet. This is the same as before, but hopefully we can add more keys here and color other things.</li>
46 : redwards 1.3
47 :     <li>From one of the PEGs that is colored as having a PIRSF link click on the link to get to the protein page. There is the attributes box (as before), and a new "Edit Attributes" button. When you click this, you will get three fields, key, value, and URL. If you go to a protein that does not have any attributes yet, you still get the edit box to let you add some attributes.</li>
48 :    
49 : redwards 1.4
50 : redwards 1.3 </ul>
51 :    
52 : redwards 1.1
53 : redwards 1.4 <h3><a name="definitions">Definitions</a></h3>
54 : redwards 1.2
55 : redwards 1.4 These are the definition of attributes in the SEED and describes the locations and implementations of the files and directories used to store and retrieve attributes.
56 : redwards 1.1
57 :    
58 :     <ol>
59 : redwards 1.4 <li style="font-weight: 700">Attributes have the following four fields</li>
60 :     <ul>
61 :     <li><em>ID</em>. This is usually a gene id or genome id but doesn't <i>have</i> to be.</li>
62 :     <li><em>Key</em>. This is the key. The key should be unique (but doesn't have to be) and we will provide a method through the clearinghoouse to allow you to register a key and/or check whether someone else has assigned a key.
63 :     <ul>
64 :     <li>The key does not have to be unique, but this will assist in the exchange of data between machines.</li>
65 :     <li>Keys are case sensitive</li>
66 :     <li>An optional mapping is provided between a key and an explanation of what the key means (see below)</li>
67 :     <li>By default, any key can have multiple values. If a key is to have only one value then a boolean can be set (see below) to limit this behavior</li>
68 :     </ul>
69 :     <li><em>Value</em>. The value is free form and there are no limitations on what is contained in the value.
70 :     <li><em>URL</em>. The URL is optional, and not required for any data set.
71 :     </ul>
72 :     <br>
73 :     <li style="font-weight: 700">File Locations</li>
74 :     <ul>
75 :     <li><em>General Attributes</em> Attributes are stored in the following locations:</li>
76 :     <ul>
77 :     <li>$FIG_Config::organisms/xxxxx/Attributes contains the genome and organism attributes</li>
78 :     <li>$FIG_Config::organisms/xxxxx/Features/peg/Attributes contains the attributes for pegs</li>
79 :     <li>$FIG_Config::organisms/xxxxx/Features/rna/Attributes contains the attributes for rnas... etc</li>
80 :     <li>Note that no general attributes should be stored in $FIG_Config::global (see below)</li>
81 :     </ul>
82 :     <br>
83 :     <li><em>Deleted Attributes</em>
84 :     <ul>
85 :     <li>Deleted attributes are stored in the text file $FIG_Config::global/Attributes/deleted_attributes. The only information that is stored here is the ID and the key. Note that this will currently delete all occurences of this key from this ID (hence with multiple values, all will be deleted).</li>
86 :     </ul>
87 :     <br>
88 :     <li><em>Metadata</em></li>
89 :     <ul>
90 :     <li>Metadata associated with a key is stored in $FIG_Config::global/Attributes/attribute_keys</li>
91 :     <li>This file has the following format, with the columns separated by tabs:</li>
92 :     <ol>
93 :     <li>key</li>
94 :     <li>single datum only. A boolean, if set will limit the data associated with the key to a single datum, otherwise the key will be assumed to allow multiple data sets. Note that this is for information only and we will store all the data associated with a key</li>
95 :     <li>Other information about the key (e.g. name of experiment, experimental details, etc).</li>
96 :     </ol>
97 :     </ul>
98 :     </ul>
99 : redwards 1.1 </ol>
100 :    
101 :    
102 :    
103 : redwards 1.4 <h3><a name="methods">Methods for accessing attributes</a></h3>
104 :     <p>The attributes methods have now been rewritten for handling all kinds of attributes. The key/value pairs can be associated with a feature like a peg, rna, or prophage, or a genome.</p>
105 :     <p>There are several base attribute methods:</p>
106 :     <pre>
107 :     get_attributes
108 :     add_attribute
109 :     delete_attribute
110 :     change_attribute</pre>
111 :     <p>There are also methods for more complex things:</p>
112 :     <pre>
113 :     get_keys
114 :     get_values
115 :     guess_value_format</pre>
116 :     <p>By default all keys are case sensitive, and all keys have leading and trailing white space removed.</p>
117 :     <p>Attributes are not on a 1:1 correlation, so a single key can have several values.</p>
118 :     <p>
119 :     </p>
120 :     <h3><a name="get_attributes">get_attributes</a></h3>
121 :     <p>Get attributes requires one of four keys:
122 :     fid (which can be genome, peg, rna, or other id),
123 :     key,
124 :     value,
125 :     url</p>
126 :     <p>It will find any attribute that has the characteristics that you request, and if any values match it will return a four-ple of:
127 :     [fid, key, value, url]</p>
128 :     <p>You can request an E. coli key like this
129 :     $fig-&gt;get_attributes('83333.1');</p>
130 :     <p>You can request any PIRSF key like this
131 :     $fig-&gt;get_attributes('', 'PIRSF');</p>
132 :     <p>You can request any google url like this
133 :     $fig-&gt;get_attributes('', '', '', 'http://www.google.com');</p>
134 :     <p>NOTE: If there are no attributes an empty array will be returned. You need to check for this and not assume that it will be undef.</p>
135 :     <p>
136 :     </p>
137 :     <h3><a name="add_attribute">add_attribute</a></h3>
138 :     <p>Add a new key/value pair to something. Something can be a genome id, a peg, an rna, prophage, whatever.</p>
139 :     <p>Arguments:</p>
140 :     <pre>
141 :     feature id, this can be a peg, genome, etc,
142 :     key name. This is case sensitive and has the leading and trailing white space removed
143 :     value
144 :     optional URL to add
145 :     optional file to store the attributes in.</pre>
146 :     <p>A note on file names. At the moment the file assigned_attributes is used to store new attributes by default, and load_attributes loads that file last so any changes will overwrite existing keys. However this is not quite true since we can now have multiple key/values for a single peg. Using this method you can define a filename to store the attributes in. The directory structure will be figured out for you, so you can use something like ``pirsf'' as the file name.</p>
147 :     <p>
148 :     </p>
149 :     <h3><a name="delete_attribute">delete_attribute</a></h3>
150 :     <p>Remove a key from a feature.</p>
151 :     <pre>
152 :     Arguments:
153 :     feature id, this can be a peg, genome, etc,
154 :     key name to delete</pre>
155 :     <pre>
156 :     Deleted attributes are stored in global/deleted_attributes</pre>
157 :     <p>
158 :     </p>
159 :     <h3><a name="change_attribute">change_attribute</a></h3>
160 :     <pre>
161 :     Change the value of a key/value pair (and optionally its url).</pre>
162 :     <pre>
163 :     Arguments:
164 :     feature id, this can be a peg, genome, etc,
165 :     key name whose value to replace
166 :     value to replace it with
167 :     optional URL to add
168 :     optional file to store the changes in.</pre>
169 :     <p>See the note in add_attributes about files. Almost always you should not include this so that the default (assigned_attributes) is used as it is loaded last. However, this allows you to change the file if you wish.</p>
170 :     <p>Returns 0 on error and 1 on success.</p>
171 :     <p>
172 :     </p>
173 :     <h3><a name="erase_attribute_entirely">erase_attribute_entirely</a></h3>
174 :     <p>This method will remove any notion of the attribute that you give it. It is different from delete as that just removes a single attribute associated with a peg. This will remove the files and uninstall the attributes from the database so there is no memory of that type of attribute. All of the attribute files are moved to FIG_Tmp/Attributes/deleted_attributes, and so you can recover the data for a while. Still, you should probably use this carefully!</p>
175 :     <p>I use this to clean out old PIR superfamily attributes immediately before installing the new correspondence table.</p>
176 :     <p>e.g. my $status=$fig-&gt;erase_attribute_entirely(``pirsf'');</p>
177 :     <p>This will return the number of files that were moved to the new location</p>
178 :     <p>
179 :     </p>
180 :     <h3><a name="get_keys">get_keys</a></h3>
181 :     <p>Get all the keys that we know about.</p>
182 :     <p>Without any arguments:</p>
183 :     <p>Returns a reference to a hash, where the key is the type of feature (peg, genome, rna, prophage, etc), and the value is a reference to a hash where the key is the key name and the value is a reference to an array of all features with that id.</p>
184 :     <p>e.g.</p>
185 :     <p>print ``There are '' , scalar @{{$fig-&gt;get_keys}-&gt;{'peg'}-&gt;{'PIRSF'}}, `` PIRSF keys in the database\n'';</p>
186 :     <p>my $keys=$fig-&gt;get_keys;
187 :     foreach my $type (keys %$keys)
188 :     {
189 :     foreach my $label (keys %{$keys-&gt;{$type}})
190 :     {
191 :     foreach my $peg (@{$keys-&gt;{$type}-&gt;{$label}})
192 :     {
193 :     .. do something to each peg and genome here
194 :     }
195 :     }
196 :     }</p>
197 :     <p>With an argument (that should be a recognized type like peg, rna, genome, etc):</p>
198 :     <p>Returns a reference to a hash where the key is the key name and the value is the reference to the array. This should use less memory than above.
199 :     The argument should be (currently) peg, rna, pp, genome, or any other recognized feature type (generally defined as the .peg. in the fid). The default is to return all keys, and this can also be specified with all</p>
200 :     <p>
201 :     </p>
202 :     <h3><a name="get_values">get_values</a></h3>
203 :     <p>Get all the values that we know about</p>
204 :     <p>Without any arguments:</p>
205 :     <p>Returns a reference to a hash, where the key is the type of feature (peg, genome, rna, prophage, etc), and the value is a reference to a hash where the key is the value and the value is the number of occurences</p>
206 :     <p>e.g. print ``There are '' , {$fig-&gt;get_values}-&gt;{'peg'}-&gt;{'100'}, `` keys with the value 100 in the database\n'';</p>
207 :     <p>With a single argument:</p>
208 :     <p>The argument is assumed to be the type (rna, peg, genome, etc).</p>
209 :     <p>With two arguments:</p>
210 :     <p>The first argument is the type (rna, peg, genome, etc), and the second argument is the key.</p>
211 :     <p>In each case it will return a reference to a hash.</p>
212 :     <p>E.g.</p>
213 :     <pre>
214 :     $fig-&gt;get_values(); # will get all values</pre>
215 :     <pre>
216 :     $fig-&gt;get_values('peg'); # will get all values for pegs</pre>
217 :     <pre>
218 :     $fig-&gt;get_values('peg', 'pirsf'); # will get all values for pegs with attribute pirsf</pre>
219 :     <pre>
220 :     $fig-&gt;get_values(undef, 'pirsf'); # will get all values for anything with that attribute</pre>
221 :     <p>
222 :     </p>
223 :     <h3><a name="key_info">key_info</a></h3>
224 :     <p>Access a reference to an array of [single, explanation]</p>
225 :     <p>Single is a boolean, if it is true only the last value returned should be used. Note that the other methods willl still return all the values, it is upto the implementer to ensure that only the last value is used.</p>
226 :     <p>Explanation is a user-derived explanation that can be defined.</p>
227 :     <p>if a reference to an array is provided, along with the key, those values will be set.</p>
228 :     <p>e.g.
229 :     $fig-&gt;key_info($key, \@data); # set the data
230 :     $data=$fig-&gt;key_info($key); # get the data</p>
231 :     <p>
232 :     </p>
233 :     <h3><a name="get_key_value">get_key_value</a></h3>
234 :     <p>Given a key and a value will return anything that has both</p>
235 :     <p>E.g.</p>
236 :     <pre>
237 :     my @nonmotile_genomes = $fig-&gt;get_key_value('motile', 'non-motile');
238 :     my @bluepegs = $fig-&gt;get_key_value('color', 'blue');</pre>
239 :     <p>If either the key or the value is ommitted will return all the matching sets.</p>
240 :     <p>
241 :     </p>
242 :     <h3><a name="guess_value_format">guess_value_format</a></h3>
243 :     <p>There are occassions where I want to know what a value is for a key. I have three scenarios right now:</p>
244 :     <pre>
245 :     1. strings
246 :     2. numbers
247 :     3. percentiles ( a type of number, I know)</pre>
248 :     <p>In these cases, I may want to know something about them and do something interesting with them. This will try and guess what the values are for a given key so that you can try and limit what people add. At the moment this is pure guess work, although I suppose we could put some restrictions on t/v pairs I don't feel like.</p>
249 :     <p>This method will return a reference to an array. If the element is a string there will only be one element in that array, the word ``string''. If the value is a number, there will be three elements, the word ``float'' in position 0, and then the minimum and maximum values. You can figure out if it is a percent :-)</p>
250 :     <p>
251 :     </p>
252 :     <h3><a name="attribute_location">attribute_location</a></h3>
253 :     <p>This is just an internal method to find the appropriate location of the attributes file depending on whether it is a peg, an rna, or a genome or whatever.</p>
254 :     <p>
255 :     </p>

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3