[Bio] / FigWebPages / RTMg.html Repository:
ViewVC logotype

View of /FigWebPages/RTMg.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Wed Jul 1 01:01:18 2009 UTC (10 years, 3 months ago) by redwards
Branch: MAIN
CVS Tags: rast_rel_2014_0912, rast_rel_2010_0928, rast_rel_2009_0925, rast_rel_2010_0827, rast_rel_2014_0729, rast_rel_2009_07_09, myrast_33, rast_rel_2011_0928, rast_rel_2010_0526, rast_rel_2010_1206, rast_rel_2010_0118, rast_rel_2011_0119, HEAD
Adding the cgi, js, html, and Web services files for the real time metagenomics pages.

<html>
<head>
<title>Real Time Metagenomics</title>
  <link rel="stylesheet" type="text/css" href="css/RTMg.css" />
  <script language="JavaScript" src="./css/sorttable.js" > </script>
 </head>

<body>
<div id="header"> &nbsp; </div>
<div id="header"><a href="../RTMg.cgi" class="help">Return to the server</a></div>
<div id="title">Real Time Metagenomics FAQ!</div>

<h2>What is it?</h2>

<p>Tired of waiting for your metagenome to be annotated? So were we. Luckily, Argonne has some of the fastest computers and best brains. Here we present a new way to annotate metagenomes - the real time metagenomics annotation server.</p>

<h2>How does it work?</h2>

<p>You upload your fasta formatted file, and our servers get to work, chunking it up and annotating your data. We separate it out to smaller blocks, do what we do, assign a function to your sequences, and send the results back ASAP! Its that simple! You get to see what functions, subsystems, and organisms are in your metagenome within minutes of uploading it to our system, You can export the data to your favorite editor, and you can also get a list of all of your sequence id's and what they are similar to.</p>

<h2>Why doesn't my sequence work?</h2>

<p>By far the most common failure we see is because your data is not in valid fasta format. Please save your file as atext file &em; not binary, not word, not open office &em; preferably with unix line breaks. At this time, we will not report back if your file is not in valid fasta format, we may just leave you thinking we're thinking...</p>

<h2>Do you have a sample file that works?</h2>

<p>Try <a href="http://bioseed.mcs.anl.gov/~redwards/51.hits_small.fa">this sample</a>, which is just a small part of a metagenome.</p>

<h2>What's the difference between <i>Assigning functions to sequences</i> and <i>Assigning functions to subsystems</i>?</h2>

<p>First, we assign a function to your sequence. That is done quickly and efficiently using our huge compute power. Then we aggregate just those functions into the subsystems that they represent. Because we don't know what subsystems we need to look up, this is slightly slower. We are optimizing the process, and soon they will both be instant!</p>


<h2>Do you save my data?</h2>

<p>No, we don't. Everything is deleted once we process the data. Nothing is saved.</p>

<h2>Does that mean this service is secure?</h2>

<p>No. We delete the files, but that doesn't mean we won't look at them as they are being processed to see what's going on. We're not making any assertions about the security of your data.</p>

<h2>Can I rerun my sample?</h2>

<p>Yes. It's easy, just upload the fasta file again and it will get re-run. Since we don't save the data, we don't have a mechanism to rerun it for you!</p>

<h2>Where do I enter my username?</h2>

<p>You don't! No registration is needed. Just upload a fasta file. We don't save your data, and we don't password protect it.</p>

<h2>Where do I get my data once it is processed?</h2>

<p>The data is processed in real time. If you close your browser window, the data will be lost. Leave the window open and the data will appear.</p>
 
<h2>Can I see some sample data sets?</h2>

<p>Yes! We have run all the <a href="http://metagenomics.nmpdr.org/">public metagenomes</a> through this server, and saved the results. They will be available as soon as Rob can figure out where to put them! The format of these is essentially the same as the text dump from the output, although we added job number details.</p>

<p>We also counted how long it took to annotate these metagenomes, so you can get an approximate idea of how long it will take to process your sample. Notice that 3,847 seconds is slightly over an hour for a 400 Mbp metagenome. However, because the results start coming back right away, that hour passes in mere minutes <small>(about 60 of them, plus/minus).</small></p>
<table class='sortable' id='timing'>
<thead>
<tr><th>Run</th><th>Number of Sequences</th><th>bp of sequence data</th><th>Time to run (s)</th></tr>
</thead>
<tr><td>1</td><td>4,645</td><td>465,209</td><td>6.05</td></tr>
<tr><td>2</td><td>715</td><td>755,429</td><td>7.56</td></tr>
<tr><td>3</td><td>730</td><td>796,793</td><td>8.03</td></tr>
<tr><td>4</td><td>12,446</td><td>1,190,841</td><td>16.61</td></tr>
<tr><td>5</td><td>2,947</td><td>2,380,900</td><td>20.57</td></tr>
<tr><td>6</td><td>39,807</td><td>3,375,494</td><td>29.39</td></tr>
<tr><td>7</td><td>6,797</td><td>6,091,740</td><td>46.84</td></tr>
<tr><td>8</td><td>12,686</td><td>8,016,534</td><td>71.34</td></tr>
<tr><td>9</td><td>9,017</td><td>8,764,614</td><td>72.52</td></tr>
<tr><td>10</td><td>12,821</td><td>8,214,974</td><td>73.43</td></tr>
<tr><td>11</td><td>94,915</td><td>10,283,401</td><td>93.54</td></tr>
<tr><td>12</td><td>9,958</td><td>14,499,070</td><td>131.95</td></tr>
<tr><td>13</td><td>85,527</td><td>18,994,386</td><td>133.83</td></tr>
<tr><td>14</td><td>267,640</td><td>27,366,887</td><td>226.71</td></tr>
<tr><td>15</td><td>289,723</td><td>30,795,962</td><td>304.3</td></tr>
<tr><td>16</td><td>154,069</td><td>35,762,224</td><td>330.34</td></tr>
<tr><td>17</td><td>399,343</td><td>41,653,979</td><td>345.49</td></tr>
<tr><td>18</td><td>50,096</td><td>52,667,848</td><td>444.82</td></tr>
<tr><td>19</td><td>61,020</td><td>64,230,062</td><td>599.17</td></tr>
<tr><td>20</td><td>770,825</td><td>80,663,537</td><td>658.85</td></tr>
<tr><td>21</td><td>101,558</td><td>105,196,135</td><td>1003.29</td></tr>
<tr><td>22</td><td>138,347</td><td>154,475,569</td><td>1332.65</td></tr>
<tr><td>23</td><td>189,052</td><td>205,008,796</td><td>1937.84</td></tr>
<tr><td>24</td><td>293,065</td><td>290,371,756</td><td>2354</td></tr>
<tr><td>25</td><td>296,355</td><td>315,151,139</td><td>2870.37</td></tr>
<tr><td>26</td><td>359,152</td><td>391,694,924</td><td>3847.02</td></tr>
</table>

<h2>When I export my data, what do those columns mean?</h2>

<p>The columns are:</p>
<ol>
<li><span id="label">Sequence ID</span> The sequence IDs from the fasta file that you supplied. According to fasta format, this is the non-space characters immediately following the &gt;</li>
<li><span id="label">Start of hit</span> The start position on your sequence for the match to a protein in our database</li>
<li><span id="label">End of hit</span> The end position on your sequence for the match to a protein in our database. If the end is less than the start, the match is on the opposite strand!</li>
<li><span id="label">OTU</span> The organism that your sequence is most similar to. Note that occasionally we can't accurately place a hit on the taxonomic tree, because it matches proteins from many different things. In the web page, we explicitly say that, but in the table it just says "null" so that you can compute on the numbers easier!</li>
<li><span id="label">Function</span> The function that the protein is doing. This is what we propose your sequence is doing.</li>
<li><span id="label">Level 1</span> The top-level subsystem classification.</li>
<li><span id="label">Level 2</span> The second-level subsystem classification. Note that the second level classification does not necessarily have to be unique, but the tple of [Level 1, Level 2] is unique. Don't worry if you don't really understand what that means, ask your programmer friend!</li>
<li><span id="label">Subsystem</span> The subsystem that the function plays a role in.</li>
</ol>


<h2>Why don't the numbers add up?</h2>

<p>We can't count!</p>
<p>In addition, each sequence can be assigned more than one function, and each function can be in more than one subsystem. If you look at the raw data using the "Export Hits" button, you will see occassionally that the sequence IDs appear more than once, and then if you scan along the row you will see examples of both these problems - more than one function, and more than one subsystem. How you use the data is upto you. Typically, we count all the functions, and all the subsystems, and then normalize to the total number. That's what the web pages are set up to display. However, we suggest that you grab the raw data, and parse it out yourself</p>


<h2>Why are some of my sequence IDs missing?</h2>

<p>We forgot about them!</p
<p>In addition, not every one of your sequences will be assigned a function. Depending on the source of your sample it may be as few as &lt; 1% or as many as &gt; 50% of your sequences.</p>

<h2>Can you add my metagenome to the list of publicly available metagenomes?</h2>

<p>Yes. The simplest way is to run it through the <a href="http://metagenomics.nmpdr.org/">metagenomics RAST</a> site, and make it publicly available there. Next time we compute these, we'll run it and include it.</p>
 


MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3