[Bio] / FigWebPages / pir.html Repository:
ViewVC logotype

View of /FigWebPages/pir.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.5 - (download) (as text) (annotate)
Thu Apr 7 19:16:10 2005 UTC (14 years, 11 months ago) by redwards
Branch: MAIN
Changes since 1.4: +3 -1 lines
many changes to key/value pairs to make them work

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
	<title>SEED/PIR Help Information</title>
	<link rel='stylesheet' title='help' href='css/help.css' type='text/css'>
<h1>Help On SEED/PIR Comparisons</h1>
<center>Rob Edwards April 1, 2005</center>

<p>The SEED/PIR comparisons are controlled by the script <a href="/FIG/pir.cgi">pir.cgi</a>. These are the functions in this script.</p>

<p>There are lots of ways of comparing the data between the SEED and the PIR superfamilies. You can enter the comparison directly using the list menu, can can ask for those proteins that are in many superfamilies or in many subsystems, and you can enter directly via the spreadsheets. This help should guide you on some of the entry points and what the data means. It is accessible from the links on the appropriate pages to remind you when necessary.</p>

<p>First we have a simple comparison of PIR and SEED functions. The list provides a summary of the PIR superfamilies, and the number of PEGS that map to that superfamily directly. There should be a many-to-one relationship here because each superfamily has many pegs in it.<br />

Immediately below the list there are some options to control which superfamilies are displayed in the list:</p>

<a name="menu"><h2>Control menu contents</h2></a>
<p>You can control the number of pegs that the superfamily must contain. Using this option you can limit the list to those superfamilies that have multiple PEGs  in the SEED database, and are therefore likely to be more consistent.
<span class="example">Minimum number of pegs per PIR superfamily shown in list &nbsp; <input type='text' name='min' value='10' size=3 /></div>
<p>You can also choose to show all superfamilies (note, this is the same as setting the minimum number to 1): <span class="example">or show all PIR superfamilies: <input type="checkbox" name="showall" value="on" /></span></p>
<p>The next option is to show the subsystem counts in the menu. This shows the correspondence between PIR superfamilies and SEED subsystems. This doesn't have to be a one-to-one relationship because one superfamily can be represented by more than one subsystem. <br />For example, "Pyrophosphate--fructose 6-phosphate 1-phosphotransferase, alpha subunit (EC" is in subsystems for  "Embden-Meyerhof and Gluconeogenesis" and "Fructose_and_Mannose_metabolism". 
Checking the box will show the correspondence. On some machines this may take a minute or two to compute.<br />
<span class="example"><input type="checkbox" name="showsubsys" value="on" />Show subsystem counts in list</span>

<p>By default the list only shows the Fully annotated PIR superfamilies, and not the preliminary superfamilies. However, you can reverse this, and display only the preliminray superfamilies if desired. The correspondences in preliminary superfamilies may be less well developed. <span class="example"><input type="checkbox" name="preliminary" value="on" />Show only preliminary PIR superfamilies</span>

<p>You can limit the PIR superfamilies shown on the list by some text. The text is a case-insensitive match, and you can search for something like "glutamate" or a superfamily number like "729"</p>

<p>The choices are to update the view which will present the same page, but with the choices that you have selected here, to show the correspondence between PIR superfamily and the SEED, and to reset the list back to the original values. The correspondence is <a href="#correspondence">described below</a></p>

<a name="datatables"><h2>Generate Data Tables</h2></a>

<p>Generating the tables takes about five minutes, and so you need to be patient and wait for the results. Resist the temptation to keep clicking the button.</P>
<p>The two options are to select what types of data are presented. See below for an example of the data that will be returned.</p>

<p>The first choice allows you to decide on the correspondence that you want to see. If you click the box (the default) you will only see those superfamilies that have proteins that are in subsystems. A superfamily has several proteins in it, and some of those may be in subsystems as well as superfamilies. However there are also superfamilies whose proteins are not in subsystems. These are not shown by default because we do not make any assertions about the annotations of those proteins.</p>

<p>The second choice allows you to sort the table that is returned. The data can either be sorted by the "Number of annotations in subsystems" or by the "Number of SEED annotations". See below for a description of these.</P>

<p>The table that is returned will look something like this:

<div class="example">
<table border=1>
		<th>Number of annotations in subsystems</th>
		<th>Number of SEED annotations</th>
		<th>PIRSF<br><small>(Link goes to SEED/PIR comparison)</small></th>

		<th>Superfamily name</th>
		<th>Subsystems in superfamily</th>
		<td><a href="/FIG/pir.cgi?pirsf=PIRSF001370&ssonly='1'&user=''">PIRSF001370</a></td>

		<td>(Full) thiamine diphosphate-dependent enzyme, acetolactate synthase type</td>
		<td>Valine_Biosynthesis; Acetoin_metabolism; Valine_Synthesis; Xanthine_to_Glycine; Allantoin_degradation; Inositol_catabolism</td>
		<td><a href="/FIG/pir.cgi?pirsf=PIRSF002891&ssonly='1'&user=''">PIRSF002891</a></td>

		<td>(Preliminary) rod protein flgF</td>
		<td><a href="/FIG/pir.cgi?pirsf=PIRSF005419&ssonly='1'&user=''">PIRSF005419</a></td>

		<td>(Preliminary) Type III secretion system/flagellar apparatus protein, InvA/LcrD/FlhA type</td>
		<td>Vibrio_Experimental_Type_III_secretion_system_; Flagellum; Type_III_secretion_system</td>
		<td><a href="/FIG/pir.cgi?pirsf=PIRSF004862&ssonly='1'&user=''">PIRSF004862</a></td>

		<td>(Preliminary) probable flagellar basal-body M ring protein</td>
		<td><a href="/FIG/pir.cgi?pirsf=PIRSF006184&ssonly='1'&user=''">PIRSF006184</a></td>

		<td>(Preliminary) flagellar basal body P-ring protein flgI</td>

<p style="font-size: smaller">Note that the header is repeated throughout the table to keep it clear which column is which.</p>

<p>The table contains the following information:</p>
	<li>Number of annotations in subsystems</li>
	<p>This is the nummber of different annotations that this superfamily has, only considering those annotations that are in subsystems. A large number indicates that the superfamily covers proteins with different roles in the subsystems, and there is probably a conflict between the superfamily and the SEED. These are the superfamilies or subsystems or annotations that need most attention.</p>
        <li>Number of SEED annotations</li>
	<p>This is the total number of different annotations that this protein family encompasses in the SEED, including those proteins that are not in subsystems. A lower number is also better, and the excess over the first column probably represents the proteins that have yet to be included in superfamilies yet.</p>
	<p>The number is the number of the PIR superfamily, and the link takes you to the correspondence between this superfamily and the SEED database. See the <a href="#correspondence">correspondence</a> help below.</p>
	<li>Superfamily name</li>
	<p>The name of the superfamily</p>
	<li>Subsystems in superfamily</li>
	<p>The different subsystems that proteins in this superfamily are members of.</p>

<a name="correspondence"><h2>Correspondence between SEED and PIR</h2></a>
<p>An example of the correspondence table is shown below:</p>

<div class="example">
<table border>
Correspondence between SEED and PIR</b></caption>
		<th>PIR Superfamily<br><small>Link goes to PIR<small></th>
		<th>FIG Function</th>
		<th>FIG Subsystem</th>

		<td><a href='http://pir.georgetown.edu/sfcs-cgi/new/pirclassif.pl?id=SF002185'>PIRSF002185</a>(Preliminary) Escherichia coli ribosomal protein L16</td>
		<td>Buchnera aphidicola str. APS (Acyrthosiphon pisum)</td>
		<td><a href=http://www.pir.uniprot.org/cgi-bin/upEntry?id=P57584>uni|P57584</a></td>
		<td><a href=/FIG/protein.cgi?prot=fig|107806.1.peg.492&user='master:RobE'>492</a></td>
		<td style="background: #C0C0C0">LSU ribosomal protein L16p (L10e)</td>

		<td><a href="subsys.cgi?&amp;user='master:RobE'&amp;ssa_name=Ribosome_LSU_bacterial&amp;request=show_ssa">Ribosome LSU bacterial</a></td>
		<td><a href='http://pir.georgetown.edu/sfcs-cgi/new/pirclassif.pl?id=SF002185'>PIRSF002185</a>(Preliminary) Escherichia coli ribosomal protein L16</td>
		<td>Acanthamoeba castellanii</td>
		<td><a href=http://www.pir.uniprot.org/cgi-bin/upEntry?id=P46768>uni|P46768</a></td>

		<td><a href=/FIG/protein.cgi?prot=fig|5755.1.peg.26&user='master:RobE'>26</a></td>
		<td style="background: #FFE4B5">LSU ribosomal protein L16</td>
		<td><a href='http://pir.georgetown.edu/sfcs-cgi/new/pirclassif.pl?id=SF002185'>PIRSF002185</a>(Preliminary) Escherichia coli ribosomal protein L16</td>
		<td>Naegleria gruberi</td>

		<td><a href=http://www.pir.uniprot.org/cgi-bin/upEntry?id=Q9G8Q3>uni|Q9G8Q3</a></td>
		<td><a href=/FIG/protein.cgi?prot=fig|5762.1.peg.26&user='master:RobE'>26</a></td>
		<td style="background: #FFE4B5">LSU ribosomal protein L16</td>
		<td><a href='http://pir.georgetown.edu/sfcs-cgi/new/pirclassif.pl?id=SF002185'>PIRSF002185</a>(Preliminary) Escherichia coli ribosomal protein L16</td>

		<td>Aquifex aeolicus VF5</td>
		<td><a href=http://www.pir.uniprot.org/cgi-bin/upEntry?id=O66438>uni|O66438</a></td>
		<td><a href=/FIG/protein.cgi?prot=fig|224324.1.peg.11&user='master:RobE'>11</a></td>
		<td style="background: #C0C0C0">LSU ribosomal protein L16p (L10e)</td>
		<td><a href="subsys.cgi?&amp;user='master:RobE'&amp;ssa_name=Ribosome_LSU_bacterial&amp;request=show_ssa">Ribosome LSU bacterial</a></td>

		<td><a href='http://pir.georgetown.edu/sfcs-cgi/new/pirclassif.pl?id=SF002185'>PIRSF002185</a>(Preliminary) Escherichia coli ribosomal protein L16</td>
		<td>Guillardia theta</td>
		<td><a href=http://www.pir.uniprot.org/cgi-bin/upEntry?id=O46901>uni|O46901</a></td>
		<td><a href=/FIG/protein.cgi?prot=fig|55529.1.peg.117&user='master:RobE'>117</a></td>
		<td style="background: #FFE4B5">LSU ribosomal protein L16</td>


<p>The table contains the following information:</p>

	<li>PIR Superfamily</li>
	<p>The name of the superfamily and a link to the PIR website that describes the superfamily.</p>
	<p>The name of the genome.</p>
	<p>The UniProt ID and a link to the PIR website describing that protein.</p>
	<p>Just the PEG number in the genome of interest. This is just a shorter link so we use 43 instead of writing out the whole fig id in the form fig|83333.1.peg.43. The link will take you to the SEED protein page.</p>
	<li>FIG Function</li>
	<p>The function that the protein has in the SEED database. Identical functions are colored with the same color so that you can easily identify which proteins have the same function and which do not.</p>
	<li>FIG Subsystem</li>
	<p>All of the subsystems that protein is present in</p>

<p>At the top of the page there is a link to either <span class="example">Show All Matches</span> or <span class="example">Show only matches with a subsystem</span>. In the former case, every match between the PIR superfamily and the SEED database will be shown. In the latter case, only those proteins that are present in subsystems will be shown</p>

<a name="updates"><h2>Update Data</h2></a>
<p>You can download new data from the PIR ftp site and install the data directly. This is a two step process. First, we check to see whether the new file is more current than the old one (there is no point updating otherwise, unless, of course, you have added new genomes). Second, we actually get the data.</p>

<p>Click on the <input type="submit" name="submit" value="Check for updates" /> button to see whether there are new updates. You will then see a page that looks something along these lines (of course, the times will be different!):</p>

<p>This example is for files that are all current.</p>
<div class="example">
<p>The local file is up to date and there is no need to update your source PIR superfamilies.</p><p>The remote file was modified on Fri Jan 14 13:26:47 2005</p><p>The local file was modified on Sat Jan 29 11:31:55 2005</p>
<input type="submit" name="submit" value="Update Anyway" />

<p>If your files are not current, then the message will be something like this:</p>
<div class="example">
<p>The remote file ftp://ftp.pir.georgetown.edu/pir_databases/pirsf/data/pirsfinfo.dat is newer than your current file. You should proceed with the update.</p>
<p>The remote file was modified on Fri Jan 14 13:26:47 2005</p>
<p>The local file was modified on Wed Dec 29 11:31:55 2004</p>
<input type="submit" name="submit" value="Update Data" />

<p>If there is a problem with the internet connection or the file can not be accessed for some reason (e.g. the name is wrong), you will not be given the option to proceed with the update like this:</p>
<div class="example">

<p class="error">Could not connect to PIR to check the status of the PIR file. Please check the location of ftp://ftp.pir.georgetown.edu/pir_databases/pirsf/data/pirsfinfo.dat</p>


<p>Clicking on the "Update Data" or "Update Anyway" buttons will start the download and reinitiate the comparison of the SEED data with the PIR data. The downloading and installation of the data is run in the background using the script 'load_pirsf'  because it takes a signficant amount of time and resources. You can monitor the progress in the <a href="/FIG/seed_ctl.cgi">SEED control panel</a>. While the data is being installed you should really not use the PIR superfamilies. Although they will show up they are being edited, added, and deleted, and are therefore unstable. Installation of the data should take about 10-15 minutes.</p>

<p>Once the update is run, you will see the front page again, however there will be a message telling you that the update is complete.</p>

<a name="spreadsheet"><h2>Subsystem Spreadsheets</h2></a>

<p>The correspondence between PIR and SEED is highlighted in the spreadsheets. A sample of a few columns are shown below. <small>Note this table is for demonstration purposes only and the correspondence will likely change.</small>

<div class="example">
<table border=1>

		<td><b>Genome ID</b></td>
		<td><b>Variant Code</b></td>
		<td><input type="text" name="genome214092.1" value="214092.1" size="15" /></td>
		<td>Yersinia pestis CO92 [B]</td>
		<td><input type="text" name="vcode214092.1" value="0311" size="10" /></td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.2277&user=master:RobE>2277</a></td>
		<td bgcolor="#00FF80"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3343&user=master:RobE>3343</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000544">10</a>] </sup> </td>
		<td bgcolor="#C0C0C0"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3345&user=master:RobE>3345</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF002936">1</a>] </sup> </td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3349&user=master:RobE>3349</a></td>
		<td bgcolor="#C0FF00"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3350&user=master:RobE>3350</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000207">9</a>] </sup> </td>
		<td bgcolor="#FF40C0"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3344&user=master:RobE>3344</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF003010">2</a>] </sup> </td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3504&user=master:RobE>3504</a></td>
		<td bgcolor="#FF8040"><a href=/FIG/protein.cgi?prot=fig|214092.1.peg.3079&user=master:RobE>3079</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF001536">3</a>] </sup> </td>
		<td><input type="text" name="genome223926.1" value="223926.1" size="15" /></td>
		<td>Vibrio parahaemolyticus RIMD 2210633 [B]</td>
		<td><input type="text" name="vcode223926.1" value="0200" size="10" /></td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.1101&user=master:RobE>1101</a></td>
		<td bgcolor="#00FF80"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.296&user=master:RobE>296</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000544">10</a>] </sup> </td>
		<td bgcolor="#C0C0C0"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.292&user=master:RobE>292</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF002936">1</a>] </sup> </td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.2721&user=master:RobE>2721</a></td>
		<td bgcolor="#C0FF00"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.2722&user=master:RobE>2722</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000207">9</a>] </sup> </td>
		<td bgcolor="#FF40C0"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.293&user=master:RobE>293</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF003010">2</a>] </sup> </td>
		<td bgcolor="#FFFFFF"></td>
		<td bgcolor="#FF8040"><a href=/FIG/protein.cgi?prot=fig|223926.1.peg.1150&user=master:RobE>1150</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF001536">3</a>] </sup> </td>
		<td><input type="text" name="genome198214.1" value="198214.1" size="15" /></td>
		<td>Shigella flexneri 2a str. 301 [B]</td>
		<td><input type="text" name="vcode198214.1" value="1311" size="10" /></td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.1222&user=master:RobE>1222</a></td>
		<td bgcolor="#00FF80"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.2595&user=master:RobE>2595</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000544">10</a>] </sup> </td>
		<td bgcolor="#C0C0C0"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.2597&user=master:RobE>2597</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF002936">1</a>] </sup> </td>
		<td bgcolor="#C08080"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.2601&user=master:RobE>2601</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000244">8</a>] </sup> </td>
		<td bgcolor="#C0FF00"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.2602&user=master:RobE>2602</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000207">9</a>] </sup> </td>
		<td bgcolor="#FF40C0"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.2596&user=master:RobE>2596</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF003010">2</a>] </sup> </td>
		<td bgcolor="#FFFFFF"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.4023&user=master:RobE>4023</a></td>
		<td bgcolor="#FF8040"><a href=/FIG/protein.cgi?prot=fig|198214.1.peg.444&user=master:RobE>444</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF001536">3</a>] </sup> </td>

		<td><input type="text" name="genome272558.1" value="272558.1" size="15" /></td>
		<td>Bacillus halodurans C-125 [B]</td>
		<td><input type="text" name="vcode272558.1" value="2200" size="10" /></td>
		<td bgcolor="#FFFFFF"></td>
		<td bgcolor="#00FF80"><a href=/FIG/protein.cgi?prot=fig|272558.1.peg.1489&user=master:RobE>1489</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000544">10</a>] </sup> </td>
		<td bgcolor="#FFFFFF"></td>
		<td bgcolor="#C08080"><a href=/FIG/protein.cgi?prot=fig|272558.1.peg.610&user=master:RobE>610</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000244">8</a>] </sup> </td>
		<td bgcolor="#C0FF00"><a href=/FIG/protein.cgi?prot=fig|272558.1.peg.609&user=master:RobE>609</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF000207">9</a>] </sup> </td>
		<td bgcolor="#FFFFFF"></td>
		<td bgcolor="#FFFFFF"></td>
		<td bgcolor="#FFC040"><a href=/FIG/protein.cgi?prot=fig|272558.1.peg.111&user=master:RobE>111</a>, <a href=http://theseed.uchicago.edu/FIG/protein.cgi?prot=fig|272558.1.peg.112&user=master:RobE>112</a> &nbsp; <sup> [<a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF005520">5</a>, <a href="pir.cgi?&amp;user=master:RobE&amp;pirsf=PIRSF001536">3</a>] </sup> </td>

<p>The columns of the table are colored based on the superfamilies that the proteins are in, and in theory each column should be the same color and complete throughout. 
<br />
Note that the small numbers that are slightly superscripted <span class="example"><sup style="margin: 1px"> [<a href="/FIG/pir.cgi?&user=master:RobE&pirsf=PIRSF005520">5</a>] </SUP></span> are linked to the PIR <a href="#correspondence">correspondence</a> table so you can click through and see proteins missing from either side as described above.</p>

<p>This example demonstrates this different aspects of the PIR/SEED interactions:</p>
<li>Protein in SEED subsystem but not in a PIR superfamily.</li>
<br>cysB is a conserved family in SEED but there is no corresponding PIR superfamily for these yet</br>
<li>Protein in SEED subsystem and in PIR superfamily.</li>
<br>cysC and cysJ are consistent across both PIR superfamilies and SEED subsystems.</br>
<li>Proteins in a PIR superfamily but not in a SEED subsystem</li>
<br>You can't actually see these from this table, you have to look at the <a href="#correspondence">correspondence</a> table.</br>
<li>Proteins in SEED subsystem but missing from the PIR superfamily.</li>
<br>For example, cysI is missing from the superfamily for the first two organisms</br>
<li>Duplicate proteins</li>
<br>Some proteins like cysS either have two proteins in one cell or may have one protein mapping to two superfamilies. These are shown as more than one entry in the superscript</br>


MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3