[Bio] / FigKernelScripts / svr_find_regulatory_proteins.pl Repository:
ViewVC logotype

View of /FigKernelScripts/svr_find_regulatory_proteins.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Sun Nov 7 15:09:00 2010 UTC (9 years ago) by overbeek
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, myrast_rel40, mgrast_dev_05262011, mgrast_dev_04082011, mgrast_version_3_2, mgrast_dev_12152011, mgrast_dev_06072011, rast_rel_2014_0729, mgrast_dev_02212011, rast_rel_2010_1206, mgrast_release_3_0, mgrast_dev_03252011, rast_rel_2011_0119, mgrast_release_3_0_4, mgrast_release_3_0_2, mgrast_release_3_0_3, mgrast_release_3_0_1, mgrast_dev_03312011, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, mgrast_dev_04132011, mgrast_dev_04012011, myrast_33, rast_rel_2011_0928, mgrast_dev_04052011, mgrast_dev_02222011, mgrast_dev_10262011, HEAD
a simple tool to find potential regulatory proteins

use strict;
use Data::Dumper;
use Carp;

#
# This is a SAS Component
#


=head1 svr_find_regulatory_genes 

Find potential regulatory proteins

------

Example:

    svr_find_regulatory_genes fasta.of.protein.sequences > ids.and.functions

------

=cut

use SeedUtils;

my $in = shift @ARGV;
($in && (-s $in))
    || die "You need to specify an input file of protein sequences in fasta format (as a command line argument)";

my @patterns = map { chomp; $_ } <DATA>;
open(FIND,"svr_assign_using_figfams < $in 2> /dev/null |") || die "could not open $in";
while (defined($_ = <FIND>))
{
    chomp;
    my($hits,$peg,$func) = split(/\t/,$_);
    if ($hits >= 5)
    {
	my $i;
	for ($i=0; ($i < @patterns) && ($func !~ /$patterns[$i]/i); $i++) {}
	if ($i < @patterns)
	{
	    print "$peg\t$func\n";
	}
    }
}

__DATA__
repressor
activator
Transcription factor
regulator
transcriptional reg
transcription reg
regulat.*protein
regulator of
histidine kinase
signal.*transduct
response regulator
two[- ]components.*system
cAMP signaling
Adenylate cyclases, cAMP-binding domains
adenylate cyclase
diguanylate cyclase
GGDEF domain
PAS/PAC sensor
EAL domain
Methyl-accepting
MCP-domain
protein kinase
protein phosphatase
Phytochrome
sigma.*factor
stringent response
ppgpp
guanosine-3\',5\'\-bis

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3