[Bio] / FigKernelPackages / SAPtutorial.pm Repository:
ViewVC logotype

Diff of /FigKernelPackages/SAPtutorial.pm

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.5, Thu Aug 6 16:00:43 2009 UTC revision 1.6, Mon Aug 10 17:42:33 2009 UTC
# Line 239  Line 239 
239  for the source type. In this case, however, care must be taken, because the same  for the source type. In this case, however, care must be taken, because the same
240  identifier can have different meanings in different databases.  identifier can have different meanings in different databases.
241    
242  =head2 Normal Mode Examples  =head2 Two Normal-Mode Examples
243    
244  The following examples use the Sapling Server in normal mode: that is, data  The following examples use the Sapling Server in normal mode: that is, data
245  is sent to the server in batches and the results are stitched together  is sent to the server in batches and the results are stitched together
# Line 495  Line 495 
495      360108.3    RNA processing and degradation, bacterial   85840820, 85841283      360108.3    RNA processing and degradation, bacterial   85840820, 85841283
496      360108.3    Recycling of Peptidoglycan Amino Acids      85842019, 85842272      360108.3    Recycling of Peptidoglycan Amino Acids      85842019, 85842272
497    
498    =head2 A More Complicated Example: Operon Upstream Regions
499    
500    In this example we'll string several services together to perform a more
501    complex task: locating the upstream regions of operons involved in a particular
502    metabolic pathway. The theory is that we can look for a common pattern in
503    these regions.
504    
505    A metabolic pathway is a subsystem, so we'll enter our database via the
506    subsystems. To keep the data manageable, we'll limit our results to
507    specific genomes. The program we write will take as input a subsystem name
508    and a file of genome IDs.
509    
510    The worst part of the task is finding the operon for a gene. This involves
511    finding all the genes in a neighborhood and isolating the ones that point in
512    the correct direction. Fortunately, there is a Sapling Server function--
513    L<SAP/make_runs>-- that specifcally performs this task.
514    
515    To start our program, we create a L<SAPserver> object and pull the subsystem
516    name from the command-line parameters. This program is going to be doing a
517    lot of complicated, long-running stuff, so we'll usually want to deal with one
518    result at a time. To facilitate that, we construct the server helper in
519    singleton mode.
520    
521        use strict;
522        use SAPserver;
523    
524        my $sapServer = SAPserver->new(singleton => 1);
525        # Get the subsystem name.
526        my $ssName = $ARGV[0];
527    
528    Our main loop will read through the list of genomes from the standard input
529    and call a method I<PrintUpstream> to process each one. We're going to be a bit
530    cheesy here and allow our genome loop to stop on the first blank line.
531    
532        while (my $genomeID = <STDIN>) {
533            chomp $genomeID;
534            PrintUpstream($sapServer, $ssName, $genomeID);
535        }
536    
537    Now we need to write I<PrintUpstream>. Its first task is to find all the
538    genes in the genome that belong to the subsystem. A single call to
539    L<SAP/ids_in_subsystems> will get this information. We then feed the
540    results into L<SAP/make_runs> to get operons and call L<SAP/upstream> for
541    each operon. The program is given below.
542    
543        sub PrintUpstream {
544            my ($sapServer, $ssName, $genomeID) = @_;
545            # Because we specify "roleForm => none", we get back one long
546            # gene list.
547            my $geneList = $sapServer->ids_in_subsystems(subsystems => $ssName,
548                                                         genome => $genomeID,
549                                                         roleForm => 'none');
550            # Convert the gene list to a comma-delimited string.
551            my $geneString = join(", ", @$geneList);
552            # Get a list of operons.
553            my $opList = $sapServer->make_runs(groups => $geneString);
554            # Loop through the operons.
555            for my $opString (@$opList) {
556                # Get the first feature's ID.
557                my ($front) = split /\s*,/, $opString, 2;
558                # Grab its upstream region. We'll include the operon string as the
559                # comment text.
560                my $fasta = $sapServer->upstream(ids => $front,
561                                                 comments => { $front => $opString },
562                                                 skipGene => 1);
563                # Print the result.
564                print "$fasta\n";
565            }
566        }
567    
568    
569  =cut  =cut
570    
571  1;  1;

Legend:
Removed from v.1.5  
changed lines
  Added in v.1.6

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3