Purpose: This tutorial demonstrates how to annotate a set of closely related genomes using KBase tools. For example, the genus Geobacter can be used to study the pan-genome and construct metabolic models to clarify differences between strains.
Required Prerequisite Activities: Getting Started with KBase
Suggested Prerequesite Activities: Basic Exercises Using KBase
Related Tutorials: None
all_entities_Genome -f scientific_name | grep GeobacterGeobacter sulfurreducens KN400 is a good example genome to begin with. Its KBase id is kb|g.2860. Locate it in the list of genomes. Obtain a local copy of the contigs of that genome using the step below.
echo 'kb|g.2860' | genomes_to_contigs | contigs_to_sequences > g.2860.contigs
mkdir g.2860 mv g.2860.contigs g.2860 cd g.2860 lsThe first command creates the subdirectory, the second moves our contigs into the subdirectory, the third moves our "position" to the subdirectory, and the last displays the contents of the subdirectory. We urge you to verify that it all works as described.
fasta_to_genome 'Geobacter sulfurreducens KN400' Bacteria 11 < g.2860.contigs > genomeThe above command creates a "genome object" in the file genome. The object contains the contigs, the scientific name of the organism, the domain (we specified 'B' for Bacteria), and the genetic code for the organism (i.e., 11, which is the code most commonly used with prokaryotic genomes). Use ls to see the file, and then click on it to see the encoded fields). Note that we have named our project subdirectory g.2860, which reflects the name of the genome whose contigs we copied. We are going to re-annotate the contigs, generating a whole new genome, which is what fasta_to_genome does; it registers a new genome ID that will not be used by anyone else.
annotate_genome < genome > annotated.genomeThis causes an initial annotation to be generated. It may take several minutes for large genomes. You can issue other commands while you wait, and the completion message will display below the command when the annotation is complete. When it completes, use ls to see the generated file, and click on it to see the encoded annotations. Alternatively, to the the features generated and placed in the annotated genome object (stored in annotated.genome), try
genomeTO_to_feature_data < annotated.genome > features.txtwhich produces a tab-separated table containing
[feature-id,location,type,assigned-function]Use ls to see it and explore the contents.
genomeTO_to_reconstructionTO < annotated.genome > reconstructionTo see the roles that were found, use
reconstructionTO_to_roles < reconstruction > rolesTo see the subsystems (along with the variant codes that seemed appropriate), use
reconstructionTO_to_subsystems < reconstruction > subsystems
all_roles_used_in_models > all.roles a_and_b roles all.roles > roles.for.models
echo 'kb|g.9032' | genomes_to_fids CDS | fids_to_roles 2> /dev/null |Note that the command fids_to_roles writes error messages when it cannot match a fid to any Roles.
cut -f 3 > roles.in.g.9032
a_and_b roles.in.g.9032 all.roles > roles.for.models.g.9032 a_not_b roles.for.models roles.for.models.g.9032 > roles.to.search.forto create a file of Roles in kb|g.9032 that are not yet found in the annotations we got back for our new version of g.2860.
genome_to_fbamodel < annotated.genome > initial.model
After a minute or two, a metabolic model object is stored in initial.model.
fbamodel_to_html < initial.model > initial.model.html
After that command completes, use ls and click on the generated html.
runfba < initial.model > solution.html
The results from the FBA are printed in HTML in the output solution file. Simply run the ls command and click on the HTML file to view the solution. At the top of this file is a table of the parameters used to run the FBA. Directly below this table is another table showing the objective function and value. In all likelihood, your model did not grow at all. This is because the model is missing pathways needed for biomass production. When a model fails to grow, the FBA command attempst to diagnose the problem by identifying biomass components that cannot be produced. This is done by maximizing the production of each individual biomass component, one at a time. You can see these analysis results in your solution HTML file (see the metabolite production table). Note that some of your biomass components have no numbers in this table. These are the components that cannot be produced. Now try to fix the model by adding reactions to enable production of these components.
gapfill_fbamodel < initial.model > gapfilled.model
Depending on the size and state of the genome, this could take minutes to hours, but when the analysis is complete, your model will have additional reactions in it, reflecting the ideal solution identified by the gapfilling algorithm to enable growth. Then rerun the flux balance analysis to determine the biomass production pathways of the organism.
runfba < gapfilled.model > solution.html
Once this command completes, use ls and click on the HTML produced by the FBA command. Your FBA solution should now include a nonzero objective value as well as numerous compound and reaction fluxes.
fbamodel_to_sbml < gapfilled.model > gapfilled.model.sbml
After the command completes, use ls and select the SBML file for download.