GNP is an integrated online genomics and chemoinformatics platform for natural product discovery. The GNP natural product discovery platform has three main components: genome search, scaffold library generation, and compound identification with iSNAP database search.
Step 1: Upload sequence file
Upload a whole genome, DNA cluster or contig, or amino acid cluster. Sequence must be in FASTA format.
Step 2: Scaffold identification
Predicted chemical scaffolds for nonribosomal peptide and polyketide gene clusters will be loaded in the scaffold library generation screen automatically when the genome search has finished executing. These structures can be combinatorialized using the scaffold library generator.
For each predicted scaffold, R groups can be added using the JSME molecule editor. See the JSME help page for more information regarding the molecule editor.
Step 1: Scaffold input
Scaffolds will be automatically generated for each gene cluster found by GNP's genome search. However, you can also input your own scaffold or library of scaffolds for combinatorialization by navigating to GNP's scaffold screen.
A single scaffold can be entered in SMILES format or multiple scaffolds can be uploaded in SMILES format, with each molecule on its own line. Uploaded scaffolds should not contain R groups, as the addition of R groups is only supported after upload.
Step 2: Library generation
To add R groups to a predicted scaffold, select 'X' in the editor, and type R and your desired R-group number (for example, R4). Each scaffold must contain at least one R group. R group numbers must be a series of consecutive integers beginning at 1. Once you have added all R groups for a given scaffold, you can move on to the next by selecting 'Next'. If you do not wish to include a given scaffold in your library, select 'Remove'.
To define R groups, continue selecting 'Next' until you have reached the end of the list of predicted scaffolds, at which point the 'Next' button will read 'Add R Group', and draw your R group using the molecule editor. Your R group must contain the pseudo-atom 'A' which signifies the site of attachment. If your first R group is a hydrogen atom, for example, draw the molecule A–H in the molecule editor. You can draw the pseudo-atom 'A' by selecting 'X' in the editor and entering the letter A.
To continue adding R groups, select 'Add R Group' again.
For your R group to be combinatorialized onto predicted scaffolds, you must check the box corresponding to the appropriate R group or R groups on the scaffolds. For example, if you wish to combinatorialize R-Group #1 onto R1, R3 and R5, these boxes must be checked.
Once all R groups have been added, select 'Submit'.
If the name of a scaffold or R group is displayed in red text along the right side, it is missing an R group or a site of attachment in its structure. This must be corrected before submitting the form to generate your library.
Libraries will be automatically loaded into the iSNAP database search screen for querying against LC-MS data. You can also download your full library in iSNAP database format by selecting the 'download' link beside the generated scaffold library option within the database selection settings.
iSNAP Database Search
iSNAP database search allows the user to perform both dereplication and prediction guided discovery of natural products. A variety of mass spectrometry and theoretical fragmentation settings are available to modify based on the quality of the LC-MS data and the fragmentation rules that are applied to the natural product database. Please refer to Ibrahim et al. and Wyatt et al for more information.
Step 1: Convert instrument data to .mzXML
Instrument vendors usually provide free software that can convert native acquisitions to standard formats. For instance, ReAdW can be used to convert ThermoFinnigan raw files, and CompassXport for Bruker raw files, etc.
There are also third-party efforts trying to simplify the conversion. ProteoWizard's msconvert supports the conversion of Agilent, Bruker, Thermo, Waters and AB Sciex file formats into mzXML.
Step 2: Input a .mzXML file
Click “Choose File”, and select your .mzXML file in the pop-up dialog. If you don't have an .mzXML file of NRP compounds, we provide a test example, the LC-MS/MS of Bacillus sp. fermentation, which can be download by clicking the link “Example .mzXML file.”
Step 3: Select database
By default, iSNAP will use only its internal database of ~1100 nonribosomal peptides. Users can also select from curated databases of lantibiotics or ribosomal peptides. Alternatively, users can upload their own database, or a library of predicted chemical structures generated by the GNP platform. See Wyatt et al. for applications of prediction-guided discovery.
Step 4: Select mass spectrometry and fragmentation settings
Users can select the intensity cut-off for each MS2 fragment that is used by iSNAP's matching algorithm, the mass window or tolerance, precursor charge, and the type of search mode (precise or analog) (See Ibrahim et al and Johnston et al. respectively). Fragmentation rules can be selected, as well as fragment mass tolerance, and structure charge.
Step 5: Submit the search
After submitting your task, iSNAP will perform NRP dereplication with our built-in database containing about 1100 NRP structures. Please keep the web page open until your files have finished uploading. A link will be displayed, where your results will be shown. The progress of the iSNAP search will be shown on this page until the analysis has completed.
Step 6: Prediction guided discovery
The prediction guided discovery plot identifies MS2 scans within your LC-MS chromatogram that most closely match your user-defined or predicted library, if applicable.
Step 7: Understand reports
The results of the iSNAP database search will be summarized in a report for your inspection after NRP iSNAP finishes the analysis. MS/MS scans identified as NRP compounds are listed in the report, sorted by P1 Score. Both P1 and P2 scores indicate confidence of the identification but are calculated in different ways (see Ibrahim et al.). All columns can be sorted. The brief report on the web page only shows the identifications with relatively high P1 and P2 scores. A complete NRP Search Report can also be downloaded. It provides detailed information for each input MS/MS scans.
To view the matched fragments for a given scan, select the scan number. This will open a new window containing the mass and structure of all matched fragments including each fragment ion’s intensity.
Reports can additionally be downloaded as Excel spreadsheets.
Your results will be stored by the iSNAP server for 60 days, after which time they will be automatically deleted. The iSNAP report includes the date on which your results will expire.
The GNP platform is an integrated platform for the genomic discovery of natural products. GNP is the first software package to integrate biosynthetic structural predictions based on genome data to generate a database of putative polyketide and/or nonribosomal peptide assembly-line products that is subsequently used to search real LC-MS/MS chromatograms. GNP uses statistical operations based on the iSNAP dereplication algorithm to identify the genetically-encoded or ‘cryptic’ metabolite based on the prediction database generated from the genome and/or by user input. GNP integrates the Chemistry Development Kit1, JSME2, SmiLib3, Glimmer4, hmmer5, BLAST6 and iSNAP7 to analyze microbial genomes and their extracts for cryptic natural products.
Note: the GNP server goes down for maintenance every Saturday at 4 AM EST. ALl jobs running at that time will be aborted.
1 Steinbeck, C., Hoppe, C., Kuhn, S., Floris, M., Guha, R., & Willighagen, E. L. Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Current Pharmaceutical Design, 12(17), 2111-2120 (2006). doi:10.2174/138161206777585274
3 Schüller, A., Hänke, V., & Schneider, G. SmiLib v2.0: a Java-based tool for rapid combinatorial library enumeration. QSAR & Combinatorial Science 26, 407-410 (2007). doi:10.1002/qsar.200630101
4 Delcher, A. L., Bratke, K. A., Powers, E. C., & Salzberg, S. L. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics, 23(6), 673-679 (2007). doi:10.1093/bioinformatics/btm009
5 Finn, R. D., Clements, J., & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Research, 39(suppl 2), W29-W37 (2011). doi:10.1093/nar/gkr367
6 Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403-410 (1990). doi:10.1016/S0022-2836(05)80360-2
7 Ibrahim, A., Yang, L., Johnston, C., Liu, X., Ma, B., & Magarvey, N.A. Dereplicating nonnribosomal peptides using an informatic search algorithm for natural products (iSNAP) discovery. PNAS 109:47, 19196-19201 (2012). doi:10.1073/pnas.1206376109
GNP: from Genes to Natural Products
An integrated online genomics and chemoinformatics platform for natural product discovery