USR-VS
a webserver for ligand-based virtual screening powered by ultrafast shape recognition techniques
a webserver for ligand-based virtual screening powered by ultrafast shape recognition techniques
USR-VS [1] is the first webserver for large-scale prospective virtual screening using USR [2,3] and USRCAT [4], two ultrafast ligand-based 3D molecular similarity methods that have been retrospectively validated [2-4] (a number of successful prospective virtual screening applications have also been reported for USR [5-8]).
USR compares the 3D shape of two molecules, whereas USRCAT compares not only the 3D shape but also the spatial distribution of atom types relevant for molecular recognition (aromatic, hydrogen bond donor, hydrogen bond acceptor and hydrophobic atoms). Both methods are invariant to spatial rotation and translation, so they do not require structural alignment to operate. The 100 most similar molecules out of the 23 million screened are returned and only these 100 are aligned to the query molecule. Their implementation at USR-VS is highly optimized, currently able to screen more than 50 million 3D conformers per second. The similarity score between two molecules is normalized to (0, 1], with 1 being the maximum similarity and 0 being the minimum similarity.
To run a virtual screen, only three simple steps are required, as illustrated in Figure 1:
Provided as two example query molecules are fluspirilene (ZINC ID: 00537755) and vemurafenib (PubChem CID: 42611257).
Figure 1: Running a virtual screen by following the three steps highlighted in blue color.
There are public databases for users to obtain a SDF file of the 3D conformer of the selected molecule to be used as query molecule (i.e. search template):
Users can search these third-party websites for their desired molecules using the compound name or the SMILES string, and then download the SDF file for the 3D conformer of the structure.
In case of no such available SDF for download, users may hereby provide a SMILES string to quickly generate an energy-minimized 3D conformer on the fly using the ETKDG algorithm:
Alternatively, users may use our in-house standalone program embed on Linux to generate energy-minimized 3D conformers from given SMILES strings in batch. It reads SMILES from standard input and writes SDF to standard output. It accepts an optional argument that specifies the output file of 2D SVG drawing of the compound. Here is an example usage: embed toluene.svg <<< Cc1ccccc1.
Note that the user takes responsibility of protonating the query molecule prior to building the 3D conformer in order to obtain a Tanimoto score of 1 for identical query and hit molecules.
Once a virtual screen is submitted, the user will be redirected to the result webpage with a unique URL that is only available to the user. Users are suggested to bookmark the result webpage if they want to browse the result at a later time.
Figure 2 shows the result webpage of using ZINC00537755.sdf as the query molecule and choosing USR as the ranking score. In this webpage, a table at the top provides a link to the input file for download and shows the results of the virtual screen, including the submission time, execution time, completion time, and screening speed (this comes from dividing the 94 million 3D conformers by the total execution time of the query, including computing their similarity scores to the query molecule, sorting the scores, aligning the top hits to the query molecule, writing the aligned top hits to the output files, and drawing a 2D chemical structure of the query molecule).
Figure 2: Visualizing the query (left) and hit molecules (right). By pressing he numbered buttons below the right canvas, the user can select different top hits for visualization. In this case, the most similar hit (number 0) is the actual query itself as USR identified it among the 23 million compounds in the screening library.
When the virtual screen is completed, the top 100 most similar molecules to the query molecule and their similarity scores are written to two output files for download (hits.csv and hits.sdf). Regardless of the selection of the ranking score, both USR and USRCAT similarities are calculated for the top hits. In addition, 2D Tanimoto similarity using Morgan fingerprint is also calculated, with the maximum score of 1 indicating identical molecules. This score is not only useful to detect identical molecules, but also to quantify the degree of dissimilarity between the chemical structures of the query molecule and hit molecule, which is indicative of chemical scaffold hopping.
By using the WebGL visualizer iview [9], the query molecule is shown in the left canvas, and the hit molecules are shown in the right canvas (this may take a few seconds to load, depending on the internet bandwidth). Although neither USR nor USRCAT requires the alignments of the query molecule against each database molecule, the hit molecules are roughly aligned to the query molecule using the four reference points as an atom mapping to facilitate their interactive inspection by the user. The user can switch among the top 100 hit molecules (numbered from 0 to 99) by pressing the button below the right canvas, and interactively translate, rotate and zoom in/out the 3D structure of the selected hit molecule to match the orientation of the query molecule if needed. The latter permits to assess the degree of 3D similarity of both molecules.
Also displayed are USR, USRCAT and Tanimoto scores, chemical properties, and a link to different options to purchase the hit molecule. This stage is intended to help the user decide which hits to purchase and how to purchase them to experimentally measure their activity against selected targets of the query molecule. Note that these targets can be both molecular (e.g. a protein of a known pathway) and non-molecular (e.g. a given cancer cell line).
In this example run, the first hit molecule shown in the right canvas is identical to the query molecule shown in the left canvas, as the query molecule is one of the 23 million molecules of the screening library. This demonstrates USR's capability of correctly retriving an existing molecule from the database, thus validating the method. Such validity also applies to USRCAT, as can be seen in the corresponding result webpage. Note that the similarity scores are close to 1, but not exactly 1, because of floating point rounding error.
For the second hit molecule, there is a big difference between the USR result webpage and the USRCAT result webpage. In the case of USR, the second hit molecule has a similar shape to the query molecule, regardless of their atom types. In the case of USRCAT, the second hit molecule is similar to the query molecule in terms of both shape and pharmacophoric features.
To purchase the selected molecule, the user can click the ZINC ID to redirect to ZINC's substance webpage, where the latest list of Vendors along with their corresponding IDs for this compound can be found, as illustrated in Figure 3. Alternatively, the vendors & annotations link can be clicked on the USR-VS results page to go directly to this information. Furthermore, external links to known targets and other information about the molecule may be available under the Annotations heading.
Figure 3: Reviewing the vendors and annotations of hit molecules in order to purchase them for wet-lab validation.
[1] Hongjian Li, Kwong-S. Leung, Man-H. Wong and Pedro J. Ballester. USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques. Nucleic Acids Research, 44(W1):W436-W441, 2016. DOI: 10.1093/nar/gkw320
[2] Pedro J. Ballester and W. Graham Richards. Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of Computational Chemistry, 28(10):1711-1723, 2007. DOI: 10.1002/jcc.20681
[3] Pedro J. Ballester. Ultrafast shape recognition: method and applications. Future Medicinal Chemistry, 3(1):65-78, 2011. DOI: 10.4155/fmc.10.280
[4] Adrian M Schreyer and Tom Blundell. USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints. Journal of Cheminformatics, 4(1):27, 2012. DOI: 10.1186/1758-2946-4-27
[5] Birgit Hoeger, Maren Diether, Pedro J. Ballester and Maja Köhn. Biochemical evaluation of virtual screening methods reveals a cell-active inhibitor of the cancer-promoting phosphatases of regenerating liver. European Journal of Medicinal Chemistry, 88:89-100, 2014. DOI: 10.1016/j.ejmech.2014.08.060
[6] Sachin P. Patil, Pedro J. Ballester and Cassidy R. Kerezsi. Prospective virtual screening for novel p53-MDM2 inhibitors using ultrafast shape recognition. Journal of Computer-Aided Molecular Design, 28(2):89-97, 2014. DOI: 10.1007/s10822-014-9732-4
[7] Pedro J. Ballester, Martina Mangold, Nigel I. Howard, Richard L. Marchese Robinson, Chris Abell, Jochen Blumberger and John B. O. Mitchell. Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification. Journal of The Royal Society Interface, 9(77):3196-3207, 2012. DOI: 10.1098/rsif.2012.0569
[8] Pedro J. Ballester, Isaac Westwood, Nicola Laurieri, Edith Sim, W. Graham Richards. Prospective virtual screening with Ultrafast Shape Recognition: the identification of novel inhibitors of arylamine N-acetyltransferases. Journal of The Royal Society Interface, 7(43):335-342, 2009. DOI: 10.1098/rsif.2009.0170
[9] Hongjian Li, Kwong-Sak Leung, Takanori Nakane and Man-Hon Wong. iview: an interactive WebGL visualizer for protein-ligand complex. BMC Bioinformatics, 15(1):56, 2014. DOI: 10.1186/1471-2105-15-56