Chemical biology is a rapidly growing interdisciplinary field with a primary focus on studying biological functions and pathways using small molecules. Although chemical biology relies heavily on new experimental technologies and data, computational approaches help to address key tasks in chemical biology. Compared to drug discovery where computational methods play an important role, chemical biology has partly different requirements for computational analysis and design.
What is chemical biology?
It is often difficult to define fast moving interdisciplinary fields. In ‘voices of chemical biology’ , Brent R. Stockwell referred to a classical definition of chemical biology as the ‘use of synthetic organic chemistry to create small-molecule probes of biological processes’ and further stated ‘I think of chemical biology as the use of chemical methods to understand biological processes’. Matthew Bogyo stated a similar definition and added ‘I know many people would argue that it is no different from classical pharmacology. However, I feel the difference is that new design methods and technologies have enabled chemicals, which formerly could only provide phenomenological data in biologic systems, to be used to understand complex systems and pathways in biology’ .
These statements provide a fairly clear picture. It is evident that chemical biology is mostly concerned with the use of small molecular probes to interrogate biological functions in cellular environments. It is also clear that new scientific approaches and technologies play a pivotal role in shaping this field. A primary goal of chemical biology is the identification of proteins that are responsible for interesting biological phenomena.
Let us compare the requirements for small molecules in chemical biology, referred to in the following as chemical biology probes (CBPs), with synthetic drug candidates. Drugs must be efficacious and safe and have limited side effects. However, they must not necessarily be selective -or even specific- for a therapeutic target, although this may in some instances be required.
In addition, it is advantageous if drugs have a defined mechanism of action (MoA), but this is not a stringent requirement either. In fact, the efficacy of many drugs depends on interactions with multiple targets. By contrast, CBPs do not need to have drug-like properties. However, to dissect biological functions or pathways, they must ultimately be selective/specific for functionally relevant targets. In some cases, exquisite specificity might be required to unambiguously determine MoAs.
Chemical biology is a highly interdisciplinary experimental science. Why might there be a need for computational approaches? In drug discovery, computational methods that aid in the identification of active compounds and their optimization are a mainstay . Such methods can also be applied in chemical biology, but the focal points change. For example, chemical biology heavily relies on phenotypic or high-content screening of compounds, whereas most screening in drug discovery settings continues to be target based. The latter assays can be well complemented with computations to search for active compounds; for the former, this is much more difficult.
One can identify at least three areas where computational approaches can have an immediate impact on chemical biology. First, the identification of targets of compounds that display desirable activity in phenotypic assays can be supported computationally through the derivation of ligand-target hypotheses, providing a basis for follow-up experiments. Second, since screening hits from chemical biology are rarely target-specific, it is often required to evolve them into CBPs that display at least target selectivity in a cellular environment. From a computational viewpoint, this translates into selectivity prediction for small molecules. Third, given the data-intense nature of chemical biology, computational methods are required to process large data volumes, for example, images from high content screens or gene expression profiles, and extract knowledge from these data. Computational approaches that are suitable for these tasks are discussed below.
Target Identification and the Data Deluge
Target hypotheses for compounds with interesting activity in phenotypic screens can be derived computationally in different ways. A popular approach is prioritization of targets on the basis of ligand similarity (figure 1, top). For a given screening hit, most similar known active compounds are identified and it is assumed that similar ligands bind to the same or closely related targets. For a novel active compound, putative target(s) can be directly inferred from ligand similarity, e.g. by statistically weighted similarity searching , or -more indirectly- using machine learning models . In the latter case, known active and inactive compounds are used to train classification models for different targets. These classifiers are then used to predict whether an interesting compound might be active against a given target or not. In practice, compounds are profiled in-silico against arrays of classifiers and the most likely target(s) are prioritized. For in-silico profiling, probabilistic classifiers based on Bayesian statistics are popular, given their ease of derivation and computational efficiency. Ligand similarity can also be combined with activity similarity of compounds tested in different assays using fingerprints . Furthermore, in-silico compound profiling is attempted with computational ligand docking using three-dimensional protein structures as templates (figure 1, bottom). Compared to ligand similarity methods, structure-based compound profiling is less frequently applied. There is much more ligand activity data available than target structure information. However, computational target identification on the basis of ligand similarity or docking is only applicable to targets for which sufficient information is available. Truly novel targets cannot be predicted using such computational approaches and must be identified experimentally, e.g. using affinity-based methods or mass spectrometry.
Biological data generated for target identification go far beyond activity measurements. In particular, many cellular profiling experiments are carried out, e.g. to determine differential gene expression resulting from small molecule treatment . For the analysis of profiling data, computations are essential. By comparing expression profiles of test compounds to those with known target annotations, matching profiles are identified using statistical correlation and/or pattern matching algorithms and target hypotheses derived. Data normalization over many experiments and statistical significance testing of profile matches are essential for such analyses. Computational protocols are also developed to identify MoAs by correlating compound sensitivity across many cell lines with gene expression data . The analysis of compound-based gene expression profiles can also only identify known targets.
Once target hypotheses are derived, follow-up experiments using selective/specific CBPs are required. Obtaining high-quality CBPs typically requires chemical optimization, similar to drug leads. The use of computational methods for the analysis and prediction of compound selectivity is still in its infancy. To train computational models for this purpose, carefully built benchmark systems are required consisting of many known compounds with different selectivity , which is often the net result of differential potency against related targets. Thus, if a compound is much more potent against a given target than related ones, it is selective for this target over others (figure 2). For machine learning, selectivity information is derived from large volumes of compound activity data, which depends on computational analysis. For example, formal concept analysis from information theory has been adapted to extract selectivity profiles from available data and identify compounds with desired selectivity . An alternative approach for selectivity prediction is proteochemometrics . In this case, ligand and target descriptors are combined in a pair-wise manner and associated with known activities. A single computational model is then derived from binding data for multiple ligands and targets. For new compound-target combinations, potency values are predicted using the model. Predictions for a given compound and multiple targets result in a putative selectivity profile.
Selectivity analysis methods are difficult to apply, when one aims to chemically optimize a CBP toward high selectivity. For this purpose, different approaches should be considered, e.g. methods for predicting compound affinity; an area where drug discovery and chemical biology meet again. For drug design and chemical biology, accurate prediction of binding free energies of compounds, or relative free energies due to chemical modifications, would be a milestone event. In free energy calculations, chemical modifications of compounds bound to target structures are dynamically modeled and changes in the free energy of binding estimated. Computationally demanding simulations to predict relative free energies have been carried out for long time, with overall limited success. Recently, further progress has been made, due to increases in computational simulation capacity and methodological refinements , indicating that free energy simulations yield more accurate estimates across various targets and compound classes. Such advances would impact CBP development. If free energy calculations can be efficiently applied in a more routine manner, the door opens for CBP optimization for a primary target over others guided by parallel simulations.
The author thanks Dr. Dagmar Stumpfe for help with illustrations.
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
 [No authors listed]: Nat. Chem. Biol. 11, 378 (2015)
 Jürgen Bajorath: Pushing the boundaries of computational approaches: special focus issue on computational chemistry and computer-aided drug discovery, Future Medcinal Chemistry 7, 2415 (2015)
 Christian Laggner, David Kokel, Vincent Setola, Alexandra Tolia, Henry Lin, John J. Irwin, Michael J. Keiser, Chung Yan J. Cheung, Daniel L. Minor Jr., Bryan L. Roth, Randall T. Peterson, Brian K. Shoichet: Chemical informatics and target identification in a zebrafish phenotypic screen, Nat. Chem. Biol. 8, 144 (2012), DOI: 10.1038/nchembio.732
 Daniel W. Young, Andreas Bender, Jonathan Hoyt, Elizabeth McWhinnie, Gung-Wei Chirn, Charles Y. Tao, John A. Tallarico, Mark Labow, Jeremy L. Jenkins, Timothy J. Mitchison, Yan Feng: Integrating high-content screening and ligand-target prediction to identify mechanism of action, Nature Chemical Biology 4, 59 (2008), DOI: 10.1038/nchembio.2007.53
 Anne Mai Wassermann, Eugen Lounkine, Laszlo Urban, Steven Whitebread, Shanni Chen, Kevin Hughes, Hongqiu Guo, Elena Kutlina, Alexander Fekete, Martin Klumpp, Meir Glick: A Screening Pattern Recognition Method Finds New and Divergent Targets for Drugs and Natural Products, ACS Chem. Biol. 9, 1622 (2014), DOI: 10.1021/cb5001839
 J. Lamb, E.D. Crawford, D. Peck, J.W. Modell, I.C. Blat, M.J. Wrobel, J.Lerner, J.P. Brunet, A. Subramanian, K.N. Ross, M. Reich, H. Hieronymus, G. Wei, S.A. Armstrong, S.J. Haggarty, P.A. Clemons, R. Wei, S.A. Carr, E.S. Lander, T.R. Golub: The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease, Science 313, 1929 (2006)
 M. G. Rees et al.: Correlating chemical sensitivity and basal gene expression reveals mechanism of action, Nature Chemical Biology 12, 109 (2016), DOI: 10.1038/nchembio.1986
 Dagmar Stumpfe, Eugen Lounkine, Jürgen Bajorath: Molecular Test Systems for Computational Selectivity Studies and Systematic Analysis of Compound Selectivity Profiles, Methods Mol. Biol. 672, 503 (2011), DOI: 10.1007/978-1-60761-839-3_20
 Eugen Lounkine, Dagmar Stumpfe, Jürgen Bajorath: Molecular Formal Concept Analysis for Compound Selectivity Profiling in Biologically Annotated Databases, Journal of Chemical Information and Modeling 49, 1359 (2009), DOI: 10.1021/ci900095v
 J. E. S. Wiberg et al.: Ann. N. Y. Acad. Sci. 994, 21 (2003)
 Lingle Wang et al.: Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field, Journal of the American Chemical Society 137, 2695 (2015), DOI: 10.1021/ja512751q
Prof. Dr. Jürgen Bajorath
Department of Life Science Informatics, B-IT