Optimization Studies
5.3 IN SILICO MODELS IN DRUG DISCOVERY AND DESIGN Biology has experienced an increased use of highly sophisticated systems
Social Aspects of Drug Discovery, Development and Commercialization 114
5.3 IN SILICO MODELS IN DRUG DISCOVERY AND DESIGN
5.3.1 Virtual Screening
The increasing difficulty experienced in the biological screening of billions of compounds is the reason for the application of computer-aided drug screening approaches in drug discovery, named virtual screening (VS). VS is a set of computer methods that screen large chemical databases in virtual HTS (from a virtual library) to select compounds for synthesis, allowing structure- based screening of up to 100,000 molecules per day using parallel process- ing clusters. This approach has been applied in the early stages of discovery to prioritize chemicals for evaluation of biological properties to enable a mechanistic understanding of these predictions and to know the potential activity levels. VS was initially intended to save costs and as substitute for HTS, but the clarity of the role of each can be increasingly seen as rather complementary. Recent developments have taken advantage of both, par- ticularly in finding compound leads [32]. The role of VS in hit (bioactive molecules) identification strategy cannot be underestimated as its utility is becoming more obvious, just as HTS and VS have both been actively applied as an integrated technique for hit and lead identification [33].
In VS, a library containing a large number of molecular compounds of about 105–107 molecules is downsized by a computational algorithm to about 100–103 that are tested experimentally. Ranking compounds accord- ing to pharmaceutical relevance has been made possible due to its ability to predict the putative binding affinities between small molecules and biologi- cal receptors with potential therapeutic qualities [34]. A parallel approach allows multiple protocols to be carried out synchronously to generate in- tegrated data. VS has served unique roles to enable identification of com- pounds when there is lack of biological assays for HTS, leading to the use of homology modeling to produce a receptor-based pharmacophore model allowing selection of molecules for validation.
Focused or knowledge-based screening is applied with a selection of smaller subsets of molecules from a chemical library. The selected candidates would be those that show some activity at the target protein and that would be based on what is known about the protein or earlier reports or litera- ture, and has led to the utility of pharmacophores and molecular modeling to conduct virtual screens of compound databases [35]. VS is dependent upon how well the known structural information about both, the target and the small molecules being docked, could provide leading information needed for effective screening. The target is probed for the presence of an appropriate binding pocket [36] with the known target-ligand cocrystal structures or using in silico methods to identify novel binding sites [37]. For
Social Aspects of Drug Discovery, Development and Commercialization 116
fragment screening, high concentration screens generate small molecular weight compound libraries with high quality and viable protein structures that progress through the discovery path. The academic community and small pharmaceutical companies have benefited from VS as it is less expen- sive than HTS, and still as efficient or even better.
There are a number of challenges associated with VS technology. For ex- ample, medicinal chemists are not fully involved with the initial optimization process that involves VS. This could lead to errors that might only be dis- covered further downstream the drug discovery process. Publication of drug compounds of such (as successful) masks early discovery of underlying prob- lems that might be discovered later and hence affecting its effectiveness [38].
Recent changes include the increasing incorporation of medicinal chemists into academic programs in order to promote cross-functional or multidisci- plinary interaction and active engagement that provide complementary skills.
Limited understanding of the complexity of living organisms deters the ability to establish or build ideal compound libraries, which also depends on the nature, stage, and goal of the project being pursued. Increasing knowledge gathered from the past lessons has brought up further inquiries requiring possible solution pathways that could allow creation of viable li- brary collections [39]. The high cost and amount of time needed to attain the goals of screening compounds, might not directly yield commensurate outcomes. Success also depends on the quality of the compounds screened [39–41]. In addition, management of the facility and the level of expertise of the personnel handling the computer-related projects and appropriateness in data handling are concerns [42].
5.3.2 Computer-Aided Drug Design
Computer-aided drug design (CADD) has been credited to the modern patterns in compound characterization in drug discovery following its in- ception in 1981 [43]. It represents an advancement when compared to HTS as it requires minimal compound design or prior knowledge, but can yield multiple hit compounds among which promising candidates have been elected. The typical role of CADD in drug discovery is to screen out large compound libraries into smaller clusters of predicted active compounds (Figure 5.1), enabling optimization of lead compounds by improving the biological properties (like affinity and ADMET) and building chemotypes from a nucleating site by combining fragments with optimized function.
Clustering has been applied as a means to select representatives from screening libraries [44]. Screening hits include molecules that specifically
bind to the target in addition to a greater number of nonspecific compounds requiring a triaging process to filter these out (Figure 5.1). Thus, such a large library that contains a number of possible hits is further downsized and clustered into series.
Computational chemistry algorithms have been developed to group hits based on structural similarity, which is necessary to ensure that compounds are adequately sorted over a broad spectrum of chemical classes. Thus, selec- tion of hits would be based on chemical cluster, potency, and factors such as ligand efficiency (which gives an idea of how well a compound binds for its size).
The increasing application of diverse computerized methods in drug dis- covery has enabled a better handling of data associated with a large number of compounds screened against the target molecules or proteins for leads.
Computational tools help to define and elaborate the strength of interaction between ligands and targets, and have been instrumental in the identification of lead molecules from databases. Nevertheless, the lack of specificity leads to low hit rate for HTS, which could limit its applicability and efficiency in screening large compound libraries. CADD is a more targeted approach to the generation of “hits” when compared with traditional HTS. It enables the elucidation of the molecular basis of therapeutic activity and possible derivatives, and those variables that could be applied or improved for gener- ating an optimal drug compound, thus leading to prioritization of the actives without requiring extensive development and validation prior to use, as in the case of assay HTS. The CADD approach has played a vital role in the search and optimization of potential lead compounds with a considerable
Figure 5.1 Screening of Drug-Like Compounds into Activity Criteria.
Social Aspects of Drug Discovery, Development and Commercialization 118
gain in time and cost. It has been applied during various stages in drug dis- covery: target identification, validation, molecular design, and interactions of drug candidates with targets of interest [45].
CADD can be structure or ligand based. Structure-based CADD seeks the knowledge of the target protein structure in the determination of inter- action levels of all compounds being examined. Ligand-based CADD relies on the chemical similarity criteria, and the predictive, quantitative struc- ture–activity relationship (QSAR) models that it creates from the molecules to determine the known active and inactives [46]. QSAR modeling enables understanding of the influence of structural factors on biological activ- ity, using the models and the understanding to construct compounds with improved and optimal biological profiles. Other methods for quantitative description of structural change are comparative molecular field analysis and GRID.
Structure-based CADD is a preferred choice for soluble proteins that could be crystallized, while ligand-based CADD is better suited for com- pounds with high binding affinity to the target, devoid of off-target effects, and that could be designed with minimal free energy, favorable drug metab- olism, and pharmacokinetic/ADMET properties [47]. In general, CADD is better suited for occasions where sparse structural information is available.
This is usually the case for membrane protein targets.
5.3.3 Structure–Activity Relationship in Drug Discovery
Structure–activity relationships (SAR) explore the relationship between a molecule’s biological activity and the three-dimensional structure of the molecule; computational chemistry is employed only if the target structure is known. Structural-aided drug design uses crystal structures in the design of molecules. It is often used as an adjunct to other screening strategies within big pharma. A compound is docked into the crystal in order to use it to predict where modifications could be added to provide increased po- tency or selectivity.
Since the early 1980s drug discovery has principally relied on a target protein’s structure to aid drug design. One outstanding strategy is design of compounds through design-make-test-analyze involving expert designers from varied pharmaceutical disciplines that interact to simplify the process of a candidate compound that is taken to the synthetic routes. This involves medicinal, computational, synthetic, and physical chemists and also phar- macokinetics (PK) experts on each project [48–50]. Molecular modeling software packages are often useful to identify binding site interactions, but
when there is no previous knowledge of the target structure, traditional noncomputational methods of studying SARs are employed.
Drug molecules typically contain several functional groups, which can interact with certain groups in the biological target. The biological tar- gets respond to compound properties and not structure but have to be determined by downstream process for clinical applications. Structure- based computer-aided drug design (SBCADD) is a promising approach in analyzing 3D structures of biologic molecules. The operating principle of SBCADD is based on the notion that a molecule could interact with a spe- cific protein and exert a desired biologic effect, which essentially relies on its ability to favorably interact with a particular binding site on that protein and that any molecule with this ability could exert similar biologic effects.
This has allowed elucidation of novel compounds based on their interaction potential with the protein’s binding site. Structural information about the target is a prerequisite for any SBCADD project.
Computational tools offer the opportunity to exploit a database containing new, potential compounds, predicted using QSAR models.
A combinatorial approach to constructing chemical libraries results in a chemical space different from that of known drugs and natural prod- ucts. Because of this, QSAR has been useful in guiding the combina- torial library synthesis for libraries that could be screened for targeted drug classes allowing a wide coverage of chemical space with more likely targeted hits [51].
Pharmaceutical researchers use 3D structural techniques such as protein docking and pharmacophore similarity to specify similar biological activi- ties. An enhanced feature of this permits actual visualization of results – compounds can be viewed docked into the protein structure. This enables adequate selection of highly promising compounds that exhibit favorable protein docking and hence potency. Different structures can act at the same biological target to elicit the same biological action, but the chemical struc- tures, which are mostly atoms separated by bonds, are not what is recog- nized by a biological target. Two molecules devoid of any structural rela- tionship to one other could mimic each other in occupying the same site on a common target but could exhibit very diverse activities at the target.
The electrostatic fields are dynamic and change with the orientation or shape of the drug compound and this resultant electrostatic field that aligns with that of the protein target is the reason for target-drug selectivity [52].
The results of SAR studies can provide information on the intermolecular interactions that are established at the binding site.
Social Aspects of Drug Discovery, Development and Commercialization 120
5.3.3.1 Limitations of SAR Application
Increasing sophistication in drug discovery tools and technologies has still not completely resolved the most crucial need; the preclinical ADME (absorption, distribution, metabolism, and excretion), safety, and toxicity problems that consequently either fail to meet Food and Drug Administra- tion (FDA) approval, or lead to unanticipated health issues while in the mar- ket. ADME prediction with the available models has been an arduous task but when not predictive enough, could, at least, guide the drug developer into safer chemical space [53].
For example, it has become a challenge for the chemist to build a chemotype due to the uncertain probability of success since the structural features do not give precise information about compound activity or all the required attributes of a candidate drug [52]. This obviously requires a huge investment, which might be lost if no alternative viable chemotypes are found. Individual (Q)SAR in silico toxicological methods do not con- sider dose and exposure, unless the exposure–response relationship is part of the study. Therefore, it predicts toxicity independently. For example, the aromatic nitro group that triggers a structural alert for carcinogenicity may pass undetected if this fragment is contained in a chemical that is mini- mally exposed in the system; its prediction remains questionable. Prediction of a toxicology endpoint is more often performed on a parent chemical structure and such experiment that preclude the metabolite, especially in cases where the metabolite is the principal source of toxicity and could be misleading. Such an experiment that does not consider metabolites of parent compounds has implications for the drug discovery process [54,55].
5.3.4 Pharmacophore Models of Drug Targeting
The pharmacophore is a description of the main molecular features neces- sary for biological activity and their relative positions in space.
Knowing the key binding site interactions and the pharmacophore, the structural features of lead compounds can be modified to give desired prop- erties. A pharmacophore model of the target binding site is based on the sum of the steric and electronic features that underpin optimal interaction of a ligand with its target. Common features normally used to define the phar- macophore maps are acidic and basic groups, partial charges of hydrogen bond acceptors and donors, aliphatic hydrophobic moieties, and aromatic hydrophobic moieties [56]. A pharmacophore feature map is carefully constructed to categorize all functional groups with similar physiochemical properties (i.e., similar ionizability, hydrogen bonding behavior) as one.
Particular feature definitions could be specific atom types at specific loca- tions. These allow the identification of novel scaffolds but also lead to an increase in false positives. These molecular attributes have led to VS, de novo design, and lead optimization [57].
Pharmacophore algorithm software packages used for ligand-based phar- macophore generation include Molecular Operating Environment (MOE), Catalyst [58], genetic algorithm similarity program [59], and Phase [60].
5.3.4.1 How Does the Protein Recognize its Binding Partner?
A protein recognizes its cognate binding drug compound by its grooves or indentures or complementary moieties, just like a lock and key, as the antigen–antibody interactions. Also, electron density or electrostatic in- teractions by nearby charges resulting in electrostatic fields that change along with the conformational change of drug molecule. Hydropho- bic effect allows the molecule to aggregate to exclude the polar water molecules; the “Log P” is an index for quantifying these effects. Two molecules with different classes could engage the same biological site when they have the same type of field pattern described. Comparing these fields could match up or find similar biological activities and this is widely applicable in drug discovery and development. Any active com- pound displays a field pattern and this can be used to virtually screen the database rather than field patterns, which are not based on structures. The library would be commercially available molecules that could be further developed into leads.
5.3.4.2 Limitations of the Pharmacophore Approaches
Knowing that these methods use a statistical approach, molecular descrip- tors and experimental data are used to model complex biological processes [61]. The rules for drug-likeness [5,6] relying on simple physicochemical properties are also well known and implemented [62]. There are limitations because it relies on the quality of experimental data, which might not al- ways be available [63].
Strategies that consider protein flexibility are the 3D structures of ADMET proteins, molecular docking, and others, which indicate that these proteins are difficult to investigate – partly due to the huge and flexible ligand-binding cavities within them that can interact with a wide range of ligands. In silico toxicology prediction has been a rigorous en- deavor due to the several toxicological effects resulting from changes in multiple physiological processes [64].
Social Aspects of Drug Discovery, Development and Commercialization 122
Both internal and external solutions have been proposed, which include regulatory guidance. Another proposed solution is integrated workflow that incorporates combined use of data extraction, quantitative structure–
activity relationships, and read-across methods.
Advances in this field are leading to a transition to a new paradigm of the discovery process, as exemplified by the Toxicity Testing in the 21st Century initiative [64].
5.3.5 Cheminformatics and Bioinformatics Technology
Computer technology and biology, particularly genomics, revolutionized drug discovery. This combination has been termed bioinformatics while specialized computational programs that deals with chemical data and the associated tools for prioritizing drug compounds is known as cheminfor- matics. The introduction of bioinformatics and cheminformatics approaches has been very useful in crucial points of the drug discovery processes for better handling of data associated with complex molecular mechanisms, requiring extensive integration of structure and bioactivity data generated at various points. Their exclusive benefits stem from their ability to scruti- nize large data volumes for data uniformity, security, and management, thus islands of data developed and maintained within each department could be exchanged.
These techniques have challenged drug developers to learn more about computers and aspects that interface with the chemistry or biology of drug discovery. Successful integration of such knowledge has resulted in the avail- ability of databases that are the largest in the world. Celera, Inc.’s database of human genome information is reportedly the largest private database in the United States.
Through bioinformatics, there is better control of sequence data derived from the Human Genome Project irrespective of the size and level of com- plexity, for better results [65]. Its application in data mining has enabled the finding of disease targets, which relies on a wide source of data that is as- sociated with the biological approaches. One example is the genotype with the phenotype, a genetic polymorphism, and the risk of disease progression where amyloid precursor proteins lead to an increase in the formation of the A beta peptide associated with Alzheimer’s disease and deposition in the brain [66].
These technologies offer a great deal of confidence in the handling of complex data associated with the intricate molecular mechanisms that require extensive integration of structure and bioactivity data generated
at various points in the product lifecycle. However, there are still concerns associated with the management of enormous data generated through anal- ysis of multiple large databases simultaneously, placing a special demand for software, hardware, and behavior developments. Before the inception of bioinformatics and cheminformatics, handling of large individual databases remained a daunting task, which currently is being addressed through the informatics technologies, although the challenges of efficient correlation of such diversified information still exist [67].
The United States National Center for Biotechnology Information has been responsible for keeping databases for genome projects. GenBank, an international collaboration of three databases between the United States, the European Molecular Biology Institute, and the DNA Data Bank of Japan, constantly engages in interchange of database information. This has resulted in problems requiring continual update of records in the old data- base regarding the use of terms and tags that are not consistent with the new format. Private companies are building web interfaces for their database offerings. Celera and Incyte offer web subscriptions to their proprietary databases and customized analysis tools. It appears that the bioinformatics child will continue to speak the web language.
Bank It is a web submission program that includes the top 100 most- accessed information. Sequin is GenBank’s submission program, running on several platforms, with complicated entries and the ability to locate errors such as missing organism information, incorrect coding region lengths, mis- matched amino acids, or internal stop codons in coding regions, and more.
Information exchange in drug discovery would have been more con- strained if it were not for the increasing application of informatics [68].
5.4 FROM HIT TO LEAD: SUMMARY OF COMPOUND