Docking & Scoring in Drug Discovery
1. Introduction
Recently, there are more and more proteins with known 3-D structure (because of improvements in techniques for structure determination).
They are chosen as target proteins in therapeutic usage.
One key methodology--docking small molecules to protein binding sites—is an active area of research on virtual screening techniques in computer-aided drug discovery.
Usage of docking: hit-identification tool, lead optimization, drug metabolism analysis.
2. Terminology
Docking—The docking process involves the prediction of ligand conformation and orientation (or posing) within a targeted binding site.
Its 2 aims are: accurate structural modeling & correct prediction of activity.
3 basic categories of treatment: systematic methods, random methods, simulation methods.
Scoring—The evaluation and ranking of predicted ligand conformation, a crucial aspect of structure-based virtual screening.
2 usages of scoring: pose score & rank score.
3 types of scoring functions: Force-field-based scoring, empirical scoring, knowledge-based scoring.
Virtual Screening—Structure-based virtual screening (SBVS) is a proven technique for lead discovery. 3 processes of a SBVS tool: ligand conformational search protocols, varying site points definitions, alternating of sampling variables. (http://www.lib.uchicago.edu/cinf/220nm/slides/220nm03/220nm03.ppt).
Posing—Determining whether a given conformation and orientation of a ligand fits the active site. This is usually a fuzzy procedure that returns many alternative results.
Ranking—A more advanced process than pose scoring that typically takes several results from an initial scoring phase and re-evaluates them. This process usually attempts to estimate the free energy of binding as accurately as possible, typically involving more elaborate calculation such as entropy.
3. Docking: Procedure & Methods
Docking:
1) application of docking algorithms that pose small molecules in the active site.
2) Complement algorithms by scoring functions that are designed to predict the biological activity through the evaluation of interactions between compounds and potential targets
3) Complications with accurately predict binding conformations and compound activity.
3 basic representations for the receptor: atomic, surface, grid
Atomic—only used in conjunction with a potential energy function, often only during final ranking procedures. [13]
Surface-based—typically used in protein-protein docking with a rigid body approximation.[14, 15]
Potential energy grids—to store information, basically about 2 types of potentials: electrostatic & Van der Waals, about the receptor’s energetic contributions on grid points so that it only needs to be read during ligand scoring.[Box2, 19]
Algorithms to treat ligand flexibility:
1) Systematic Search: trying to explore all the degrees of freedom in a molecule, but ultimately face the problem of combinatorial explosion. As a result, ligands are often incrementally grown into active sites.
Steps:
a) Docking various molecular fragments into the active-site region and linking them covalently
or
a) Dividing docked ligands into rigid (core fragment) and flexible parts. Once the rigid cores have been defined, they are docked into the active site.
b) Adding flexible regions in an incremental fashion. (DOCK, FlexX)
Or
b) Rebuilds the ligand from fragments that have acceptable initial scores.(Hammerhead algorithm)
Or
Use libraries of pre-generated conformations to reduce the search problem to a rigid body docking procedure. (FLOG)
2) Random Search: making random changes to either a single ligand or a population of ligands. A newly obtained ligand is evaluated on the basis of a pre-defined probability function. (Monte Carlo, genetic algorithms)
3) Simulation methods: Molecular dynamics, energy minimization methods.
Algorithms to treat protein flexibility: molecular dynamics, Monte Carlo calculations, rotamer libaries, protein ensemble grids.
Random/stochastic
Systematic
Simulation
AutoDock
MOE-Dock
GOLD
PRO_LEADS
DOCK (incremental)
FlexX (incremental)
Glide (incremental)
Hammerhead (incremental)
FLOG (database)
DOCK
Glide
MOE-Dock
AutoDock
Hammerhead
4. Scoring: Procedure & Methods
1) Force-field-based scoring: only consider a single protein conformation, which makes it possible to omit the calculation of internal protein energy,which greatly simplifies scoring. [26, 45, 46]
2) Empirical scoring functions: These scoring functions are fit to reproduce experimental data, such as binding energies and/or conformations, as a sum of several parameterized functions.[48, 50, 51, 52, 53]
3) Knowledge-based scoring functions.protein–ligand complexes are modelled using relatively simple atomic interaction-pair potentials.attempt to implicitly capture binding effects that are difficult to model explicitly.
4) Consensus scoring: combines information from different scores to balance errors in single scores and improve the probability of identifying ‘true’ ligands.[59, 60]
5) Evaluating scoring schemes:
a) force-field terms generally better performance than PMF knowledge-based function [linear discriminate analysis, 61]
b) simple contact scores outperformed force-field treatment under pharmacophore constraints and conformational flexibility. [62,63,64]
c) knowledge-based and force-field-based scoring methods have very similar abilities in predicting the correct binding modes.[65]
d) combination of scoring function improves performance over purely force-field or purely knowledge-based.[59,66]
e)
6) Posing versus scoring:
5. Conclusion and Future Research