2017-03-06Macromolecular X-ray crystallography is one of the main experimental techniques to visualize protein–ligand interactions. The high complexity of the ligand universe, however, has delayed the development of efficient methods for the automated identification, fitting and validation of ligands in their electron-density clusters. The identification and fitting are primarily based on the density itself and do not take into account the protein environment, which is a step that is only taken during the validation of the proposed binding mode. Here, a new approach, based on the estimation of the major energetic terms of protein–ligand interaction, is introduced for the automated identification of crystallographic ligands in the indicated binding site with ARP/wARP. The applicability of the method to the validation of protein–ligand models from the Protein Data Bank is demonstrated by the detection of models that are `questionable' and the pinpointing of unfavourable interatomic contacts.
2017-03-06Coot is a molecular-graphics program primarily aimed at model building using X-ray data. Recently, tools for the manipulation and representation of ligands have been introduced. Here, these new tools for ligand validation and comparison are described. Ligands in the wwPDB have been scored by density-fit, distortion and atom-clash metrics. The distributions of these scores can be used to assess the relative merits of the particular ligand in the protein–ligand complex of interest by means of `sliders' akin to those now available for each accession code on the wwPDB websites.
2017-02-28The de facto commoditization of biomolecular crystallography as a result of almost disruptive instrumentation automation and continuing improvement of software allows any sensibly trained structural biologist to conduct crystallographic studies of biomolecules with reasonably valid outcomes: that is, models based on properly interpreted electron density. Robust validation has led to major mistakes in the protein part of structure models becoming rare, but some depositions of protein–peptide complex structure models, which generally carry significant interest to the scientific community, still contain erroneous models of the bound peptide ligand. Here, the protein small-molecule ligand validation tool Twilight is updated to include peptide ligands. (i) The primary technical reasons and potential human factors leading to problems in ligand structure models are presented; (ii) a new method used to score peptide-ligand models is presented; (iii) a few instructive and specific examples, including an electron-density-based analysis of peptide-ligand structures that do not contain any ligands, are discussed in detail; (iv) means to avoid such mistakes and the implications for database integrity are discussed and (v) some suggestions as to how journal editors could help to expunge errors from the Protein Data Bank are provided.
2017-02-22Metals are essential in many biological processes, and metal ions are modeled in roughly 40% of the macromolecular structures in the Protein Data Bank (PDB). However, a significant fraction of these structures contain poorly modeled metal-binding sites. CheckMyMetal (CMM) is an easy-to-use metal-binding site validation server for macromolecules that is freely available at http://csgid.org/csgid/metal_sites. The CMM server can detect incorrect metal assignments as well as geometrical and other irregularities in the metal-binding sites. Guidelines for metal-site modeling and validation in macromolecules are illustrated by several practical examples grouped by the type of metal. These examples show CMM users (and crystallographers in general) problems they may encounter during the modeling of a specific metal ion.
2017-02-22The Cambridge Structural Database (CSD) is the worldwide resource for the dissemination of all published three-dimensional structures of small-molecule organic and metal–organic compounds. This paper briefly describes how this collection of crystal structures can be used en masse in the context of macromolecular crystallography. Examples highlight how the CSD and associated software aid protein–ligand complex validation, and show how the CSD could be further used in the generation of geometrical restraints for protein structure refinement.
2017-02-22Many ligand-discovery stories tell of the use of structures of protein–ligand complexes, but the contribution of structural chemistry is such a core part of finding and improving ligands that it is often overlooked. More than 800 000 crystal structures are available to the community through the Cambridge Structural Database (CSD). Individually, these structures can be of tremendous value and the collection of crystal structures is even more helpful. This article provides examples of how small-molecule crystal structures have been used to complement those of protein–ligand complexes to address challenges ranging from affinity, selectivity and bioavailability though to solubility.
2017-03-06The steady expansion in the capacity of modern beamlines for high-throughput data collection, enabled by increasing X-ray brightness, capacity of robotics and detector speeds, has pushed the bottleneck upstream towards sample preparation. Even in ligand-binding studies using crystal soaking, the experiment best able to exploit beamline capacity, a primary limitation is the need for gentle and nontrivial soaking regimens such as stepwise concentration increases, even for robust and well characterized crystals. Here, the use of acoustic droplet ejection for the soaking of protein crystals with small molecules is described, and it is shown that it is both gentle on crystals and allows very high throughput, with 1000 unique soaks easily performed in under 10 min. In addition to having very low compound consumption (tens of nanolitres per sample), the positional precision of acoustic droplet ejection enables the targeted placement of the compound/solvent away from crystals and towards drop edges, allowing gradual diffusion of solvent across the drop. This ensures both an improvement in the reproducibility of X-ray diffraction and increased solvent tolerance of the crystals, thus enabling higher effective compound-soaking concentrations. The technique is detailed here with examples from the protein target JMJD2D, a histone lysine demethylase with roles in cancer and the focus of active structure-based drug-design efforts.
2017-03-06Although noncovalent binding by small molecules cannot be assumed a priori to be stoichiometric in the crystal lattice, occupancy refinement of ligands is often avoided by convention. Occupancies tend to be set to unity, requiring the occupancy error to be modelled by the B factors, and residual weak density around the ligand is necessarily attributed to `disorder'. Where occupancy refinement is performed, the complementary, superposed unbound state is rarely modelled. Here, it is shown that superior accuracy is achieved by modelling the ligand as partially occupied and superposed on a ligand-free `ground-state' model. Explicit incorporation of this model of the crystal, obtained from a reference data set, allows constrained occupancy refinement with minimal fear of overfitting. Better representation of the crystal also leads to more meaningful refined atomic parameters such as the B factor, allowing more insight into dynamics in the crystal. An outline of an approach for algorithmically generating ensemble models of crystals is presented, assuming that data sets representing the ground state are available. The applicability of various electron-density metrics to the validation of the resulting models is assessed, and it is concluded that ensemble models consistently score better than the corresponding single-state models. Furthermore, it appears that ignoring the superposed ground state becomes the dominant source of model error, locally, once the overall model is accurate enough; modelling the local ground state properly is then more meaningful than correcting all remaining model errors globally, especially for low-occupancy ligands. Implications for the simultaneous refinement of B factors and occupancies, and for future evaluation of the limits of the approach, in particular its behaviour at lower data resolution, are discussed.
2017-02-24XChemExplorer (XCE) is a data-management and workflow tool to support large-scale simultaneous analysis of protein–ligand complexes during structure-based ligand discovery (SBLD). The user interfaces of established crystallographic software packages such as CCP4 [Winn et al. (2011), Acta Cryst. D67, 235–242] or PHENIX [Adams et al. (2010), Acta Cryst. D66, 213–221] have entrenched the paradigm that a `project' is concerned with solving one structure. This does not hold for SBLD, where many almost identical structures need to be solved and analysed quickly in one batch of work. Functionality to track progress and annotate structures is essential. XCE provides an intuitive graphical user interface which guides the user from data processing, initial map calculation, ligand identification and refinement up until data dissemination. It provides multiple entry points depending on the need of each project, enables batch processing of multiple data sets and records metadata, progress and annotations in an SQLite database. XCE is freely available and works on any Linux and Mac OS X system, and the only dependency is to have the latest version of CCP4 installed. The design and usage of this tool are described here, and its usefulness is demonstrated in the context of fragment-screening campaigns at the Diamond Light Source. It is routinely used to analyse projects comprising 1000 data sets or more, and therefore scales well to even very large ligand-design projects.
2017-02-24In this work, two freely available web-based interactive computational tools that facilitate the analysis and interpretation of protein–ligand interaction data are described. Firstly, WONKA, which assists in uncovering interesting and unusual features (for example residue motions) within ensembles of protein–ligand structures and enables the facile sharing of observations between scientists. Secondly, OOMMPPAA, which incorporates protein–ligand activity data with protein–ligand structural data using three-dimensional matched molecular pairs. OOMMPPAA highlights nuanced structure–activity relationships (SAR) and summarizes available protein–ligand activity data in the protein context. In this paper, the background that led to the development of both tools is described. Their implementation is outlined and their utility using in-house Structural Genomics Consortium (SGC) data sets and openly available data from the PDB and ChEMBL is described. Both tools are freely available to use and download at http://wonka.sgc.ox.ac.uk/WONKA/ and http://oommppaa.sgc.ox.ac.uk/OOMMPPAA/.