Subscribe: Quantitative Structure-Activity Relationships
Added By: Feedage Forager Feedage Grade B rated
Language: English
activity  based  binding  chemical  compound  compounds  data  design  generative  kinase  model  models  molecular  structure 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Quantitative Structure-Activity Relationships

Molecular Informatics

Wiley Online Library : Molecular Informatics

Published: 2018-01-01T00:00:00-05:00


Prediction of Protein−compound Binding Energies from Known Activity Data: Docking-score-based Method and its Applications


We used protein−compound docking simulations to develop a structure-based quantitative structure−activity relationship (QSAR) model. The prediction model used docking scores as descriptors. The binding free energy was approximated by a weighted average of docking scores for multiple proteins. This approximation was based on a pharmacophore model of receptor pockets and compounds. The weights of the docking scores were restricted to small values to avoid unrealistic weights by a regularization term. Additional outlier elimination improved the results. We applied this method to two groups of targets. The first target was the kinase family. The cross-validation results of 107 kinase proteins showed that the RMSE of predicted binding free energies was 1.1 kcal/mol. The second target was the matrix metalloproteinase (MMP) family, which has been difficult for docking programs. MMPs require metal-binding groups in their inhibitor structures in many cases. A quantum effect contributes to the metal−ligand interaction. Despite this difficulty, the present method worked well for the MMPs. This method showed that the RMSE of predicted binding free energies was 1.1 kcal/mol. In comparison, with the original docking method the RMSE was 1.7 kcal/mol. The results suggest that the present QSAR model should be applied to general target proteins.

Insights from Ion Binding Site Network Analysis into Evolution and Functions of Proteins


Many biological phenomena can be represented as complex networks. Using a protein binding site comparison approach, we generated a network of ion binding sites on the scale of all known protein structures from the Protein Data Bank. We found that this ion binding site similarity network is scale-free, indicating a network in which a few ion binding site scaffolds are the network hubs, and these are connected to hundreds of nodes, whereas the vast majority of nodes have only a few neighbors. Enrichment and statistical analysis of the network components and communities yielded insights into underlying processes from the functional and the structural perspective. Largest components and communities were observed to be closely related to basic metabolic processes and some of the most common structural folds, which, from the evolutionary point of view, indicates that they may be the oldest ones. Further, we derived the first comprehensive map of ion interchangeability, based on binding site similarity. Several highly interchangeable protein-ion binding site pairs emerged (e.g., Ca2+ and Mg2+), as well as structurally distinct ones. The constructed network of ion binding site similarities will aid in understanding the general principles of protein-ion binding sites structure, function and evolution. We demonstrate potential uses of the network on proteins involved in cancer development and immune response, where individual ions play prominent roles in disease development.

Tox21 Enricher: Web-based Chemical/Biological Functional Annotation Analysis Tool Based on Tox21 Toxicity Screening Platform


The US Toxicology Testing in the 21st Century (Tox21) program was established to develop more efficient and human-relevant toxicity assessment methods. The Tox21 program screens >10,000 chemicals using quantitative high-throughput screening (qHTS) of assays that measure effects on toxicity pathways. To date, more than 70 assays have yielded >12 million concentration-response curves. The patterns of activity across assays can be used to define similarity between chemicals. Assuming chemicals with similar activity profiles have similar toxicological properties, we may infer toxicological properties based on its neighbourhood. One approach to inference is chemical/biological annotation enrichment analysis. Here, we present Tox21 Enricher, a web-based chemical annotation enrichment tool for the Tox21 toxicity screening platform. Tox21 Enricher identifies over-represented chemical/biological annotations among lists of chemicals (neighbourhoods), facilitating the identification of the toxicological properties and mechanisms in the chemical set.

Development of Predictive QSAR Models of 4-Thiazolidinones Antitrypanosomal Activity using Modern Machine Learning Algorithms


This paper presents novel QSAR models for the prediction of antitrypanosomal activity among thiazolidines and related heterocycles. The performance of four machine learning algorithms: Random Forest regression, Stochastic gradient boosting, Multivariate adaptive regression splines and Gaussian processes regression have been studied in order to reach better levels of predictivity. The results for Random Forest and Gaussian processes regression are comparable and outperform other studied methods. The preliminary descriptor selection with Boruta method improved the outcome of machine learning methods. The two novel QSAR-models developed with Random Forest and Gaussian processes regression algorithms have good predictive ability, which was proved by the external evaluation of the test set with corresponding Q2ext=0.812 and Q2ext=0.830. The obtained models can be used further for in silico screening of virtual libraries in the same chemical domain in order to find new antitrypanosomal agents. Thorough analysis of descriptors influence in the QSAR models and interpretation of their chemical meaning allows to highlight a number of structure-activity relationships. The presence of phenyl rings with electron-withdrawing atoms or groups in para-position, increased number of aromatic rings, high branching but short chains, high HOMO energy, and the introduction of 1-substituted 2-indolyl fragment into the molecular structure have been recognized as trypanocidal activity prerequisites.

Protocols for the Design of Kinase-Focused Compound Libraries


Protocols for the design of kinase-focused compound libraries are presented. Kinase-focused compound libraries can be differentiated based on the design goal. Depending on whether the library should be a discovery library specific for one particular kinase, a general discovery library for multiple distinct kinase projects, or even phenotypic screening, there exists today a variety of in silico methods to design candidate compound libraries. We address the following scenarios: 1) Datamining of SAR databases and kinase focused vendor catalogues; 2) Predictions and virtual screening; 3) Structure-based design of combinatorial kinase inhibitors; 4) Design of covalent kinase inhibitors; 5) Design of macrocyclic kinase inhibitors; and 6) Design of allosteric kinase inhibitors and activators.

Unbinding of Kinesin from Microtubule in the Strongly Bound States Enhances under Assisting Forces


The ability to predict the cellular dynamics of intracellular transport has enormous potential to impact human health. A key transporter is kinesin-1, an ATP-driven molecular motor that shuttles cellular cargos along microtubules (MTs). The dynamics of kinesins depends critically on their unbinding rate from MT, which varies depending on the force direction applied on the motor, i.e. the force-unbinding rate relation is asymmetric. However, it remains unclear how changing the force direction from resisting (applied against the motion direction) to assisting (applied in the motion direction) alters the kinesin's unbinding and stepping. Here, we propose a theoretical model for the influence of the force direction on the stepping dynamics of a single kinesin. The model shows that the asymmetry of the force-unbinding rate relation is independent of ATP concentration. It also reveals that the synthesis of ATP from backward stepping under assisting forces is less likely than under resisting forces. It then finds that the unbinding of kinesin in the strongly MT-bound kinetic states enhances under assisting forces.

In Silico Studies of Mammalian δ-ALAD Interactions with Selenides and Selenoxides


Previous studies have shown that the mammalian δ-aminolevulinic acid dehydratase (δ-ALAD) is inhibited by selenides and selenoxides, which can involve thiol oxidation. However, the precise molecular interaction of selenides and selenoxides with the active center of the enzyme is unknown. Here, we try to explain the interaction of selenides and the respective selenoxides with human δ-ALAD by in silico molecular docking. The in silico data indicated that Se atoms of selenoxides have higher electrophilic character than their respective selenides. Further, the presence of oxygen increased the interaction of selenoxides with the δ-ALAD active site by O…Zn coordination. The interaction of S atom from Cys124 with the Se atom indicated the importance of the nucleophilic attack of the enzyme thiolate to the organoselenium molecules. These observations help us to understand the interaction of target proteins with organoselenium compounds.

An Improved Binary Differential Evolution Algorithm for Feature Selection in Molecular Signatures


The discovery of biomarkers from high-dimensional data is a very challenging task in cancer diagnoses. On the one hand, biomarker discovery is the so-called high-dimensional small-sample problem. On the other hand, these data are redundant and noisy. In recent years, biomarker discovery from high-throughput biological data has become an increasingly important emerging topic in the field of bioinformatics. In this study, we propose a binary differential evolution algorithm for feature selection. Firstly, we suggest using a two-stage approach, where three filter methods including the Fisher score, T-statistics, and Information gain are used to generate the feature pool for input to differential evolution (DE). Secondly, in order to improve the performance of differential evolution algorithm for feature selection, a new variant of binary DE called BDE is proposed. Three optimization strategies are incorporated into the BDE. The first strategy is the heuristic method in initial stage, the second one is the self-adaptive parameter control, and the third one is the minimum change value to improve the exploration behaviour thus enhance the diversity. Finally, Support vector machine (SVM) is used as the classifier in 10 fold cross-validation method. The experimental results of our proposed algorithm on some benchmark datasets demonstrate the effectiveness of our algorithm. In addition, the BDE forged in this study will be of great potential in feature selection problems.

R-based Tool for a Pairwise Structure-Activity Relationship Analysis


The Structure-Activity Relationship analysis is a complex process that can be enhanced by computational techniques. This article describes a simple tool for SAR analysis that has a graphic user interface and a flexible approach towards the input of molecular data. The application allows calculating molecular similarity represented by Tanimoto index & Euclid distance, as well as, determining activity cliffs by means of Structure-Activity Landscape Index. The calculation is performed in a pairwise manner either for the reference compound and other compounds or for all possible pairs in the data set. The results of SAR analysis are visualized using two types of plot. The application capability is demonstrated by the analysis of a set of COX2 inhibitors with respect to Isoxicam. This tool is available online: it includes manual and input file examples.

Importance of an Orchestrate Participation of Each Individual Residue Present at a Catalytic Site


GTP hydrolysis is indispensable to keep a living cell healthy. Nature has evolved so many enzymes to enhance the slow GTP hydrolysis. Rab GTPases are evolved to regulate vesicle trafficking. GTPase activating proteins (GAPs) accelerates their intrinsic slow GTP hydrolysis in order to maintain the sustainability between cellular events. Any malfunction/interference in this hydrolysis disrupts normal cellular events and causes severe diseases. In this study, GTP hydrolysis mechanism of Rab33B catalyzed by TBC-domain GAP protein Gyp1p has been decoded using extensive ab initio QM/MM metadynamics simulations. An organized coupled movement of individual residues present at the catalytic site is found to be the key factor for this reaction. An unorganized coupled movement leads the hydrolysis through very high energy pathways. This also reveals that the chemical transformations occurring at a catalytic site are residue specific.

Ligand-based Modeling for the Prediction of Pharmacophore Features for Multi-targeted Inhibition of the Arachidonic Acid Cascade


The single-target drugs against the arachidonic acid inflammatory pathway are associated with serious side effects, hence, as a first step towards multi-target drugs, we have studied the pharmacophoric features common to the inhibitors of 5-lipoxygenase-activating protein (FLAP), microsomal prostaglandin E-synthase 1 (mPGES-1) and leukotriene A4 hydrolase (LTA4H). FLAP and mPGES-1 shared subfamily-specific positions (SSPs) and four mPGES-1 inhibitors binding to them mapped onto the pharmacophore derived from FLAP inhibitors (Ph-FLAP). The reactions of mPGES-1 and LTA4H had high structural similarity. The pharmacophore derived from two substrate mimic inhibitors of LTA4H (Ph-LTA4H) also mapped onto three mPGES-1 inhibitors. Screening of in-house database for Ph-FLAP and Ph-LTA4H identified one compound, C1. It inhibited the production of the mPGES-1 product, prostaglandin E2 (PGE2) by 97.8±1.6 % at 50 μM in HeLa cells and can be a starting point for designing molecules inhibiting all three targets simultaneously.

Virtual Screening Approach of Bacterial Peptide Deformylase Inhibitors Results in New Antibiotics


The increasing resistance of bacteria to antibacterial therapy poses an enormous health problem, it renders the development of new antibacterial agents with novel mechanism of action an urgent need. Peptide deformylase, a metalloenzyme which catalytically removes N-formyl group from N-terminal methionine of newly synthesized polypeptides, is an important target in antibacterial drug discovery. In this study, we report the structure-based virtual screening of ZINC database in order to discover potential hits as bacterial peptide deformylase enzyme inhibitors with more affinity as compared to GSK1322322, previously known inhibitor. After virtual screening, fifteen compounds of the top hits predicted were purchased and evaluated in vitro for their antibacterial activities against one Gram positive (Staphylococcus aureus) and three Gram negative (Escherichia coli, Pseudomonas aeruginosa and Klebsiella. pneumoniae) bacteria in different concentrations by disc diffusion method. Out of these, three compounds, ZINC00039650, ZINC03872971 and ZINC00126407, exhibited significant zone of inhibition. The results obtained were confirmed using the dilution method. Thus, these proposed compounds may aid the development of more efficient antibacterial agents.

RJSplot: Interactive Graphs with R


Data visualization techniques provide new methods for the generation of interactive graphs. These graphs allow a better exploration and interpretation of data but their creation requires advanced knowledge of graphical libraries. Recent packages have enabled the integration of interactive graphs in R. However, R provides limited graphical packages that allow the generation of interactive graphs for computational biology applications. The present project has joined the analytical power of R with the interactive graphical features of JavaScript in a new R package (RJSplot). It enables the easy generation of interactive graphs in R, provides new visualization capabilities, and contributes to the advance of computational biology analytical methods. At present, 16 interactive graphics are available in RJSplot, such as the genome viewer, Manhattan plots, 3D plots, heatmaps, dendrograms, networks, and so on. The RJSplot package is freely available online at

Multi-Objective Optimization of Benzamide Derivatives as Rho Kinase Inhibitors


Despite recent advances in Computer Aided Drug Discovery and High Throughput Screening, the attrition rates of drug candidates continue to be high, underscoring the inherent complexity of the drug discovery paradigm. Indeed, a compromise between several objectives is often required to obtain successful clinical drugs. The present manuscript details a multi-objective workflow that integrates the 4D-QSAR and molecular docking methods in the simultaneous modeling of the Rho Kinase inhibitory activity and acute toxicity of Benzamide derivatives. To this end, the pIC50/pLD50 ratio is considered as the response variable, permitting the concurrent modeling of both properties and representing a shift from classical step-by-step evaluations. The 4D-QSAR strategy is used to generate the Grid Cell Occupancy Descriptors (GCODs), and Stochastic Gradient Boosting (SGB) and Partial Least Squares (PLS) methods as the model fitting techniques. While the statistical parameters for the PLS model do not meet established criteria for acceptability, the SGB model yields satisfactory performance, with correlation coefficients r2=0.95 and r2pred=0.65 for the training and test set, respectively. Posteriorly, the structural interpretation of the most relevant GCODs according to the SGB model is performed, allowing for the proposal of 139 novel benzamide derivatives, which are then screened using the same model. Of these 9 compounds were predicted to possess pIC50/pLD50 ratio values higher than those for the employed dataset. Finally, in order to corroborate the results obtained with the SGB model, a docking simulation was formed to evaluate the binding affinity of the proposed molecules to the ROCK2 active site and 3 chemical structures (i. e. p6, p14 and p131) showed higher binding affinity than the most active compound in the training set, while the rest generally demonstrated comparable behavior. It may therefore be concluded that the consensus models that intertwine the 4D-QSAR and molecular docking methods contribute to more reliable virtual screening and compound optimization experiments. Additionally, the use of multi-objective modeling schemes permits the simultaneous evaluation of different chemical and biological profiles, which should contribute to the control a priori of causative factors for the high attrition rates in later drug discovery phases.

Design, Synthesis, SAR and Molecular Modeling Studies of Novel Imidazo[2,1-b][1,3,4]Thiadiazole Derivatives as Highly Potent Antimicrobial Agents


In this study, a novel series of phenyl substituted imidazo[2,1-b][1,3,4]thiadiazole derivatives were synthesized, characterized and explored for antibacterial activity against Gram-negative Escherichia coli, Gram-positive Staphylococcus aureus and Bacillus subtilis and antifungal activity against Candida albicans. Most of the synthesized compounds exhibited remarkable antimicrobial activities, some of which being ten times more potent than positive controls. The most promising compound showed excellent activity with MIC value of 0.03 μg/ml against both S. aureus and B. subtilis (MIC values of positive compound Chloramphenicol are 0.4 μg/ml and 0.85 μg/ml, respectively). Furthermore, structure-activity relationship was also investigated with the help of computational tools. Some physicochemical and ADME properties of the compounds were calculated too. The combination of electronic structure calculations performed at PM6 level and molecular docking simulations using Glide extra-precision mode showed that the hydrophobic nature of keto aryl ring with no electron withdrawing substituents at para position enhances activity while electron-donating substituents at the second aryl ring is detrimental to activity.

Cover Picture: (Mol. Inf. 1-2/2018)


Artistic representation of a generative neural network for molecular design (copyright: Max Pillong, with kind permission)

Cheminformatics and the Mean


Classifiers and their Metrics Quantified


Molecular modeling frequently constructs classification models for the prediction of two-class entities, such as compound bio(in)activity, chemical property (non)existence, protein (non)interaction, and so forth. The models are evaluated using well known metrics such as accuracy or true positive rates. However, these frequently used metrics applied to retrospective and/or artificially generated prediction datasets can potentially overestimate true performance in actual prospective experiments. Here, we systematically consider metric value surface generation as a consequence of data balance, and propose the computation of an inverse cumulative distribution function taken over a metric surface. The proposed distribution analysis can aid in the selection of metrics when formulating study design. In addition to theoretical analyses, a practical example in chemogenomic virtual screening highlights the care required in metric selection and interpretation.

Transductive Ridge Regression in Structure-activity Modeling


In this article we consider the application of the Transductive Ridge Regression (TRR) approach to structure-activity modeling. An original procedure of the TRR parameters optimization is suggested. Calculations performed on 3 different datasets involving two types of descriptors demonstrated that TRR outperforms its non-transductive analogue (Ridge Regression) in more than 90 % of cases. The most significant transductive effect was observed for small datasets. This suggests that transduction may be particularly useful when the data are expensive or difficult to collect.

Deep Generative Models for Molecular Science


Generative deep machine learning models now rival traditional quantum-mechanical computations in predicting properties of new structures, and they come with a significantly lower computational cost, opening new avenues in computational molecular science. In the last few years, a variety of deep generative models have been proposed for modeling molecules, which differ in both their model structure and choice of input features. We review these recent advances within deep generative models for predicting molecular properties, with particular focus on models based on the probabilistic autoencoder (or variational autoencoder, VAE) approach in which the molecular structure is embedded in a latent vector space from which its properties can be predicted and its structure can be restored.

De Novo Design of Bioactive Small Molecules by Artificial Intelligence


Generative artificial intelligence offers a fresh view on molecular design. We present the first-time prospective application of a deep learning model for designing new druglike compounds with desired activities. For this purpose, we trained a recurrent neural network to capture the constitution of a large set of known bioactive compounds represented as SMILES strings. By transfer learning, this general model was fine-tuned on recognizing retinoid X and peroxisome proliferator-activated receptor agonists. We synthesized five top-ranking compounds designed by the generative model. Four of the compounds revealed nanomolar to low-micromolar receptor modulatory activity in cell-based assays. Apparently, the computational model intrinsically captured relevant chemical and biological knowledge without the need for explicit rules. The results of this study advocate generative artificial intelligence for prospective de novo molecular design, and demonstrate the potential of these methods for future medicinal chemistry.

Generative Recurrent Networks for De Novo Drug Design


Generative artificial intelligence models present a fresh approach to chemogenomics and de novo drug design, as they provide researchers with the ability to narrow down their search of the chemical space and focus on regions of interest. We present a method for molecular de novo design that utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells. This computational model captured the syntax of molecular representation in terms of SMILES strings with close to perfect accuracy. The learned pattern probabilities can be used for de novo SMILES generation. This molecular design concept eliminates the need for virtual compound library enumeration. By employing transfer learning, we fine-tuned the RNN′s predictions for specific molecular targets. This approach enables virtual compound design without requiring secondary or external activity prediction, which could introduce error or unwanted bias. The results obtained advocate this generative RNN-LSTM system for high-impact use cases, such as low-data drug discovery, fragment based molecular design, and hit-to-lead optimization for diverse drug targets.

Application of Generative Autoencoder in De Novo Molecular Design


A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the trainings set were identified.

Active Search for Computer-aided Drug Design


We consider lead discovery as active search in a space of labelled graphs. In particular, we extend our recent data-driven adaptive Markov chain approach, and evaluate it on a focused drug design problem, where we search for an antagonist of an αv integrin, the target protein that belongs to a group of Arg−Gly−Asp integrin receptors. This group of integrin receptors is thought to play a key role in idiopathic pulmonary fibrosis, a chronic lung disease of significant pharmaceutical interest. As an in silico proxy of the binding affinity, we use a molecular docking score to an experimentally determined αvβ6 protein structure. The search is driven by a probabilistic surrogate of the activity of all molecules from that space. As the process evolves and the algorithm observes the activity scores of the previously designed molecules, the hypothesis of the activity is refined. The algorithm is guaranteed to converge in probability to the best hypothesis from an a priori specified hypothesis space. In our empirical evaluations, the approach achieves a large structural variety of designed molecular structures for which the docking score is better than the desired threshold. Some novel molecules, suggested to be active by the surrogate model, provoke a significant interest from the perspective of medicinal chemistry and warrant prioritization for synthesis. Moreover, the approach discovered 19 out of the 24 active compounds which are known to be active from previous biological assays.

Identification of Bioactive Scaffolds Based on QSAR Models


In medicinal chemistry, the molecular scaffolds commonly found in compounds with preferable biological activities are called bioactive scaffolds. They are important because if present in a structure, it is more likely that the compound will be bioactive. Traditionally, medicinal chemists use their knowledge to identify bioactive scaffolds from a given data set after systematic extraction of all candidate scaffolds. However, manually sorting all the scaffolds is not practical as the number of compounds in a data set is often very large. Herein, we propose a method to systematically identify bioactive scaffolds based on a structure generator and a QSAR model. Two proof-of-concept studies showed that known bioactive scaffolds as well as scaffolds containing important substructures were extracted. The proposed method does not depend on scaffold frequencies in a data set, which is different from currently used methods for bioactive scaffold identification.

Monitoring of the Conformational Space of Dipeptides by Generative Topographic Mapping


This work describes a procedure to build generative topographic maps (GTM) as 2D representation of the conformational space (CS) of dipeptides. GTMs with excellent propensities to support highly predictive landscapes of various conformational properties were reported for three dipeptides (AA, KE and KR). CS monitoring via GTMproceeds through the projection of conformer ensembles on the map, producing cumulated responsibility (CR) vectors characteristic of the CS areas covered by the ensemble. Overlap of the CS areas visited by two distinct simulations can be expressed by the Tanimoto coefficient Tc of the associated CRs. This idea was used to monitor the reproducibility of the stochastic evolutionary conformer generation process implemented in S4MPLE. It could be shown that conformers produced by <500 S4MPLE runs reproducibly cover the relevant CS zone at given setup of the driving force field. The propensity of a simulation to visit the native CS zone can thus be quantitatively estimated, as the Tc score with respect to the “native“ CR, as defined by the ensemble of dipeptide geometries extracted from PDB proteins. It could be shown that low-energy CS regions were indeed found to fall within the native zone. The Tc overlap score behaved as a smooth function of force field parameters. This opens the perspective of a novel force field parameter tuning procedure, bound to simultaneously optimize the behavior of the in Silico simulations for every possible dipeptide.