The American Heart Association is committed to convening the best collaborators in science, technology, computational biology, engineering and other fields to accelerate the development of lifesaving treatments for patients. We’ll accomplish our goal by using the most advanced and powerful systems at the intersection of science and technology.
Together the AHA and Lawrence Livermore National Laboratory are creating datasets and tools that will take advantage of unique national resources. For example, it’s using one of the world's top-ranked supercomputers to discover potential protein-drug target interactions rapidly in an unbiased fashion and to predict adverse drug reactions for candidate compounds. The result will be new therapies that come to fruition more quickly. By leveraging world-class, bleeding-edge technology, we can create a comprehensive, open-access reference atlas of cell-protein targets to accelerate and hone drug discovery.
It’s a robust database of protein models and their associated docking interactions with the small molecule library. Since 2017 the Center for Accelerated Drug Discovery has engaged in research resulting in:
- A significant database of nearly 12,000 human protein models, instrumented for molecular screening. These models represent biologically-relevant protein assemblies. Therefore about 9,000 models represent homo- and hetero-dimers and larger constructs.
- High-performance artificial intelligence/machine learning software optimized for high-performance computing platforms, enabling modeling of small-molecule binding to proteins.
- A library of about 2 million small molecules available for binding calculations; currently docking/binding calculations against the entire protein database have been completed for approximately 500,000 molecules.
As mentioned above, the database holds nearly 12,000 human protein models and 2 million small molecules. Data for each molecule includes a set of calculated features – “molecular descriptors,” from the simple (e.g., molecular weight) to the complex (e.g., polar surface area). The molecules were drawn from these compound sets:
- ChEMBL 28 : 1.9 million.
- Broad approved and developmental drugs : 6,550.
- Foodome : 24,144.
- Marine : 867.
Over time additional protein models can be added to the Protein Binding Atlas. The library of small molecules also can be augmented and docking/binding calculations can be performed against the proteins.
The CADD Protein Binding Atlas has the unique ability to pair with the computational and software tools the Center for Accelerated Drug Discovery has created, making it particularly valuable. This Protein Binding Atlas is curated to model protein small-molecule compound interactions, which differentiates it from others in the space.
The Protein Binding Atlas can be accessed through this link. Currently the site is open to all AHA Professional Members.
General protein search:
From the protein query page, enter or select search criteria and click Search button. Click on protein result meeting search criteria. See Docked Compounds list.
UniProt ID protein search:
From the protein query page, enter the human reference UniProt ID in the Protein Search text box and click Search button. Click on protein result meeting search criteria. See Docked Compounds list.
UniProt IDs “P04035,Q9UHC9” example:
From the protein query page, enter the human reference UniProt IDs “ P04035,Q9UHC9 ” in the Protein Search text box and click Search button. Click on “ NPC1-like intracellular cholesterol transporter 1 ”. See Docked Compounds list. Note ezetimibe is a known drug target.
“Cholesterol Biosynthesis” Pathway example:
From the protein query page, select “cholesterol biosynthesis” Pathway and click Search button. Click on “ 7-dehydrocholesterol reductase ”. See Docked Compounds list. Note coenzyme-I is a known drug target.
First try searching by the human reference UniProt id as described inUniProt ID protein search under the previous question.
It’s possible your protein of interest may not be available in the set of 12,000 human protein models. Please contact us about your protein of interest.
The Protein Binding Atlas is a finite piece of knowledge in time. It contains about 200 terabytes of data, so all the data will not fit on a USB drive. But subsets of the data can be downloaded (enabled through better search tools in the data portal) if they are well filtered/curated and then loaded onto a USB drive or another server.
From the compound, protein, or predict protein binding query and detail pages, the “Download All” button or links will provide a metadata csv file for the search or listing results. From the protein and predict protein binding, the “Download Structure or PDB File” buttons will provide PDB formatted files.
Please see Downloads on info page.
You can use the download options listed in the previous question to download and save the results. You can copy the query url to recreate the search results.
Multiple docking poses were assessed. Poses with scores that meet the cutoff (Docking: < -7.5 kcal/mol, CoherentFusion > 7.5, GBSA < -30) have a higher confidence.
Please cite the AHA Protein Portal DOI: 10.11578/1969730.
Please contact us with your inquiry. We’re interested in discussing potential collaborations.
Please contact us to share about your experience. We’re interested in feedback that may help address your use case.