Medicinal Chemistry Data for
QSAR modeling & drug discovery

Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. When used with machine learning approaches, it decreases the number of compounds to be synthesized by facilitating the selection of the most promising candidates.

A critical bottleneck in QSAR modeling for drug discovery is the availability of high-quality, clean, annotated datasets on which to train algorithms and search for novel compounds. GOSTAR provides a massive, normalized, thoroughly quality-controlled data set that is available in part or in its entirety as a flat file. With the ability to connect via API, GOSTAR can seamlessly integrate into your systems, providing a continuously updated stream of bioactivity information. GOSTAR is uniquely suited to the needs of today’s AI / ML-based drug discovery programs.

“GOSTAR provides a range of different types of information – a range of different types of data – and that allows you to create a range of different types of predictions, and in drug discovery having many different predictive options is certainly helpful.”
Stephen MacKinnon
VP of Research and Development, Cyclica

Optimize Your Chemotypes

Take a quantum leap forward on building your predictive models using GOSTAR’s activity data on over 9,000,000 compounds. The scale of data which GOSTAR provides will allow your artificial intelligence and machine learning algorithms to clearly identify patterns and extrapolate those findings into potential new areas of discovery.

Ensure Patentability

Never worry about whether your newly identified structures are actually novel. GOSTAR has the deepest patent coverage of any structure-activity relationship database. GOSTAR has the largest coverage in patents for SAR content and helps you in determining novelty of your scaffolds or compounds. With its vast coverage of exemplified compounds from patents, it aids you in patent busting.


Content coverage from patents

  • Patent Documents – 87K
  • Compounds – 6.7M
  • SAR data points – 19.3M

Work Confidently

Our scientific experts curate, excerpt, enhance, and enrich every data point from many different types of sources into a relational data format. Content is subjected to a three-tiered, QMS-ISO certified quality control process, then standardized and normalized before being added to the user system. Our process is both rigorous and expedient, with new information being captured and added within just weeks of publication, ensuring you have an up-to-date view of the chemical space at all times.

Get Ahead

Better data – and more data – means better predictions. Better predictions mean more promising pipelines.

Find your next breakthrough. Start with GOSTAR.