SpaceGrow Dataset
The navigation through reaction-driven combinatorial libraries, chemical fragment spaces, offers synthetically accessible compounds far beyond the reach of enumerable databases. The SpaceGrow dataset[1] was compiled for 3D shape-based virtual screening applications to compare approaches for searching in chemical spaces with conventional approaches searching in enumerable databases. The dataset comprises 160 ligands from a list of known drugs[2] available in the PDBbind database.[3] For 56 ligands selected as references, ligands binding in the same active site were superimposed based on the binding site alignment[4] to form homologous ligand pairs. Both ligands of each pair were fragmented into a chemical space, the validation space, by cutting all acyclic bonds. Enumerating all molecules, the validation library contains 34,134 molecules.
The dataset includes the validation space and library. They assist in benchmarking tools regarding the rank and RMSD of ligands aligned to the corresponding reference drug compared to their binding site-based alignment. Accordingly, the reference molecules can be used to search the space and library to assess their rank and RMSD.
The dataset can be downloaded at https://fiona.uni-hamburg.de/37ebe22f/spacegrowdataset.zip.
[1] Hönig, S. M. N.; Flachsenberg, F.; Ehrt, C.; Neumann, A.; Schmidt, R.; Lemmen, C.; Rarey, M. SpaceGrow: Efficient Shape-Based Virtual Screening of Billion-Sized Combinatorial Fragment Spaces. J Comput Aided Mol Des 2024, 38 (1), 13. DOI: https://doi.org/10.1007/s10822-024-00551-7
[2] Neumann, A.; Marrison, L.; Klein, R. Relevance of the Trillion-Sized Chemical Space "eXplore" as a Source for Drug Discovery. ACS Med Chem Lett 2023, 14 (4), 466-472. DOI: https://doi.org/10.1021/acsmedchemlett.3c00021
[3] Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J Med Chem 2004, 47 (12), 2977-2980. DOI: https://doi.org/10.1021/jm030580l
[4] Bietz, S.; Rarey, M. SIENA: Efficient Compilation of Selective Protein Binding Site Ensembles. J Chem Inf Model 2016, 56 (1), 248-259. DOI: https://doi.org/10.1021/acs.jcim.5b00588