HELLS Dataset
HELLS - A Dataset of Lead-Like Molecules
The Hamburg Enumerated Lead-Like Set (HELLS)[1] is a collection of 503,974,653 lead-like molecules generated by recombination of fragments from approved drug molecules. It was generated using the "Approved Drugs" set from Drugbank,[2] the enumeration tool FSees, and BRICS[3] fragmentation rules.
The initial fragment space contained 1214 fragments from 1009 molecules. We used 183 fragments containing at least two linkers and at least one ring with a size of at least 5 as starting points for the dataset.
The full HELLS dataset (SMILES format) as gzipped file can be downloaded from here (file size: 1.6 GB).
[1] Lauck, F.; Rarey, M. FSees: Customized Enumeration of Chemical Subspaces with Limited Main Memory Consumption. J Chem Inf Model 2016, 56 (9), 1641-1653. DOI: https://doi.org/10.1021/acs.jcim.6b00117
[2] Drugbank. https://go.drugbank.com/
[3] Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A.; Rarey, M. On the Art of Compiling and Using 'Drug-Like' Chemical Fragment Spaces. ChemMedChem 2008, 3 (10), 1503-1507. DOI: https://doi.org/10.1002/cmdc.200800178