A copy of othe current list is available here.
Right now, it is almost entirely the EPA's high production volume chemical list.
Those things beside the names are SMILES strings; neat,
compact and unique descriptors of molecules. Further information is detailed below.
Just what do you need a list of cheap chemicals for?
I am interested in the properties of the set of all cheap, synthetically accessible
compounds. I can generate molecules (up to connectivity, that's what those SMILES
string are used for) of that space on a computer, by iteratively applying chemical
reactions to the initial set of cheap chemicals. Think of compounds like water, ethanol,
vanillin, sodium bisulfite, glucose, lysine etc.
What is a cheap synthetically accessible compound?
There is of course no one definition. I may decide to change the criteria later,
but essentially, any compound that can be made from the set of cheap starting materials
in 5 high yield steps, using only "easy" reactions. It is important to remember that
"cheap" is a consequence of supply and demand, and is unpredictably time dependant.
What is an easy reaction?
That's a tough one, given that chemical reactions display unpredictable sensitivity
to the local context of molecular connectivity and the global context of ambient
conditions. A place to start would be the set of "click"1 reactions
i.e. reactions that tend to have high, relatively context independent exergonicities.
What do you want to do with this set of cheap synthetically accessible compounds?
I am primarily interested in statistical descriptors of that set, to answer questions like:
- What is the size of the set?
- How does this set compare, under a given metric, to the set of natural products or drugs?
- What are the statistical descriptors of the respective set of synthetic routes?
Won't you run out of computing time / space before you could ever generate the complete set?
I don't know, yet. This is the part of the problem where some clever approximation will have to be employed.
1Angew. Chem. Int. Ed. 2001, 40,2004--2021