Hi there, I am trying to make a complete list of cheap "process-ready" starting materials. I would like your help.

Can you think of 5 small organic molecules that are widely available, pure and cheap?
Try to think of ones that other people may not have thought of.
You can enter the common or IUPAC names on separate lines here (feel free to put in more or less that 5):

Optionally, you can provide:

Your name (so I can give you credit):

Email (if you want a copy of the final list):



A copy of othe current list is available here. Right now, it is almost entirely the EPA's high production volume chemical list. Those things beside the names are SMILES strings; neat, compact and unique descriptors of molecules. Further information is detailed below.


Just what do you need a list of cheap chemicals for?

I am interested in the properties of the set of all cheap, synthetically accessible compounds. I can generate molecules (up to connectivity, that's what those SMILES string are used for) of that space on a computer, by iteratively applying chemical reactions to the initial set of cheap chemicals. Think of compounds like water, ethanol, vanillin, sodium bisulfite, glucose, lysine etc.

What is a cheap synthetically accessible compound?

There is of course no one definition. I may decide to change the criteria later, but essentially, any compound that can be made from the set of cheap starting materials in 5 high yield steps, using only "easy" reactions. It is important to remember that "cheap" is a consequence of supply and demand, and is unpredictably time dependant.

What is an easy reaction?

That's a tough one, given that chemical reactions display unpredictable sensitivity to the local context of molecular connectivity and the global context of ambient conditions. A place to start would be the set of "click"1 reactions i.e. reactions that tend to have high, relatively context independent exergonicities.

What do you want to do with this set of cheap synthetically accessible compounds?

I am primarily interested in statistical descriptors of that set, to answer questions like:

Won't you run out of computing time / space before you could ever generate the complete set?

I don't know, yet. This is the part of the problem where some clever approximation will have to be employed.


1Angew. Chem. Int. Ed. 2001, 40,2004--2021