Ask a RIFM Scientist: Can artificial intelligence help evaluate safety?


Artificial intelligence, or AI, once the stuff of science fiction, has seeped into the fabric of everyday life—it drives a whole host of processes that we mostly take for granted, from Google’s search algorithms to electronic health records used to gauge patient risk.

In last week’s Ask a RIFM Scientist, we hinted at the impact that artificial intelligence is beginning to have on our understanding of chemicals. This week, we take a deeper dive.

Manoj Kumar, PhD, joined the Research Institute for Fragrance Materials (RIFM) in February 2020 as a Computational Chemist with a background of extensive use of AI in chemical clustering (organizing chemicals into structurally similar groups).

Q: What role might artificial intelligence have in evaluating the safety of fragrance materials?

The ultimate goal of applying AI in RIFM’s work is to develop supervised and unsupervised machine-learning models that will speed up the read-across process (using data from one material to accurately predict how other similar materials may affect human health and the environment).

We will develop a specific machine-learning model for each health aspect, or “endpoint,” that RIFM assesses (e.g., reproductive toxicity, skin sensitization). Each of these models will predict endpoint-specific read-across options for a given chemical with high accuracy. This will significantly reduce the time needed to complete the read-across process, making the safety assessment writing process more efficient.

Data science skills will also play a crucial role in automating the tedious job of transferring data from one format to another. RIFM scientists are already using Python-written code to transfer exposure data from the Creme RIFM Aggregate Exposure Model, saving them many hours that they can now focus on other projects.

Data science skills are going to play an even more significant role in automating the handling of natural complex substance (NCS) safety assessments. NCS assessments involve many chemicals, and manually transferring data from all of them will slow down the process and pose a higher risk of human error while manually typing in data.

Q: What does the process of developing AI look like?

AI is 80% cleaning and extracting data and 20% model building. For RIFM, this will involve collecting, cleaning, and formatting data, and then leveraging that data to build machine-learning models to make endpoint-specific predictions.

RIFM scientists have already completed safety assessments covering 85% of the fragrance industry’s volume of use. These assessments themselves provide a wealth of information that we can feed into machine-learning models. RIFM can use these models to make useful predictions—for instance, whether a given chemical is potentially genotoxic or a skin sensitizer.

We will also draw from the RIFM Database, which is the most comprehensive, worldwide source of toxicology data, literature, and general information on fragrance and flavor raw materials. Finally, we will be able to leverage AI to select read-across options for a given chemical, either by clustering them or using a classification algorithm.

Ultimately, every step in the safety assessment process may be guided in some way by predictive algorithms.

Q: Are there any other extensive collections of information from which RIFM might draw?

In addition to RIFM’s safety assessments and the RIFM Database, we will also look for useful data from several open-source platforms, including the Environmental Protection Agency, Food and Drug Administration, National Toxicology Program, and National Institutes of Health (including NIH’s PubChem database).

Q: Is AI an expensive tool? When will RIFM begin to see the results of its initial AI work?

Python, which is the primary programming language that RIFM is using to develop AI, is open-source, which means you do not need to pay anything to use it. Python is robust while also being readable and easy to learn. These attributes make it suitable for programmers of all backgrounds, which is why Python is one of the most widely used programming languages.

RIFM will start to see the impact of AI-based predictive algorithms by the beginning of summer.

Currently, we are working on extracting, parsing, and preparing the data for model building. Our initial interactions with other RIFM scientists suggest that the data that we are currently extracting from safety documents are of immense value to all toxicologists. This information will be made accessible via web apps, which will not only be useful to RIFM staff but to RIFM members and other fragrance safety stakeholders.

Read more about RIFM’s Safety Assessment program here.