1 s2.0 S2468111317300166 main

(1)

Autoencoder Predicting Estrogenic Chemical Substances (APECS): An

improved approach for screening potentially estrogenic chemicals using

in vitro assays and deep learning

Lyle D. Burgoon

US Army Engineer Research and Development Center, RTP, NC 27711, United States

a r t i c l e

i n f o

Article history:

Received 28 January 2017

Received in revised form 7 March 2017 Accepted 12 March 2017

Available online 14 March 2017

Keywords: Deep learning Autoencoder Endocrine disruptor Estrogenic Predictive toxicology Computational toxicology High throughput screening

a b s t r a c t

In 2015 the US Environmental Protection Agency published a computational toxicology approach to screen chemicals for potential estrogenic activity. This complicated approach requires several steps, including concentration-response modeling (which includes fitting several different models and identi-fying the best model), application of a multi-factor mathematical model that attempts to model the concentration-response data, calculation of the area under the concentration-modeled response curve, and finally standardizing the area under the concentration-modeled response curve to that of 17-beta estradiol. Toxicologists will find it difficult to implement this approach on their own, creating a need for a more straightforward tool. Recently, it has been shown that deep learning approaches lead to less complicated approaches, that can run faster than more complicated approaches, while maintaining or improving overall algorithmic performance. In this paper we examine the Autoencoder Predicting Estrogenic Chemical Substances (APECS). APECS is two deep autoencoder models that achieve at least the same performance while being less complicated for an average toxicologist to use than the US EPA’s approach. Our deep autoencoders achieved accuracies of 91% vs 86% and 93% vs 93% on the in vivo and in vitro datasets used by the US EPA in validating their approach. Users can use our deep autoencoder models to make predictions of assay data by using our open source Java desktop applica-tions. APECS has a simple push-button interface and was written in Java.

Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Introduction

Governments around the world want to protect their citizens and environments from endocrine disrupting chemicals. These chemicals can either act as mimics of endocrine active substances, or disrupt endocrine signaling[4,5]. Depending upon the timing of exposure, the impacts of endocrine disrupting chemicals may be permanent or transient[5]. At the same time, there is interest in replacing animal models with lower cost and higher throughput in vitro assays.

The US Environmental Protection Agency recently developed a complicated, multistep algorithm and mathematical model to pre-dict if a chemical is an endocrine disruptor using data from the ToxCast program[1]. Beyond the complicated nature of the algo-rithm, the approach is also somewhat subjective in nature. For instance, the approach uses either the Hill model or the Gain-Loss model. However, the Hill model is known to not fit all sigmoidal shapes well, and a generalized sigmoidal model may

perform better generally[3]. In addition, non-sigmoidal relation-ships may exist in assay concentration-response data, which are best fit with other models, such as exponential or linear models [3]. Thus, a data-driven non-parametric approach to curve fitting is likely more appropriate[2].

In addition, the EPA’s approach uses the area under the concentration-response curve (AUC) to calculate similarity between a chemical’s concentration-response curve and 17-beta estradiol. The problem is that curves with very different shapes can all share the same AUC. For instance, a chemical with a sig-moidal concentration-response curve with an AUC of 75 units would be called similar to another chemical that is best fit with a quartic equation and an AUC of 75, or a chemical with an exponen-tial concentration-response curve and an AUC of 75. These shapes are all very different, but yield the same AUC, and have been seen in Tox21 data before. A more robust alternative is to use Pearson correlation, which is sensitive to shape.

The United States Army has interests in developing predictive computational toxicology models that use in vitro high throughput assays to identify promising new chemicals of military interest

http://dx.doi.org/10.1016/j.comtox.2017.03.002 2468-1113/Published by Elsevier B.V.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). E-mail address:[email protected]

Contents lists available atScienceDirect

Computational Toxicology

(2)

faster, that are less toxic to people and the environment, yet still acceptable to regulators. The Autoencoder Predicting Estrogenic Chemical Substances (APECS) are deep [learning] autoencoders that use in vitro high throughput screening assay data to predict if a chemical is estrogenic. Deep learning and autoencoder meth-ods have been used previously to create reduced order models – models that are less computationally intense, yet still yield good predictions[9,10]. Deep autoencoders have also been used as a nonlinear replacement for principal components analysis or singu-lar value decomposition, especially as an un-supervised or semi-supervised pattern recognition approach.

This paper will examine the APECS models and demonstrate that they perform generally at least as well as the US EPA’s method for predicting if a chemical is estrogenic, while also having the advantage of being simpler to implement and use.

Materials and methods

Data

Data were obtained from the US EPA’s ToxCast invitrodb_v2 database ( https://www.epa.gov/chemical-research/toxicity-fore-caster-toxcasttm-data). We obtained the in vitro and in vivo estro-genicity ‘‘ground truth” calls for chemicals from Browne et al.[1].

Software

All analysis and model building was performed in R (v3.2.4) H2O (v3.8.3.3) was used for model development. One model was built using the in vitro ground truth information (APECS-vitro) and another model was built using the in vivo ground truth infor-mation (APECS-vivo). Deep autoencoders are simply deep neural networks. Plain old java objects (POJOs) were exported from the H2O server. These POJOs contain the neuron weights, neural net-work architecture, and bias factors.

A JavaFX graphical user interface was built in Java (v1.8.0_05) that uses the POJOs to make predictions on user-supplied in vitro data. The open source graphical user interface code is available at GitHub (https://github.com/DataSciBurgoon/apecs_vivo and https://github.com/DataSciBurgoon/apecs_vitro). The executable desktop applications can be downloaded from GitHub (APECS-vivo: https://github.com/DataSciBurgoon/apecs_vivo/releases and APECS-vitro:https://github.com/DataSciBurgoon/apecs_vitro/ releases).

Analysis

Data for 10 of the assays reported in Browne et al.[1]were used. These 10 assays were: 1) NVS_NR_hER, 2) OT_ER_ERaERa_0480, 3) OT_ER_ERaERa_1440, 4) OT_ER_ERaERb_0480, 5) OT_ER_ER-aERb_1440, 6) OT_ER_ERbERb_0480, 7) OT_ER_ERbERb_1440, 8) TOX21_ERa_BLA_Agonist, 9)ATG_TRANS, 10) ATG_CIS. We had dif-ficulty finding the other assays listed by Browne et al.[1]within

the ToxCast database. This is not a concern given that APECS’ per-formance surpassed that reported in Browne, et al.[1]and the aim was to develop a reduced order model, not to perform a direct reproduction of the Browne et al.[1]study.

Loess was used to fit a nonlinear model to the concentration-response data for each chemical and assay combination. Pearson correlation was used to measure the similarity of the concentration-response curves for each chemical and assay combi-nation to the concentration-response curve for 17-beta estradiol in each assay. The resulting matrix (chemicals as rows, assays as col-umns, and correlation in each cell) was fed into the autoencoder function from H2O.

For the chemicals that are estrogenic in vitro and their negative controls from the Browne et al. study[1], the autoencoder had 3 hidden layers with 10, 2, and 10 neurons, respectively. For the estrogenic in vivo chemicals and their negative controls, also from the Browne et al. study[1], the autoencoder had 7 hidden layers with 43, 20, 5, 2, 5, 20, and 43 neurons, respectively. The number of neurons and the number of hidden layers in both cases was cho-sen using a grid search, with an eye toward optimal separation of the chemicals based on their classification as estrogenic or not.

The middle (2 neurons) hidden layer was projected into a Carte-sian plane for each autoencoder. This 2-dimensional projection serves as a nonlinear unsupervised clustering of the chemical data. A Euclidean distance that results in the best classification accuracy was chosen for each autoencoder. This is similar to choosing a cir-cular decision boundary centered on 17-beta estradiol. For the in vitro autoencoder the optimal distance was 1.35 units, and for the in vivo autoencoder the optimal distance was 1.50 units.

Results

The autoencoder approach achieved marginally higher accuracy than the ToxCast ER Model (Table 1). For the in vivo data, the autoencoder achieved 91% accuracy vs 86% for the ToxCast ER Model. The autoencoder did a better job at identifying true nega-tives, resulting in fewer false posinega-tives, while achieving the same performance for true positives. For the in vitro data, the autoen-coder achieved the same accuracy as the ToxCast ER Model (93% accuracy for both). Here, the autoencoder did a better job of iden-tifying true positives, resulting in no false negatives. The autoen-coder misclassified three true negatives as false positives, versus the ToxCast ER Model which misclassified only one, resulting in a lower specificity for the autoencoder. Having higher sensitivity is something we typically want to achieve in a screening assay, even at the expense of specificity.

One of the advantages of the autoencoder approach is that we can generate visualizations from the autoencoder that help us see the results (Figs. 1 and 2). In Figs. 1 and 2, we can see the impact of moving the decision boundary to greater than or less than 1.50 units (plots generated in R). This also allows us to see which chemicals have the most similar and the most different behaviors in the ToxCast assays compared to 17-beta estradiol.

Table 1

In Vivo and In Vitro Autoencoder Model Performance vs ToxCast ER Model Performance.

Performance In Vivo APECS*

ToxCast ER Model In Vivo In Vitro APECS*

ToxCast ER Model In Vitro

True Positives 29 29 28 26

True Negatives 10 8 9 11

False Positives 3 5 3 1

False Negatives 1 1 0 2

Sensitivity 97% 97% 100% 93%

Specificity 80% 67% 75% 92%

Accuracy 91% 86% 93% 93%

*

(3)

For instance, in the in vivo dataset, we see that 4-nonylphenol (chemical 13) does not behave much like 17-beta estradiol in the ToxCast assays, despite being called estrogenic based on the in vivo assays. In the in vitro dataset, procymidone (chemical 38) tends to behave more like 17-beta estradiol in the ToxCast assays compared to other non-estrogenic chemicals.

Next, the APECS in vivo and in vitro models were validated (see Table 3). The validation set consisted of 8 chemicals that were not used in the training of the model. These 8 chemicals were identi-fied from the curated uterotrophic database at the National Toxi-cology Program Interagency Center for Evaluation of Alternative Toxicological Methods (NICEATM)[6]. The chemicals are tamox-ifen, bisphenol F, benzophenone, 1,3-dinitrobenzene, oxybenzone, clofibrate, benzoic acid, and carbendazim. According to the curated NICEATM database, the first three are estrogenic and uterotrophic while the rest are not.

In our validation, we see that the in vivo APECS and in vitro APECS perform quite well. Again, we are less concerned about the lower performance on the specificity end since this is a screen-ing assay. The two misclassified chemicals in the in vitro APECS model were oxybenzone and clofibrate.

It is important to note that we do not have validation perfor-mance characteristics for the US EPA ToxCast ER model reported in Browne et al. – the US EPA did not publish any validation perfor-mance data. As far as we can tell, they have only published the results for their training data. Thus, we cannot compare our valida-tion performance against validavalida-tion performance characteristics for the US EPA model.

Discussion

Chemical producers, including the US Army, are interested in lower cost and faster alternatives to animal studies that allow them to screen new chemicals early in the development process. Likewise, risk assessors and risk managers are interested in faster alternatives to animal studies that allow them to make their deci-sions more quickly and more accurately for human and environ-mental health.

This creates a unique opportunity for data scientists to create decision-support tools that mine large databases such as ToxCast and PubChem to make predictions about a chemical’s potential risks.

Fig. 1.Visualization of the autoencoder results for the in vivo data. The axes represent the outputs from the two center neurons in the autoencoder. The numbers represent

specific chemicals (seeTable 2). Blue dots represent chemicals that APECS-vivo called estrogenic, while orange dots represent chemicals that APECS-vivo called not estrogenic.

Fig. 2.Visualization of the autoencoder results for the in vitro data. The axes represent the outputs from the two center neurons in the autoencoder. The numbers represent

(4)

However, for these tools to be useful, toxicologists, risk asses-sors and risk managers need easy to use and transparent computa-tional tools. The US EPA published their ToxCast ER Model, which is a computational approach that uses the ToxCast data to predict if chemicals are estrogenic. However, their approach is rather math-ematically and computationally complicated to implement. In addition, the ToxCast ER Model has a small number of parametric

model choices for concentration-response analysis that require expert judgment for use – meaning multiple experts can justify ferent parametric models and parameters, making justification dif-ficult. There are also challenges associated with using AUC for making calls of how similar a concentration-response curve is to 17-beta estradiol.

APECS overcomes the challenges associated with the ToxCast ER Model by using data-driven non-parametric, nonlinear modeling (LOESS) and the Pearson’s correlation instead of AUC to identify similar concentration-response curves. In addition, APECS is a reduced order model that is built on a deep learning autoencoder. This facilitated the development of a simple push-button graphical user interface that runs in Java to implement our prediction model. A user only needs to select their input file for the model to run.

APECS achieved marginally better performance than the EPA’s ToxCast ER Model. The model evaluated using in vivo ground truth achieved higher accuracy and specificity, while the model evalu-ated using the in vitro ground truth data achieved higher sensitiv-ity while maintaining the same accuracy. However, unlike the EPA, we have also validated our model against a completely different data set, and showed comparable performance levels on non-training data.

The in vitro dataset false positives for APECS in the training set were procymidone and spironolactone and corticosterone. This means that in the in vitro dataset, procymidone, spironolactone and corticosterone were ground truthed as non-estrogenic, but APECS called them both highly similar to 17-beta estradiol based on ToxCast assay performance, and thus estrogenic. It has been shown that procymidone may indirectly activate estrogen recep-tor, leading to increases in vitollogenin expression in rainbow trout cells in vitro, as well as proliferation of MCF-7 cells[7,8]. Thus, it may not be too surprising that procymidone is being called estro-genic by APECS, and perhaps this reflects a true positive in vitro result. Spironolactone shows some estrogen receptor agonist/ partial-agonist activity in Tox21 assays at PubChem ( https://pub-chem.ncbi.nlm.nih.gov/assay/bioactivity.html?cid=5833), while corticosterone has shown some activity in some estrogen receptor HTS assays in Tox21, it is largely seen as inconclusive or inactive in many assays.

In the in vivo dataset, 4-nonylphenol represented a ground truth positive that APECS called as a negative. Browne et al.[1] report that 4-nonylphenol is active in 5 assays and inactive in 4 assays. This suggests that 4-nonylphenol should probably be regarded as inconclusive, rather than as a positive. Based on the ToxCast data, APECS called this a negative, and that may be justified. Regardless, these results suggest that the data on 4-nonylphenol should be scrutinized further.

Conclusions

APECS further demonstrates that data science and deep learning approaches have utility in predictive toxicology. Technologies such as APECS will facilitate the future adoption of in vitro approaches to predictive toxicology and risk assessment, with the potential

Table 2

List of Chemicals Used in the In Vivo and In Vitro Datasets.

Chemical Name CASRN In Vivo

ID 2,20_,4,40_{-Tetrahydroxybenzophenone} _131-55-5 ₅ 2,4-Dihydroxybenzophenone 131-56-6 6 4,40_{-Sulfonyldiphenol} _80-09-1 ₁₄ 5alpha-Dihydrotestosterone 521-18-6 15 7

Amitrole 61-82-5 16

Apigenin 520-36-5 8

Atrazine 1912-24-9 17 9

Bisphenol A 80-05-7 18 10

Bisphenol AF 1478-61-1 19

Bisphenol B 77-40-7 20 11

Butyl benzyl phthalate 85-68-7 12

Butylparaben 94-26-8 21 Di(2-ethylhexyl) phthalate 117-81-7 23 17

Dibutyl phthalate 84-74-2 24 18

Dicofol 115-32-2 19

In Vivo and In Vitro APECS Performance on the Validation Set.

Performance on Validation Set In Vivo APECS In Vitro APECS

(5)

to replace animal models moving forward. Here we demonstrated that semi-supervised deep learning autoencoders were capable of predicting estrogenic chemicals based solely on data from 10 in vitro assays. This yields promise that reduced order models based on deep learning approaches may be able to replace more complicated computational systems biology models with easier to implement and faster to develop computational toxicology approaches based on data science and machine intelligence.

Acknowledgements

The US Army Environmental Quality and Installations (EQI) Research Program supported this work. The US Army EQI Research Program had no role in the study design, collection, analysis or interpretation of the data, the writing of the report, or in the deci-sion to publish this work. Permisdeci-sion has been granted by the Chief of Engineers for publication. The use of trade, product, or firm names are for descriptive purposes only and does not imply the endorsement of the United States Government.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in the online version, athttp://dx.doi.org/10.1016/j.comtox.2017.03. 002.

References

[1]P. Browne et al., Screening chemicals for estrogen receptor bioactivity using a computational model, Environ. Sci. Technol. 49 (2015) 8804–8814. [2]L.D. Burgoon et al., Using in vitro high-throughput screening data for

predicting Benzo[k] fluoranthene human health hazards, Risk Anal. (2016). [3]L.D. Burgoon, T.R. Zacharewski, Automated quantitative dose-response

modeling and point of departure determination for large toxicogenomic and high-throughput screening data sets, Toxicol. Sci. Off. J. Soc. Toxicol. 104 (2008) 412–418.

[4]D. Crews et al., Endocrine disruptors: present issues, future directions, Q. Rev. Biol. 75 (2000) 243–260.

[5]A.C. Gore et al., Endocrine disruption for endocrinologists (and Others), Endocrinology 147 (2006) s1–s3.

[6]N.C. Kleinstreuer et al., A curated database of rodent uterotrophic bioactivity, Environ. Health Perspect. 124 (2016) 556–562.

[7]S. Radice et al., Estrogenic activity of procymidone in primary cultured rainbow trout hepatocytes (Oncorhynchus mykiss), Toxicol. Vitro Int. J. Publ. Assoc. BIBRA 16 (2002) 475–480.

[8]S. Radice et al., Estrogenic effect of procymidone through activation of MAPK in MCF-7 breast carcinoma cell line, Life Sci. 78 (2006) 2716–2723.

[9] M. Wang, H.X. Li, W. Shen, 2016, Deep auto-encoder in model reduction of lage-scale spatiotemporal dynamics. in: 2016 International Joint Conference on Neural Networks (IJCNN)., pp. 3180–3186.