Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs

(1)

RESEARCH ARTICLE

Accuracy of a Deep Learning System for

Classi ﬁ cation of Papilledema Severity on Ocular Fundus Photographs

Caroline Vasseneix, MD, Raymond P. Najjar, PhD, Xinxing Xu, PhD, Zhiqun Tang, PhD,

Jing Liang Loo, MBBS, MMed, FRCS(Ed), Shweta Singhal, MBBS, Sharon Tow, MBBS, Leonard Milea, Daniel Shu Wei Ting, MD, PhD, Yong Liu, PhD, Tien Y. Wong, MD, PhD, Nancy J. Newman, MD, Valerie Biousse, MD, and Dan Milea, MD, PhD, on behalf of the BONSAI Group

Neurology

®

2021;97:e369-e377. doi:10.1212/WNL.0000000000012226

Correspondence Dr. D. Milea

[email protected]

Abstract

Objective

To evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure on standard retinal fundus photographs.

Methods

A DLS was trained to automatically classify papilledema severity in 965 patients (2,103 mydriatic fundus photographs), representing a multiethnic cohort of patients with confirmed elevated intracranial pressure. Training was performed on 1,052 photographs with mild/moderate papilledema (MP) and 1,051 photographs with severe papilledema (SP) classified by a panel of experts. The performance of the DLS and that of 3 independent neuro-ophthalmologists were tested in 111 patients (214 photographs, 92 with MP and 122 with SP) by calculating the area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity.

Kappa agreement scores between the DLS and each of the 3 graders and among the 3 graders were calculated.

Results

The DLS successfully discriminated between photographs of MP and SP, with an AUC of 0.93 (95% confidence interval [CI] 0.89–0.96) and an accuracy, sensitivity, and specificity of 87.9%, 91.8%, and 86.2%, respectively. This performance was comparable with that of the 3 neuro- ophthalmologists (84.1%, 91.8%, and 73.9%,p = 0.19, p = 1, p= 0.09, respectively). Mis- classification by the DLS was mainly observed for moderate papilledema (Frisén grade 3).

Agreement scores between the DLS and the neuro-ophthalmologists’evaluation was 0.62 (95%

CI 0.57–0.68), whereas the intergrader agreement among the 3 neuro-ophthalmologists was 0.54 (95% CI 0.47–0.62).

Conclusions

Our DLS accurately classiﬁed the severity of papilledema on an independent set of mydriatic fundus photographs, achieving a comparable performance with that of independent neuro- ophthalmologists.

Classification of Evidence

This study provides Class II evidence that a DLS using mydriatic retinal fundus photographs accurately classiﬁed the severity of papilledema associated in patients with a diagnosis of increased intracranial pressure.

MORE ONLINE

Class of Evidence Criteria for rating therapeutic and diagnostic studies

NPub.org/coe

From the Singapore Eye Research Institute (C.V., R.P.N., Z.T., J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); Duke–NUS Medical School (R.P.N., J.L.L., S.S., S.T., T.Y.W., D.M.); Institute of High Performance Computing (X.X., Y.L.), Agency for Science, Technology and Research (A*STAR); Singapore National Eye Centre (J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); University of Berkeley (L.M.), CA; Departments of Ophthalmology and Neurology (N.J.N., V.B.), Emory University School of Medicine, Atlanta, GA; and Copenhagen University Hospital (D.M.), Denmark.

Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.

BONSAI Group coinvestigators are listed at http://links.lww.com/WNL/B432.

(2)

Papilledema, defined as optic nerve head swelling associated with any cause of intracranial hypertension, can result in permanent vision loss.^1,2Papilledema severity at presentation is the most important prognostic factor for subsequent visual outcomes.^3-9 Patients with severe papilledema may have progressive vision loss and visual field constriction due to retinal nervefiber loss, thus requiring closer monitoring and more invasive treatment, whereas those with mild papilledema and no optic atrophy usually have good visual outcomes.^5,8,9 However, the evaluation of papilledema severity, based on the 5-grade modified Frisén scale classification mainly used in clinical trials (with 1 being very mild papilledema and 5 very severe papilledema),¹⁰is difficult to apply and subject to high variability.^10-14Hence, neurologists, especially those not confident in performing ophthalmos- copy,¹⁵ usually rely on ophthalmologists to determine the severity of papilledema.^16,17Fundus photography is now in- creasingly used in various clinical settings for screening purposes,^18,19 and may be augmented with artificial intelligence deep learning techniques for automated image interpretation.^20,21Recently, the Brain and Optic Nerve Study with Artificial Intelligence (BONSAI) deep learning system (DLS)²² was shown to accurately discriminate papilledema from normal and other abnormal optic discs on fundus photographs, with a performance comparable to that of expert neuro-ophthalmologists.²³

The aim of the current study was to develop, train, and test a new DLS to automatically classify the severity of papilledema and to compare the performance of this DLS with the clas- siﬁcation performance of 3 neuro-ophthalmologists.

Methods

This study, performed by the BONSAI Consortium,²² included investigators and patients from 14 countries.

Our primary research questions, aiming to provide a Class II level of evidence, were the following: (1) Is a DLS capable of discriminating mild to moderate from severe papilledema on mydriatic fundus photographs? (2) Is the DLS’s performance in classifying the severity of papilledema on fundus photographs comparable to that of neuro-ophthalmologists?

Standard Protocol Approvals, Registrations, and Patient Consents

The study was approved by the Centralized Institutional Review Board of SingHealth, Singapore, and each contribut- ing institution for any experiments using human subjects, and

was conducted in accordance with the Declaration of Hel- sinki. Informed consent was exempted given the retrospective nature of the study and the use of de-identiﬁed ocular fundus photographs.

Inclusion/Exclusion Criteria

The study included unaltered, de-identified digital ocular fundus photographs obtained in patients with confirmed intracranial hypertension and papilledema from 19 neuroophthalmology centers participating in the BONSAI consortium.²²The fundus photographs, taken at variousfields of view (20°–45°) including the optic disc, were obtained after pupillary dilation, using 15 different cameras, mydriatic or nonmydriatic, depending on the center (table 1).

Experts from each participating center provided photographs of patients with confirmed papilledema (criteria previously published²²). Intracranial hypertension was confirmed in every patient by brain imaging (e.g., showing an intracranial mass or venous sinus thrombosis) or elevated CSF opening pressure and follow-up visits. Papilledema was diagnosed only if the optic disc swelling was related to confirmed raised intracranial pressure (ICP). Patients with a diagnosis of idiopathic intracranial hypertension met the modified Dandy criteria.²⁴

The fundus photographs were divided into 2 datasets representing a mix of consecutive and convenience samples. The training dataset, used to train the DLS to classify the severity of papilledema, was composed of all papilledema images previously included in the training cohort of ourﬁrst BONSAI study.²²The testing dataset was obtained by choosing randomly 222 images from 4 participating centers of the same study, independent from the training dataset, in order to test the performance of the DLS after training, and to test 3 independent neuro-ophthalmologists for comparison with the DLS.

Two experts (V.B., N.J.N.) independently reviewed all fundus photographs of the training (2,524 photographs) and testing (222 photographs) datasets, and classiﬁed papilledema severity according to a simple 2-grade classiﬁcation (see below).

Classiﬁcation by the experts was performed under standard conditions on a computer screen (LG-34WK650, 100%

brightness, 80% contrast) using semi-automated software developed by L.M. and R.P.N.²⁵; the severity scores were automatically included into an Excel spreadsheet. In the case of discordance between the 2 experts, the classiﬁcation was adjudicated by 2 additional neuro-ophthalmologists (D.M., C.V.), and a consensus was obtained for all images, used as reference standard. Only patients with active papilledema

Glossary

AUC = area under the receiver operating characteristic curve; BONSAI = Brain and Optic Nerve Study with Artiﬁcial Intelligence;CI= conﬁdence interval;DLS= deep learning system;GCC= ganglion cell complex;ICP= intracranial pressure;

IIH= idiopathic intracranial hypertension;OCT= optical coherence tomography.

(3)

were included, and optic discs with atrophic papilledema (defined as definite atrophy with no active swelling) were excluded from the study (394/2,524 [15.6%] photographs excluded in the training dataset and 8/222 [3.6%] in the testing dataset). Fundus photographs of insufficient quality were also excluded (27/2,524 [1.1%] in the training dataset, none in the testing dataset). A total of 2,103 and 214 fundus photographs were included in the training and testing datasets, respectively (figure 1).

Papilledema Severity Classification

We created a simple 2-grade papilledema severity classification (figure 2): (1) mild to moderate papilledema, corresponding to Frisén grades 1–3, defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc; (2) severe papilledema, that is, Frisén grades 4 and 5, defined as disc edema associated with any obscuration of major blood vessels (arteries and/or veins) on the disc. The presence of hemorrhages and exudates did not influence the classification, and optic discs with atrophic papilledema were excluded from the study.

Study Population Training Dataset

A total of 2,103 fundus photographs of 965 patients, 1,052 mild to moderate papilledema and 1,051 severe papilledema,

were included in the training dataset, collected from 16 participating centers of BONSAI (table 2).²²A total of 685 patients had a photograph taken in both eyes and 146 in 1 eye.

Testing Dataset

The testing dataset included 214 photographs (92 MP, 122 SP) from 111 patients (103 patients with both eyes imaged and 8 with 1 eye), randomly collected from 4 participating centers (Bangkok, Thailand; Freiburg, Germany; Tehran, Iran; Angers, France) of the BONSAI study (table 2).²² Deep Learning System

Deep learning is a technique of machine learning, which consists of multiple layers of convolutional neural networks with the capability to learn image features and classify images without using hand-crafted features.²⁰ DLSs need to be trained on large datasets, and the classiﬁcation performance is subsequently evaluated on part of the training dataset (validation) or on an independent external dataset (testing dataset).²⁶

In our study, the DLS was composed of a segmentation network and a classiﬁcation network. First, the optic disc was automatically located on the fundus photograph by the segmentation network (U-Net²⁷) as previously described for the BONSAI-DLS.²²The segmentation network is based on U- Table 1 Cameras Used in the Participating Centers

City, country Camera brand Model Characteristics

Angers, France Topcon TRC-NW6S Nonmydriatic

Atlanta, USA Topcon 50DX Mydriatic

Baltimore, USA Zeiss FF4 Mydriatic

Bordeaux, France Zeiss VISUCAM Nonmydriatic

Bangkok, Thailand Kowa WX3D Nonmydriatic

Bologna, Italy Topcon DRI OCT Triton Nonmydriatic

Coimbra, Portugal Topcon TRC-NW7SF Mark II Mydriatic and nonmydriatic

Chennai, India Zeiss FF450 Plus IR Mydriatic

Freiburg, Germany Zeiss SF 420 Mydriatic

Geneva, Switzerland Zeiss FF450 Plus Mydriatic

Grenoble, France Topcon/Canon TRC NW6S/CR2 Nonmydriatic

Hong Kong, China Topcon TRC 50DX Mydriatic

London, UK Topcon/Canon TRC 50DX/CR2 Mydriatic and nonmydriatic

Manila, Philippines Zeiss/Meditec VISUCAM 500/NMFA Nonmydriatic

Paris, France Canon CRDGI Nonmydriatic

Sydney, Australia Zeiss VISUCAM 500 Nonmydriatic

Syracuse, USA Topcon/Zeiss TRC NW8/TRC NW400/FF 450 Mydriatic and nonmydriatic

Singapore, Singapore Topcon TRC 50DX Mydriatic

Tehran, Iran Canon CR2 Nonmydriatic

(4)

Net,²⁷which was widely used for biomedical image segmentation tasks. The U-Net was trained to localize the optic disc region based on a total of 6,370 fundus images with masks annotated in pixel level. The trained U-Net was then applied to test the full fundus image to generate the optic disc region automatically, which was further used as the input to the classiﬁcation network.

Then, the classification network classified the optic disc into 1 of 2 classes: mild to moderate or severe papilledema (classification network, VGGNet²⁸). The classification network is based on the convolutional neural networks. At the last

convolutional layer of the VGGNet, 2 dense layers were added with a SoftMax layer to obtain the 2-class outputs. The classification network was initialized using weights pretrained on ImageNet²⁹ andfine-tuned in an end-to-end manner to achieve the optimal performance. The classification network was trained on 2,103 fundus photographs (training dataset) to automatically classify papilledema’s severity into the 2 classes defined as the reference standard (mild to moderate and severe papilledema). The network weights were updated itera- tively based on the difference signals between the outputs of DLS and the clinical reference standard on the severity levels using backpropagation algorithm.³⁰The trained segmentation Figure 1Flowchart of Inclusion and Exclusion of Fundus Photographs

Process of inclusion and exclusion of fundus photographs in the training and testing datasets.

Figure 2Papilledema Severity Classification

This figure represents the 2-grade papilledema severity classification system used in our study, separating mild to moderate papilledema (Fris´en grade 1 to 3) and severe papilledema (Fris´en grade 4 and 5). Mild to moderate papilledema was defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc and severe papilledema as disc edema with any obscuration of major blood vessels on the disc.

(5)

network and the classiﬁcation network were tested on another 214 images from the external testing dataset to get the prediction outcome. To report performance characteristics for this model, the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and speciﬁcity were calculated. Diagnosis was provided at eye level for each included image.

Testing of 3 Independent Neuro-

ophthalmologists for Comparison With the DLS In order to compare the performance of the DLS with the classiﬁcation performance of neuro-ophthalmologists, we tested 3 independent neuro-ophthalmologists (S.S., J.L.L., S.T.) who were asked to independently classify the papilledema severity into one of the 2 previously described classes, after a brief training session. The testing was performed on the same 214 images from the testing dataset used for the DLS.

For this purpose, images were presented to the 3 neuro- ophthalmologists on the same individual computer screen (LG-34WK650, 100% brightness, 80% contrast), using the semiautomated software described above.²⁵ The neuro- ophthalmologists were masked to patients’ clinical information and to the classification assigned by the other evaluators and by the DLS. For comparisons and statistical analysis purposes, we used a majority agreement grade, defined as the severity classification reported by at least 2 of the 3 participating neuro-ophthalmologists, for each image.

Statistical Analysis

The performance characteristics of the DLS and of the 3 neuro-ophthalmologists were evaluated by calculating the AUC, sensitivity, speciﬁcity, and accuracy in the external testing dataset.

A pairwise McNemar test with Bonferroni correction was used to compare the accuracies, sensitivities, and speciﬁcities between the DLS and each neuro-ophthalmologist and between the DLS and the majority agreement among the 3 neuro-ophthalmologists.

Cohen kappa agreement scores were used for comparison between the DLS and each of the 3 neuro-ophthalmologists and between the DLS and the majority agreement among the 3 neuro-ophthalmologists and Fleiss kappa score³¹was used for intergrader agreement analysis among the 3 neuro- ophthalmologists. Kappa agreement scores were interpreted according to previously published scale (0–0.20: no agreement, 0.21–0.39: minimal; 0.40–0.59: weak; 0.60–0.79:

moderate; 0.80–0.90: strong; >0.90: almost perfect).³² Apvalue less than 0.05 was considered statistically signiﬁcant.

Data Availability

Anonymized data will be shared by justiﬁed request from any qualiﬁed investigator.

Results

Patient and Image Characteristics

Theﬁnal training dataset included 965 patients (representing 1777 eyes and 2,103 images); 397 patients (731 eyes) had mild to moderate papilledema, 431 (772 eyes) had severe papilledema, and 137 (274 eyes) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Among those 965 patients, 822 (85%) had papilledema due to idiopathic intracranial hypertension (IIH) and 143 (15%) presented with secondary causes of intracranial hypertension such as cerebral venous sinus thrombosis, meningitis, or brain tumor. Patient demographics and image characteristics are described in table 3.

The testing dataset included 111 patients (representing 214 eyes or images), of whom 40 (77 eyes or images) had mild to moderate papilledema, 56 (107 eyes or images) had severe Table 2 List of Participating Centers and Number of

Photographs Included per Center

Participating center Number of photographs included Primary training dataset

Angers, France 286

Atlanta, USA 1,157

Baltimore, USA 136

Bologna, Italy 11

Bordeaux, France 26

Chennai, India 183

Coimbra, Portugal 17

Freiburg, Germany 10

Geneva, Switzerland 12

Grenoble, France 11

Hong Kong, China 4

London, UK 35

Manila, Philippines 18

Paris, France 86

Singapore 18

Sydney, Australia 65

Syracuse, USA 28

Total training dataset 2,103 External testing dataset

Angers, France 10

Bangkok, Thailand 86

Freiburg, Germany 35

Tehran, Iran 83

Total testing dataset 214

(6)

papilledema, and 15 (30 eyes or images) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Seventy-ﬁve patients (68%) had papilledema from IIH and the remaining from secondary causes.

Papilledema Severity Classification by the DLS and Neuro-ophthalmologists

In the testing dataset, the DLS successfully discriminated severe from mild to moderate papilledema with an AUC of 0.93 (95%

conﬁdence interval [CI] 0.89–0.96), an accuracy of 87.9% (95%

CI 82.7%–91.9%), a sensitivity of 91.8% (95% CI 86.9%–96.7%), and a specificity of 82.6% (95% CI 74.9%–90.4%) (figures 3 and 4). Among the 26 images misclassified by the DLS, 16 cases of mild to moderate papilledema were misclassified as severe; 14 of these 16 (87.5%) were photographs of papilledema Frisén grade 3, and 2 of them Frisén grade 2; 10 images of severe papilledema were misclassified as mild to moderate, all Frisén grade 4 with minimal vessel obscuration on the disc or with hemorrhages and cotton-wool spots on the disc (eFigures 1 and 2, data available from Dryad, doi.org/10.5061/dryad.66t1g1k1x). The ability of the 3 independent neuro-ophthalmologists to discriminate severe from mild to moderate papilledema was comparable to the DLS, with an accuracy of the majority agreement among neuro- ophthalmologists of 84.1% (95% CI 78.5%–88.7%,p= 0.19), a sensitivity of 91.8% (95% CI 86.9%–96.7%,p= 1), and a specificity of 73.9% (95% CI 64.9%–82.9%,p= 0.09) (figures 3 and 4).

Agreement scores between the DLS and neuro- ophthalmologist 1, 2, and 3 were 0.72 (95% CI 0.67–0.76), 0.43 (95% CI 0.37–0.49), and 0.60 (95% CI 0.55–0.65), respectively, and between the DLS and the majority agreement of neuro-ophthalmologists, 0.62 (95% CI 0.57–0.68). A weak intergrader agreement of 0.54 (95% CI 0.47–0.62) was found among the 3 neuro-ophthalmologists. Disagreement among neuro-ophthalmologists was observed for 67 photographs, among them 46 (68.6%) photographs of moderate papilledema with a Fris´en grade of 3, the other 21 photographs of severe papilledema with a Fris´en grade of 4.

Discussion

In this study, a DLS trained on 2,103 ocular fundus photographs to classify the severity of papilledema from intracranial hypertension discriminated between mild to moderate and

severe papilledema on an independent testing dataset of 214 fundus photographs, with a comparable performance to that of 3 neuro-ophthalmologists.

Papilledema^33,34 is the only objective clinical sign of intracranial hypertension.^35,36 Because the degree of papilledema at presentation is a reliable indicator of subsequent visual outcomes from secondary optic atrophy,^3-9the severity of papilledema influences the management and the frequency of visual monitoring during follow-up of patients with elevated ICP.^37,38Our simple binary classification of papilledema severity aimed at identification of patients with lower risk (mild to moderate papilledema) or higher risk (severe Table 3Demographics and Image Characteristics in the Training and Testing Datasets

Characteristics Training dataset Testing dataset

Number of patients (images) 965 (2,103) 111 (214)

Age, y, mean (SD) 31.7 (12.8) 32.0 (15.4)

Female, n (%) 715 (74.9) 77 (69.4)

Images with mild to moderate papilledema, n (%) 1,052 (50.0) 92 (43.0)

Images with severe papilledema, n (%) 1,051 (50.0) 122 (57.0)

Figure 3Performance (Area Under the Receiver Operating Characteristic [ROC] Curve) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists in Classifying Papilledema Severity

ROC curve representing the performance of the DLS, 3 neuro-ophthalmology graders, and the majority agreement among the 3 neuro-ophthalmologists (obtained when at least 2 graders agreed on the grade) in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). The optimal performance of the DLS (blue dot) was calculated with the Youden index for best cutoff between sensitivity and specificity.

(7)

papilledema) of visual loss.^5,6,8,9Hence, a patient with mild to moderate papilledema from IIH might be managed with weight loss and oral medications as an outpatient, whereas a patient with severe papilledema should have closer follow-up, sometimes inpatient, and beneﬁt from timely surgical in- tervention, especially in those cases with visual loss.³⁸A DLS, capable of automatically classifying the severity of papilledema on fundus photographs, could assist nonexperts with a more accurate prognosis, treatment strategy, and eﬀect of treatment, as a diagnostic or prognostic tool along with the clinical examination and additional tests such as computer-assisted perimetry or optical coherence tomography (OCT).

It could be argued that ophthalmology consultation would obviate the need for automated fundus photographic interpretation by a DLS. However, despite the simplification of the papilledema severity classification in our study, the intergrader agreement among the 3 neuro-ophthalmologists was relatively weak (0.54), confirming the variability of human evaluation of papilledema severity, particularly for moderate papilledema (Frisén grade 3).¹³Nevertheless, our DLS could accurately distinguish between mild to moderate and severe papilledema on fundus photographs. The performance of the DLS was similar to that of 3 independent neuro- ophthalmologists with a comparable accuracy and sensitivity, and with a nonsignificantly higher specificity (82.6% for the DLS vs 73.9% for the majority agreement among the 3 neuro- ophthalmologists,p= 0.09). The agreement scores between the DLS and the majority agreement among the 3 neuro- ophthalmologists (κ= 0.62) was higher than the intergrader agreement among the 3 neuro-ophthalmologists (κ= 0.54).

Moderate intergrader agreement scores were also observed in

studies involving glaucoma or retinal diseases, for example, for glaucomatous damage assessment of the optic disc under stereoscopic conditions by 6 glaucoma experts³⁹(κ= 0.50), or for plus disease retinopathy of prematurity diagnosis by 9 experts⁴⁰(κ= 0.59 to 0.92). Similar results were previously described with a machine learning technique in a small study,⁴¹which showed that a computer-aided image analyses, used to analyze features of papilledema on fundus photographs, could automatically grade papilledema with a sub- stantial agreement with one expert grading (κ= 0.71).

Our results might have implications for the management of raised ICP and papilledema in the future. However, further prospective validation studies are needed, at best in non- ophthalmologic clinical settings (i.e., neurology clinics, neurosurgery clinics, or emergency departments), and ideally using nonmydriatic digital cameras.^15,42If those studies con- ﬁrm its applicability, a DLS, connected to a camera on site²¹or remotely, could be used for the assessment of papilledema severity.

Our study has inherent limitations. As it was a retrospective data collection, visual function was not available and objective data such as OCT retinal nervefiber thickness or macular ganglion cell complex (GCC) analysis were not systematically collected. Moreover, atrophic papilledema was excluded from this study, as we only trained the DLS to grade the severity of papilledema, not to identify associated atrophy, a difficult assessment even for the most experienced ophthalmologists.¹³Hence, some of the patients with longstanding raised ICP who had both atrophy and residual papilledema atfirst presentation were excluded. In a future project, GCC-OCT, Figure 4Performance (Accuracy, Sensitivity, Specificity) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists

in Classifying Papilledema Severity

Comparison of the performance (accuracy, sensitivity, specificity) of the DLS to (A) the detailed performance of each grader and (B) the performance of the majority agreement among the 3 graders in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). Statistical significance of DLS vs human graders *p< 0.05; **p< 0.01; ***p< 0.001. Data are represented as mean (95% confidence interval [CI]). 95% CIs were calculated using the asymptotic method.

(8)

which has proven useful in the detection of optic atrophy associated with papilledema,^43,44could also be incorporated into the DLS strategy, as already done in some glaucoma studies.⁴⁵We used a simplified classification of papilledema into 2 grades, instead of the 5 grades used in the Frisén scale.

The Frisén scale is notoriously difficult to use, especially when attempting to differentiate grades 3 and 4.^13,14The few cases of misclassification by the DLS were mainly observed for moderate papilledema (Frisén grade 3) or for Frisén grade 4 papilledema with mild vessel obscuration on the optic disc or associated with hemorrhages or cotton-wool spots. Those images were challenging to classify by the experts as well. The lack of reproducibility and the inability of the Frisén scale to discern optic disc changes over time was demonstrated by Sinclair et al.,¹³who proposed an alterna- tive ranking of optic disc appearance related to papilledema severity that incorporates the development of secondary optic atrophy, with an improvement of complete agreement among reviewers from 1.6% of photographs with Frisén scale to 44.6% with their optic disc ranking scheme. In the prospective Intracranial Hypertension Treatment Trial, experts initially agreed on Frisén scale classification for only 42% of images, and intragrader agreement rates varied from 55% to 73%.¹⁴Our simplified binary classification was designed to signal the presence of severe papilledema, a finding that should influence the acute management of these patients by nonophthalmologists.

We developed, trained, and tested a DLS that accurately discriminated mild to moderate from severe papilledema on mydriatic fundus photographs. In a subsequent comparison, the DLS had a performance comparable to 3 independent neuro-ophthalmologists. The automated recognition of severe papilledema by a DLS could be helpful in neurology, neurosurgery, and emergency settings for the management of patients with raised ICP. Additional prospective studies are needed to conﬁrm the applicability of this DLS in real-life clinical settings.

Study Funding

Singapore National Medical Research Council (Clinician Scientist Individual Research grant CIRG18Nov-0013), The Duke-NUS Medical School, Ophthalmology and Visual Sci- ences Academic Clinical Program grant (05/FY2019/P2/06- A60) departmental grant, NIH/NEI core grant P30-EY06360 (Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA), NIH/NINDS (RO1NSO89694).

Disclosure

C. Vasseneix, R.P. Najjar, X. Xu, Z. Tang, J.L. Loo, S. Singhal, S. Tow, L. Milea, D. Ting, Y. Liu, and T.Y. Wong report no disclosures relevant to the manuscript. N.J. Newman is consultant for GenSight Biologics and Neurophoenix, Santhera Pharmaceuticals/Chiesi, and Stealth BioTherapeutics. V.

Biousse is consultant for GenSight Biologics and Neuro- phoenix. D. Milea reports no disclosures relevant to the manuscript. Go to Neurology.org/N for full disclosures.

Publication History

Received by Neurology November 9, 2020. Accepted in ﬁnal form April 19, 2021.

Appendix 1Authors

Name Location Contribution

Caroline Vasseneix, MD

Singapore Eye Research Institute

Designed and conceptualized study, major role in the acquisition of data, analyzed and interpreted the data, drafted the manuscript for intellectual content Raymond P.

Najjar, PhD

Singapore Eye Research Institute; Duke-NUS Medical School, Singapore

Designed and conceptualized study, analyzed and interpreted the data, revised the manuscript for intellectual content

Xinxing Xu, PhD

Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore

Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Zhiqun Tan, PhD

Singapore Eye Research Institute

Interpreted the data

Jing Liang Loo, MBBS, MMed, FRCS(Ed)

Singapore National Eye Centre

Major role in the acquisition of data

Shweta Singhal, MBBS, PhD

Singapore National Eye Centre; Singapore Eye Research Institute

Sharon Tow, MBBS, FRCS(Ed)

Singapore National Eye Centre; Duke-NUS Medical School

Leonard Milea UC Berkeley, CA Design and conceptualized study, major role in the acquisition of data Daniel Shu

Wei Ting, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute

Design and conceptualized study

Yong Liu, PhD Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore

Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Tien Y. Wong, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School

Revised the manuscript for intellectual content

Nancy J.

Newman, MD

Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content Valerie

Biousse, MD

Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

(9)

References

1. Friedman DI. The pseudotumor cerebri syndrome.Neurol Clin.2014;32(2):363-396.

2. Corbett JJ. Visual loss in pseudotumor cerebri: follow-up of 57 patients fromﬁve to 41 years and a proﬁle of 14 patients with permanent severe visual loss.Arch Neurol.1982;

39(8):461-474.

3. Orcutt JC, Page NGR, Sanders MD. Factors aﬀecting visual loss in benign intracranial hypertension.Ophthalmology.1984;91(11):1303-1312.

4. Liu KC, Bhatti MT, Chen JJ, et al. Presentation and progression of papilledema in cerebral venous sinus thrombosis.Am J Ophthalmol.2020;213:1-8.

5. Chen JJ, Thurtell MJ, Longmuir RA, et al. Causes and prognosis of visual acuity loss at the time of initial presentation in idiopathic intracranial hypertension.Invest Oph- thalmol Vis Sci.2015;56(6):3850-3859.

6. Gospe SM, Bhatti MT, El-Dairi MA. Anatomic and visual function outcomes in paediatric idiopathic intracranial hypertension. Br J Ophthalmol. 2016;100(4):

505-509.

7. Takkar A, Goyal MK, Bansal R, Lal V. Clinical and neuro-ophthalmologic predictors of visual outcome in idiopathic intracranial hypertension.Neuroophthalmology.2018;

42(4):201-208.

8. Wall M, Falardeau J, Fletcher WA, et al. Risk factors for poor visual outcome in patients with idiopathic intracranial hypertension.Neurology.2015;85(9):799-805.

9. Micieli JA, Bruce BB, Vasseneix C, et al. Optic nerve appearance as a predictor of visual outcome in patients with idiopathic intracranial hypertension.Br J Ophthalmol.

2019;103(10):1429-1435.

10. Scott CJ. Diagnosis and grading of papilledema in patients with raised intracranial pressure using optical coherence tomography vs clinical expert assessment using a clinical staging scale.Arch Ophthalmol.2010;128(6):705-711.

11. Frisen L. Swelling of the optic nerve head: a staging scheme.J Neurol Neurosurg Psychiatry.1982;45(1):13-18.

12. Kardon R. Optical coherence tomography in papilledema: what am I missing?

J Neuroophthalmol.2014;34(suppl S10-7):S10-S17.

13. Sinclair AJ, Burdon MA, Nightingale PG, et al. Rating papilloedema: an evaluation of the Fris´en classiﬁcation in idiopathic intracranial hypertension.J Neurol.2012;259(7):

1406-1412.

14. Fischer WS, Wall M, McDermott MP, Kupersmith MJ, Feldon SE. Photographic reading center of the idiopathic intracranial hypertension treatment trial (IIHTT):

methods and baseline results.Invest Ophthalmol Vis Sci.2015;56(5):3292-3303.

15. Bruce BB, Lamirel C, Wright DW, et al. Nonmydriatic ocular fundus photography in the emergency department.N Engl J Med.2011;364(4):387-389.

16. Friesner D, Rosenman R, Lobb BM, Tanne E. Idiopathic intracranial hypertension in the USA: the role of obesity in establishing prevalence and healthcare costs.Obes Rev.

2011;12(5):e372-e380.

17. Mollan SP, Aguiar M, Evison F, Frew E, Sinclair AJ. The expanding burden of idiopathic intracranial hypertension.Eye.2019;33(3):478-485.

18. Bruce BB, Thulasi P, Fraser CL, et al. Diagnostic accuracy and use of nonmydriatic ocular fundus photography by emergency physicians: phase II of the FOTO-ED study.Ann Emerg Med.2013;62(1):28-33.

19. Ivan Y, Ramgopal S, Cardenas-Villa M, et al. Feasibility of the digital retinography system camera in the pediatric emergency department.Pediatr Emerg Care.2018;

34(7):488-491.

20. Ting DSW, Pasquale LR, Peng L, et al. Artiﬁcial intelligence and deep learning in ophthalmology.Br J Ophthalmol.2019;103(2):167-175.

21. AbràmoffMD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI- based diagnostic system for detection of diabetic retinopathy in primary care offices.

NPJ Digit Med.2018;1:39.

22. Milea D, Najjar RP, Zhubo J, et al. Artiﬁcial intelligence to detect papilledema from ocular fundus photographs.N Engl J Med.2020;382(18):1687-1695.

23. Biousse V, Newman NJ, Najjar RP, et al. Optic disc classiﬁcation by deep learning versus expert neuro‐ophthalmologists.Ann Neurol.2020;88(4):785-795.

24. Friedman DI, Liu GT, Digre KB. Revised diagnostic criteria for the pseudotumor cerebri syndrome in adults and children.Neurology.2013;81(13):1159-1165.

25. Milea L, Najjar RP.Classif-Eye: A Semi-automated Image Classiﬁcation Application;

2020. GitHub repository. github.com/milealeonard/Classif-Eye/

26. Milea D, Singhal S, Najjar RP. Artiﬁcial intelligence for detection of optic disc ab- normalities.Curr Opin Neurol.2020;33(1):106-110.

27. Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry.Nat Methods.2019;16(1):67-70.

28. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Presented at the International Conference on Learning Representations (ICLR); April 10, 2015; San Diego, CA.

29. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Presented at the Conference on Computer Vision and Pattern Recognition (CVPR); June 20–25, 2009; Miami, FL.

30. LeCun Y, Bengio Y, Hinton G. Deep learning.Nature.2015;521(7553):436-444.

31. Vanbelle S. Asymptotic variability of (multilevel) multirater kappa coeﬃcients.Stat Methods Med Res.2019;28(10-11):3012-3026.

32. McHugh ML. Interrater reliability: the kappa statistic.Biochem Med.2012;22(3):

276-282.

33. Friedman DI, Quiros PA, Subramanian PS, et al. Headache in idiopathic intracranial hypertension:ﬁndings from the Idiopathic Intracranial Hypertension Treatment Trial.Headache.2017;57(8):1195-1205.

34. Fisayo A, Bruce BB, Newman NJ, Biousse V. Overdiagnosis of idiopathic intracranial hypertension.Neurology.2016;86(4):341-350.

35. Radojicic A, Vukovic-Cvetkovic V, Pekmezovic T, Trajkovic G, Zidverc-Trajkovic J, Jensen RH. Predictive role of presenting symptoms and clinicalﬁndings in idiopathic intracranial hypertension.J Neurol Sci.2019;399:89-93.

36. Crum OM, Kilgore KP, Sharma R, et al. Etiology of papilledema in patients in the eye clinic setting.JAMA Netw Open.2020;3(6):e206625.

37. Onyia CU, Ogunbameru IO, Dada OA, et al. Idiopathic intracranial hypertension:

proposal of a stratiﬁcation strategy for monitoring risk of disease progression.Clin Neurol Neurosurg.2019;179:35-41.

38. Mollan SP, Davies B, Silver NC, et al. Idiopathic intracranial hypertension: consensus guidelines on management.J Neurol Neurosurg Psychiatry.2018;89(10):1088-1100.

39. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma.Ophthalmology.1992;99(2):215-221.

40. Brown JM, Campbell JP, Beers A, et al; Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks.JAMA Oph- thalmol.2018;136(7):803-810.

41. Echegaray S, Zamora G, Yu H, Luo W, Soliz P, Kardon R. Automated analysis of optic nerve images for detection and staging of papilledema.Invest Ophthalmol Vis Sci.2011;

52(10):7470-7478.

42. Irani NK, Bidot S, Peragallo JH, Esper GJ, Newman NJ, Biousse V. Feasibility of a nonmydriatic ocular fundus camera in an outpatient neurology clinic.Neurologist.

2020;25(2):19-23.

43. Moreno-Ajona D, McHugh JA, Hoﬀmann J. An update on imaging in idiopathic intracranial hypertension.Front Neurol.2020;11:453.

44. Athappilly G, Garc´ıa-Basterra I, Machado-Miller F, Hedges TR, Mendoza-Santies- teban C, Vuong L. Ganglion cell complex analysis as a potential indicator of early neuronal loss in idiopathic intracranial hypertension.Neuroophthalmology.2018;

43(1):10-17.

45. Medeiros FA, Jammal AA, Thompson AC. From machine to machine.Ophthalmology.

2019;126(4):513-521.

Appendix 1 (continued)

Name Location Contribution

Dan Milea, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School, Singapore; Copenhagen University Hospital, Denmark

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Appendix 2Coinvestigators

Coinvestigators are listed at links.lww.com/WNL/B432.