RESEARCH ARTICLE
Accuracy of a Deep Learning System for
Classi fi cation of Papilledema Severity on Ocular Fundus Photographs
Caroline Vasseneix, MD, Raymond P. Najjar, PhD, Xinxing Xu, PhD, Zhiqun Tang, PhD,
Jing Liang Loo, MBBS, MMed, FRCS(Ed), Shweta Singhal, MBBS, Sharon Tow, MBBS, Leonard Milea, Daniel Shu Wei Ting, MD, PhD, Yong Liu, PhD, Tien Y. Wong, MD, PhD, Nancy J. Newman, MD, Valerie Biousse, MD, and Dan Milea, MD, PhD, on behalf of the BONSAI Group
Neurology
®
2021;97:e369-e377. doi:10.1212/WNL.0000000000012226Correspondence Dr. D. Milea
Abstract
Objective
To evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure on standard retinal fundus photographs.
Methods
A DLS was trained to automatically classify papilledema severity in 965 patients (2,103 myd- riatic fundus photographs), representing a multiethnic cohort of patients with confirmed ele- vated intracranial pressure. Training was performed on 1,052 photographs with mild/moderate papilledema (MP) and 1,051 photographs with severe papilledema (SP) classified by a panel of experts. The performance of the DLS and that of 3 independent neuro-ophthalmologists were tested in 111 patients (214 photographs, 92 with MP and 122 with SP) by calculating the area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity.
Kappa agreement scores between the DLS and each of the 3 graders and among the 3 graders were calculated.
Results
The DLS successfully discriminated between photographs of MP and SP, with an AUC of 0.93 (95% confidence interval [CI] 0.89–0.96) and an accuracy, sensitivity, and specificity of 87.9%, 91.8%, and 86.2%, respectively. This performance was comparable with that of the 3 neuro- ophthalmologists (84.1%, 91.8%, and 73.9%,p = 0.19, p = 1, p= 0.09, respectively). Mis- classification by the DLS was mainly observed for moderate papilledema (Fris´en grade 3).
Agreement scores between the DLS and the neuro-ophthalmologists’evaluation was 0.62 (95%
CI 0.57–0.68), whereas the intergrader agreement among the 3 neuro-ophthalmologists was 0.54 (95% CI 0.47–0.62).
Conclusions
Our DLS accurately classified the severity of papilledema on an independent set of mydriatic fundus photographs, achieving a comparable performance with that of independent neuro- ophthalmologists.
Classification of Evidence
This study provides Class II evidence that a DLS using mydriatic retinal fundus photographs accurately classified the severity of papilledema associated in patients with a diagnosis of increased intracranial pressure.
MORE ONLINE
Class of Evidence Criteria for rating therapeutic and diagnostic studies
NPub.org/coe
From the Singapore Eye Research Institute (C.V., R.P.N., Z.T., J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); Duke–NUS Medical School (R.P.N., J.L.L., S.S., S.T., T.Y.W., D.M.); Institute of High Performance Computing (X.X., Y.L.), Agency for Science, Technology and Research (A*STAR); Singapore National Eye Centre (J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); University of Berkeley (L.M.), CA; Departments of Ophthalmology and Neurology (N.J.N., V.B.), Emory University School of Medicine, Atlanta, GA; and Copenhagen University Hospital (D.M.), Denmark.
Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.
BONSAI Group coinvestigators are listed at http://links.lww.com/WNL/B432.
Papilledema, defined as optic nerve head swelling associated with any cause of intracranial hypertension, can result in permanent vision loss.1,2Papilledema severity at presentation is the most important prognostic factor for subsequent visual outcomes.3-9 Patients with severe papilledema may have progressive vision loss and visual field constriction due to retinal nervefiber loss, thus requiring closer monitoring and more invasive treatment, whereas those with mild papil- ledema and no optic atrophy usually have good visual outcomes.5,8,9 However, the evaluation of papilledema se- verity, based on the 5-grade modified Fris´en scale classifica- tion mainly used in clinical trials (with 1 being very mild papilledema and 5 very severe papilledema),10is difficult to apply and subject to high variability.10-14Hence, neurologists, especially those not confident in performing ophthalmos- copy,15 usually rely on ophthalmologists to determine the severity of papilledema.16,17Fundus photography is now in- creasingly used in various clinical settings for screening purposes,18,19 and may be augmented with artificial in- telligence deep learning techniques for automated image interpretation.20,21Recently, the Brain and Optic Nerve Study with Artificial Intelligence (BONSAI) deep learning system (DLS)22 was shown to accurately discriminate papilledema from normal and other abnormal optic discs on fundus photographs, with a performance comparable to that of expert neuro-ophthalmologists.23
The aim of the current study was to develop, train, and test a new DLS to automatically classify the severity of papilledema and to compare the performance of this DLS with the clas- sification performance of 3 neuro-ophthalmologists.
Methods
This study, performed by the BONSAI Consortium,22 in- cluded investigators and patients from 14 countries.
Our primary research questions, aiming to provide a Class II level of evidence, were the following: (1) Is a DLS capable of discriminating mild to moderate from severe papilledema on mydriatic fundus photographs? (2) Is the DLS’s performance in classifying the severity of papilledema on fundus photo- graphs comparable to that of neuro-ophthalmologists?
Standard Protocol Approvals, Registrations, and Patient Consents
The study was approved by the Centralized Institutional Review Board of SingHealth, Singapore, and each contribut- ing institution for any experiments using human subjects, and
was conducted in accordance with the Declaration of Hel- sinki. Informed consent was exempted given the retrospective nature of the study and the use of de-identified ocular fundus photographs.
Inclusion/Exclusion Criteria
The study included unaltered, de-identified digital ocular fundus photographs obtained in patients with confirmed intracranial hypertension and papilledema from 19 neuro- ophthalmology centers participating in the BONSAI con- sortium.22The fundus photographs, taken at variousfields of view (20°–45°) including the optic disc, were obtained after pupillary dilation, using 15 different cameras, mydriatic or nonmydriatic, depending on the center (table 1).
Experts from each participating center provided photographs of patients with confirmed papilledema (criteria previously pub- lished22). Intracranial hypertension was confirmed in every pa- tient by brain imaging (e.g., showing an intracranial mass or venous sinus thrombosis) or elevated CSF opening pressure and follow-up visits. Papilledema was diagnosed only if the optic disc swelling was related to confirmed raised intracranial pressure (ICP). Patients with a diagnosis of idiopathic intracranial hy- pertension met the modified Dandy criteria.24
The fundus photographs were divided into 2 datasets repre- senting a mix of consecutive and convenience samples. The training dataset, used to train the DLS to classify the severity of papilledema, was composed of all papilledema images previously included in the training cohort of ourfirst BONSAI study.22The testing dataset was obtained by choosing randomly 222 images from 4 participating centers of the same study, independent from the training dataset, in order to test the performance of the DLS after training, and to test 3 independent neuro-ophthalmologists for comparison with the DLS.
Two experts (V.B., N.J.N.) independently reviewed all fundus photographs of the training (2,524 photographs) and testing (222 photographs) datasets, and classified papilledema se- verity according to a simple 2-grade classification (see below).
Classification by the experts was performed under standard conditions on a computer screen (LG-34WK650, 100%
brightness, 80% contrast) using semi-automated software developed by L.M. and R.P.N.25; the severity scores were automatically included into an Excel spreadsheet. In the case of discordance between the 2 experts, the classification was adjudicated by 2 additional neuro-ophthalmologists (D.M., C.V.), and a consensus was obtained for all images, used as reference standard. Only patients with active papilledema
Glossary
AUC = area under the receiver operating characteristic curve; BONSAI = Brain and Optic Nerve Study with Artificial Intelligence;CI= confidence interval;DLS= deep learning system;GCC= ganglion cell complex;ICP= intracranial pressure;
IIH= idiopathic intracranial hypertension;OCT= optical coherence tomography.
were included, and optic discs with atrophic papilledema (defined as definite atrophy with no active swelling) were excluded from the study (394/2,524 [15.6%] photographs excluded in the training dataset and 8/222 [3.6%] in the testing dataset). Fundus photographs of insufficient quality were also excluded (27/2,524 [1.1%] in the training dataset, none in the testing dataset). A total of 2,103 and 214 fundus photographs were included in the training and testing data- sets, respectively (figure 1).
Papilledema Severity Classification
We created a simple 2-grade papilledema severity classification (figure 2): (1) mild to moderate papilledema, corresponding to Fris´en grades 1–3, defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc; (2) severe papilledema, that is, Fris´en grades 4 and 5, defined as disc edema associated with any obscuration of major blood vessels (arteries and/or veins) on the disc. The presence of hemorrhages and exudates did not influence the classification, and optic discs with atrophic papilledema were excluded from the study.
Study Population Training Dataset
A total of 2,103 fundus photographs of 965 patients, 1,052 mild to moderate papilledema and 1,051 severe papilledema,
were included in the training dataset, collected from 16 par- ticipating centers of BONSAI (table 2).22A total of 685 pa- tients had a photograph taken in both eyes and 146 in 1 eye.
Testing Dataset
The testing dataset included 214 photographs (92 MP, 122 SP) from 111 patients (103 patients with both eyes imaged and 8 with 1 eye), randomly collected from 4 participating centers (Bangkok, Thailand; Freiburg, Germany; Tehran, Iran; Angers, France) of the BONSAI study (table 2).22 Deep Learning System
Deep learning is a technique of machine learning, which consists of multiple layers of convolutional neural networks with the capability to learn image features and classify images without using hand-crafted features.20 DLSs need to be trained on large datasets, and the classification performance is subsequently evaluated on part of the training dataset (vali- dation) or on an independent external dataset (testing dataset).26
In our study, the DLS was composed of a segmentation network and a classification network. First, the optic disc was automatically located on the fundus photograph by the seg- mentation network (U-Net27) as previously described for the BONSAI-DLS.22The segmentation network is based on U- Table 1 Cameras Used in the Participating Centers
City, country Camera brand Model Characteristics
Angers, France Topcon TRC-NW6S Nonmydriatic
Atlanta, USA Topcon 50DX Mydriatic
Baltimore, USA Zeiss FF4 Mydriatic
Bordeaux, France Zeiss VISUCAM Nonmydriatic
Bangkok, Thailand Kowa WX3D Nonmydriatic
Bologna, Italy Topcon DRI OCT Triton Nonmydriatic
Coimbra, Portugal Topcon TRC-NW7SF Mark II Mydriatic and nonmydriatic
Chennai, India Zeiss FF450 Plus IR Mydriatic
Freiburg, Germany Zeiss SF 420 Mydriatic
Geneva, Switzerland Zeiss FF450 Plus Mydriatic
Grenoble, France Topcon/Canon TRC NW6S/CR2 Nonmydriatic
Hong Kong, China Topcon TRC 50DX Mydriatic
London, UK Topcon/Canon TRC 50DX/CR2 Mydriatic and nonmydriatic
Manila, Philippines Zeiss/Meditec VISUCAM 500/NMFA Nonmydriatic
Paris, France Canon CRDGI Nonmydriatic
Sydney, Australia Zeiss VISUCAM 500 Nonmydriatic
Syracuse, USA Topcon/Zeiss TRC NW8/TRC NW400/FF 450 Mydriatic and nonmydriatic
Singapore, Singapore Topcon TRC 50DX Mydriatic
Tehran, Iran Canon CR2 Nonmydriatic
Net,27which was widely used for biomedical image segmen- tation tasks. The U-Net was trained to localize the optic disc region based on a total of 6,370 fundus images with masks annotated in pixel level. The trained U-Net was then applied to test the full fundus image to generate the optic disc region automatically, which was further used as the input to the classification network.
Then, the classification network classified the optic disc into 1 of 2 classes: mild to moderate or severe papilledema (classi- fication network, VGGNet28). The classification network is based on the convolutional neural networks. At the last
convolutional layer of the VGGNet, 2 dense layers were added with a SoftMax layer to obtain the 2-class outputs. The classification network was initialized using weights pretrained on ImageNet29 andfine-tuned in an end-to-end manner to achieve the optimal performance. The classification network was trained on 2,103 fundus photographs (training dataset) to automatically classify papilledema’s severity into the 2 classes defined as the reference standard (mild to moderate and se- vere papilledema). The network weights were updated itera- tively based on the difference signals between the outputs of DLS and the clinical reference standard on the severity levels using backpropagation algorithm.30The trained segmentation Figure 1Flowchart of Inclusion and Exclusion of Fundus Photographs
Process of inclusion and exclusion of fundus photographs in the training and testing datasets.
Figure 2Papilledema Severity Classification
This figure represents the 2-grade papilledema severity classification system used in our study, separating mild to moderate papilledema (Fris´en grade 1 to 3) and severe papilledema (Fris´en grade 4 and 5). Mild to moderate pap- illedema was defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc and se- vere papilledema as disc edema with any obscuration of major blood vessels on the disc.
network and the classification network were tested on another 214 images from the external testing dataset to get the prediction outcome. To report performance characteristics for this model, the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were calculated. Diagnosis was provided at eye level for each included image.
Testing of 3 Independent Neuro-
ophthalmologists for Comparison With the DLS In order to compare the performance of the DLS with the classification performance of neuro-ophthalmologists, we tested 3 independent neuro-ophthalmologists (S.S., J.L.L., S.T.) who were asked to independently classify the papil- ledema severity into one of the 2 previously described classes, after a brief training session. The testing was performed on the same 214 images from the testing dataset used for the DLS.
For this purpose, images were presented to the 3 neuro- ophthalmologists on the same individual computer screen (LG-34WK650, 100% brightness, 80% contrast), using the semiautomated software described above.25 The neuro- ophthalmologists were masked to patients’ clinical in- formation and to the classification assigned by the other evaluators and by the DLS. For comparisons and statistical analysis purposes, we used a majority agreement grade, de- fined as the severity classification reported by at least 2 of the 3 participating neuro-ophthalmologists, for each image.
Statistical Analysis
The performance characteristics of the DLS and of the 3 neuro-ophthalmologists were evaluated by calculating the AUC, sensitivity, specificity, and accuracy in the external testing dataset.
A pairwise McNemar test with Bonferroni correction was used to compare the accuracies, sensitivities, and specificities between the DLS and each neuro-ophthalmologist and be- tween the DLS and the majority agreement among the 3 neuro-ophthalmologists.
Cohen kappa agreement scores were used for comparison between the DLS and each of the 3 neuro-ophthalmologists and between the DLS and the majority agreement among the 3 neuro-ophthalmologists and Fleiss kappa score31was used for intergrader agreement analysis among the 3 neuro- ophthalmologists. Kappa agreement scores were interpreted according to previously published scale (0–0.20: no agree- ment, 0.21–0.39: minimal; 0.40–0.59: weak; 0.60–0.79:
moderate; 0.80–0.90: strong; >0.90: almost perfect).32 Apvalue less than 0.05 was considered statistically significant.
Data Availability
Anonymized data will be shared by justified request from any qualified investigator.
Results
Patient and Image Characteristics
Thefinal training dataset included 965 patients (representing 1777 eyes and 2,103 images); 397 patients (731 eyes) had mild to moderate papilledema, 431 (772 eyes) had severe papil- ledema, and 137 (274 eyes) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Among those 965 patients, 822 (85%) had papilledema due to idiopathic in- tracranial hypertension (IIH) and 143 (15%) presented with secondary causes of intracranial hypertension such as cerebral venous sinus thrombosis, meningitis, or brain tumor. Patient demographics and image characteristics are described in table 3.
The testing dataset included 111 patients (representing 214 eyes or images), of whom 40 (77 eyes or images) had mild to moderate papilledema, 56 (107 eyes or images) had severe Table 2 List of Participating Centers and Number of
Photographs Included per Center
Participating center Number of photographs included Primary training dataset
Angers, France 286
Atlanta, USA 1,157
Baltimore, USA 136
Bologna, Italy 11
Bordeaux, France 26
Chennai, India 183
Coimbra, Portugal 17
Freiburg, Germany 10
Geneva, Switzerland 12
Grenoble, France 11
Hong Kong, China 4
London, UK 35
Manila, Philippines 18
Paris, France 86
Singapore 18
Sydney, Australia 65
Syracuse, USA 28
Total training dataset 2,103 External testing dataset
Angers, France 10
Bangkok, Thailand 86
Freiburg, Germany 35
Tehran, Iran 83
Total testing dataset 214
papilledema, and 15 (30 eyes or images) had mild to mod- erate papilledema in one eye and severe papilledema in the other eye. Seventy-five patients (68%) had papilledema from IIH and the remaining from secondary causes.
Papilledema Severity Classification by the DLS and Neuro-ophthalmologists
In the testing dataset, the DLS successfully discriminated severe from mild to moderate papilledema with an AUC of 0.93 (95%
confidence interval [CI] 0.89–0.96), an accuracy of 87.9% (95%
CI 82.7%–91.9%), a sensitivity of 91.8% (95% CI 86.9%–96.7%), and a specificity of 82.6% (95% CI 74.9%–90.4%) (figures 3 and 4). Among the 26 images misclassified by the DLS, 16 cases of mild to moderate papilledema were misclassified as severe; 14 of these 16 (87.5%) were photographs of papilledema Fris´en grade 3, and 2 of them Fris´en grade 2; 10 images of severe papilledema were misclassified as mild to moderate, all Fris´en grade 4 with minimal vessel obscuration on the disc or with hemorrhages and cotton-wool spots on the disc (eFigures 1 and 2, data available from Dryad, doi.org/10.5061/dryad.66t1g1k1x). The ability of the 3 independent neuro-ophthalmologists to discriminate severe from mild to moderate papilledema was comparable to the DLS, with an accuracy of the majority agreement among neuro- ophthalmologists of 84.1% (95% CI 78.5%–88.7%,p= 0.19), a sensitivity of 91.8% (95% CI 86.9%–96.7%,p= 1), and a speci- ficity of 73.9% (95% CI 64.9%–82.9%,p= 0.09) (figures 3 and 4).
Agreement scores between the DLS and neuro- ophthalmologist 1, 2, and 3 were 0.72 (95% CI 0.67–0.76), 0.43 (95% CI 0.37–0.49), and 0.60 (95% CI 0.55–0.65), re- spectively, and between the DLS and the majority agreement of neuro-ophthalmologists, 0.62 (95% CI 0.57–0.68). A weak intergrader agreement of 0.54 (95% CI 0.47–0.62) was found among the 3 neuro-ophthalmologists. Disagreement among neuro-ophthalmologists was observed for 67 photographs, among them 46 (68.6%) photographs of moderate papil- ledema with a Fris´en grade of 3, the other 21 photographs of severe papilledema with a Fris´en grade of 4.
Discussion
In this study, a DLS trained on 2,103 ocular fundus photo- graphs to classify the severity of papilledema from intracranial hypertension discriminated between mild to moderate and
severe papilledema on an independent testing dataset of 214 fundus photographs, with a comparable performance to that of 3 neuro-ophthalmologists.
Papilledema33,34 is the only objective clinical sign of in- tracranial hypertension.35,36 Because the degree of papil- ledema at presentation is a reliable indicator of subsequent visual outcomes from secondary optic atrophy,3-9the severity of papilledema influences the management and the frequency of visual monitoring during follow-up of patients with ele- vated ICP.37,38Our simple binary classification of papilledema severity aimed at identification of patients with lower risk (mild to moderate papilledema) or higher risk (severe Table 3Demographics and Image Characteristics in the Training and Testing Datasets
Characteristics Training dataset Testing dataset
Number of patients (images) 965 (2,103) 111 (214)
Age, y, mean (SD) 31.7 (12.8) 32.0 (15.4)
Female, n (%) 715 (74.9) 77 (69.4)
Images with mild to moderate papilledema, n (%) 1,052 (50.0) 92 (43.0)
Images with severe papilledema, n (%) 1,051 (50.0) 122 (57.0)
Figure 3Performance (Area Under the Receiver Operating Characteristic [ROC] Curve) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists in Classifying Papilledema Severity
ROC curve representing the performance of the DLS, 3 neuro-ophthalmology graders, and the majority agreement among the 3 neuro-ophthalmologists (obtained when at least 2 graders agreed on the grade) in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). The optimal performance of the DLS (blue dot) was calculated with the Youden index for best cutoff between sensitivity and specificity.
papilledema) of visual loss.5,6,8,9Hence, a patient with mild to moderate papilledema from IIH might be managed with weight loss and oral medications as an outpatient, whereas a patient with severe papilledema should have closer follow-up, sometimes inpatient, and benefit from timely surgical in- tervention, especially in those cases with visual loss.38A DLS, capable of automatically classifying the severity of papilledema on fundus photographs, could assist nonexperts with a more accurate prognosis, treatment strategy, and effect of treat- ment, as a diagnostic or prognostic tool along with the clinical examination and additional tests such as computer-assisted perimetry or optical coherence tomography (OCT).
It could be argued that ophthalmology consultation would obviate the need for automated fundus photographic in- terpretation by a DLS. However, despite the simplification of the papilledema severity classification in our study, the intergrader agreement among the 3 neuro-ophthalmologists was relatively weak (0.54), confirming the variability of hu- man evaluation of papilledema severity, particularly for moderate papilledema (Fris´en grade 3).13Nevertheless, our DLS could accurately distinguish between mild to moderate and severe papilledema on fundus photographs. The perfor- mance of the DLS was similar to that of 3 independent neuro- ophthalmologists with a comparable accuracy and sensitivity, and with a nonsignificantly higher specificity (82.6% for the DLS vs 73.9% for the majority agreement among the 3 neuro- ophthalmologists,p= 0.09). The agreement scores between the DLS and the majority agreement among the 3 neuro- ophthalmologists (κ= 0.62) was higher than the intergrader agreement among the 3 neuro-ophthalmologists (κ= 0.54).
Moderate intergrader agreement scores were also observed in
studies involving glaucoma or retinal diseases, for example, for glaucomatous damage assessment of the optic disc under stereoscopic conditions by 6 glaucoma experts39(κ= 0.50), or for plus disease retinopathy of prematurity diagnosis by 9 experts40(κ= 0.59 to 0.92). Similar results were previously described with a machine learning technique in a small study,41which showed that a computer-aided image analyses, used to analyze features of papilledema on fundus photo- graphs, could automatically grade papilledema with a sub- stantial agreement with one expert grading (κ= 0.71).
Our results might have implications for the management of raised ICP and papilledema in the future. However, further prospective validation studies are needed, at best in non- ophthalmologic clinical settings (i.e., neurology clinics, neu- rosurgery clinics, or emergency departments), and ideally using nonmydriatic digital cameras.15,42If those studies con- firm its applicability, a DLS, connected to a camera on site21or remotely, could be used for the assessment of papilledema severity.
Our study has inherent limitations. As it was a retrospective data collection, visual function was not available and objective data such as OCT retinal nervefiber thickness or macular ganglion cell complex (GCC) analysis were not systematically collected. Moreover, atrophic papilledema was excluded from this study, as we only trained the DLS to grade the severity of papilledema, not to identify associated atrophy, a difficult assessment even for the most experienced ophthalmolo- gists.13Hence, some of the patients with longstanding raised ICP who had both atrophy and residual papilledema atfirst presentation were excluded. In a future project, GCC-OCT, Figure 4Performance (Accuracy, Sensitivity, Specificity) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists
in Classifying Papilledema Severity
Comparison of the performance (accuracy, sensitivity, specificity) of the DLS to (A) the detailed performance of each grader and (B) the performance of the majority agreement among the 3 graders in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). Statistical significance of DLS vs human graders *p< 0.05; **p< 0.01; ***p< 0.001. Data are represented as mean (95% confidence interval [CI]). 95% CIs were calculated using the asymptotic method.
which has proven useful in the detection of optic atrophy associated with papilledema,43,44could also be incorporated into the DLS strategy, as already done in some glaucoma studies.45We used a simplified classification of papilledema into 2 grades, instead of the 5 grades used in the Fris´en scale.
The Fris´en scale is notoriously difficult to use, especially when attempting to differentiate grades 3 and 4.13,14The few cases of misclassification by the DLS were mainly observed for moderate papilledema (Fris´en grade 3) or for Fris´en grade 4 papilledema with mild vessel obscuration on the optic disc or associated with hemorrhages or cotton-wool spots. Those images were challenging to classify by the ex- perts as well. The lack of reproducibility and the inability of the Fris´en scale to discern optic disc changes over time was demonstrated by Sinclair et al.,13who proposed an alterna- tive ranking of optic disc appearance related to papilledema severity that incorporates the development of secondary optic atrophy, with an improvement of complete agreement among reviewers from 1.6% of photographs with Fris´en scale to 44.6% with their optic disc ranking scheme. In the pro- spective Intracranial Hypertension Treatment Trial, experts initially agreed on Fris´en scale classification for only 42% of images, and intragrader agreement rates varied from 55% to 73%.14Our simplified binary classification was designed to signal the presence of severe papilledema, a finding that should influence the acute management of these patients by nonophthalmologists.
We developed, trained, and tested a DLS that accurately discriminated mild to moderate from severe papilledema on mydriatic fundus photographs. In a subsequent comparison, the DLS had a performance comparable to 3 independent neuro-ophthalmologists. The automated recognition of se- vere papilledema by a DLS could be helpful in neurology, neurosurgery, and emergency settings for the management of patients with raised ICP. Additional prospective studies are needed to confirm the applicability of this DLS in real-life clinical settings.
Study Funding
Singapore National Medical Research Council (Clinician Scientist Individual Research grant CIRG18Nov-0013), The Duke-NUS Medical School, Ophthalmology and Visual Sci- ences Academic Clinical Program grant (05/FY2019/P2/06- A60) departmental grant, NIH/NEI core grant P30-EY06360 (Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA), NIH/NINDS (RO1NSO89694).
Disclosure
C. Vasseneix, R.P. Najjar, X. Xu, Z. Tang, J.L. Loo, S. Singhal, S. Tow, L. Milea, D. Ting, Y. Liu, and T.Y. Wong report no disclosures relevant to the manuscript. N.J. Newman is con- sultant for GenSight Biologics and Neurophoenix, Santhera Pharmaceuticals/Chiesi, and Stealth BioTherapeutics. V.
Biousse is consultant for GenSight Biologics and Neuro- phoenix. D. Milea reports no disclosures relevant to the manuscript. Go to Neurology.org/N for full disclosures.
Publication History
Received by Neurology November 9, 2020. Accepted in final form April 19, 2021.
Appendix 1Authors
Name Location Contribution
Caroline Vasseneix, MD
Singapore Eye Research Institute
Designed and conceptualized study, major role in the acquisition of data, analyzed and interpreted the data, drafted the manuscript for intellectual content Raymond P.
Najjar, PhD
Singapore Eye Research Institute; Duke-NUS Medical School, Singapore
Designed and conceptualized study, analyzed and interpreted the data, revised the manuscript for intellectual content
Xinxing Xu, PhD
Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore
Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content
Zhiqun Tan, PhD
Singapore Eye Research Institute
Interpreted the data
Jing Liang Loo, MBBS, MMed, FRCS(Ed)
Singapore National Eye Centre
Major role in the acquisition of data
Shweta Singhal, MBBS, PhD
Singapore National Eye Centre; Singapore Eye Research Institute
Major role in the acquisition of data
Sharon Tow, MBBS, FRCS(Ed)
Singapore National Eye Centre; Duke-NUS Medical School
Major role in the acquisition of data
Leonard Milea UC Berkeley, CA Design and conceptualized study, major role in the acquisition of data Daniel Shu
Wei Ting, MD, PhD
Singapore National Eye Centre; Singapore Eye Research Institute
Design and conceptualized study
Yong Liu, PhD Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore
Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content
Tien Y. Wong, MD, PhD
Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School
Revised the manuscript for intellectual content
Nancy J.
Newman, MD
Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA
Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content Valerie
Biousse, MD
Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA
Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content
References
1. Friedman DI. The pseudotumor cerebri syndrome.Neurol Clin.2014;32(2):363-396.
2. Corbett JJ. Visual loss in pseudotumor cerebri: follow-up of 57 patients fromfive to 41 years and a profile of 14 patients with permanent severe visual loss.Arch Neurol.1982;
39(8):461-474.
3. Orcutt JC, Page NGR, Sanders MD. Factors affecting visual loss in benign intracranial hypertension.Ophthalmology.1984;91(11):1303-1312.
4. Liu KC, Bhatti MT, Chen JJ, et al. Presentation and progression of papilledema in cerebral venous sinus thrombosis.Am J Ophthalmol.2020;213:1-8.
5. Chen JJ, Thurtell MJ, Longmuir RA, et al. Causes and prognosis of visual acuity loss at the time of initial presentation in idiopathic intracranial hypertension.Invest Oph- thalmol Vis Sci.2015;56(6):3850-3859.
6. Gospe SM, Bhatti MT, El-Dairi MA. Anatomic and visual function outcomes in paediatric idiopathic intracranial hypertension. Br J Ophthalmol. 2016;100(4):
505-509.
7. Takkar A, Goyal MK, Bansal R, Lal V. Clinical and neuro-ophthalmologic predictors of visual outcome in idiopathic intracranial hypertension.Neuroophthalmology.2018;
42(4):201-208.
8. Wall M, Falardeau J, Fletcher WA, et al. Risk factors for poor visual outcome in patients with idiopathic intracranial hypertension.Neurology.2015;85(9):799-805.
9. Micieli JA, Bruce BB, Vasseneix C, et al. Optic nerve appearance as a predictor of visual outcome in patients with idiopathic intracranial hypertension.Br J Ophthalmol.
2019;103(10):1429-1435.
10. Scott CJ. Diagnosis and grading of papilledema in patients with raised intracranial pressure using optical coherence tomography vs clinical expert assessment using a clinical staging scale.Arch Ophthalmol.2010;128(6):705-711.
11. Frisen L. Swelling of the optic nerve head: a staging scheme.J Neurol Neurosurg Psychiatry.1982;45(1):13-18.
12. Kardon R. Optical coherence tomography in papilledema: what am I missing?
J Neuroophthalmol.2014;34(suppl S10-7):S10-S17.
13. Sinclair AJ, Burdon MA, Nightingale PG, et al. Rating papilloedema: an evaluation of the Fris´en classification in idiopathic intracranial hypertension.J Neurol.2012;259(7):
1406-1412.
14. Fischer WS, Wall M, McDermott MP, Kupersmith MJ, Feldon SE. Photographic reading center of the idiopathic intracranial hypertension treatment trial (IIHTT):
methods and baseline results.Invest Ophthalmol Vis Sci.2015;56(5):3292-3303.
15. Bruce BB, Lamirel C, Wright DW, et al. Nonmydriatic ocular fundus photography in the emergency department.N Engl J Med.2011;364(4):387-389.
16. Friesner D, Rosenman R, Lobb BM, Tanne E. Idiopathic intracranial hypertension in the USA: the role of obesity in establishing prevalence and healthcare costs.Obes Rev.
2011;12(5):e372-e380.
17. Mollan SP, Aguiar M, Evison F, Frew E, Sinclair AJ. The expanding burden of idiopathic intracranial hypertension.Eye.2019;33(3):478-485.
18. Bruce BB, Thulasi P, Fraser CL, et al. Diagnostic accuracy and use of nonmydriatic ocular fundus photography by emergency physicians: phase II of the FOTO-ED study.Ann Emerg Med.2013;62(1):28-33.
19. Ivan Y, Ramgopal S, Cardenas-Villa M, et al. Feasibility of the digital retinography system camera in the pediatric emergency department.Pediatr Emerg Care.2018;
34(7):488-491.
20. Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology.Br J Ophthalmol.2019;103(2):167-175.
21. Abr`amoffMD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI- based diagnostic system for detection of diabetic retinopathy in primary care offices.
NPJ Digit Med.2018;1:39.
22. Milea D, Najjar RP, Zhubo J, et al. Artificial intelligence to detect papilledema from ocular fundus photographs.N Engl J Med.2020;382(18):1687-1695.
23. Biousse V, Newman NJ, Najjar RP, et al. Optic disc classification by deep learning versus expert neuro‐ophthalmologists.Ann Neurol.2020;88(4):785-795.
24. Friedman DI, Liu GT, Digre KB. Revised diagnostic criteria for the pseudotumor cerebri syndrome in adults and children.Neurology.2013;81(13):1159-1165.
25. Milea L, Najjar RP.Classif-Eye: A Semi-automated Image Classification Application;
2020. GitHub repository. github.com/milealeonard/Classif-Eye/
26. Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc ab- normalities.Curr Opin Neurol.2020;33(1):106-110.
27. Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry.Nat Methods.2019;16(1):67-70.
28. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Presented at the International Conference on Learning Representations (ICLR); April 10, 2015; San Diego, CA.
29. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Presented at the Conference on Computer Vision and Pattern Recognition (CVPR); June 20–25, 2009; Miami, FL.
30. LeCun Y, Bengio Y, Hinton G. Deep learning.Nature.2015;521(7553):436-444.
31. Vanbelle S. Asymptotic variability of (multilevel) multirater kappa coefficients.Stat Methods Med Res.2019;28(10-11):3012-3026.
32. McHugh ML. Interrater reliability: the kappa statistic.Biochem Med.2012;22(3):
276-282.
33. Friedman DI, Quiros PA, Subramanian PS, et al. Headache in idiopathic intracranial hypertension:findings from the Idiopathic Intracranial Hypertension Treatment Trial.Headache.2017;57(8):1195-1205.
34. Fisayo A, Bruce BB, Newman NJ, Biousse V. Overdiagnosis of idiopathic intracranial hypertension.Neurology.2016;86(4):341-350.
35. Radojicic A, Vukovic-Cvetkovic V, Pekmezovic T, Trajkovic G, Zidverc-Trajkovic J, Jensen RH. Predictive role of presenting symptoms and clinicalfindings in idiopathic intracranial hypertension.J Neurol Sci.2019;399:89-93.
36. Crum OM, Kilgore KP, Sharma R, et al. Etiology of papilledema in patients in the eye clinic setting.JAMA Netw Open.2020;3(6):e206625.
37. Onyia CU, Ogunbameru IO, Dada OA, et al. Idiopathic intracranial hypertension:
proposal of a stratification strategy for monitoring risk of disease progression.Clin Neurol Neurosurg.2019;179:35-41.
38. Mollan SP, Davies B, Silver NC, et al. Idiopathic intracranial hypertension: consensus guidelines on management.J Neurol Neurosurg Psychiatry.2018;89(10):1088-1100.
39. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma.Ophthalmology.1992;99(2):215-221.
40. Brown JM, Campbell JP, Beers A, et al; Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks.JAMA Oph- thalmol.2018;136(7):803-810.
41. Echegaray S, Zamora G, Yu H, Luo W, Soliz P, Kardon R. Automated analysis of optic nerve images for detection and staging of papilledema.Invest Ophthalmol Vis Sci.2011;
52(10):7470-7478.
42. Irani NK, Bidot S, Peragallo JH, Esper GJ, Newman NJ, Biousse V. Feasibility of a nonmydriatic ocular fundus camera in an outpatient neurology clinic.Neurologist.
2020;25(2):19-23.
43. Moreno-Ajona D, McHugh JA, Hoffmann J. An update on imaging in idiopathic intracranial hypertension.Front Neurol.2020;11:453.
44. Athappilly G, Garc´ıa-Basterra I, Machado-Miller F, Hedges TR, Mendoza-Santies- teban C, Vuong L. Ganglion cell complex analysis as a potential indicator of early neuronal loss in idiopathic intracranial hypertension.Neuroophthalmology.2018;
43(1):10-17.
45. Medeiros FA, Jammal AA, Thompson AC. From machine to machine.Ophthalmology.
2019;126(4):513-521.
Appendix 1 (continued)
Name Location Contribution
Dan Milea, MD, PhD
Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School, Singapore; Copenhagen University Hospital, Denmark
Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content
Appendix 2Coinvestigators
Coinvestigators are listed at links.lww.com/WNL/B432.