• Tidak ada hasil yang ditemukan

Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs

N/A
N/A
Fitri Andriyani@2062

Academic year: 2024

Membagikan " Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs "

Copied!
9
0
0

Teks penuh

(1)

RESEARCH ARTICLE

Accuracy of a Deep Learning System for

Classi fi cation of Papilledema Severity on Ocular Fundus Photographs

Caroline Vasseneix, MD, Raymond P. Najjar, PhD, Xinxing Xu, PhD, Zhiqun Tang, PhD,

Jing Liang Loo, MBBS, MMed, FRCS(Ed), Shweta Singhal, MBBS, Sharon Tow, MBBS, Leonard Milea, Daniel Shu Wei Ting, MD, PhD, Yong Liu, PhD, Tien Y. Wong, MD, PhD, Nancy J. Newman, MD, Valerie Biousse, MD, and Dan Milea, MD, PhD, on behalf of the BONSAI Group

Neurology

®

2021;97:e369-e377. doi:10.1212/WNL.0000000000012226

Correspondence Dr. D. Milea

[email protected]

Abstract

Objective

To evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure on standard retinal fundus photographs.

Methods

A DLS was trained to automatically classify papilledema severity in 965 patients (2,103 myd- riatic fundus photographs), representing a multiethnic cohort of patients with confirmed ele- vated intracranial pressure. Training was performed on 1,052 photographs with mild/moderate papilledema (MP) and 1,051 photographs with severe papilledema (SP) classified by a panel of experts. The performance of the DLS and that of 3 independent neuro-ophthalmologists were tested in 111 patients (214 photographs, 92 with MP and 122 with SP) by calculating the area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity.

Kappa agreement scores between the DLS and each of the 3 graders and among the 3 graders were calculated.

Results

The DLS successfully discriminated between photographs of MP and SP, with an AUC of 0.93 (95% confidence interval [CI] 0.89–0.96) and an accuracy, sensitivity, and specificity of 87.9%, 91.8%, and 86.2%, respectively. This performance was comparable with that of the 3 neuro- ophthalmologists (84.1%, 91.8%, and 73.9%,p = 0.19, p = 1, p= 0.09, respectively). Mis- classification by the DLS was mainly observed for moderate papilledema (Fris´en grade 3).

Agreement scores between the DLS and the neuro-ophthalmologists’evaluation was 0.62 (95%

CI 0.57–0.68), whereas the intergrader agreement among the 3 neuro-ophthalmologists was 0.54 (95% CI 0.47–0.62).

Conclusions

Our DLS accurately classified the severity of papilledema on an independent set of mydriatic fundus photographs, achieving a comparable performance with that of independent neuro- ophthalmologists.

Classification of Evidence

This study provides Class II evidence that a DLS using mydriatic retinal fundus photographs accurately classified the severity of papilledema associated in patients with a diagnosis of increased intracranial pressure.

MORE ONLINE

Class of Evidence Criteria for rating therapeutic and diagnostic studies

NPub.org/coe

From the Singapore Eye Research Institute (C.V., R.P.N., Z.T., J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); Duke–NUS Medical School (R.P.N., J.L.L., S.S., S.T., T.Y.W., D.M.); Institute of High Performance Computing (X.X., Y.L.), Agency for Science, Technology and Research (A*STAR); Singapore National Eye Centre (J.L.L., S.S., S.T., D.S.W.T., T.Y.W., D.M.); University of Berkeley (L.M.), CA; Departments of Ophthalmology and Neurology (N.J.N., V.B.), Emory University School of Medicine, Atlanta, GA; and Copenhagen University Hospital (D.M.), Denmark.

Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.

BONSAI Group coinvestigators are listed at http://links.lww.com/WNL/B432.

(2)

Papilledema, defined as optic nerve head swelling associated with any cause of intracranial hypertension, can result in permanent vision loss.1,2Papilledema severity at presentation is the most important prognostic factor for subsequent visual outcomes.3-9 Patients with severe papilledema may have progressive vision loss and visual field constriction due to retinal nervefiber loss, thus requiring closer monitoring and more invasive treatment, whereas those with mild papil- ledema and no optic atrophy usually have good visual outcomes.5,8,9 However, the evaluation of papilledema se- verity, based on the 5-grade modified Fris´en scale classifica- tion mainly used in clinical trials (with 1 being very mild papilledema and 5 very severe papilledema),10is difficult to apply and subject to high variability.10-14Hence, neurologists, especially those not confident in performing ophthalmos- copy,15 usually rely on ophthalmologists to determine the severity of papilledema.16,17Fundus photography is now in- creasingly used in various clinical settings for screening purposes,18,19 and may be augmented with artificial in- telligence deep learning techniques for automated image interpretation.20,21Recently, the Brain and Optic Nerve Study with Artificial Intelligence (BONSAI) deep learning system (DLS)22 was shown to accurately discriminate papilledema from normal and other abnormal optic discs on fundus photographs, with a performance comparable to that of expert neuro-ophthalmologists.23

The aim of the current study was to develop, train, and test a new DLS to automatically classify the severity of papilledema and to compare the performance of this DLS with the clas- sification performance of 3 neuro-ophthalmologists.

Methods

This study, performed by the BONSAI Consortium,22 in- cluded investigators and patients from 14 countries.

Our primary research questions, aiming to provide a Class II level of evidence, were the following: (1) Is a DLS capable of discriminating mild to moderate from severe papilledema on mydriatic fundus photographs? (2) Is the DLS’s performance in classifying the severity of papilledema on fundus photo- graphs comparable to that of neuro-ophthalmologists?

Standard Protocol Approvals, Registrations, and Patient Consents

The study was approved by the Centralized Institutional Review Board of SingHealth, Singapore, and each contribut- ing institution for any experiments using human subjects, and

was conducted in accordance with the Declaration of Hel- sinki. Informed consent was exempted given the retrospective nature of the study and the use of de-identified ocular fundus photographs.

Inclusion/Exclusion Criteria

The study included unaltered, de-identified digital ocular fundus photographs obtained in patients with confirmed intracranial hypertension and papilledema from 19 neuro- ophthalmology centers participating in the BONSAI con- sortium.22The fundus photographs, taken at variousfields of view (20°–45°) including the optic disc, were obtained after pupillary dilation, using 15 different cameras, mydriatic or nonmydriatic, depending on the center (table 1).

Experts from each participating center provided photographs of patients with confirmed papilledema (criteria previously pub- lished22). Intracranial hypertension was confirmed in every pa- tient by brain imaging (e.g., showing an intracranial mass or venous sinus thrombosis) or elevated CSF opening pressure and follow-up visits. Papilledema was diagnosed only if the optic disc swelling was related to confirmed raised intracranial pressure (ICP). Patients with a diagnosis of idiopathic intracranial hy- pertension met the modified Dandy criteria.24

The fundus photographs were divided into 2 datasets repre- senting a mix of consecutive and convenience samples. The training dataset, used to train the DLS to classify the severity of papilledema, was composed of all papilledema images previously included in the training cohort of ourfirst BONSAI study.22The testing dataset was obtained by choosing randomly 222 images from 4 participating centers of the same study, independent from the training dataset, in order to test the performance of the DLS after training, and to test 3 independent neuro-ophthalmologists for comparison with the DLS.

Two experts (V.B., N.J.N.) independently reviewed all fundus photographs of the training (2,524 photographs) and testing (222 photographs) datasets, and classified papilledema se- verity according to a simple 2-grade classification (see below).

Classification by the experts was performed under standard conditions on a computer screen (LG-34WK650, 100%

brightness, 80% contrast) using semi-automated software developed by L.M. and R.P.N.25; the severity scores were automatically included into an Excel spreadsheet. In the case of discordance between the 2 experts, the classification was adjudicated by 2 additional neuro-ophthalmologists (D.M., C.V.), and a consensus was obtained for all images, used as reference standard. Only patients with active papilledema

Glossary

AUC = area under the receiver operating characteristic curve; BONSAI = Brain and Optic Nerve Study with Artificial Intelligence;CI= confidence interval;DLS= deep learning system;GCC= ganglion cell complex;ICP= intracranial pressure;

IIH= idiopathic intracranial hypertension;OCT= optical coherence tomography.

(3)

were included, and optic discs with atrophic papilledema (defined as definite atrophy with no active swelling) were excluded from the study (394/2,524 [15.6%] photographs excluded in the training dataset and 8/222 [3.6%] in the testing dataset). Fundus photographs of insufficient quality were also excluded (27/2,524 [1.1%] in the training dataset, none in the testing dataset). A total of 2,103 and 214 fundus photographs were included in the training and testing data- sets, respectively (figure 1).

Papilledema Severity Classification

We created a simple 2-grade papilledema severity classification (figure 2): (1) mild to moderate papilledema, corresponding to Fris´en grades 1–3, defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc; (2) severe papilledema, that is, Fris´en grades 4 and 5, defined as disc edema associated with any obscuration of major blood vessels (arteries and/or veins) on the disc. The presence of hemorrhages and exudates did not influence the classification, and optic discs with atrophic papilledema were excluded from the study.

Study Population Training Dataset

A total of 2,103 fundus photographs of 965 patients, 1,052 mild to moderate papilledema and 1,051 severe papilledema,

were included in the training dataset, collected from 16 par- ticipating centers of BONSAI (table 2).22A total of 685 pa- tients had a photograph taken in both eyes and 146 in 1 eye.

Testing Dataset

The testing dataset included 214 photographs (92 MP, 122 SP) from 111 patients (103 patients with both eyes imaged and 8 with 1 eye), randomly collected from 4 participating centers (Bangkok, Thailand; Freiburg, Germany; Tehran, Iran; Angers, France) of the BONSAI study (table 2).22 Deep Learning System

Deep learning is a technique of machine learning, which consists of multiple layers of convolutional neural networks with the capability to learn image features and classify images without using hand-crafted features.20 DLSs need to be trained on large datasets, and the classification performance is subsequently evaluated on part of the training dataset (vali- dation) or on an independent external dataset (testing dataset).26

In our study, the DLS was composed of a segmentation network and a classification network. First, the optic disc was automatically located on the fundus photograph by the seg- mentation network (U-Net27) as previously described for the BONSAI-DLS.22The segmentation network is based on U- Table 1 Cameras Used in the Participating Centers

City, country Camera brand Model Characteristics

Angers, France Topcon TRC-NW6S Nonmydriatic

Atlanta, USA Topcon 50DX Mydriatic

Baltimore, USA Zeiss FF4 Mydriatic

Bordeaux, France Zeiss VISUCAM Nonmydriatic

Bangkok, Thailand Kowa WX3D Nonmydriatic

Bologna, Italy Topcon DRI OCT Triton Nonmydriatic

Coimbra, Portugal Topcon TRC-NW7SF Mark II Mydriatic and nonmydriatic

Chennai, India Zeiss FF450 Plus IR Mydriatic

Freiburg, Germany Zeiss SF 420 Mydriatic

Geneva, Switzerland Zeiss FF450 Plus Mydriatic

Grenoble, France Topcon/Canon TRC NW6S/CR2 Nonmydriatic

Hong Kong, China Topcon TRC 50DX Mydriatic

London, UK Topcon/Canon TRC 50DX/CR2 Mydriatic and nonmydriatic

Manila, Philippines Zeiss/Meditec VISUCAM 500/NMFA Nonmydriatic

Paris, France Canon CRDGI Nonmydriatic

Sydney, Australia Zeiss VISUCAM 500 Nonmydriatic

Syracuse, USA Topcon/Zeiss TRC NW8/TRC NW400/FF 450 Mydriatic and nonmydriatic

Singapore, Singapore Topcon TRC 50DX Mydriatic

Tehran, Iran Canon CR2 Nonmydriatic

(4)

Net,27which was widely used for biomedical image segmen- tation tasks. The U-Net was trained to localize the optic disc region based on a total of 6,370 fundus images with masks annotated in pixel level. The trained U-Net was then applied to test the full fundus image to generate the optic disc region automatically, which was further used as the input to the classification network.

Then, the classification network classified the optic disc into 1 of 2 classes: mild to moderate or severe papilledema (classi- fication network, VGGNet28). The classification network is based on the convolutional neural networks. At the last

convolutional layer of the VGGNet, 2 dense layers were added with a SoftMax layer to obtain the 2-class outputs. The classification network was initialized using weights pretrained on ImageNet29 andfine-tuned in an end-to-end manner to achieve the optimal performance. The classification network was trained on 2,103 fundus photographs (training dataset) to automatically classify papilledema’s severity into the 2 classes defined as the reference standard (mild to moderate and se- vere papilledema). The network weights were updated itera- tively based on the difference signals between the outputs of DLS and the clinical reference standard on the severity levels using backpropagation algorithm.30The trained segmentation Figure 1Flowchart of Inclusion and Exclusion of Fundus Photographs

Process of inclusion and exclusion of fundus photographs in the training and testing datasets.

Figure 2Papilledema Severity Classification

This figure represents the 2-grade papilledema severity classification system used in our study, separating mild to moderate papilledema (Fris´en grade 1 to 3) and severe papilledema (Fris´en grade 4 and 5). Mild to moderate pap- illedema was defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc and se- vere papilledema as disc edema with any obscuration of major blood vessels on the disc.

(5)

network and the classification network were tested on another 214 images from the external testing dataset to get the prediction outcome. To report performance characteristics for this model, the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were calculated. Diagnosis was provided at eye level for each included image.

Testing of 3 Independent Neuro-

ophthalmologists for Comparison With the DLS In order to compare the performance of the DLS with the classification performance of neuro-ophthalmologists, we tested 3 independent neuro-ophthalmologists (S.S., J.L.L., S.T.) who were asked to independently classify the papil- ledema severity into one of the 2 previously described classes, after a brief training session. The testing was performed on the same 214 images from the testing dataset used for the DLS.

For this purpose, images were presented to the 3 neuro- ophthalmologists on the same individual computer screen (LG-34WK650, 100% brightness, 80% contrast), using the semiautomated software described above.25 The neuro- ophthalmologists were masked to patients’ clinical in- formation and to the classification assigned by the other evaluators and by the DLS. For comparisons and statistical analysis purposes, we used a majority agreement grade, de- fined as the severity classification reported by at least 2 of the 3 participating neuro-ophthalmologists, for each image.

Statistical Analysis

The performance characteristics of the DLS and of the 3 neuro-ophthalmologists were evaluated by calculating the AUC, sensitivity, specificity, and accuracy in the external testing dataset.

A pairwise McNemar test with Bonferroni correction was used to compare the accuracies, sensitivities, and specificities between the DLS and each neuro-ophthalmologist and be- tween the DLS and the majority agreement among the 3 neuro-ophthalmologists.

Cohen kappa agreement scores were used for comparison between the DLS and each of the 3 neuro-ophthalmologists and between the DLS and the majority agreement among the 3 neuro-ophthalmologists and Fleiss kappa score31was used for intergrader agreement analysis among the 3 neuro- ophthalmologists. Kappa agreement scores were interpreted according to previously published scale (0–0.20: no agree- ment, 0.21–0.39: minimal; 0.40–0.59: weak; 0.60–0.79:

moderate; 0.80–0.90: strong; >0.90: almost perfect).32 Apvalue less than 0.05 was considered statistically significant.

Data Availability

Anonymized data will be shared by justified request from any qualified investigator.

Results

Patient and Image Characteristics

Thefinal training dataset included 965 patients (representing 1777 eyes and 2,103 images); 397 patients (731 eyes) had mild to moderate papilledema, 431 (772 eyes) had severe papil- ledema, and 137 (274 eyes) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Among those 965 patients, 822 (85%) had papilledema due to idiopathic in- tracranial hypertension (IIH) and 143 (15%) presented with secondary causes of intracranial hypertension such as cerebral venous sinus thrombosis, meningitis, or brain tumor. Patient demographics and image characteristics are described in table 3.

The testing dataset included 111 patients (representing 214 eyes or images), of whom 40 (77 eyes or images) had mild to moderate papilledema, 56 (107 eyes or images) had severe Table 2 List of Participating Centers and Number of

Photographs Included per Center

Participating center Number of photographs included Primary training dataset

Angers, France 286

Atlanta, USA 1,157

Baltimore, USA 136

Bologna, Italy 11

Bordeaux, France 26

Chennai, India 183

Coimbra, Portugal 17

Freiburg, Germany 10

Geneva, Switzerland 12

Grenoble, France 11

Hong Kong, China 4

London, UK 35

Manila, Philippines 18

Paris, France 86

Singapore 18

Sydney, Australia 65

Syracuse, USA 28

Total training dataset 2,103 External testing dataset

Angers, France 10

Bangkok, Thailand 86

Freiburg, Germany 35

Tehran, Iran 83

Total testing dataset 214

(6)

papilledema, and 15 (30 eyes or images) had mild to mod- erate papilledema in one eye and severe papilledema in the other eye. Seventy-five patients (68%) had papilledema from IIH and the remaining from secondary causes.

Papilledema Severity Classification by the DLS and Neuro-ophthalmologists

In the testing dataset, the DLS successfully discriminated severe from mild to moderate papilledema with an AUC of 0.93 (95%

confidence interval [CI] 0.89–0.96), an accuracy of 87.9% (95%

CI 82.7%–91.9%), a sensitivity of 91.8% (95% CI 86.9%–96.7%), and a specificity of 82.6% (95% CI 74.9%–90.4%) (figures 3 and 4). Among the 26 images misclassified by the DLS, 16 cases of mild to moderate papilledema were misclassified as severe; 14 of these 16 (87.5%) were photographs of papilledema Fris´en grade 3, and 2 of them Fris´en grade 2; 10 images of severe papilledema were misclassified as mild to moderate, all Fris´en grade 4 with minimal vessel obscuration on the disc or with hemorrhages and cotton-wool spots on the disc (eFigures 1 and 2, data available from Dryad, doi.org/10.5061/dryad.66t1g1k1x). The ability of the 3 independent neuro-ophthalmologists to discriminate severe from mild to moderate papilledema was comparable to the DLS, with an accuracy of the majority agreement among neuro- ophthalmologists of 84.1% (95% CI 78.5%–88.7%,p= 0.19), a sensitivity of 91.8% (95% CI 86.9%–96.7%,p= 1), and a speci- ficity of 73.9% (95% CI 64.9%–82.9%,p= 0.09) (figures 3 and 4).

Agreement scores between the DLS and neuro- ophthalmologist 1, 2, and 3 were 0.72 (95% CI 0.67–0.76), 0.43 (95% CI 0.37–0.49), and 0.60 (95% CI 0.55–0.65), re- spectively, and between the DLS and the majority agreement of neuro-ophthalmologists, 0.62 (95% CI 0.57–0.68). A weak intergrader agreement of 0.54 (95% CI 0.47–0.62) was found among the 3 neuro-ophthalmologists. Disagreement among neuro-ophthalmologists was observed for 67 photographs, among them 46 (68.6%) photographs of moderate papil- ledema with a Fris´en grade of 3, the other 21 photographs of severe papilledema with a Fris´en grade of 4.

Discussion

In this study, a DLS trained on 2,103 ocular fundus photo- graphs to classify the severity of papilledema from intracranial hypertension discriminated between mild to moderate and

severe papilledema on an independent testing dataset of 214 fundus photographs, with a comparable performance to that of 3 neuro-ophthalmologists.

Papilledema33,34 is the only objective clinical sign of in- tracranial hypertension.35,36 Because the degree of papil- ledema at presentation is a reliable indicator of subsequent visual outcomes from secondary optic atrophy,3-9the severity of papilledema influences the management and the frequency of visual monitoring during follow-up of patients with ele- vated ICP.37,38Our simple binary classification of papilledema severity aimed at identification of patients with lower risk (mild to moderate papilledema) or higher risk (severe Table 3Demographics and Image Characteristics in the Training and Testing Datasets

Characteristics Training dataset Testing dataset

Number of patients (images) 965 (2,103) 111 (214)

Age, y, mean (SD) 31.7 (12.8) 32.0 (15.4)

Female, n (%) 715 (74.9) 77 (69.4)

Images with mild to moderate papilledema, n (%) 1,052 (50.0) 92 (43.0)

Images with severe papilledema, n (%) 1,051 (50.0) 122 (57.0)

Figure 3Performance (Area Under the Receiver Operating Characteristic [ROC] Curve) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists in Classifying Papilledema Severity

ROC curve representing the performance of the DLS, 3 neuro-ophthalmology graders, and the majority agreement among the 3 neuro-ophthalmologists (obtained when at least 2 graders agreed on the grade) in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). The optimal performance of the DLS (blue dot) was calculated with the Youden index for best cutoff between sensitivity and specificity.

(7)

papilledema) of visual loss.5,6,8,9Hence, a patient with mild to moderate papilledema from IIH might be managed with weight loss and oral medications as an outpatient, whereas a patient with severe papilledema should have closer follow-up, sometimes inpatient, and benefit from timely surgical in- tervention, especially in those cases with visual loss.38A DLS, capable of automatically classifying the severity of papilledema on fundus photographs, could assist nonexperts with a more accurate prognosis, treatment strategy, and effect of treat- ment, as a diagnostic or prognostic tool along with the clinical examination and additional tests such as computer-assisted perimetry or optical coherence tomography (OCT).

It could be argued that ophthalmology consultation would obviate the need for automated fundus photographic in- terpretation by a DLS. However, despite the simplification of the papilledema severity classification in our study, the intergrader agreement among the 3 neuro-ophthalmologists was relatively weak (0.54), confirming the variability of hu- man evaluation of papilledema severity, particularly for moderate papilledema (Fris´en grade 3).13Nevertheless, our DLS could accurately distinguish between mild to moderate and severe papilledema on fundus photographs. The perfor- mance of the DLS was similar to that of 3 independent neuro- ophthalmologists with a comparable accuracy and sensitivity, and with a nonsignificantly higher specificity (82.6% for the DLS vs 73.9% for the majority agreement among the 3 neuro- ophthalmologists,p= 0.09). The agreement scores between the DLS and the majority agreement among the 3 neuro- ophthalmologists (κ= 0.62) was higher than the intergrader agreement among the 3 neuro-ophthalmologists (κ= 0.54).

Moderate intergrader agreement scores were also observed in

studies involving glaucoma or retinal diseases, for example, for glaucomatous damage assessment of the optic disc under stereoscopic conditions by 6 glaucoma experts39(κ= 0.50), or for plus disease retinopathy of prematurity diagnosis by 9 experts40(κ= 0.59 to 0.92). Similar results were previously described with a machine learning technique in a small study,41which showed that a computer-aided image analyses, used to analyze features of papilledema on fundus photo- graphs, could automatically grade papilledema with a sub- stantial agreement with one expert grading (κ= 0.71).

Our results might have implications for the management of raised ICP and papilledema in the future. However, further prospective validation studies are needed, at best in non- ophthalmologic clinical settings (i.e., neurology clinics, neu- rosurgery clinics, or emergency departments), and ideally using nonmydriatic digital cameras.15,42If those studies con- firm its applicability, a DLS, connected to a camera on site21or remotely, could be used for the assessment of papilledema severity.

Our study has inherent limitations. As it was a retrospective data collection, visual function was not available and objective data such as OCT retinal nervefiber thickness or macular ganglion cell complex (GCC) analysis were not systematically collected. Moreover, atrophic papilledema was excluded from this study, as we only trained the DLS to grade the severity of papilledema, not to identify associated atrophy, a difficult assessment even for the most experienced ophthalmolo- gists.13Hence, some of the patients with longstanding raised ICP who had both atrophy and residual papilledema atfirst presentation were excluded. In a future project, GCC-OCT, Figure 4Performance (Accuracy, Sensitivity, Specificity) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists

in Classifying Papilledema Severity

Comparison of the performance (accuracy, sensitivity, specificity) of the DLS to (A) the detailed performance of each grader and (B) the performance of the majority agreement among the 3 graders in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). Statistical significance of DLS vs human graders *p< 0.05; **p< 0.01; ***p< 0.001. Data are represented as mean (95% confidence interval [CI]). 95% CIs were calculated using the asymptotic method.

(8)

which has proven useful in the detection of optic atrophy associated with papilledema,43,44could also be incorporated into the DLS strategy, as already done in some glaucoma studies.45We used a simplified classification of papilledema into 2 grades, instead of the 5 grades used in the Fris´en scale.

The Fris´en scale is notoriously difficult to use, especially when attempting to differentiate grades 3 and 4.13,14The few cases of misclassification by the DLS were mainly observed for moderate papilledema (Fris´en grade 3) or for Fris´en grade 4 papilledema with mild vessel obscuration on the optic disc or associated with hemorrhages or cotton-wool spots. Those images were challenging to classify by the ex- perts as well. The lack of reproducibility and the inability of the Fris´en scale to discern optic disc changes over time was demonstrated by Sinclair et al.,13who proposed an alterna- tive ranking of optic disc appearance related to papilledema severity that incorporates the development of secondary optic atrophy, with an improvement of complete agreement among reviewers from 1.6% of photographs with Fris´en scale to 44.6% with their optic disc ranking scheme. In the pro- spective Intracranial Hypertension Treatment Trial, experts initially agreed on Fris´en scale classification for only 42% of images, and intragrader agreement rates varied from 55% to 73%.14Our simplified binary classification was designed to signal the presence of severe papilledema, a finding that should influence the acute management of these patients by nonophthalmologists.

We developed, trained, and tested a DLS that accurately discriminated mild to moderate from severe papilledema on mydriatic fundus photographs. In a subsequent comparison, the DLS had a performance comparable to 3 independent neuro-ophthalmologists. The automated recognition of se- vere papilledema by a DLS could be helpful in neurology, neurosurgery, and emergency settings for the management of patients with raised ICP. Additional prospective studies are needed to confirm the applicability of this DLS in real-life clinical settings.

Study Funding

Singapore National Medical Research Council (Clinician Scientist Individual Research grant CIRG18Nov-0013), The Duke-NUS Medical School, Ophthalmology and Visual Sci- ences Academic Clinical Program grant (05/FY2019/P2/06- A60) departmental grant, NIH/NEI core grant P30-EY06360 (Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA), NIH/NINDS (RO1NSO89694).

Disclosure

C. Vasseneix, R.P. Najjar, X. Xu, Z. Tang, J.L. Loo, S. Singhal, S. Tow, L. Milea, D. Ting, Y. Liu, and T.Y. Wong report no disclosures relevant to the manuscript. N.J. Newman is con- sultant for GenSight Biologics and Neurophoenix, Santhera Pharmaceuticals/Chiesi, and Stealth BioTherapeutics. V.

Biousse is consultant for GenSight Biologics and Neuro- phoenix. D. Milea reports no disclosures relevant to the manuscript. Go to Neurology.org/N for full disclosures.

Publication History

Received by Neurology November 9, 2020. Accepted in final form April 19, 2021.

Appendix 1Authors

Name Location Contribution

Caroline Vasseneix, MD

Singapore Eye Research Institute

Designed and conceptualized study, major role in the acquisition of data, analyzed and interpreted the data, drafted the manuscript for intellectual content Raymond P.

Najjar, PhD

Singapore Eye Research Institute; Duke-NUS Medical School, Singapore

Designed and conceptualized study, analyzed and interpreted the data, revised the manuscript for intellectual content

Xinxing Xu, PhD

Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore

Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Zhiqun Tan, PhD

Singapore Eye Research Institute

Interpreted the data

Jing Liang Loo, MBBS, MMed, FRCS(Ed)

Singapore National Eye Centre

Major role in the acquisition of data

Shweta Singhal, MBBS, PhD

Singapore National Eye Centre; Singapore Eye Research Institute

Major role in the acquisition of data

Sharon Tow, MBBS, FRCS(Ed)

Singapore National Eye Centre; Duke-NUS Medical School

Major role in the acquisition of data

Leonard Milea UC Berkeley, CA Design and conceptualized study, major role in the acquisition of data Daniel Shu

Wei Ting, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute

Design and conceptualized study

Yong Liu, PhD Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore

Major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Tien Y. Wong, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School

Revised the manuscript for intellectual content

Nancy J.

Newman, MD

Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content Valerie

Biousse, MD

Ophthalmology and Neurology Department, Emory University School of Medicine, Atlanta, GA

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

(9)

References

1. Friedman DI. The pseudotumor cerebri syndrome.Neurol Clin.2014;32(2):363-396.

2. Corbett JJ. Visual loss in pseudotumor cerebri: follow-up of 57 patients fromfive to 41 years and a profile of 14 patients with permanent severe visual loss.Arch Neurol.1982;

39(8):461-474.

3. Orcutt JC, Page NGR, Sanders MD. Factors affecting visual loss in benign intracranial hypertension.Ophthalmology.1984;91(11):1303-1312.

4. Liu KC, Bhatti MT, Chen JJ, et al. Presentation and progression of papilledema in cerebral venous sinus thrombosis.Am J Ophthalmol.2020;213:1-8.

5. Chen JJ, Thurtell MJ, Longmuir RA, et al. Causes and prognosis of visual acuity loss at the time of initial presentation in idiopathic intracranial hypertension.Invest Oph- thalmol Vis Sci.2015;56(6):3850-3859.

6. Gospe SM, Bhatti MT, El-Dairi MA. Anatomic and visual function outcomes in paediatric idiopathic intracranial hypertension. Br J Ophthalmol. 2016;100(4):

505-509.

7. Takkar A, Goyal MK, Bansal R, Lal V. Clinical and neuro-ophthalmologic predictors of visual outcome in idiopathic intracranial hypertension.Neuroophthalmology.2018;

42(4):201-208.

8. Wall M, Falardeau J, Fletcher WA, et al. Risk factors for poor visual outcome in patients with idiopathic intracranial hypertension.Neurology.2015;85(9):799-805.

9. Micieli JA, Bruce BB, Vasseneix C, et al. Optic nerve appearance as a predictor of visual outcome in patients with idiopathic intracranial hypertension.Br J Ophthalmol.

2019;103(10):1429-1435.

10. Scott CJ. Diagnosis and grading of papilledema in patients with raised intracranial pressure using optical coherence tomography vs clinical expert assessment using a clinical staging scale.Arch Ophthalmol.2010;128(6):705-711.

11. Frisen L. Swelling of the optic nerve head: a staging scheme.J Neurol Neurosurg Psychiatry.1982;45(1):13-18.

12. Kardon R. Optical coherence tomography in papilledema: what am I missing?

J Neuroophthalmol.2014;34(suppl S10-7):S10-S17.

13. Sinclair AJ, Burdon MA, Nightingale PG, et al. Rating papilloedema: an evaluation of the Fris´en classification in idiopathic intracranial hypertension.J Neurol.2012;259(7):

1406-1412.

14. Fischer WS, Wall M, McDermott MP, Kupersmith MJ, Feldon SE. Photographic reading center of the idiopathic intracranial hypertension treatment trial (IIHTT):

methods and baseline results.Invest Ophthalmol Vis Sci.2015;56(5):3292-3303.

15. Bruce BB, Lamirel C, Wright DW, et al. Nonmydriatic ocular fundus photography in the emergency department.N Engl J Med.2011;364(4):387-389.

16. Friesner D, Rosenman R, Lobb BM, Tanne E. Idiopathic intracranial hypertension in the USA: the role of obesity in establishing prevalence and healthcare costs.Obes Rev.

2011;12(5):e372-e380.

17. Mollan SP, Aguiar M, Evison F, Frew E, Sinclair AJ. The expanding burden of idiopathic intracranial hypertension.Eye.2019;33(3):478-485.

18. Bruce BB, Thulasi P, Fraser CL, et al. Diagnostic accuracy and use of nonmydriatic ocular fundus photography by emergency physicians: phase II of the FOTO-ED study.Ann Emerg Med.2013;62(1):28-33.

19. Ivan Y, Ramgopal S, Cardenas-Villa M, et al. Feasibility of the digital retinography system camera in the pediatric emergency department.Pediatr Emerg Care.2018;

34(7):488-491.

20. Ting DSW, Pasquale LR, Peng L, et al. Articial intelligence and deep learning in ophthalmology.Br J Ophthalmol.2019;103(2):167-175.

21. Abr`amoffMD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI- based diagnostic system for detection of diabetic retinopathy in primary care offices.

NPJ Digit Med.2018;1:39.

22. Milea D, Najjar RP, Zhubo J, et al. Artificial intelligence to detect papilledema from ocular fundus photographs.N Engl J Med.2020;382(18):1687-1695.

23. Biousse V, Newman NJ, Najjar RP, et al. Optic disc classication by deep learning versus expert neuro‐ophthalmologists.Ann Neurol.2020;88(4):785-795.

24. Friedman DI, Liu GT, Digre KB. Revised diagnostic criteria for the pseudotumor cerebri syndrome in adults and children.Neurology.2013;81(13):1159-1165.

25. Milea L, Najjar RP.Classif-Eye: A Semi-automated Image Classification Application;

2020. GitHub repository. github.com/milealeonard/Classif-Eye/

26. Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc ab- normalities.Curr Opin Neurol.2020;33(1):106-110.

27. Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry.Nat Methods.2019;16(1):67-70.

28. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Presented at the International Conference on Learning Representations (ICLR); April 10, 2015; San Diego, CA.

29. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Presented at the Conference on Computer Vision and Pattern Recognition (CVPR); June 20–25, 2009; Miami, FL.

30. LeCun Y, Bengio Y, Hinton G. Deep learning.Nature.2015;521(7553):436-444.

31. Vanbelle S. Asymptotic variability of (multilevel) multirater kappa coefficients.Stat Methods Med Res.2019;28(10-11):3012-3026.

32. McHugh ML. Interrater reliability: the kappa statistic.Biochem Med.2012;22(3):

276-282.

33. Friedman DI, Quiros PA, Subramanian PS, et al. Headache in idiopathic intracranial hypertension:findings from the Idiopathic Intracranial Hypertension Treatment Trial.Headache.2017;57(8):1195-1205.

34. Fisayo A, Bruce BB, Newman NJ, Biousse V. Overdiagnosis of idiopathic intracranial hypertension.Neurology.2016;86(4):341-350.

35. Radojicic A, Vukovic-Cvetkovic V, Pekmezovic T, Trajkovic G, Zidverc-Trajkovic J, Jensen RH. Predictive role of presenting symptoms and clinicalfindings in idiopathic intracranial hypertension.J Neurol Sci.2019;399:89-93.

36. Crum OM, Kilgore KP, Sharma R, et al. Etiology of papilledema in patients in the eye clinic setting.JAMA Netw Open.2020;3(6):e206625.

37. Onyia CU, Ogunbameru IO, Dada OA, et al. Idiopathic intracranial hypertension:

proposal of a stratification strategy for monitoring risk of disease progression.Clin Neurol Neurosurg.2019;179:35-41.

38. Mollan SP, Davies B, Silver NC, et al. Idiopathic intracranial hypertension: consensus guidelines on management.J Neurol Neurosurg Psychiatry.2018;89(10):1088-1100.

39. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma.Ophthalmology.1992;99(2):215-221.

40. Brown JM, Campbell JP, Beers A, et al; Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks.JAMA Oph- thalmol.2018;136(7):803-810.

41. Echegaray S, Zamora G, Yu H, Luo W, Soliz P, Kardon R. Automated analysis of optic nerve images for detection and staging of papilledema.Invest Ophthalmol Vis Sci.2011;

52(10):7470-7478.

42. Irani NK, Bidot S, Peragallo JH, Esper GJ, Newman NJ, Biousse V. Feasibility of a nonmydriatic ocular fundus camera in an outpatient neurology clinic.Neurologist.

2020;25(2):19-23.

43. Moreno-Ajona D, McHugh JA, Hoffmann J. An update on imaging in idiopathic intracranial hypertension.Front Neurol.2020;11:453.

44. Athappilly G, Garc´ıa-Basterra I, Machado-Miller F, Hedges TR, Mendoza-Santies- teban C, Vuong L. Ganglion cell complex analysis as a potential indicator of early neuronal loss in idiopathic intracranial hypertension.Neuroophthalmology.2018;

43(1):10-17.

45. Medeiros FA, Jammal AA, Thompson AC. From machine to machine.Ophthalmology.

2019;126(4):513-521.

Appendix 1 (continued)

Name Location Contribution

Dan Milea, MD, PhD

Singapore National Eye Centre; Singapore Eye Research Institute; Duke- NUS Medical School, Singapore; Copenhagen University Hospital, Denmark

Designed and conceptualized study, major role in the acquisition of data, interpreted the data, revised the manuscript for intellectual content

Appendix 2Coinvestigators

Coinvestigators are listed at links.lww.com/WNL/B432.

Referensi

Dokumen terkait