Image Synthesis and Anomaly Detection on Chest Radiographs using Generative Adversarial Networks

Deep learning, one of the artificial intelligence technologies that has been in the spotlight recently, has shown promising results in several computer vision tasks. Despite the promising results of GANs, it has been less explored in the medical field. In the first topic in (a), we proposed a 3-step method that used a deep learning-based classification network (classifier).

In the second topic of (b), we explored and discovered semantic representations of predefined lung disease patterns in the latent space of PGGAN. In the third topic of (c), we proposed an anomaly detection system based on CXR images with an unsupervised method using PGGAN. In the evaluation, we demonstrated that aberrant patterns in CXR can be sensitively detected without the need for disease annotations.

Meanwhile, the generator understands a distribution of the training data and learns meaningful representations of the data set. Previous research on the decomposition of semantic feature representations into the latent space of GANs achieved a conditional manipulation of a given synthetic image25,27-32. The main concept is that abnormal samples can be distinguished in the latent space of the GAN, which is trained only with normal samples.

Datasets

Each task of high-resolution image synthesis, supervised image synthesis (image manipulation), and anomaly detection in CXR images can be exploited to solve existing problems in supervised learning, such as patient privacy, unbalanced dataset, and insufficient annotations in medical data sets. Each case was used to generate synthetic CXR images with and without lung abnormalities. A total of 111,163 CXR images, of which 20,000 were normal cases, subsampled from 72,938, of which 91,163 were abnormal cases, were used for the high-resolution image synthesis and controllable image synthesis experiments.

The CXR scans were converted to 8-bit Portable Network Graphics (PNG) format and 99th percentile normalization was performed for all converted images. The CXR images were collected from two medical centers, AMC, Seoul, South Korea, and SNUBH, Bundang, South Korea. The dataset consists of 6,069 normal CXR scans and 3,417 CXR scans of the patients at AMC, including and 331 cases with nodule[s], consolidation, interstitial opacities, pleural effusion, and pneumothorax, respectively.

In addition, abnormal cases with pleural effusion and pneumothorax were determined by consensus of two thoracic radiologists with the corresponding chest CT images. For the experiment of high-fidelity image synthesis and its evaluation, we divided all the data into two categories of normal and abnormal. Therefore, there were 7,104 normal CXR images from 7,104 healthy subjects and 10,234 abnormal CXR images from 7,821 patients when multilabeled cases were counted.

On the other hand, for the supervised image synthesis experiment, the normal and the above-mentioned five lung disease abnormal cases were used to consider the 6-class classification.

Figure 1. A flowchart of dataset selection for GAN training

Evaluating GAN generator

For training a CNN classifier, we chose a network architecture of ResNet-5033, which is based on deep residual network (ResNet)34, as it has high performance, which has often been used to solve a real-world classification problem. The classifiers were trained using real and synthetic images, respectively, with the same strategy including the identical architecture33 and hyperparameters. For a fair comparison, the identical number of images and the ratio of normal to abnormal cases were used.

Finally, the evaluation was performed using the real CXR images for both classifiers trained on real and synthetic datasets. To begin with, a classifier was trained using the real data set of normal and abnormal CXR images. We then created pseudo-labels for the PGGAN-generated images according to the classification results of a classifier trained on a real dataset, i.e.

We set a probability threshold higher than 0.7 and lower than 0.3 for labeling normal and abnormal samples, respectively. Finally, a real test set was evaluated and compared in the prediction using both classifiers (real and synthetic).

Disentanglement of the latent space in GAN

The regression slopes for each disease class are considered the axes of the disease models. The PGGAN model was chosen to be implemented to generate synthetic CXR images as this model performed better in reconstructing the global structure and fine details with a high resolution quality among other variant GAN models 31,36,37. PGGAN aimed to learn meaningful feature representations of pulmonary disease patterns in each real CXR image during training.

After training, a generator network of PGGAN is able to generate synthetic CXR images with random disease patterns from a 512-dimensional variables of random noise, that is, we used a multi-label CNN classifier with high performance to the lung disease classification outputs from the synthetic CXR images. The base model architecture is Resnet-50 replacing the last softmax layer with six sigmoid layers to deal with CXR images with multiple disease patterns to deal with a multi-label classification problem.

The classifier has the ability to simultaneously detect and classify five disease patterns – nodules, consolidation, interstitial opacity, pleural effusion and pneumothorax – on CXR images. The generated 30,000 synthetic CXR images were given to the classifier and classified into six classes (five disease patterns and normal), the output of which are numerical numbers between 0 and 5. A CNN classifier can be used to predict and classify lung abnormalities in the randomly generated CXR images into six classes, including normal, and five classes of abnormal patterns.

Feature representations of disease patterns can be encoded into the latent vector of PGGAN. With latent vector regression and classification results of synthetic CXR images, the dissociated feature axis of the disease pattern can be detected. Also, if PGGAN is well trained, it generates synthetic images of a different disease, reflecting the incidence distribution of each disease pattern in the training dataset.

To reduce the natural bias of the regression result derived from the imbalance of the data, we added weights to the rare disease pattern class in a ratio of the number of images per class. An overview of the experimental setting for disentangling the latent space of PGGAN and manipulating the synthetic images.

Figure 4. An overview of the experimental setting for disentangling the latent space of PGGAN and manipulating the synthetic images.

Anomaly detection system with GAN

Evaluation of image fidelity of PGGAN-generated data

A ROC curve result with the AUROC result using a (real) classifier on the real test set. A ROC curve score with the AUROC score using a (synthetic) classifier on the synthetic test set. A ROC curve score with the AUROC score using a (synthetic) classifier on the real test set.

Figure 7. A result of the ROC curve with AUROC score using a classifier (real) on real test set.

Disentangled pulmonary representations in PGGAN

Anomaly detection with PGGAN

One possible explanation for the results could be that the CXR images generated by the GAN are so realistic that even a classifier trained on the synthetic dataset can learn anatomical variations with lung abnormalities comparable to those on the real dataset. . Visual comparison of abnormal case prediction results using unsupervised and supervised methods in anomaly detection. Visual comparison of normal case prediction results using unsupervised and supervised anomaly detection methods.

본 연구에서는 GAN 생성기, 이미지 합성 제어, CXR 이미지에 대한 이상 탐지 평가 작업에서 GAN, 특히 PGGAN을 조사했습니다. 또한, PGGAN의 잠재 공간의 풀림을 활용하여 희귀 질병 패턴에 대한 불충분한 데이터를 생성할 수 있습니다. 최근 주목받고 있는 인공지능 기술 중 하나인 딥러닝은 수많은 컴퓨터 비전 작업에서 유망한 결과를 보여왔습니다.

그러나 의료 분야에서는 고품질의 데이터 세트를 구축하는 것이 어렵습니다. 다양한 데이터로부터 그럴듯한 새로운 사례를 생성하는 능력을 알아보세요. GAN은 도메인 매칭, 초해상도, 이미지 간 변환, 이미지 스타일 전송, 이상 탐지 등 다양한 작업에서 잠재력을 입증했습니다.

GAN의 유망한 결과에도 불구하고 의료 분야에서는 소수의 연구만 수행되었습니다. 본 연구에서는 특히 의료영상 분야에서 PGGAN(Progressive Incremental Genative Adversarial Network)을 사용합니다. 우리는 흉부 X선 영상(CXR)에 적용하기 위한 몇 가지 비지도 학습 방법을 제안합니다.

1단계와 2단계에서는 실제 및 합성 CXR 이미지에 대해 별도로 훈련된 두 분류기의 성능을 비교합니다. 3단계에서는 실제 CXR 이미지로 구성된 동일한 테스트 데이터 세트에서 이진 분류(정상 또는 비정상) 성능을 평가합니다. PGGAN을 통해 생성된 합성 CXR 영상은 실제 영상과 비교할 수 있는 방사선 정보를 보존하는 것으로 나타났습니다. b)의 두 번째 주제에서는 PGGAN의 잠재 공간에서 미리 정의된 폐 질환 패턴의 의미적 표현을 탐색하고 발견합니다.

이는 (a)에 제시된 분류기를 사용하여 지수를 사용하여 전문 방사선 전문의의 시각적 채점을 통해 각각 정성적 및 정량적으로 수행되었습니다. c)의 세 번째 주제에서 PGGAN이 바로 그것이다.