Detect anomalies in Chest X-Ray scans using Artificial Intelligence - Digital Solutions, IT Services & Consulting - Payoda

Detect anomalies in Chest X-Ray scans using Artificial Intelligence

Artificial Neural networks have revolutionized image processing in a variety of domains. The field of medical imaging is one of them. In this blog, we’ll look at how to create an application that works with Chest X-Ray (CXR) images. Finally, you’ll be given a working code to experiment with.

The use case is to identify X-ray images that contain an anomaly. This is a classification problem in which we will deal with two classes: ‘anomaly’ and ‘normal.’

Which anomaly are we trying to spot?

The presence of effusion” in the lungs will be determined. Because it is specific to the medical domain, the terminology may be quite unfamiliar. Pleural effusion is a condition in which there is an abnormal level of water accumulation between the lungs and the chest wall. It can be caused by a variety of conditions ranging from mild to severe, such as tuberculosis, pneumonia, and so on. According to the American Thoracic Society, approximately 1 million cases of pleural effusion are diagnosed in the United States each year.

The methodology discussed in the blog can also be used to detect a variety of other illnesses detected by x-ray scans. We can deal with “effusion” in particular for the purposes of this demonstration.

What do these anomalies look like?

The image below depicts a chest X-ray scan in which the large dark lung-shaped regions are clearly the right and left lung, respectively, and the white semi-triangle-shaped portion on the lower corner of the left lung is the heart. Examine the white area on the bottom of the right lung, which is caused by an excess accumulation of fluid content; this is the effusion condition.

What are the advantages of using AI?

  1. Faster diagnosis: For example, in the case of tuberculosis diagnosis, the results typically arrive in 3 days. However, in AI applications, the outcome is immediate.
  2. Increase the accuracy: The AI model has narrow intelligence. Given a large amount of data, its only task is to learn to distinguish between anomalies and normal images. As a result, it has been demonstrated to outperform the average professional.
  3. Pre-diagnosis of X-rays: The results suggested by AI applications can be used as a pre-diagnosis step, allowing a doctor or medical worker to proceed with the next step in diagnosis or report creation.

Application Development

This is a type of computer vision application that identifies images using a deep learning algorithm as its main feature. Deep learning is a subset of machine learning in which the machine (or algorithm) learns to approximate the true result given sufficient data and training. Depending on the problem at hand, there are numerous machine learning algorithms and methods to choose from.

We’re working on a binary classification problem that necessitates the application of supervised learning. In this approach, the learning algorithm is fed labeled data, such as input features and target class, to generate a learned model that can be used for prediction. Furthermore, because the input data is a picture, convolutional neural networks are an excellent choice; to be more specific, we will use residual network architecture. Without going into too much technical detail, let’s use a simplified block diagram to explain the application development processes.

The three high-level stages of application development are data preprocessing, model building and evaluation, and model deployment and serving. Before moving on to the processes, we’ll take a quick look at the data’s composition.

Explore Dataset

We have two types of Chest X-Ray (CXR) images, which are organized into two folders: ‘effusion’ and ‘nofinding.’ Let’s look at one sample from each of the two folders and note the parameters. The dataset and Python code is available in the additional resources section below. Look at the left image; water has accumulated in one of the lungs, which is why the black region has shrunk.

Class-0 — effusion, Class-1 — nofinding, Image Dimension — 1080x1080, Number of Channels — 1


The image is represented as a matrix when we read it into our application. The image has a resolution of 1080x1080 pixels, and each pixel represents one data point, so in our case, 1166400(1080x1080) data points are needed to represent a single image. The x-ray images, thankfully, are single-channel grayscale. The channel size in the case of RGB will be three, resulting in a threefold increase in data size.

It is not necessary to retain the original resolution of the image. It can be scaled down to speed up model training. We must, however, proceed with caution before deciding to rescale the image. For example, in the case of an X-ray scan image for the diagnosis of a brain tumor, clot, or lung nodule, the features will be tiny, and the detail may be lost in the process of scaling down. We will rescale the image to 256x256 pixels because the features on a chest x-ray scan are widespread. Rescaling is also useful for standardizing image sizes.


Many times, the amount of data available is insufficient to perform the classification task adequately. In these cases, we augment the data. Image data can be enhanced in a variety of ways. The model learns all of these variations using training data that includes data augmentation such as flipping, rotation, cropping, translation, illumination, scaling, adding noise, and so on. This significantly improves the model’s accuracy.

The choice of argumentation techniques should be intuitive, taking into account all of the possible variations in which the image can be found in real-time. We should especially avoid augmentations that are inappropriate for our dataset. We have some specific constraints for the CXR data. Vertical flip must be avoided. This is due to the fact that CXR images have a natural orientation — up to down. We should not crop CXR images in the center because the anomaly may be in an area outside the cropped portion of the image. Although the lungs are not symmetric, a horizontal flip is acceptable.


Normalization is an important step in preprocessing because it allows us to standardize the images across the dataset. Ideally, we should account for variations in images such as contrast and lighting conditions, as well as different machine settings when taking images. Normalized images also allow for much better gradient propagation.

Because image pixels naturally range from 0 to 255, dividing the image pixels by 255 will squash the value between 0 and 1, which is the preferred range for model training. However, instead of simply dividing by 255, we must perform normalization based on the maximum/minimum values. Divided by 255 distorts the data of this image because X-ray scans are non-natural images that are usually specific to the range of values that they lie between (e.g. some MRI regions have pixels that never reach 255, and hence 255 is an arbitrary value.)

Architect Model

Residual network architecture is a significant step forward in overcoming the vanishing gradient problem in deep neural networks. Previously, after a certain number of layers, the network’s accuracy begins to deteriorate with the addition of each additional layer. The introduction of residual connections solves this problem. Let’s return to the model building, for now, leaving the rest of the story for another day.

We will use a residual network architecture that differs slightly from the original resnet architecture and was proposed in a paper published on arXiv. We used an 18-layer architecture, with the input layer being a 256x256x1 convolutional layer, the residual block being a 256x246x64 convolutional layer followed by batch normalization and relu activation, and a set of 2 weight(convolutional) layers being followed by the addition of residual from the previous layer introduced via skip connection. Finally, the output of the final residual block will be flattened and sent to the final layer, a dense layer that uses a softmax activation function to provide a probability value between the two classes of images. The code contains more detailed information about the architecture.

Following the construction of the network, we must compile it with the appropriate loss function, optimization algorithm, and accuracy metrics. In this case, our first options are categorical cross-entropy loss, Adam optimizer, and accuracy metrics. These could be changed during the model-building process.

Model Training

In most cases, especially with image data, model training will be done in batches. Aside from the memory benefits, batch training is also important in creating an efficient model. To train the model, we use the generator functionality available in the Python programming language to fetch a limited amount of data at a time. For training, the batch size should be determined ahead of time.a

Ablation Run

The ablation experiment is the first step in model training. It is done to ensure that our code is functioning properly. In order to observe the equivalent change in the output, we use the fail-fast principle in ablation runs, where we systematically modify certain parts of the input. We fit our model with a small portion of training data and mitigate obvious model failures to reduce the number of re-architecture and re-training of the model. For example, if the model is unable to overfit a small version, it is unlikely that it will learn from the larger version.

Model Evaluation

Accuracy is the most commonly used evaluation metric in machine learning. However, using accuracy as a metric in classification problems is risky because it may convey false information. In our case, the data prevalence between the two classes is 10:1. That is, we have 9 times as much data in the ‘nofinding’ class as in the ‘effusion’ class. Even if the model predicts every model as ‘nofinding,’ it can easily exceed 90% accuracy, though it will never correctly predict an ‘effusion’ image.

When selecting a metric for medical images with a prevalence problem, we prioritize recall and precision over accuracy. We don’t want to overlook any effusion cases. A weighted cross-entropy loss is a common solution to the low prevalence rate problem. The loss is modified so that misclassifications of the low-prevalence class are penalized more severely than misclassifications of the other class.

Final Thoughts

This is an era of artificial intelligence, the data is exploding and we need AI to guide us through the universe of data. Especially in light of the current healthcare crisis, we must harness the power of AI to improve diagnosis accuracy, classify the severity of conditions, and discover new drugs and their side effects. In fact, experiments are being carried out with chest x-ray scans to diagnose covid cases. By adapting to this technical innovation, we could bring about a change.

Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *

2 × two =