03 Details

Details of idea(Data preparation~Machine learning)

hyojung chang

written by hyojung chang


3. Data Preparation

In order to predict the area of disease using Faster R-CNN, the bounding boxes and labels of the lung disease shown in the CT image are required. Below is a description of how to derive this and compose annotation file.


3-1) Collect Data(CT image and mask image)

Data reference 

Total number of image : 16614 of cancer, 1104 of covid-19, 588 of nodule


3-2) Extract RoI from CT image / Compose annotation file

At this point, RoI(region of interest) means the area corresponding to the bounding box. RoI must consist of [x0, y0, x1, y1].

Tools

  • labelimg

        When we use labelImg, we mark the RoI directly on the image and get an xml file with labeled information.


  • python

        When we use python, we find out the coordinates of the RoI on the image from masking information and save the information in a csv file. The contents of the file consist of [filename, minX, minY, maxX, maxY, classname].

        The csv file is required for the use of torchvision.models.detection.faster_rcnn, which is a library in TorchVision.

        How to extract RoI from images is described in detail in [All for nothing] section. If you are curious, please refer to the note.



3-3) Convert file format .csv to .json

For comparison between TorchVision and Detectron2, we converted csv file to json file according to the required format of DatasetCatalog and MetadataCatalog in Detectron2.


4. Machine learning and simulation

4-1) Split data

We split data into the following criteria. As a result, there are 1,0989 images for training, 3638 images for validation and 3,679 images for testing.


4-2) Choose method of object detection

To detect lung disease, we compared object detection methods, Faster R-CNN, YOLO, and SSD, and adopted Faster R-CNN, which best fits our goal.


4-3) Choose implementation of Faster R-CNN algorithm based on PyTorch

Before training model using on Faster R-CNN, we compared Detectron2 and Torchvision based on PyTorch. It proceeded through the same data set and variable setting.


4-3-1) Compare with sample data set 1

Data set

  • Sample data set (100 for training, 30 for validation, 30 for test)

Variable setting

  • epoch = 10
  • batch size = 4
  • the number of workers = 4
  • learning rate = 0.005

Accuracy and Total training time

Detectorn2

with pre-trained model : original ResNet-50

Detectorn2

with pre-trained model : ResNeXt-101-82x8d trained with Caffe2

Torchvision

with pre-trained model : original ResNet-50

AP = 0.48%

Total training time = 2m 37s

AP = 0.02%

Total training time = 2m 38s

AP = 0.76%

Total training time = 4m 1s


4-3-2) Compare with sample data set 2

Data set

  • Sample data set (1000 for training, 300 for validation, 300 for test)

Variable setting

  • epoch = 10
  • batch size = 4
  • the number of workers = 4
  • learning rate = 0.005

Accuracy and Total training time

Detectorn2

with pre-trained model : original ResNet-50

Detectorn2

with pre-trained model : ResNeXt-101-82x8d trained with Caffe2

Torchvision

with pre-trained model : original ResNet-50

AP = 7.46%

Total training time = 1h 25m

AP = 1.54%

Total training time = 1h 16m

AP = 9.97%

Total training time = 1h 59m


4-3-3) Compare with our original data set

Data set

  • Our original data set

Variable setting

  • epoch = 3
  • batch size = 4
  • the number of workers = 4
  • learning rate = 0.005

Accuracy and Total training time

Detectorn2

with pre-trained model : original ResNet-50

Detectorn2

with pre-trained model : ResNeXt-101-82x8d trained with Caffe2

Torchvision

with pre-trained model : original ResNet-50

AP = 18.58%

Total training time = 5h 10m

AP = 8.5%

Total training time = 4h 47m

AP = 19.25%

Total training time = 6h 3m


In comparison, Torchvision has the highest accuracy in all settings but total training time was too long. There was a lot of discussion about this, and Torchvision was finally adopted. Because the opinion that the accuracy of model is more important than training time was more dominant.

(Note) The reason we didn’t select the prediction time as comparison target is they are almost the similar.


4-4) Conduct Machine learning using Torchvision

We conducted machine learning through Faster R-CNN library provided by Torchvision. To speed up time of train and evaluate, we used Colab's GPU.

The process of performing Machine learning is as follows :

1) Set up the Colab environment

2) Define the custom Dataset according to the required structure by Torchvision

3) Train the model

4) Evaluate the model

5) Save and test the model

Detailed descriptions of this content are posted in [AI application] section.


4-5) Adjust detailed parameters of machine learning based on validation results

In order to select the optimal model, we adjusted detailed parameters of machine learning and compared validation results.

epoch = 50

batch size = 16

the number of workers = 4

learning rate = 0.01

epoch = 40

batch size = 16

the number of workers = 4

learning rate = 0.05

epoch = 30

batch size = 16

the number of workers = 4

learning rate = 0.1

AP = 38.98%

Total training time = 20h 16m

AP = 47.32%

Total training time = 15h 38m

AP = 21.66%

Total training time = 12h 51m


In comparison, the accuracy was the highest when the parameters were set as epoch = 40, batch size = 16, the number of workers = 4 and learning rate = 0.05. Therefore, that model was finally selected.

(Note) The reason we didn’t select the prediction time as comparison target is they are almost the similar.



5. Link website with trained model

When users access our website and upload their CT images, they should be diagnosed with the presence and accuracy of lung diseases. For this purpose, we linked our website with trained model.


6. Summary

The following is a structure of what has been implemented.