written by hyojung chang
3. Data Preparation
In order to predict the area of disease using Faster R-CNN, the bounding boxes and labels of the lung disease shown in the CT image are required. Below is a description of how to derive this and compose annotation file.
3-1) Collect Data(CT image and mask image)
Data reference
Total number of image : 16614 of cancer, 1104 of covid-19, 588 of nodule
3-2) Extract RoI from CT image / Compose annotation file
At this point, RoI(region of interest) means the area corresponding to the bounding box. RoI must consist of [x0, y0, x1, y1].
Tools
When we use labelImg, we mark the RoI directly on the image and get an xml file with labeled information.
When we use python, we find out the coordinates of the RoI on the image from masking information and save the information in a csv file. The contents of the file consist of [filename, minX, minY, maxX, maxY, classname].
The csv file is required for the use of torchvision.models.detection.faster_rcnn, which is a library in TorchVision.
How to extract RoI from images is described in detail in [All for nothing] section. If you are curious, please refer to the note.
3-3) Convert file format .csv to .json
For comparison between TorchVision and Detectron2, we converted csv file to json file according to the required format of DatasetCatalog and MetadataCatalog in Detectron2.
4. Machine learning and simulation
4-1) Split data
We split data into the following criteria. As a result, there are 1,0989 images for training, 3638 images for validation and 3,679 images for testing.
4-2) Choose method of object detection
To detect lung disease, we compared object detection methods, Faster R-CNN, YOLO, and SSD, and adopted Faster R-CNN, which best fits our goal.
4-3) Choose implementation of Faster R-CNN algorithm based on PyTorch
Before training model using on Faster R-CNN, we compared Detectron2 and Torchvision based on PyTorch. It proceeded through the same data set and variable setting.
4-3-1) Compare with sample data set 1
Data set
Variable setting
Accuracy and Total training time
Detectorn2 with pre-trained model : original ResNet-50 | Detectorn2 with pre-trained model : ResNeXt-101-82x8d trained with Caffe2 | Torchvision with pre-trained model : original ResNet-50 |
AP = 0.48% Total training time = 2m 37s | AP = 0.02% Total training time = 2m 38s | AP = 0.76% Total training time = 4m 1s |
4-3-2) Compare with sample data set 2
Data set
Variable setting
Accuracy and Total training time
Detectorn2 with pre-trained model : original ResNet-50 | Detectorn2 with pre-trained model : ResNeXt-101-82x8d trained with Caffe2 | Torchvision with pre-trained model : original ResNet-50 |
AP = 7.46% Total training time = 1h 25m | AP = 1.54% Total training time = 1h 16m | AP = 9.97% Total training time = 1h 59m |
4-3-3) Compare with our original data set
Data set
Variable setting
Accuracy and Total training time
Detectorn2 with pre-trained model : original ResNet-50 | Detectorn2 with pre-trained model : ResNeXt-101-82x8d trained with Caffe2 | Torchvision with pre-trained model : original ResNet-50 |
AP = 18.58% Total training time = 5h 10m | AP = 8.5% Total training time = 4h 47m | AP = 19.25% Total training time = 6h 3m |
In comparison, Torchvision has the highest accuracy in all settings but total training time was too long. There was a lot of discussion about this, and Torchvision was finally adopted. Because the opinion that the accuracy of model is more important than training time was more dominant.
(Note) The reason we didn’t select the prediction time as comparison target is they are almost the similar.
4-4) Conduct Machine learning using Torchvision
We conducted machine learning through Faster R-CNN library provided by Torchvision. To speed up time of train and evaluate, we used Colab's GPU.
The process of performing Machine learning is as follows :
1) Set up the Colab environment 2) Define the custom Dataset according to the required structure by Torchvision 3) Train the model 4) Evaluate the model 5) Save and test the model |
Detailed descriptions of this content are posted in [AI application] section.
4-5) Adjust detailed parameters of machine learning based on validation results
In order to select the optimal model, we adjusted detailed parameters of machine learning and compared validation results.
epoch = 50 batch size = 16 the number of workers = 4 learning rate = 0.01 | epoch = 40 batch size = 16 the number of workers = 4 learning rate = 0.05 | epoch = 30 batch size = 16 the number of workers = 4 learning rate = 0.1 |
AP = 38.98% Total training time = 20h 16m | AP = 47.32% Total training time = 15h 38m | AP = 21.66% Total training time = 12h 51m |
In comparison, the accuracy was the highest when the parameters were set as epoch = 40, batch size = 16, the number of workers = 4 and learning rate = 0.05. Therefore, that model was finally selected.
(Note) The reason we didn’t select the prediction time as comparison target is they are almost the similar.
5. Link website with trained model
When users access our website and upload their CT images, they should be diagnosed with the presence and accuracy of lung diseases. For this purpose, we linked our website with trained model.
6. Summary
The following is a structure of what has been implemented.