04 All for nothing

Conduct Machine Learning using Torchvision

hyojung chang

written by hyojung chang

We will conduct machine learning experiment through Faster R-CNN library provided by Torchvision. To speed up time of train and evaluate, we use Colab's GPU.

1. Set up the Colab environment

1) First, we need to enable GPUs for the notebook

Navigate to Edit → Notebook Settings

Select GPU from the Hardware Accelerator drop-down

2) Download some requirment for torchvision

%%shell

pip install cython

# Install pycocotools, the version by default in Colab

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

%%shell

# Download TorchVision repo to use some files from

# references/detection

git clone https://github.com/pytorch/vision.git

cd vision

git checkout v0.3.0

cp references/detection/utils.py ../

cp references/detection/transforms.py ../

cp references/detection/coco_eval.py ../

cp references/detection/engine.py ../

cp references/detection/coco_utils.py ../

Now, you can do all necessary imports

import os
import numpy as np
import torch
import torch.utils.data
from PIL import Image
import pandas as pdimport torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from engine import train_one_epoch, evaluate

import utils
import transforms as T

When using PyTorch, a bug related to numpy may occur. To prevent this, downgrade the installed version of numpy.

pip install numpy==1.17.4

3) Mount google drive to Colab

from google.colab import drive
drive.mount('/content/drive')

When we do this, our current directory becomes '/content/drive/My Drive/'.

2. Define the custom Dataset

Dataset structure required by Torchvision :

image: a PIL Image of size (H, W)

target: a dict containing the following fields

- boxes (FloatTensor[N, 4]): the coordinates of the N bounding boxes in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H

- labels (Int64Tensor[N]): the label for each bounding box. 0 represents always the background class.

- image_id (Int64Tensor[1]): an image identifier. It should be unique between all the images in the dataset, and is used during evaluation

- area (Tensor[N]): The area of the bounding box. This is used during evaluation with the COCO metric, to separate the metric scores between small, medium and large boxes.

- iscrowd (UInt8Tensor[N]): instances with iscrowd=True will be ignored during evaluation.

- (optionally) masks (UInt8Tensor[N, H, W]): The segmentation masks for each one of the objects

- (optionally) keypoints (FloatTensor[N, K, 3]): For each one of the N objects, it contains the K keypoints in [x, y, visibility] format, defining the object. visibility=0 means that the keypoint is not visible. Note that for data augmentation, the notion of flipping a keypoint is dependent on the data representation, and you should probably adapt references/detection/transforms.py for your new keypoint representation

We don't need masks, keypoints because we will only use Faster R-CNN.

1) Prepare annotation.csv file

We will use the annotation.csv file as input. The structure of the file is [filename, minX, maxX, minY, maxY, classname], and the order of the coordinates does not matter. Don't forget that you need the original image for each item as well as the animation.csv file!

How to extract RoI from images is described in detail in another note. If you are curious, please refer to the note.

2) Parse one annotation of image from annotation.csv file

def parse_one_annot(filepath, filename):

# Load image and check position and classname of RoI

# At this time, convert classname to label(integer type). # The reason it starts from 1 is that 0 is set as the label of the background.

    data = pd.read_csv(filepath)

    boxes_array = data[data["filename"] == filename][["minX", "minY", "maxX", "maxY"]].values

    for i in range(len(boxes_array)) :

        minX = boxes_array[i, 0]

        minY = boxes_array[i, 1]

        maxX = boxes_array[i, 2]

        maxY = boxes_array[i, 3]

    classnames = data[data["filename"] == filename][["classname"]]

    classes = []

    for i in range(len(classnames)) :

        if classnames.iloc[i, 0] == 'classname1' : classes.append(1)

        elif classnames.iloc[i, 0] == 'classname2' : classes.append(2)

        elif classnames.iloc[i, 0] == 'classname3' : classes.append(3)

    return boxes_array, classes

3) Define our custom Dataset class

root : path where images are stored

df_path : path where annotation.csv file is stored

class OpenDataset(torch.utils.data.Dataset):

# Class for creating dataset and importing dataset into the Datalader

# Transforms means whether or not the image is preprocessed (left/right transform, etc.)

    def __init__(self, root, df_path, transforms=None):

        self.root = root

        self.transforms = transforms

        self.df = df_path

        names = pd.read_csv(df_path)[['filename']]

        names = names.drop_duplicates()

        self.imgs = list(np.array(names['filename'].tolist()))

    def __getitem__(self, idx):

# Load image and check image information

        img_path = os.path.join(self.root, self.imgs[idx])

        if img_path.split('.')[-1] != 'png' : img_path += '.png'

        img = Image.open(img_path).convert("RGB")

        box_list, classes = parse_one_annot(self.df, self.imgs[idx])

# Convert to format suitable for learning(torch.tensor type)

        boxes = torch.as_tensor(box_list, dtype=torch.float32)

        labels = torch.as_tensor(classes, dtype=torch.int64)

        image_id = torch.tensor([idx])

# area means the area corresponding to RoI

        area_list = [(i[2] - i[0]) * (i[3] - i[1]) for i in box_list]

        areas = torch.as_tensor(area_list, dtype=torch.float32)

# whether the roi is hidden from others

        # 0 if hidden, 1 if not

        iscrowd = torch.zeros((len(boxes),), dtype=torch.int64)

        target = {}

        target["boxes"] = boxes

        target["labels"] = labels

        target["image_id"] = image_id

        target["area"] = areas

        target["iscrowd"] = iscrowd

        if self.transforms is not None:

            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):

        return len(self.imgs)

Now we can instantiate our training and testing data classes and assign them to Dataloader that control how images are loaded during training and testing (batch size etc).

def get_transform(train):

   transforms = []

   # Converts the image, a PIL image, into a PyTorch Tensor

   transforms.append(T.ToTensor())

   if train:

# Transform the image left and right with 50% probability when learning

      transforms.append(T.RandomHorizontalFlip(0.5))

   return T.Compose(transforms)

dataset_train = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=True))

dataset_val = OpenDataset(val_root,'/content/drive/My Drive/test/val.csv', transforms = get_transform(train=False))

# Randomly reorder images in a dataset

torch.manual_seed(1)

indices_train = torch.randperm(len(dataset_train)).tolist()

indices_val = torch.randperm(len(dataset_val)).tolist()

dataset_train = torch.utils.data.Subset(dataset_train, indices_train)

dataset_val = torch.utils.data.Subset(dataset_val, indices_val)

# Define Dataloader

data_loader = torch.utils.data.DataLoader(

    dataset_train, batch_size=4, shuffle=True, num_workers=4,

    collate_fn=utils.collate_fn)

data_loader_val = torch.utils.data.DataLoader(

    dataset_val, batch_size=1, shuffle=False, num_workers=4,

    collate_fn=utils.collate_fn)

print("We have: {} examples, {} are training and {} testing".format(len(dataset_train)+len(dataset_val),

    len(dataset_train), len(dataset_val)))

Unlike the above(define separate datasets for learning and validation), what would you like to divide one dataset for learning and validation?

dataset_train = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=True))

dataset_val = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=False))

# Split dataset for learning and validation

# 40 images of the total are used for validation and the rest for learning.

torch.manual_seed(1)

indices = torch.randperm(len(dataset)).tolist()

dataset_train = torch.utils.data.Subset(dataset_train, indices[:-40])

dataset_test = torch.utils.data.Subset(dataset_test, indices[-40:])

data_loader = torch.utils.data.DataLoader(

    dataset_train, batch_size=4, shuffle=True, num_workers=4,

    collate_fn=utils.collate_fn)

data_loader_val = torch.utils.data.DataLoader(

    dataset_val, batch_size=1, shuffle=False, num_workers=4,

    collate_fn=utils.collate_fn)

print("We have: {} examples, {} are training and {} testing".format(len(dataset_train)+len(dataset_val),

    len(dataset_train), len(dataset_val)))

3. Train the model

1) Download and adjust the model

If you want to start from a model pre-trained on COCO and want to finetune it for your particular classes. Below is a possible way of doing it.

def get_instance_segmentation_model(num_classes):
    # Load a model pre-trained pre-trained on COCO
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

    # Replace the classifier with a new one, that has
    # num_classes which is user-defined
    # Get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # Replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

2) Set up the model

# Proceed with GPU for learning but if GPU is not available, use CPU

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

num_classes = 4 # 3 class (number of classname) + 1 class (background)

model = get_instance_segmentation_model(num_classes)

# Move model to GPU or CPU
model.to(device)

# Construct an optimizer

params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005,

                            momentum=0.9, weight_decay=0.0005)

# Construct a learning rate scheduler

# Learning rate scheduler decreases by 10x every 5 epochs

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,

                                            step_size=5,

                                            gamma=0.1)

In order to do the training we now must write our own for loop over the number of epochs we wish to train on, then call PyTorch’s train_one_epoch function, adjust the learning rate and finally evaluate once per epoch.

num_epochs = 10

for epoch in range(num_epochs):

# Train for 1 epoch and print every 10 iterations

    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)

    # Update learning rate

    lr_scheduler.step()

    # Evaluate on the validation data

    evaluate(model, data_loader_val, device=device)

3) Save the model

torch.save(model.state_dict(), "./model.pth")

4. Test the model

1) Load the model

model = get_instance_segmentation_model(num_classes)
model.load_state_dict(torch.load(PATH))

2) Draw prediction on image

from PIL import ImageDraw

def drawPrediction(img, label_boxes, prediction) :
    image = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
    draw = ImageDraw.Draw(image)

    # Draw prediction on image
    for elem in range(len(label_boxes)):
        draw.rectangle([(label_boxes[elem][0], label_boxes[elem][1]),
        (label_boxes[elem][2], label_boxes[elem][3])], 
        outline ="green", width =3)
    for element in range(len(prediction[0]["boxes"])):
        boxes = prediction[0]["boxes"][element].cpu().numpy()
        score = np.round(prediction[0]["scores"][element].cpu().numpy(),
                            decimals= 4)
        draw.rectangle([(boxes[0], boxes[1]), (boxes[2], boxes[3])], 
        outline ="red", width =3)
        draw.text((boxes[0], boxes[1]), text = str(score))
    return image

3) Test the model

dataset_test = OpenDataset(test_root,'/content/drive/My Drive/test/test.csv', transforms = get_transform(train=False))
for i in range(len(dataset_test)) :
    img, _ = dataset_test[i]
    label_boxes = np.array(dataset_test[i][1]["boxes"])
    # Put the model in evaluation mode
    model.eval()
    with torch.no_grad():
        prediction = model([img.to(device)])
    result = drawPrediction(img, label_boxes, prediction)
    result

Reference : <Link> <Link>

Thanks for your interest in we_lung_u.

If you want to take a look our whole-line code, visit our Github through below link.

[ Github ]

04 All for nothing

Conduct Machine Learning using Torchvision

Course

Service