written by hyojung chang
We will conduct machine learning experiment through Faster R-CNN library provided by Torchvision. To speed up time of train and evaluate, we use Colab's GPU.
1. Set up the Colab environment
1) First, we need to enable GPUs for the notebook
Navigate to Edit → Notebook Settings Select GPU from the Hardware Accelerator drop-down |
2) Download some requirment for torchvision
%%shell pip install cython # Install pycocotools, the version by default in Colab pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' |
%%shell # Download TorchVision repo to use some files from # references/detection git clone https://github.com/pytorch/vision.git cd vision git checkout v0.3.0 cp references/detection/utils.py ../ cp references/detection/transforms.py ../ cp references/detection/coco_eval.py ../ cp references/detection/engine.py ../ cp references/detection/coco_utils.py ../ |
Now, you can do all necessary imports
import os import numpy as np import torch import torch.utils.data from PIL import Image import pandas as pdimport torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor from engine import train_one_epoch, evaluate import utils import transforms as T |
When using PyTorch, a bug related to numpy may occur. To prevent this, downgrade the installed version of numpy.
pip install numpy==1.17.4 |
3) Mount google drive to Colab
from google.colab import drive drive.mount('/content/drive') |
When we do this, our current directory becomes '/content/drive/My Drive/'.
2. Define the custom Dataset
Dataset structure required by Torchvision :
- boxes (FloatTensor[N, 4]): the coordinates of the N bounding boxes in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H - labels (Int64Tensor[N]): the label for each bounding box. 0
represents always the background class. - image_id (Int64Tensor[1]): an image identifier. It should be
unique between all the images in the dataset, and is used during evaluation - area (Tensor[N]): The area of the bounding box. This is used
during evaluation with the COCO metric, to separate the metric scores between
small, medium and large boxes. - iscrowd (UInt8Tensor[N]): instances with iscrowd=True will be
ignored during evaluation. - (optionally) masks (UInt8Tensor[N, H, W]): The segmentation
masks for each one of the objects - (optionally) keypoints (FloatTensor[N, K, 3]): For each one
of the N objects, it contains the K keypoints in [x, y, visibility] format,
defining the object. visibility=0 means that the keypoint is not visible. Note
that for data augmentation, the notion of flipping a keypoint is dependent on
the data representation, and you should probably adapt
references/detection/transforms.py for your new keypoint representation |
We don't need masks, keypoints because we will only use Faster R-CNN.
1) Prepare annotation.csv file
We will use the annotation.csv file as input. The structure of the file is [filename, minX, maxX, minY, maxY, classname], and the order of the coordinates does not matter. Don't forget that you need the original image for each item as well as the animation.csv file!
How to extract RoI from images is described in detail in another note. If you are curious, please refer to the note.
2) Parse one annotation of image from annotation.csv file
def parse_one_annot(filepath, filename): # Load image and check position and classname of RoI # At this time, convert classname to label(integer type).
# The reason it starts from 1 is that 0 is set as the label of the background.
data = pd.read_csv(filepath) boxes_array = data[data["filename"] == filename][["minX", "minY", "maxX", "maxY"]].values for i in range(len(boxes_array)) : minX = boxes_array[i, 0] minY = boxes_array[i, 1] maxX = boxes_array[i, 2] maxY = boxes_array[i, 3] classnames = data[data["filename"] == filename][["classname"]] classes = [] for i in range(len(classnames)) : if classnames.iloc[i, 0] == 'classname1' : classes.append(1) elif classnames.iloc[i, 0] == 'classname2' : classes.append(2) elif classnames.iloc[i, 0] == 'classname3' : classes.append(3) return boxes_array, classes |
3) Define our custom Dataset class
root : path where images are stored
df_path : path where annotation.csv file is stored
class OpenDataset(torch.utils.data.Dataset): # Class for creating dataset and importing dataset into the Datalader # Transforms means whether or not the image is preprocessed (left/right transform, etc.) def __init__(self, root, df_path, transforms=None): self.root = root self.transforms = transforms self.df = df_path names = pd.read_csv(df_path)[['filename']] names = names.drop_duplicates() self.imgs = list(np.array(names['filename'].tolist())) def __getitem__(self, idx): # Load image and check image information img_path = os.path.join(self.root, self.imgs[idx]) if img_path.split('.')[-1] != 'png' : img_path += '.png' img = Image.open(img_path).convert("RGB") box_list, classes = parse_one_annot(self.df, self.imgs[idx]) # Convert to format suitable for learning(torch.tensor type) boxes = torch.as_tensor(box_list, dtype=torch.float32) labels = torch.as_tensor(classes, dtype=torch.int64) image_id = torch.tensor([idx]) # area means the area corresponding to RoI area_list = [(i[2] - i[0]) * (i[3] - i[1]) for i in box_list] areas = torch.as_tensor(area_list, dtype=torch.float32) # whether the roi is hidden from others # 0 if hidden, 1 if not iscrowd = torch.zeros((len(boxes),), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels target["image_id"] = image_id target["area"] = areas target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs) |
Now we can instantiate our training and testing data classes and assign them to Dataloader that control how images are loaded during training and testing (batch size etc).
def get_transform(train): transforms = [] # Converts the image, a PIL image, into a PyTorch Tensor transforms.append(T.ToTensor()) if train: # Transform the image left and right with 50% probability when learning transforms.append(T.RandomHorizontalFlip(0.5)) return T.Compose(transforms) |
dataset_train = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=True)) dataset_val = OpenDataset(val_root,'/content/drive/My Drive/test/val.csv', transforms = get_transform(train=False)) # Randomly reorder images in a dataset torch.manual_seed(1) indices_train = torch.randperm(len(dataset_train)).tolist() indices_val = torch.randperm(len(dataset_val)).tolist() dataset_train = torch.utils.data.Subset(dataset_train, indices_train) dataset_val = torch.utils.data.Subset(dataset_val, indices_val) # Define Dataloader data_loader = torch.utils.data.DataLoader( dataset_train, batch_size=4, shuffle=True, num_workers=4, collate_fn=utils.collate_fn) data_loader_val = torch.utils.data.DataLoader( dataset_val, batch_size=1, shuffle=False, num_workers=4, collate_fn=utils.collate_fn) print("We have: {} examples, {} are training and {} testing".format(len(dataset_train)+len(dataset_val), len(dataset_train), len(dataset_val))) |
Unlike the above(define separate datasets for learning and validation), what would you like to divide one dataset for learning and validation?
dataset_train = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=True)) dataset_val = OpenDataset(train_root,'/content/drive/My Drive/test/train.csv', transforms = get_transform(train=False)) # Split dataset for learning and validation # 40 images of the total are used for validation and the rest for learning. torch.manual_seed(1) indices = torch.randperm(len(dataset)).tolist() dataset_train = torch.utils.data.Subset(dataset_train, indices[:-40]) dataset_test = torch.utils.data.Subset(dataset_test, indices[-40:]) data_loader = torch.utils.data.DataLoader( dataset_train, batch_size=4, shuffle=True, num_workers=4, collate_fn=utils.collate_fn) data_loader_val = torch.utils.data.DataLoader( dataset_val, batch_size=1, shuffle=False, num_workers=4, collate_fn=utils.collate_fn) print("We have: {} examples, {} are training and {} testing".format(len(dataset_train)+len(dataset_val), len(dataset_train), len(dataset_val))) |
3. Train the model
1) Download and adjust the model
If you want to start from a model pre-trained on COCO and want to finetune it for your particular classes. Below is a possible way of doing it.
def get_instance_segmentation_model(num_classes): # Load a model pre-trained pre-trained on COCO model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) # Replace the classifier with a new one, that has # num_classes which is user-defined # Get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # Replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) |
2) Set up the model
# Proceed with GPU for learning but if GPU is not available, use CPU device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') num_classes = 4 # 3 class (number of classname) + 1 class (background)model = get_instance_segmentation_model(num_classes) # Move model to GPU or CPU model.to(device) # Construct an optimizer params = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) # Construct a learning rate scheduler # Learning rate scheduler decreases by 10x every 5 epochs lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) |
In order to do the training we now must write our own for loop over the number of epochs we wish to train on, then call PyTorch’s train_one_epoch function, adjust the learning rate and finally evaluate once per epoch.
num_epochs = 10 for epoch in range(num_epochs): # Train for 1 epoch and print every 10 iterations train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10) # Update learning rate lr_scheduler.step() # Evaluate on the validation data evaluate(model, data_loader_val, device=device) |
3) Save the model
torch.save(model.state_dict(), "./model.pth") |
4. Test the model
1) Load the model
model = get_instance_segmentation_model(num_classes) model.load_state_dict(torch.load(PATH)) |
2) Draw prediction on image
from PIL import ImageDraw def drawPrediction(img, label_boxes, prediction) : image = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy()) draw = ImageDraw.Draw(image) # Draw prediction on image for elem in range(len(label_boxes)): draw.rectangle([(label_boxes[elem][0], label_boxes[elem][1]), (label_boxes[elem][2], label_boxes[elem][3])], outline ="green", width =3) for element in range(len(prediction[0]["boxes"])): boxes = prediction[0]["boxes"][element].cpu().numpy() score = np.round(prediction[0]["scores"][element].cpu().numpy(), decimals= 4) draw.rectangle([(boxes[0], boxes[1]), (boxes[2], boxes[3])], outline ="red", width =3) draw.text((boxes[0], boxes[1]), text = str(score)) return image |
3) Test the model
dataset_test = OpenDataset(test_root,'/content/drive/My Drive/test/test.csv', transforms = get_transform(train=False)) for i in range(len(dataset_test)) : img, _ = dataset_test[i] label_boxes = np.array(dataset_test[i][1]["boxes"]) # Put the model in evaluation mode model.eval() with torch.no_grad(): prediction = model([img.to(device)]) result = drawPrediction(img, label_boxes, prediction) result |