View on GitHub

Pan-Cancer-Nuclei-Instance-Segmentation

This project is about Instance Segmentation of 5 Different Cancer Nuclei Across 19 Unique Tissue Types

Instance Segmentation of Different Types of Cancer Cells.

About the Project

Dataset

The dataset that I used is known as PanNuke (Pan-Cancer Nuclei), it contains semi automatically generated nuclei instance segmentation and classification images with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources.

In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask. Models trained on PanNuke can aid in whole slide image tissue type segmentation, and generalize to new tissues.

As can be seen from the image above, the 19 tissue names and their corresponding distribution to the whole dataset are shown below:

There are five nuclei types for each tissue type in PanNuke dataset, these are:

neoplastic
inflammatory
softtissue
dead
epithelial

The part-1 of the dataset that I used is hosted in kaggle, it is available here. Also, the notebook that I used to analyze the dataset is available here.

Model

For this project, I used two different architectures for instance segmentation: 1. Hover-Net and 2. Mask RCNN. Hover-Net is designed for computational pathology and therefore the appropriate architecture to be used in PanNuke dataset. Mask RCNN on the other hand, is a general purpose architechture. It can be used in any type of instance segmentation dataset for as long as the annotations are converted in COCO format.

Hover-Net Training/Validation

To use HoverNet as our main architecture, it is needed to convert the dataset into the format HoverNet can process. Below is the code block that was used to convert the PanNuke dataset to format (items, img_size, img_size, indices), where the indices 0:3 are the RGB codes, 3 is the instance and 4 is the class type. The notebook used for HoverNet format data conversion can be viewed here

# This function is modified from https://github.com/meszlili96/PanNukeChallenge.git
def transform(images, masks, path, out_dir, start, finish):

  fold_path = out_dir+path
  try:
      os.mkdir(fold_path)
  except FileExistsError:
      pass
    
  start = int(images.shape[0]*start)
  finish = int(images.shape[0]*finish)
    
  for i in tqdm(range(start, finish)):
      np_file = np.zeros((256,256,5), dtype='int16')

      # add rgb channels to array
      img_int = np.array(images[i],np.int16)
      for j in range(3):
          np_file[:,:,j] = img_int[:,:,j]

      # convert inst and type format for mask
      msk = masks[i]

      inst = np.zeros((256,256))
      for j in range(5):
          #copy value from new array if value is not equal 0
          inst = np.where(msk[:,:,j] != 0, msk[:,:,j], inst)
      map_inst(inst)

      types = np.zeros((256,256))
      for j in range(5):
          # write type index if mask is not equal 0 and value is still 0
          types = np.where((msk[:,:,j] != 0) & (types == 0), j+1, types)

      # add padded inst and types to array
      np_file[:,:,3] = inst
      np_file[:,:,4] = types

      np.save(fold_path + '/' + '%d.npy' % (i), np_file)
  

After conversion, we proceed to training of the model. We used the official implementation of Hover-Net available in github. The usage is easy since the repository is properly documented, I just have to change the configuration to match my use of PanNuke (e.g. epoch, directories, etc.). Below is the validation results in Hover-Net.

------valid-np_acc    : 0.94211
------valid-np_dice   : 0.83099
------valid-tp_dice_0 : 0.96235
------valid-tp_dice_1 : 0.63813
------valid-tp_dice_2 : 0.55646
------valid-tp_dice_3 : 0.46966
------valid-tp_dice_4 : 0.00026
------valid-tp_dice_5 : 0.06760
------valid-hv_mse    : 0.04192

1056.png	1037.png	105.png

#color representation
RED: neoplastic
GREEN: inflammatory
BLUE: softtissue
BROWN: dead
ORANGE: epithelial

Mask RCNN Training/Validation

For Mask RCNN architecture, we first convert the dataset into COCO format. COCO format is commonly used in instance segmentation datasets so it is not difficult to convert PanNuke into this format. Basically, COCO format is saved in a json file as a dictionary of images and annotations. The basic structure of the annotations dictionary is shown below.

# annotations section
{
  "segmentation":
  [[
    239.97,
    260.24,
    222.04,
    …
  ]],
  "area": 2765.1486500000005,
  "iscrowd": 0,
  "image_id": 558840,
  "bbox":
  [
    199.84,
    200.46,
    77.71,
    70.88
  ],
  "category_id": 58,
  "id": 156
}

The data conversion code from PanNuke to COCO is shown below. The notebook I used in the conversion is available here. It automatically saves the conversion in the json file.

# code from https://github.com/nolancardozo13/pathology_maskrcnn.git
def pannuke_to_coco_format(image_path, 
                         output_path, 
                         categories = ["neoplastic","inflammatory","softtissue","dead","epithelial"] , 
                         dataset_name = "pannuke"):
  '''
  this function converts the pannuke dataset format to the coco format which makes it easier to apply detectron 
  2 algorithms on.
  '''
  images_name = os.listdir(image_path)
  cocoformat = {"licenses":[], "info":[], "images":[], "annotations":[], "categories":[]}
    
  for i in range(len(categories)):
      cocoformat["categories"].append({"id": int(i+1), "name": categories[i], "supercategory": dataset_name})
    
  m_id = 1
    
  for i, img in tqdm(enumerate(images_name)):
        
      image = Image.open(image_path + img + "/images/" + img + ".jpg")
      image_info = pycococreatortools.create_image_info(int(i+1), 
                                                        img + ".jpg" , 
                                                        image.size)
        
      cocoformat["images"].append(image_info)
      c_types = os.listdir(image_path + img + "/masks/")
        
      for c in c_types:
          masks = os.listdir(image_path + img + "/masks/"+c)
          for msk in masks:
              category_info = {'id': int(categories.index(c)+1), 'is_crowd': False}
              m_image = np.asarray(Image.open(image_path + img + "/masks/"+c+"/"+ msk).convert('1')).astype(np.uint8)
              annotation_info = pycococreatortools.create_annotation_info(
                  m_id, int(i+1), category_info, m_image,
                  image.size, tolerance=2)
              m_id = m_id + 1
                
              if annotation_info is not None:
                  cocoformat["annotations"].append(annotation_info) 
                    
      time.sleep(0.2)
  with open(output_path, "w") as f:
      json.dump(cocoformat, f)

The training is done with the toolbox MMDetection. Configuration as usual, is changed to match the PanNuke dataset training. The notebook used in the training can be viewed here. The results are shown below.

2022-05-08 22:54:52,456 - mmdet - INFO - 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.150
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.304
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.134
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.142
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.300

1056.png	1037.png	105.png

Usage

You don’t need to install anything to run the project. The notebooks I used are all hosted in kaggle and are attached in this repository. If you want to run or view the codes, here are the links to them:

PanNuke Data Analysis: Pan-Cancer Nuclei Data Analysis.
PanNuke to Hover-Net conversion: Pan-Cancer Nuclei Data Conversion - #1
PanNuke to COCO conversion: Pan-Cancer Nuclei Data Conversion - #2.
Hover-Net Training: Pan-Cancer Nuclei Instance Segmentation - #1.
Mask RCNN Training: Pan-Cancer Nuclei Instance Segmentation - #2.
Inference: Pan-Cancer Nuclei Inference.
Output Usage: Pan-Cancer Nuclei Usage

License

This repo is licensed under Apache License