Large-Scale Computations on Histology Images Reveal Grade-Differentiating
Parameters for Breast Cancer
Abstract -
Tumor classification, now based on morphology evaluation, is
inexact, largely dependent on the qualitative pathological
examination of the images of the tumor tissue slides. In this
study, we have developed an automated image processing method to
detect and identify the clinically relevant microscopic
structures that are observed on histology images. The
microstructure elements identified in our study on the
Haematoxylin and Eosin (H&E) stained slides include adipose
tissue, extracellular matrix, three morphologically distinct cell
nuclei types used in grading cancer, cross sections of breast
ducts, and the tubular organization of cells around them. The
image processing adopted in our analysis is based on gray-scale
segmentation, morphological operations, feature extraction,
supervised learning, and subsequent training and clustering. The
automated processing system developed has an accuracy of 89% ±0.8
in correctly identifying the three different nuclei types
observed in H & E stained histology slides. Computations for
each histology slide image identifies the spatial positions of
hundred of thousands cell nuclei and thousands of tubular
sections and subsequently allows the evaluation from the image
objective local microstructure features such as closest distance
between cell nuclei and type of neighbor cell nuclei. Histology
image classification applying a series of clustering techniques
in the extracted feature vectors identified the number density of
breast duct cross-sections and the number density of cell nuclei
with dispersed chromatin as important tumor grade differentiating
features. The predictive value of the image processing and
classification scheme presented here is expected to increase when
data from multiple platforms - global expression profiles,
chromosome aberrations and global methylation scan - are combined
to complement the histology image data.
Index
Terms - Automated identification, breast cancer, cell nuclei morphology, supervised learning.
The tissue components identified are:
NM_1(Nucleus Morphology): the nuclei of
inflammatory cells, which include lymphocytes,
NM_2: nuclei of cells of epithelial origin having
nearly uniform chromatin distribution. These nuclei are significantly
larger than the NM_1,
NM_3: nuclei of cancer cells with non–uniform
chromatin distribution, usually large in size and weak in
intensity,
ECM (ExtraCellularMatrix): The collagen based support structure supporting the cells in the stroma,
AT (Adipose Tissue): areas representing water, carbohydrate, lipid or gas.

Identified Tissue Microstructures


Designed Graphical User Interface
Original H&E Image (left), Resulted Image (right)

Scatter plots for the classification of histology slides and their section images using NM3 and DT as
feature parameters. A) Classification of histology slides. B) Clustering section-images from all the histology slides used in this study