A transfer learning-based system for grading breast invasive ductal carcinoma

Breast carcinoma is a sort of malignancy that begins in the breast. Breast malignancy cells generally structure a tumour that can routinely be seen on an x-ray or felt like a lump. Despite advances in screening, treatment, and observation that have improved patient endurance rates, breast carcinoma is the most regularly analyzed malignant growth and the subsequent driving reason for malignancy mortality among ladies. Invasive ductal carcinoma is the most boundless breast malignant growth with about 80% of all analyzed cases. It has been found from numerous types of research that artiﬁcial intelligence has tremendous capabilities, which is why it is used in various sectors, especially in the health-care domain. In the initial phase of the medical ﬁeld, mammography is used for diagnosis, and ﬁnding cancer in the case of a dense breast is challenging. The evolution of deep learning and applying the same in the ﬁndings are helpful for earlier tracking and medication. The authors have tried to utilize the deep learning concepts for grading breast invasive ductal carcinoma using Transfer Learning in the present work. The authors have used ﬁve transfer learning approaches here, namely VGG16, VGG19, InceptionReNetV2, DenseNet121, and DenseNet201 with 50 epochs in the Google Colab platform which has a single 12GB NVIDIA Tesla K80 graphical processing unit (GPU) support that can be used up to 12 h continuously. The dataset used for this work can be openly accessed from http://databiox.com. The experimental results that the authors have received regarding the algorithm’s accuracy are as follows: VGG16 with 92.5%, VGG19 with 89.77%, InceptionReNetV2 with 84.46%, DenseNet121 with 92.64%, DenseNet201 with 85.22%. From the experimental results, it is clear that the DenseNet121 gives the maximum accuracy in terms of cancer grading, whereas the InceptionReNetV2 has minimal accuracy.


INTRODUCTION
The human body comprises trillions of cells that ordinarily develop and partition over one's lifetime depending on the situation. At the point when cells are unusual or go downhill, they generally pass on. Cancer initiates when cells turn out badly in this interaction, and cells carry out making new cells and the old or strange ones do not bite the dust when they ought to. As the cancerous cells outgrow control, they can swarm out malignant cells. For some individuals, cancer can be dealt with effectively. More persons than at any other time in recent memory have complete existence after cancer therapy [1]. Breast cancer happens predominantly in ladies; however, men can get breast cancer too. It is imperative to comprehend that most breast cancers are considered non-cancerous. Non-carcinogenic breast tumours are peculiar turns of events, yet they do not spread outside of the breast. They are not dangerous, yet rather a couple of sorts of excellent breast malignancies can assemble a woman's chances of getting breast disease. Any breast lump or change ought to be checked by a doctor to choose whether it is typical or carcinogenic and if it might impact the future malignancy hazard [2]. Breast malignancy is an exceptionally heterogeneous illness established in a hereditary premise, affected by exterior upgrades, and reflected in clinical conduct.
The variety of breast cancer status and the outflow of surface atoms have guided treatment choices for quite a long time; in any case, subtype-explicit treatment frequently yields different reactions because of shifting tumour development and threatening potential. Although the instruments behind breast disease heterogeneity are unknown, accessible proof recommends that considering breast malignancy digestion can give essential experiences into the reasons for these varieties just as suitable focuses for intercession [3]. On top of that, the early precise conclusion of malignant growth assumes a significant part in picking the proper cure and enhances the endurance rate among the patients.
Our main contribution to the work is as follows: • We have designed a model which can classify the patients' level of breast cancer by applying the transfer learning (TL) process, and it is the first of the type carried out with this dataset [4]. • While applying a histopathological microscopy picture as the input for the system, it can predict the level and helps to take contour action before the patient gets affected. This paper is structured as follows: Section 1 will provide a brief overview of cancer followed by breast cancer and how AI has tremendous capability in the diagnosis and prognosis of breast cancer, Section 2 presents the earlier works carried out in this area, Section 3 presents the methods and materials, and approaches used in this work. Section 4 presents the experimental research with the detailed result and discussion; Section 5 concludes the paper.

RELATED WORKS
Deep Multi-Magnification Network (DMMN) is established by the combination of various layers that helps in classification as carcinoma, benign epithelial, background, stroma, necrotic, and adipose. The network structure used in [5] is SegNet, U-Net, with encoder and decoder perspective four DMMN and received outstanding performance that could be used for near future treatment of the patients. The flow of work carried out in [6] is a convolution neural network (CNN) deployed in classification-breast density, asymmetry detection, calcification detection, and mass detection. Researchers in [7] carried out the work with a magnetic resonance imaging (MRI) database, noise reduction followed by breast region segmentation. Over that, K-means utilized lesion segmentation is done to optimize the analysis. Lesions are characterized using parameters like histogram, shape, grey-level co-occurrence, run length, dependence matrix, and adjacent grey tone different matrix. The authors categorized the lesion as either benign or malignant (cancer). Machine learning (ML) algorithms such as support vector machine (SVM), random forest (RF), naive bayes (NB) and k-nearest neighbour (kNN) are used for classification. Lesion classification with fisher score, feature ranking, and selection method yields higher accuracy on 10-fold cross-validation compared to all features. Kaggle dataset about invasive ductal carcinoma (IDC) positive and negative is considered in [8]. Data augmentation (DA) is applied to overcome the data imbalance, followed by applying the AutoML model [9] established by Google over the cloud platform with 80% training and 10 % for validation and testing purpose and part of the data is applied for the held-out test. Art of state indicates that the AutoML shows a higher average accuracy of 91.6%. Reference [10] considered BreaKHis dataset holds histopathological images showing malignant and benign as the class label and further it is subcategorized into four in each main category. ResNet-18, TL structure is taken as the sketch and fine-tuned to receive higher accuracy in magnificent-dependent and independent criteria. Three-fold DA is applied over the 80% training set, followed by optimizing the hyperparameter values. The authors got experimental values of as accuracy 98.42 % in the case of magnificent independence and between 98.08% and 99.25% in a magnificent dependent. IDC dataset with the prevailing TL is considered in [11]. The selective TL process involves pruning the less critical filter on the training before applying the learning algorithms. TL approaches like VGG19, ResNet34, and ResNet50 were used, and authors gained accuracy of 91.25%, 91.80%, and 92.07%, respectively. IDC dataset, along with the CNN and autoencoder model, is utilized. The Ridge regression (RR) approach helps in yielding better features and applying the linear discriminant analysis for the classification [12].
Wisconsin, breast cancer database, is taken into consideration [13]. RF algorithm and randomization of the same yields an extra tree. The decision tree (DT) is the base of these algorithms. Four steps that are involved and producing the accurate classification are identification of input, tracing optimal trees, analysis working on the principle of voting, and terminal decision of malignant or benign. Reference [14] illustrated various applications carried out for tumour detection, grading, and subtype the breast cancer, prediction, and medication for the same with the AI approach. The most common type of breast cancer threatening human life is IDC. Whole slide histopathological images are gathered, and DL algorithms like ResNet50 and DenseNet161 are applied for training purposes. Balanced accuracy is shown higher results in DenseNet161, and the F-Score value is best in ResNet50 [15].
Assessment of malignancy is a cumbersome task. Haematoxylin and eosin-based images showed variation even at the same level of malignancy. Before applying to the SVM classification, features are grouped on nuclei, colour, and texture regions. Malignancy level is average, in situ, benign, and invasive [16]. Work was carried out with two different datasets of a varied number of images with 5-fold cross-validation reaching an accuracy of around 75% and 61% based on the nature of the image.
Haematoxylin and eosin histological image classification and prediction are not optimal for clinical diagnosis. Events were organized to create awareness of breast cancer image-analysis like the grand challenge in breast cancer histology images. Cumulative work of the same with the data preparation, training, and testing phase is followed by result analysis [17]. Most of the teams worked with the DL structure. From the histopathological images, phylogenetic diversity indexes are used for model creation, classifying into average, in situ, benign, and invasive. Classifiers used are multilayer perceptron (MLP), RF, and XGBoost. Content-based image retrieval is applied to achieve results, and unnamed classification is mapped [18].
In [19], 3-D CNN is deployed to design computer-aided detection as a part of the automated breast ultrasound system. 3-D simplified VGG16 and DenseNet and ensemble model are applied. The ensemble exhibits good performance. A thorough analysis of DL utilization in various breast cancer diagnosis and treatment phases is carried out [20]. MR images to map occult invasive disease and core needle biopsy help find ductal carcinoma in situ. Pretrained GoogleNet is used in one approach, ImageNet is used as a feature extractor, and a polynomial kernel SVM classifier is used for learning [21]. The area under the receiver operating characteristics performance measure was used for the analysis. CNN architecture like Xception, Incep-tionV3, InceptionResNetV2, DenseNet121, DenseNet169, and NasNetMobile with 500 epochs worked over magnetic resonance stuff with maximum intensity projections of dynamic contrast. InceptionResNetV2 reveals best performance and interpretation obtained between humans and CNN is the same [22]. DL model used in [23], which is similar to DenseNet. It is comprised of convolution, batch normalization, and pooling layers. One hundred twenty layers pre-trained with 1.28 natural images than ultrasonography images that help categorize sentinel and non-sentinel lymph nodes.
Work carried out in [24] is with 335 MRIs collected from 335 patients. ResNet50 is used to make the automatic feature extraction. The received feature map is fed to a densely connected layer with five neurons of categorization into normal, IDC, malignancy, other benign, and other malignant. Images fall under the type of whole slide breast histopathology, and the ultimate task is detecting and diagnosing relevant regions. This phase slowly removes the outside area from the images. Followed by that, another CNN is applied to make the categorization based on ductal proliferation. The region of interest and classification followed by the visualization provides a sound understanding [25]. The screening was carried out on the breast using mammography to analyze IDC in situ for the cohort group of women [26].
Challenges in breast cancer diagnosis are due to numerous morphological and textual variations. Processing of the same is also challenging due to the computational process involved being hefty. The approach proposed in [27] combines the CNN and the two-level Haar wavelet decomposition. The carried-out work was processed effectively without any degradation from the performance point of view. Also, the TL approach with DA produced an accuracy of around 98% for the considered international conference on image analysis and recognition grand challenge dataset. Input MRI was blurred and not clear to extract the perfect boundary of the severity applied enhancement processing, organization discretion, isolated island elimination, followed by finite-difference time-domain model to precisely map the location of earlystage breast tumour [28]. All the performance measures showed higher values in the case of the hybrid structure [29].
Annotation provides more insight into any images. In the case of medical images, this is a very tough task. Researchers [30] proposed a minor modification to the overall ResNet-18 structure to entertain both segmentation and categorization simultaneously. In [31], an extensive survey is done on the viewpoint of using ML and DL algorithms to classify breast cancer. The authors have given the general architecture for the analysis, application on the cancer perspective using learning models, information about various image modalities, and prominent medical image datasets commonly used pre-processing methods, segmentation techniques, types of classifiers, and regularizations. The detailed discussion provided the findings like the quality of image data, and large-scale data are responsible for the model's accuracy. Researchers [32] worked with breast histopathological datasets to make two and four classes. It works in two modes, namely one patch in one decision and all patches in one decision with the concept of majority voting. The secondmentioned method is overwhelming in performance. Cancer proliferation depends on the mitotic count, and the dataset holds more non-miotic nuclei records that pave the way for class imbalance and, in turn, affect the accuracy measures of a classification model. To overcome this discrepancy 2-phase model that works based on CNN with blue ratio and DA on the successive phase datasets is modified to minimize the class imbalance [33]. For the new patient, breast cancer prediction at the earliest is the need of the hour in the health-related sector because the number of people getting affected keeps increasing due to lifestyle changes. In [34], a CNN-SVM-based hybrid threshold segmentation approach is introduced to distinguish and characterize the brain tumour utilizing the BRATS MRI dataset. Using TL and weakly-supervised learning, the authors developed whole slide image (WSI) classification models for IDC [49]. A DA technique has been added to balance and expand the dataset to properly train the model, significantly improving the predicted outputs of the models. In addition, the models' performances are improved using the TL technique. A stacking method is used for getting better classification and accurate differentiation between cancerous and non-cancerous breast pictures, GRU is chosen as the meta-classifier out of the three base classifiers [50].

System working
The working of the scheme can be seen in Figure 4. For the current work, we have considered the dataset [4]. In [4], a histopathological microscopy picture dataset of 922 pictures In this work, we have considered the 40× magnified images. The decisive objective lens is perfect for observing well subtleties inside an example test. The absolute magnification of a prevailing objective lens linked with a 10× eyepiece is equivalent to 400× magnification, giving a tremendously definite picture of the specimen in the slide [35]. Figures 1 to 3 show the dataset sample.
(Figures 2) Three cancerous cell spotlights are considered and each is allotted a score. The scores are then adjoined to become a figure in the range of 3 to 9 that is applied to get a grade of 1, 2, or 3, which is distinguished on the pathology report.
• Grade 1 or well-differentiated (scores 3-5): The cells are slower emerging and look much like normal tissue. • Grade 2 or moderately differentiated (scores 6-7): The cells are emerging at a speed of and look like cells close to grades 1 and 3. • Grade 3 or poorly differentiated (scores 8-9): The cells seem to be exclusive from distinctive cells and will most likely grow and spread quicker.
Initially, we are taking the data from [4] followed by applying five well-known TL approaches, namely, VGG16, VGG19, InceptionResNetV2, DenseNet121, and DenseNet201. To train the system properly, we have split the dataset into an 80:20 ratio, that is, 80% of the dataset was used to train the TL model, and 20% of the dataset was considered for testing the model. Then we tried to evaluate the performance of each TL model with evaluation parameters, namely loss, accuracy, precision, AUC, and recall. Figure 4 presents the overall system functionality.

TL
TL is the improvement of learning in another assignment through exchanging information from a connected errand that has effectively been learned. The focus of TL is to improvise learning in the objective function by accessing data from the source. There are majorly three basic estimations by which move might improvise learning. First is the whole show doable in the objective function simply using the moved data, before any further learning is done, appeared differently concerning the hidden show of a neglectful trained professional. Second is the proportion of time it takes to get comfortable with the objective function given the moved data appeared differently concerning the proportion of time to take in it without any planning. The third is the last exhibition level attainable in the objective errand contrasted with the last level without move [36]. TL has become a sizeable subfield in DL. It has philosophical advantages since it is a significant part of human learning and practical advantages since it can make DL more productive. As registering power increments and specialists apply DL to more intricate issues, information moves can just turn out to be more attractive. The principal advantages of TL incorporate the saving of assets and further developed effectiveness while new models getting trained. It can likewise assist with training models when just unlabelled datasets are accessible, as most of the models will be pre-trained [48].

VGG16
The VGG16 [37] is quite possibly the most well-known preprepared model for picture characterization. Presented at the renowned ILSVRC 2014 Conference, it was and stays the model to beat even today. Created at the Visual Graphics Group at the University of Oxford, VGG-16 beat the then norm of AlexNet and was immediately embraced by specialists and the business for their picture characterization tasks [38]. VGG16 is having 13 convolution layers, 5 pooling layers, and 3 dense layers and the size is 528 MB with a Top 1 accuracy of 0.713 and Top 5 accuracy of 0.901 with a total of 14,789,955 parameters, 75,267 trainable parameters, and 14,714,688 nontrainable parameters with a depth of 23 [39]. Figure 5 shows the model. The formula for finding the output of convolution layers is presented in Equation (1) Out put = input − ker nel _size + 2 × padd in g stride + 1 (1)

InceptionResNetV2
Recently, very deep CNN has become essential in the advancement of picture acknowledgment execution. The inception model was demonstrated to accomplish well at generally less computational expenses. Inception-v4 is the high-level variant of Inception-v3 and an advanced form of inception modules and residual modules. In inception-v4, batch normalization happens on top of conventional convolutional layers. Because of these possessions, the inception block size has been expanded [41]. Inception-ResNet v2's computational cost is identical to Inception v4, but they have non-identical stems [42]. Incep-tionResNetV2 is having a size of 215 MB with 0.803 as Top 1 accuracy and Top 5 accuracy as 0.953 with a total of 54,451,939 parameters, 115,203 trainable parameters, and 54,336,736 non-trainable parameters with a depth of 572 [39]. In ResNet, the identity mapping is reproduced by a linear projection to increase the channels of shortcuts to match the residual. This agrees for the input c and F(c) to be united as input to the next layer and it is depicted in Figure 7. This Equation (3) is utilized once F (c ) and c have a diverse dimensionality, for instance, 30 × 30 and 35 × 35. This P i term

DenseNet121
DenseNets [43] is the subsequent stage while in transit to continue to build the profundity of deep CNN. Rather than drawing authentic force from incredibly deep or wide designs, DenseNets misuses the network's capability through feature reuse. DenseNets needs lesser parameters than an identical conventional CNN, as there is no compelling reason to learn repetitive feature maps. DenseNets layers are highly restricted (e.g. 12 filters), and they add a little arrangement of new feature maps [44]. DenseNet121 has a 33 MB size with 0.750 as Top 1 accuracy and Top 5 accuracy as 0.923 with a total of 7,188,035 parameters, 150,531 trainable parameters, and 7,037,504 nontrainable parameters with a depth of 121 [39]. Figure 8 illustrates the model. The quantity of filters changes between the DenseBlocks, expanding the elements of the channel. The growth rate (G) helps in summing up the mth layer. It controls the measure of data to be added to each layer as presented in (4)

Densenet201
The Densenet201model is again an outcome of the DenseNet group [43] of the model's depiction to carry out picture classification. The major dissimilarity with the densenet121 model is the size and accuracy. Initially prepared on Torch, the creators changed over them into Caffe* design. Every DenseNet model has been pre-prepared on the ImageNet picture dataset [45]. Figure 9 provides the model insight.

RESULT AND DISCUSSION
3.8.1 Performance evaluation

Accuracy
This term tells us how many correct classifications were made out of all the classifications:

Recall
It is essentially the ratio of true positives to all the positives in the ground truth.

Loss
This function is used to evaluate a candidate solution as the objective function [46].

AUC
It is the proportion of the capacity of a classifier to recognize classes and is utilized as a synopsis of the ROC graph. The higher the AUC, the better the presentation of the model at recognizing the positive and negative classes [47].

Analysis
In the present work, we have used five TL approaches namely VGG16, VGG19, InceptionReNetV2, DenseNet121, and DenseNet201 with 50 epochs in the Google Colab platform which has a single 12GB NVIDIA Tesla K80 graphical processing unit (GPU) support that can be used up to 12 h continuously. It has the special feature of executing in the GPU mode, but various factors like network speed and the size of the dataset all possess impact on the completion of the process. Training accuracy for the various epoch over the TL is plotted in Figure 10. Precision for the various epoch over the TL is plotted in Figure 11. Up to the 20th epoch, all the TL are mapping with similar precision. From the 30th epoch, VGG16 has a steeply increased inconsistent manner.
Around the 50th epoch, InceptionResNetV2 is a lesser value as compared to other approaches. Towards the end, the VGG16 is showing higher precision.
Recall for the various epoch over the TL are plotted in Figure 12. Up to the 20th epoch, all the TL are mapping with a similar recall.
In the 30th epoch, DenseNet121 falls vertically down, and others are flowing in a similar pattern. However, in the 40th epoch, suddenly DenseNet201 is dropping more.
Around the 50th epoch except for InceptionResNetV2 and DenseNet201, all are achieving good recall values. In comparison, DensetNet121 performed consistently throughout the flow. For analyzing the BCG system, DenseNet121 suits more based on recall perspective too.
Training loss for the various epoch over the TL is plotted in Figure 13. Up to the 20th epoch, all the TL are mapping with a similar loss tendency.
Henceforth, we can visualize a gradual and steady decline in the loss values. Lesser the loss values better the model. In the case of the 30th epoch, InceptionResNetV2 is showing a higher loss, and it is indifferent to revealing it as a good model. However, in the 40th epoch, suddenly DenseNet201 showed a higher loss and then increased. Around the 50th epoch, except for DenseNet201 and InceptionResNetV2, all are producing a minimal loss.
In comparison, DensetNet121 graphed consistently in decreasing order throughout the flow and showed a minimal loss. For analyzing the BCG system, DenseNet121 suits based on a minimal loss perspective too.
AUC for the various epoch over the TL is plotted in Figure 14. Up to the 20th epoch except InceptionResNetV2 other TL are mapping with a similar AUC.
However, in the 40th epoch, suddenly VGG19 and VGG16 dropped and then raised. Around the 50th epoch, except for DenseNet201, all are achieving good AUC.
In comparison, VGG16 performed consistently throughout the flow and had better AUC. Higher the AUC better the system. For analyzing the BCG system, except DenseNet201, others are showing a positive response.
DenseNet121 is exhibiting good measured in all the ways and the confusion matrix for the same is mentioned in Table 1. It is performing admirably in every measure when seen from the perspective of the full analysis.

CONCLUSION
There are three types of grading for the breast cancer IDC namely Grade 1 (low), Grade 2 (moderate), and Grade 3 (high). The cancerous cells seem to be like normal cells (can be easily figured out). They, as a rule, develop gradually. Grade 1 malignancy cells are more averse to spread. The Grade 2 malignant growth cells look stranger and become somewhat quicker than grade 1 cells. The Grade 3 cancerous cells appear to be unique from normal cells (cannot be easily found). They may develop more rapidly than grade 1 or 2 cells. Here in the present work, we have utilized DL concepts for grading breast IDC using TL. The BCG grading accuracy we have received by the TL approach in terms of accuracy is as follows: DenseNet121 with 92.64% < VGG16 with 92.5% < VGG19 with 89.77% < DenseNet201 with 85.22% < Inception-ReNetV2 with 84.46%. From the entire perspective of analysis, DenseNet121 is showing remarkable performance in all the measures. Accuracy is more; the loss is minimal, and high recall illustrates a low false-negative rate. Higher scores on both precision and recall show the system is accurate in addition to higher positive outcomes. In future work, we will try to optimize the process with bio-inspired methods for better results. We also planned to consider all the historical data of the patient such that it will help to give more inferences.

CONFLICT OF INTEREST
There are no conflicts of interest among the authors.

FUNDING INFORMATION
None.

DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.