Biopsy quality is essential for preoperative prognostication in oral tongue cancer

A role for incisional biopsy in preoperative prognostication is increasingly being advocated in oral tongue squamous cell carcinomas (OTSCC). Biopsies at two locations were compared, and prognostic factors in biopsies and their corresponding resections were evaluated. A total of 138 OTSCC biopsy slides from Finland and Saudi Arabia were compared for size (horizontal and vertical) and invasive front. The Finnish cases were assessed for tumor stroma ratio (TSR) and tumor‐infiltrating lymphocytes (TILs) using light microscopy and digital image analysis assessment and compared. Furthermore, TSR, TILs, and previously analyzed budding and depth of invasion (BD) score in biopsies were compared with their evaluation in the corresponding resections. Fifty‐nine percent of Finnish and 42% of Saudi Arabian biopsies were ≥ 5 mm deep, while 98% of Saudi Arabian and 76% of Finnish biopsies were ≥ 5 mm wide. Assessment of invasion front was possible in 72% of Finnish in comparison with 40% of Saudi Arabian biopsies. There was 86.8% agreement between TSR and 75% agreement between TIL evaluation using light microscopy and digital assessment. Significant agreement was obtained on comparing the TSR (p = 0.04) and BD (p < 0.001) values in biopsies and resections. Biopsies of ≥ 5 mm depth from representative OTSCC areas are essential for prognostic information. Clinical pathologists are advised to assess BD score and TSR for prognostic features in such biopsies.


INTRODUCTION
Squamous cell carcinoma of the oral (mobile) tongue (OTSCC) contributes the highest number of cases of oral squamous cell carcinoma (OSCC) and is associated with the lowest relative survival of the different subsites within the oral cavity (1,2).
Incisional biopsy remains an indispensable diagnostic tool for oral cancers. Cancer diagnosis is mandatorily followed by preoperative staging (usually by clinical and imaging modalities) in order to determine the risk category and the most appropriate treatment plan for the patient. The need for extending the role of the biopsy from being only diagnostic to also include prognostication has recently been highlighted by some investigators (3,4). In OSCC (and OTSCC), tumor grade of differentiation has not been found to be particularly useful in prognostication despite being the only universal routinely assessed histological feature in preoperative biopsies by pathologists (5)(6)(7).
In resection samples, several tumor-related histological features such as depth of invasion (DOI), worst pattern of invasion (WPOI), tumor budding, budding and depth of invasion (BD) score, and some stromal features such as tumor stroma ratio (TSR) and tumor-infiltrating lymphocytes (TILs) have been identified in several studies as being of significant prognostic potential in OTSCC (8)(9)(10)(11). Notably, the latest AJCC/UICC TNM classification of head and neck tumors has significantly redefined the T staging, making DOI an integral part of the pathologic TNM classification with every 5 mm depth increment automatically upstaging the T stage to a higher one (T1 to T2 À5 mm; T2 to T3 À10 mm). Therefore, DOI is now as equally important as primary tumor horizontal dimension in the staging of T1-T3 tumors (12).
In order to minimize cases of undertreatment or unnecessary escalation in management, particularly regarding clinically negative neck (cN0) in early stage lesions, deeper biopsies are needed. They are necessary to explore as many histological prognostic details as possible before decisions are made regarding treatment. The assessment of several tumor and stromal prognostic features from an adequate biopsy coupled with adequate tumor imaging will result in a better characterized lesion and strongly enhance the probability that the best management method will be planned up-front (3).
This study sets its objective toward answering three questions: (i) Do OTSCC biopsies obtained from two different locations (Finland and Saudi Arabia) show similarities in width and depth? (ii) In biopsies with optimal dimensions, can stromal features (namely TSR and TILs) be similarly assessed using either light microscopy or digital assessment (by machine learning with whole slide imaging, WSI) and how do the two methods agree with each other? (iii) How do these stromal features, as well as our previously published budding (B) and depth of invasion (D) and (BD) scores from the same biopsies compare with those obtained from the corresponding resection specimens?

METHODS
The study was approved by the local institutional review boards of the three participating institutions: Universities of Helsinki, Eastern Finland and King Saud, Saudi Arabia.
Histological slides of biopsies of OTSCC cases were randomly retrieved from the archives of two Finnish University hospitals; Helsinki (HUH) and Kuopio (KUH), and King Saud University Oral Histopathology Laboratory (KSU-OHL) in Saudi Arabia. Ninety-three cases were obtained from Finland (HUH, 74 cases and KUH, 19 cases), while KSU-OHL contributed 45 cases, making 138 cases in total available for comparison. The inclusion criteria were the histological diagnosis of SCC of the mobile tongue (irrespective of clinical staging) and the presence of tumor tissue in the patient slides.

Biopsy dimensions measurement
The widest biopsy dimensions were measured in horizontal and vertical planes. A biopsy of at least 5 9 5 mm after tissue processing was considered optimal, including biopsies with tumor extending beyond their margins (Fig. 1).

Cancer tissue analyses
The DOI and carcinoma invasive front were evaluated, where possible, for all cases. DOI was defined as a vertical line dropped from just underneath the epithelium (or its representation) to the end of the most invasive part of the main tumor. Invasive front was taken at the advancing front (deepest point of invasion) of the tumor in the connective tissue. Comparisons were made between the Finnish and Saudi Arabian cases. In addition, previously published data of the BD score of the same biopsies and corresponding resections by our group (for the Finnish cases) were retrieved and included (4).

Stromal features
For the Finnish cases, two stromal features, TSR and TILs, were evaluated by light microscope and then digitally using WSI as described below (Fig. 2). The technique for assessing TSR has been previously well described (10,13): The area having the highest amount of stroma is selected with tumor islands present at all borders under low magnification (94) and, thereafter, scored at higher magnification (910). The percentage of stroma relative to carcinoma tissue was then evaluated as the TSR score. TILs were evaluated at the invasive front as also previously described: percentage of stroma occupied by lymphocytes. Scanning was done at low magnification (94), and the TSR was assessed under high magnification (920) (11). Based on previous studies, TSR and TILs were divided into high and low categories using 50% and 20% as the thresholds, respectively (10,11). Light microscopic and digital assessment measurements were compared. The process was also done for resection slides using light microscopy only. TSR and TILs values from the biopsies were then compared with the corresponding values from the resection sections. Only 42 resection histological slides were available corresponding to the 53 biopsies that were initially analyzed.

Digital assessment of stromal features
For digital computational assessment, the slides were scanned digitally (WSI) and analyzed using QuPath (14).
QuPath is an open-source machine learning-based software for digital pathology image analysis. QuPath's ability to classify cell types within each tissue was applied to distinguish between cancer cells and stromal TILs (Fig. 2). The performance of a section image analysis was measured by applying cell segmentation to entire tumor images using the cell detection algorithm in QuPath with default settings. Tumor areas and stromal TILs were separated from the stroma by manually drawing lines with the computer mouse on the images. Choice of assessment areas was made using the same principles as used with light microscopy. Cellular detection settings were selected under consultation with a pathologist and the evaluation carried out by a computer scientist (image analyst).

Statistical analysis
Frequency tables were made for comparison of biopsy quality between Finnish and Saudi Arabian cases. Chisquare statistics or Fisher's exact test was used for evaluating statistical significance. Additionally, mean age of the patients was compared using independent samples t-test. Cross-tabulation of light microscopy versus digital assessment of TSR and TILs was performed. Agreement between the measurements was evaluated using percentage agreement, Kappa, and AC1 statistics (15). Accuracy, sensitivity, and specificity were calculated between stromal features in biopsies and resections, and chi-square test was used for evaluation of significance. Analyses were done using IBM SPSS version 23 and Matlab R 2016.

Demographic and biopsy characteristics
The Saudi Arabian patients were significantly younger than those of Finland with their mean age about 7 years lower than the latter (p = 0.008, 95% CI: 1.84-11.7). Two-thirds of Saudi Arabian patients were male, while no sex predilection was observed in Finnish patients (Table 1). No significant difference was observed regarding the WHO histological grading between the two countries.

Biopsy dimensions and invasive front
The vertical dimension of the biopsies was adequate (≥5 mm) in 74 cases (54%) and < 5 mm in 64 cases (46%) ( Table 1, Fig. 1). Fifty-nine percent of biopsies from Finland and 42% of those from KSA were ≥ 5 mm. Most cases (84%) from both centers were 'optimal' in horizontal dimension. Clinicians from Saudi Arabia appeared to take wider biopsies (≥5 mm) than those from Finland (98% KSA vs 76% Finnish). Regarding the invasive front, it was microscopically demonstrable in 62% of all the biopsy samples and was more clearly identified in Finnish biopsies than those from Saudi Arabia (72% Finnish vs 40% KSA). The relationship of the invasive front and the DOI is shown in Fig. 3. The optimality of a biopsy did not automatically translate to having assessable DOI. Some biopsies with depth < 5 mm had superficially invasive cancer from which the invasive front and stromal features could be adequately analyzed. Many others with vertical length > 5 mm did not have DOI or invasive front that could be assessed owing to tumor tissue extending beyond biopsy margins (Fig. 3).

Stromal features
Of the 93 tumors assessed for TSR and TILs, only 53 biopsies were suitable for assessment by light microscopy. One of these cases could not be analyzed digitally making only 52 cases available for the latter (Fig. 2). Agreement between light microscopy and digital measurements was compared using Cohen's kappa coefficient and was substantial for TSR (kappa value, 0.70) than TIL (kappa value, 0.52). The AC1 values were better or comparable to kappa's. The strength of the percentage agreement (87%) and (75%), respectively, illustrates substantial agreement between both methods ( Table 2). A notable issue with digital analysis is the inclusion of unwanted cells (e.g., degenerated muscle cells as tumor cells and lymphocytes as tumor cells or some tumor cell nuclei as lymphocytes) or exclusion of relevant cells (e.g., some lymphocytes in TILs evaluation (Fig. 2).

Comparison of stromal features in biopsies and resection specimens
For the 42 cases that simultaneously had both their biopsies and their resection specimens available, good level of agreement was observed in the case of TSR in biopsies when compared to resections (71% accuracy, p = 0.04). For TIL, there was low agreement between biopsies and resections (64% accuracy, p = 0.116) ( Table 3).

Combination of carcinoma and stromal features in biopsies and resections
Using the same biopsy and resection samples, we have previously shown that BD score (which combines the depth of invasion and the budding of the cancer cells) has a good agreement between adequate biopsies and their resections (83%, p < 0.001) (4). As shown in Table 4, this provides strong evidence for advising clinical pathologists to evaluate BD score (4) and TSR (present study) in biopsies.

DISCUSSION
The primary role of OSCC biopsy is diagnostic, but recently, there is increasing advocacy for a prognostic role for it (3,4,16,17). Only optimal (at least ≥ 5 mm deep after tissue processing) biopsies can serve this role. Biopsies close to 10 mm in depth have been suggested to accurately predict the true DOI in at least 80% of OSCC patients, although surgeons consistently take biopsies < 5 mm in depth irrespective of their experience, tumor accessibility, and size (3). Despite good accessibility, lack of obstructive structures (e.g., bone), and minimal risk of damage to important structures, the mean depth of OTSCC biopsies was still < 5 mm (3). Many biopsies in the present study were sufficiently wide but relatively shallow, especially those from KSA. DOI is now a clearly defined parameter as part of the pathologic T staging of OSCC by AJCC/UICC. Clinicians have even advocated estimating DOI in OSCC using bimanual palpation for clinical staging (18). Stromal features like TSR and TILs are better assessed in deeper biopsies where the invasive front including cancer   cells growth pattern may be more clearly identified. A shallow biopsy may only be useful for the estimation of these features in superficially invasive tumors (Fig. 3).
In this study, only 62% of biopsies had determinable invasive front. There was a significant contrast between the Finnish and Saudi Arabian biopsies in relation to detecting the invasive front with far more of invasive front observed in the Finnish samples. Since the clinical staging of the KSA tumors was not available for this study, it is a plausible suggestion that the KSA tumors may comprise more advanced cases in which the invasive front is more difficult to obtain. Interestingly, it is noteworthy that most of the Finish tumors (73%) were low-stage. The value of imaging methods in evaluating the DOI has been reported. However, it is possible to overestimate the DOI using imaging methods by as much as 3 mm when compared with anatomic pathological assessment (19,20). A combination of clinical estimation and preoperative imaging evaluation of DOI supplemented by further biopsy evaluation will greatly increase the accuracy of the tumor staging. Notably, however, in some cases, it will still not be possible to measure DOI from biopsies due to problems like tissue fragmentation, distorted plane of sectioning, and samples with only epithelial components and no connective tissue interface. Even in resection samples, technical difficulties that may hamper the precise measuring of DOI have been highlighted in some studies (21,22). Dhanda et al. (3) have suggested that standardization of all oral cancer biopsies using the punch technique will more likely increase the accuracy of the assessment of stromal to epithelial ratio, in addition to being a more proficient technique than the scalpel biopsy. All the biopsies assessed here were taken by scalpel blades, mostly in a wedge form with the surface forming the base of a triangle or the larger end of a trapezium. Using this method often leaves a very small area for interface analysis if the biopsy is taken from the central parts of the tumor. To reduce this problem, the general advice is to take the biopsy at the periphery of the lesion, especially in verrucopapillary lesions, alongside an adjacent 'normal' tissue. However, taking more than one specimen may seem preferable, including the most severe (avoiding necrosis), internal regions with the deepest cancer growth (23,24). In practice, since biopsies shrink after fixation, it is advisable to make the depth of the biopsy slightly more than the required 5 mm. Some of these recommendations are illustrated in Fig. 4.
The possibility of assessment of stromal histological prognostic features of OTSCC (e.g., TSR, TILs) offers an intriguing prospect in its preoperative staging and management planning (3). The  prognostic potential of these features in OTSCC has been well-documented (10,11). In this study, 43% of cases included in the evaluation of stromal features were found not to be suitable for such an assessment. A lot more cases would have been found suitable if a biopsy technique that could reliably sample both epithelial and stromal tissue is instituted (3). Many clinicians still rely on scalpel biopsies, and it may be sufficient that the only procedural modification needed is to make deeper incisions (24). An important finding in this study is that not only is it possible that digital analysis can be used for assessing the stromal features in biopsies but that its agreement with light microscopic assessment is quite significant. In the biopsies, percentage agreement for the stromal features was between 75% and 87%. All the digital measurements for this study were made by an image analyst with no prior pathology training. A pathologist was only on hand to guide on where measurements should be made. It could be argued that if a pathologist proficient in the use of digital measurement actually did the measurement, the agreement rate could be much higher. Many histopathology laboratories now have access to digital pathology which could be ultimately integrated into histopathology workflow and not only limited to research, teaching, and external quality assurance practice (25). Ultimately using QuPath for stromal assessment could improve the preoperative tumor staging and risk assessment stratification to ensure that the most appropriate treatment is rendered and obviate the need for several unplanned interventions thereafter.
The problem of lack of complete concordance between digital diagnosis and light microscopy has been highlighted in some recent systematic reviews of the subject (26,27). Problems identified with digital microscopic diagnosis include limited image resolution, difficulties in identifying cell types and cellular structures, poor biopsy quality, lack of immunohistochemistry and special stains, poor assessment of nuclear atypia, grading of dysplasia and malignancy, and poor identification of small objects (26,27). In essence, digital microscopy has yet to reach the level presently attainable with light microscopy. Interestingly, in OSCC resections, a recent study proposed a relatively objective digital evaluation of TILs abundance (TILAb) in WSI by segmentation into tissue types (e.g., tumor and It is recommended that at least one should be taken with the tumor border. However the diameter of 'normal' tissue must not exceed that of the tumor tissue while doing this. (C) Small tissue can be excised but the depth must be at least 5 mm. In general, the same rules apply in scalpel biopsies as in punch biopsies. (D) In verrucopapillary or exophytic lesions, biopsies are better taken at the periphery of the tumor tissue so that the stromal (connective) tissue is included. lymphocytes) and then using a deep convolutional neural network and binary classifier of tumor lymphocyte colocalization to estimate TILAb. TILAb was found to be a strong prognostic indicator of disease-free survival and better than manually assessed TILAb (28). Digital tumor staging and risk assessment is a more advanced procedure (despite the issues raised above being carried over into it) than digital diagnostics using WSI. Digital measurements as used in this work therefore need to be further refined to eliminate unwanted cells and include relevant cells that it has excluded. Additionally, immunohistochemical staining as an adjunct may be very helpful in making QuPath to more efficiently identify tumor cells and lymphocytes when assessing TSR and TILs. It is a limitation of this study that the issues currently raised with digital microscopy could not be sufficiently addressed, and this may have contributed to the discordance noted in light microscopic and digital measurements.
Regarding the comparison between biopsies and resections, only TSR showed significant agreement when the two were compared. The explanation is that TSR measurement is based on finding a single spot on the tumor slide for measurements to be taken. Conversely, TILs measurement depends on scanning the whole of the slide's tumor invasive front to arrive at the aggregated score. Pathologists are generally familiar with variations in lymphocytic host response in different parts of a resection. TILs in biopsies may therefore not be truly representative of the real picture. It provides only a snapshot of the TILs population in the area of the tumor from which the biopsy was taken. Similarly, due to lack of uniformity all around the tumor, discordant evaluation of the WHO histological grade in biopsies and resections has been previously reported (5). In general, if TILs score is high in a biopsy, it is an indication that it will be high in the resection, while low value in a biopsy does not always imply low value in the resection. TSR appears not to be affected by the issues noted with TILs and therefore could possibly be added to the previously reported tumor histological factors of prognostic importance that can be evaluated in biopsies (Table 4).
In conclusion, this study showed that stromal prognostic features can be evaluated in optimal and representative OTSCC biopsies, using both light microscopy and digital measurements. There is relatively good agreement using both methods despite the small sample size available for this pilot study and also when compared with corresponding resection specimen. Digital evaluation needs further refining and familiarity to oral and maxillofacial pathologists for easy and relatively accurate evaluation of stromal features. Finally, we suggest that clinicians should take representative deep biopsies (>5 mm), and clinical pathologists should evaluate BD and TSR scores from those biopsies. While considerations for complications such as bleeding and functional deficits may be of important concern in biopsies and excision in other oral cavity subsites, it is often less pronounced in the oral tongue (24). Clinicians should make the preparations needed to successfully obtain an adequate biopsy while at the same time ensure that morbidity is preempted, especially in older patients (often with comorbid conditions) who constitute the bulk of patients with OTSCC.