An importance of clinical proteomics for personalized medicine of lung cancer subtypes

Mass spectrometry-based clinical proteomic analysis, combined with collection of taegeted cancerous cells laser-microdissected (LMD) from formalin-fixed paraffinembedded (FFPE) tissues, has been promising to unveil both proteins expressed and their functional networks in lung cancer subtypes. Among lung cancer subtypes, both large cell neuroendocrine carcinoma (LCNEC) of the lung and small cell lung carcinoma (SCLC) are now classified to neuroendocrine tumor (NET) but pre-therapeutic histological distinction between LCNEC and SCLC has so far been problematic, leading to adverse clinical outcome. Protein biomarker candidates for LCNEC were found to be interestingly known as cancer stem cell (CSC) markers including aldehyde dehydrogenase 1 family member A1 (ALDH1A1), and those for SCLC included novel NET marker candidates, brain acid soluble protein 1 (BASP1) and secretagogin (SEGN), and a known NET marker, neural cell adhesion molecule (CD56). For three types of lung adenocarcinomas (ACs), which are adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and lepidic predominant invasive asenocarcinoma (LEP), their preliminary clinical proteomic analyses revealed protein expression profiles characterizing those lung cancer subtypes. The STRING protein-protein interaction (PPI) network analysis followed by gene set enrichment (GSE) for proteins expressed significantly to the three lung AC subtypes manifested characteristic associations of cancer-related pathways, which might play consertedly important roles in progression of disease mechanisms, and which would be quite useful to understand carcinogenic processes of lung adenocarcinom. Thus outcomes from clinical proteomic analysis reveal not only biomarker protein candidates expressed significantly to a disease but also serve to elucidate disease-oriented protein-protein interaction (PPI) networks including functional networks predicted from experimentally obtained proteome datasets. Abbreviations: LCNEC: Large-Cell Neuroendocrine Lung Carcinoma, OLC: Other Large-Cell Lung Cancer, SCLC: Small-Cell Lungcancer, GGO: Focal Ground-Glass Opacity, AIS: Adenocarcinoma In Situ, MIA: Minimally Invasive Adenocarcinoma, LEP: Lepidic Predominant Invasive Adenocarcinoma, FFPE: Formalin-Fixed and Paraffin-Embedded tissue sections, MS: Mass Spectrometry, PPI: Comparative Proteomics, Protein-Protein Interaction


Introduction
Lung cancer is the leading cause of cancer-related mortality worldwide [1]. In Japan, annual deaths from lung cancer have been increasing and reached about 70,000 [2], and in the United States reached 160,000 even with a recent decreasing trend [3]. Recent advances in chest high-resolution computed tomography (HRCT) scanning technology have enabled to find small lung adenocarcinomas showing increasing trend worldwide [4] at an earlier and potentially more curable stage than was previously possible [5]. There are 90 million current and ex-smokers in the United States who are at increased risk of lung cancer. The published data from the National Lung Screening Trial (NLST) suggest that yearly screening with low-dose thoracic CT scan in heavy smokers can reduce lung cancer mortality by 20% and all-cause mortality by 7% [6].
large cells grow without any distinctive tissue construct. Small-cell lung cancer (SCLC) is the subtype of an aggressive neuroendocrine tumor consisting of small bare nuclei cells. Travis et al. [7] proposed a new subtype of large-cell lung carcinoma, named large cell neuroendocrine carcinoma (LCNEC) in 1991. SCLC is the subtype of an aggressive neuroendocrine tumor consisting of small bare nuclei cells. Currently, both LCNEC and SCLC belong to neuroendocrine tumors (NET) of the lung in the 2015 WHO classification [8]. LCNEC exhibits morphology similar to other LCC (OLC), but neuroendocrine differentiation like SCLC that could be judged by expression of at least one of three representative neuroendocrine proteins; neural cell adhesion molecule (NCAM1 which is known as CD56), synaptophysin (Syn) and chromogranin A (CGA). Developmental history of the tissue origin is currently unknown for these three types of lung cancers. LCNEC has a poor prognosis similarly as small-cell lung carcinoma (SCLC), and survival rate is just 18% in IA-stage only by resection [9,10]. Currently similarly to non small-cell lung carcinoma (NSCLC) its resection is first choice, and followed adjuvant therapy is selected as for SCLC. Surgical resection of LCNEC in many series has been described with 5-year actuarial survival that is far worse than that reported for other histological variants of non-small-cell lung cancer (NSCLC). There have been considerable debates on whether these tumors should be classified and treated as NSCLC or SCLC. The large-scale epidemiologic study has compared the presenting and prognostic characteristics of patients with LCNEC with those of patients with SCLC or other large cell carcinomas (OLCs) with respect to overall survival (OS) and lung cancer-specific survival (LCSS) rates for patients undergoing definitive resection without radiotherapy (S-NoRT), and they have concluded that LCNEC should continue to be classified and treated as a large cell carcinoma because the clinical, histopathologic, and biologic features of LCNEC are more similar to OLC than to SCLC [11,12].

Lung adenocarcinoma classification
In 2011, new pathologic classification of lung adenocarcinoma was proposed by the International Association for the Study of Lung Cancer (IASLC), the American Thoracic Society (ATS) and the European Respiratory Society (ERS) [13]. In the new classification, concept of adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA) were newly introduced and the term bronchioloalveolar carcinoma (BAC) was abolished. Additionally, invasive adenocarcinomas were categorized into 6 subtypes, lepidic, acinar, papillary, micropapillary, solid, and variants, according to the predominant histologic pattern. Both AIS and MIA were defined tumors ≤ 3 cm in size. AIS is a preinvasive lesion showing pure lepidic growth without invasion. MIA is also lepidic predominant tumor but with ≤ 5 mm invasion. Lepidic predominant invasive adenocarcinoma (LEP) is an invasive adenocarcinoma showing former nonmucinous BAC pattern with > 5 mm invasion. These 3 lepidic type adenocarcinomas are speculated to show step-wise progression from AIS, MIA, to LEP. After complete resection of AIS or MIA, usually 100% of recurrence-free 5-year survival can be obtained [13], while some recurrent cases are found after resection of LEP [14][15][16]. Since postoperative prognoses between the AIS plus MIA group and LEP are different, differential protein expressions associated with invasiveness of cancer cells in each subtype should play important roles to determine local recurrences and survivals.
Recent advancements in shotgun sequencing and quantitative mass spectrometry (MS) could make proteomics amenable to unveil proteins significantly expressed in clinical specimens of a disease [17,18]. Figure 1 illustrates a workflow of clinical proteomic analysis utilizing a variety of formalin fixed paraffin embedded (FFPE) cancer tissues archived, from which laser microdissection (LMD) made it possible to collect target cells to be investigated [19][20][21][22][23][24][25][26]. This paper describes clinical proteomic analysis that can profile proteins expressed in cancerous cells derived from lung cancer subtypes, from which proteomic datasets obtained are utilized importantly to elucidate biomarker candidates and further acquire a knowledge of dynamic protein-protein interactions (PPI) networks linked tightly to a disease mechanism.

Clinical proteomic analysis
Millions of clinical samples are obtained every day for use in diagnostic tests that support clinical decision making. Clinical samples (tissues, biopsies, blood, etc.) can also be archived into repositories for use in future studies investigating the etiology of diseases using omics approaches. Therefore, infrastructure buildup of standardized biobanking is increasingly needed within the clinical omics community because the samples themselves have intrinsic values in the determination of outcomes of clinical trials [27][28][29][30][31][32]. The samples can be retrieved from pathology laboratories with the approval from ethical committees of medical institutes and hospitals. Many types of disease specimens exist, such as frozen and FFPE tissues; biopsies; and body fluids including blood, serum, plasma, and urine; interstitial fluid; cyst material; ascites fluid; and pancreatic juice.

Laser microdisection and protein solubilization
In hospitals and medical institutes, tumor tissues obtained by surgical resection are typically fixed in 4% paraformaldehyde and routinely processed for paraffin sectioning. Cancerous lesions can be identified on serial tissue sections stained with hematoxylin and eosin (HE). Figure 2 shows (A) focal ground-glass opacity (GGO) on chest HRCT, which lesions are identified as AIS, MIA, or LEP and (B) a representative HE-stained image of LEP. Laser microdissection (LMD) makes it possible to collect target cells from a variety of FFPE cancer tissues. For shotgun proteomic analysis, 10-μm sections prepared from the same tissue block are attached onto DIRECTOR™ slides (OncoPlexDx, Rockville, MD, USA), de-paraffinized twice with xylene for 5 min, rehydrated with graded ethanol solutions and distilled water, and stained by hematoxylin [33][34][35][36][37]. Slides are air dried and subjected to LMD with a Leica LMD7000 (Leica Micro-systems GmbH, Ernst-Leitz-Strasse, Wetzlar, Germany). Typically, ca. 30,000 cells (ca. 8 mm 2 ) per tissue sample are transferred directly to a 1.5-mL low-binding plastic tube. Figure 3 exemplifies the hematoxylin-stained LEP tissue before and after LMD (C-1 and C-2, respectively).
Proteins/peptides from dissected cells can be extracted by following several protocols [33,35]. For example, according to the protocol of a Liquid Tissue™ MS Protein Prep kit (OncoPlexDx, Rockville, MD, USA) [33], the cellular material, suspended in the liquid tissue buffer, is incubated at 95°C for 90 min, cooled on ice (3 min), and subsequently enzymatically digested, followed by reduction and alkylation. The liquid tissue digests can be stored at −20°C until proteomic analysis.
Recent advances in mass spectrometry (MS) could make proteomics amenable to in-depth exploratory and targeted quantitative analysis of proteins expressed in a complex clinical specimen [38,39].
MS is greatly advantageous due to its extremely high capability of capturing/identifying/sequencing of proteins/peptides expressed in a complex clinical specimen, with high sensitivity and high precision, unlike others [18].

Exploratory ShotGun proteomic analysis
An exploratory proteomic analysis typically comprises extraction and/or direct tryptic digestion of all expressed proteins in a complex biological sample, and then the peptide mixture obtained is subjected to liquid chromatography (LC) /electrospray ionization-tandem MS analytical platform so as to sequence these by searching against protein sequence databases, which is referred as ShotGun proteomic analysis. Protein identification in shotgun proteomic approaches (bottom-up) can be now performed by four peptide sequencing strategies using MS/MS spectra: (A) database search, (B) spectral library matching, (C) hybrid approaches using sequence-tag determination followed by database search, and (D) de novo sequencing as illustrated in figure 4. Several hundreds to several thousands (more than 10,000 in some cases) different protein species can typically be identified in such exploratory clinical proteomic studies [39][40][41][42][43][44][45][46], in which label-free semi-quantitative comparison with statistical evaluation is mainly performed to elucidate proteins specifically relevant to a disease subtype.

Targeted protein detection and quantitation
In order to detect and quantify targeted peptides/proteins, selected-reaction monitoring (SRM) mass spectrometric assays are often utilized in tandem mass spectrometry, in which a targeted ion of a particular mass, usually a doubly-charged peptide ion, is selected in the first stage of a tandem mass spectrometer and a singly-charged product of a fragmentation reaction of the doubly-charged precursor ion is selected in the second mass spectrometer stage for detection, which is referred as the SRM transition. The SRM technology complements the discovery capabilities of ShotGun proteomic strategies by its reliable quantification of peptides/proteins of low abundance in a complex clinical sample [19,21].

Biomarker candidates for lung cancer subtypes
In the preliminary proteomic study conducted for three lung cancer subtypes of OLC (n=5), SCLC (n=5) and LCNEC (n=4), in which n denotes the number of patients, proteins up-regulated in LCNEC were representatively aldehyde dehydrogenase 1 family member A1 (ALDH1A1), aldo-keto reductase family 1 members C1 (AK1C1) and C3 (AK1C3) and CD44 [37]. On the other hand, those in SCLC were brain acid soluble protein 1 (BASP1), secretagogin (SEGN), and neural cell adhesion molecule (CD56). BASP1 is a newly reported NET marker candidate, which is known as a cofactor of WT1 (the Wilms' tumor suppressor) that suppresses the transcriptional activation function of WT1 [47]. BASP1 is a significant regulator of WT1 that is recruited to WT1-binding sites and suppresses WT1-mediated transcriptional activation at several WT1 target genes, and it has been reported that WT1 and BASP1 cooperate to induce the differentiation of K562 cells to a neuronal-like morphology [48,49]. Thus, BASP1 might be related to transcriptional reprogramming and morphological changes to a neuroendocrine phenotype in lung cancer.
Discovery of Biomarker candidates is carried out by semiquantitative comparisons based on spectral counting and G-statistics [20][21][22][23][24][25][26]. G-values were obtained in pairwise group-comparison of spectral counts of a protein, in which the spectral count is the number of MS/MS spectra assigned to a protein. The G-score > 3.84 is equal to the significance p-value less than 0.05. Figure 5 illustrates a 3D scatter plot with an X axis indicating G-statistic values (G-values) for LCNEC vs. OLC analysis, a Y axis for OLC vs. SCLC, and a Z axis for LCNEC vs. SCLC. The proteins expressed specifically to LCNEC will therefore be present in the region (X > 3.84, Z > 3.84 corresponding to p < 0.05 each) on the X-Z plane, those in SCLC in the region (y > 3.84, z > 3.84) on the Y-Z plane and those in OLC in the region (X > 3.84, Y > 3.84). This resulted in identifying four proteins ALDH1A1, AK1C1 AK1C3 and CD44 that were expressed in LCNEC more than in SCLC and OLC with high probabilities. These proteomic findings using the limited scale of patients were confirmed by routine immunohistochemitry with additional patients [37]. CD44 is a cell-surface glycoprotein which relates to cell-cell interactions including adhesion and migration, and thus to tumor growth and progression [50]. Those proteins, ALDH1A1 [51,52], AK1C1 [53], AK1C3 [54] and CD44 [55], were proposed to be the markers of cancer stem cells. Their expression in tumor cells could correlate with their aggressive biological behavior, drug resistance and poor prognosis, which are common characteristics of LCNEC and SCLC. Previous studies suggested that these redox enzymes were present in a variety of malignant tumor cells. In particular, AK1C1, and AK1C3 are reported in human non-small cell lung carcinoma (A549) cells [56], and a high expression of ALDH1A1 in lung cancer cell lines, especially in AC cell lines compared to OLC and SCLC cell lines [57][58][59]. The hematoxylin-stained LEP tissue before and after LMD (C-1 and C-2, respectively). The DIRECTOR® slide is similar to a standard glass (uncharged) microscope slide, but has an energy transfer coating on one side of the slide. Tissue sections are mounted on top of the energy transfer coating, and when the slide is turned over, the tissue faces down under the microdissection system. Targeting cells or tissue areas of interest is carried out on computer display. The laser energy is converted to kinetic energy upon striking the coating, vaporizing it and instantly propelling selected tissue features into the collection tube. Semi-quantitations of ALDH1A1 by utilizing the selected-reaction monitoring (SRM) mass spectrometric method has been examined in comparison with its antibody-based %-immuno staining measurements, where both data were obtained from FFPE tissue specimens belonging to a corresponding patient. Figure 6 shows the plot of SRM AUCs against antibody-based % -values of immunostaining on ALDH1A1 for 14 cases throughout the three subtypes: LCNEC (n=4), OLC (n=5), and SCLC (n=5). Therein, all tissues of SCLC group showed zero-% values of ALDH1A1 antibody immunostainng although the LCNEC group gave %-antibody immunostaining higher than 30%. Moreover, the reasonably good correlation (R 2 > 0.87) observed between those two different assays confirmed that ALDH1A1 is a promising candidate characteristic to LCNEC. BASP1, a biomarker candidate for SCLC, is a potential tumor suppressor [60] suggests that different mechanisms of tumor growth could operate between LCNEC and SCLC. Another SCLC-specific SEGN is a novel neuroendocrine marker that has a distinct expression pattern with being negative in LCNEC, and with the reported rate for positive staining in SCLC [61]. It is recently reported that ALDH1A1 plays an important role in Notch pathway [62]. Though there has been no effective chemotherapy to LCNEC, Sorafenib, a tyrosine kinase inhibitor in the MAP kinase pathway, is effective to malignant tumor cells with ALDH1A [63].

Protein-protein interaction networks in lung adenocarcinoma subtypes
As described in "Introducion", lepidic type adenocarcinomas (ACs) are constituted by three subtypes; adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA) and lepidic predominant invasive adenocarcinoma (LEP). Although these subtypes are speculated to show sequential progression from preinvasive lesion to invasive lung cancer, changes of protein expressions during these processes have not been fully studied yet. From cancerous cells laser-microdissected from FFPE AC tissues a total of 840 proteins were identified by proteomic analysis. Spectral counting-based semi-quantitative comparisons of all identified proteins through AIS to LEP have revealed that the protein expression profile of LEP was significantly differentiated from other subtypes. LEP-type marker candidates included HPX, CTTN, CDH1, EGFR, and MUC1. Protein candidates for MIA-type marker included CRABP2, and those for AIS-type marker included LTA4H and SOD2 [64].
The most important information is how proteins expressed significantly in disease subtype interplay with other key proteins and pathways to evaluate biomarker candidates and therapeutic targets. Several open PPI databases are available; current versions include Reactome [65] and BioGRID [66], and PPI network analysis can be performed by designated network construction algorithms, using, for example, the Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) database [67] and the Cytoscape, a software environment for integrated models of biomolecular interaction networks [68]. PPI networks elucidated so far consist of nodes and edges, where nodes are proteins experimentally identified, and edges are the predicted functional associations based on primary databases comprising the Kyoto Encyclopedia of Genes and Genomes (KEGG) and gene ontology (GO) [69], the primary literature, and so on. Thus, it has become possible to elucidate protein networks relevant to a disease subtype using its proteomic datasets. The concept of biomarkers has been changing from conventional biomarkers-single proteins-to specific protein networks or dynamically varying networks [70], since diseases can be regarded also as dynamic network disorders. PPI networks dynamically activated or deactivated in a disease subtype of interest would be directly associated with its responsible molecular mechanisms, which would lead to discovery of therapeutic and druggable targets. AK1C1 and AK1C3 (orange), ALDH1A1 (purple) and CD44 (red) Proteins being located very near or on X-Z plane are isolated as candidates of specific LCNEC markers. SEGN (yellow) were located on Y-Z plane, which was already known as one of SCLC-specific markers [37].  Figure 7. The STRING PPI networks elucidated for A) MIA and B) LEP from the respective proteome datasets, which were obtained from the clinical proteomic analytical study of the cancerous cells laser-microdissected (LMD) from lung cancer FFPE-tissue specimens, and C) results of the STRING gene set enrichments (GSEs) for LEP, MIA, and AIS obtained against the 24 cancer-related KEGG pathways (significance rank p < 0.05 after correction by FDR), which revealed how functional participations of expressed proteins alter dramatically throughout disease stages, which reflected mechanisms of disease progression [64].
The STRING gene set enrichment (GSE) resulted from the PPI network analysis suggested that AIS was rather associated with pathways of focal adhesion, adherens junction, tight junction and leukocyte transendothelial migration, that MIA had a strong association predominantly with pathways of proteoglycans in cancer and with PI3K-Akt. In contrast, LEP was associated broadly with numerous tumor-progression pathways including ErbB, Ras, Rap1 and HIF-1 signalings. Figure 7 shows the STRING PPI networks for A) MIA and B) LEP developed from the respective proteome datasets obtained [64]. Proteoglycans are known to be important molecular effectors of cell surface and pericellular microenvironments and to have multiple functions in cancer and angiogenesis by interacting with both ligands and receptors that regulate neoplastic growth and neovascularization [71]. Molecules participating in the proteoglycan-related cancer pathway were denoted by red circles in figure 7A. The ErbB signaling pathway is associated with several cancer pathways. The ErbB family represents epidermal growth factor receptors, which play an important role in tumor growth. Overexpression of EGFR occurs in around 60% NSCLCs, with patients with AC having the highest frequency [72]. Hypoxia-inducible factors (HIFs) regulate the transcription of genes that mediate the response to hypoxia (reduced O 2 availability) [73]. Diverse products of HIF-1 action such as induction of the Met protein, hepatocyte growth factor, followed by Met receptor activation, may result in the poor prognosis associated with hypoxic tumors, which are indeed more aggressive than their well-oxygenated counterparts. Molecules participating in the ErbB and HIF-1 signaling pathways are denoted by orange and red circles in Figure 7B, respectively [64]. Figure 7C illustrates the results of the STRING gene set enrichments for LEP, MIA, and AIS obtained for the 24 cancer-related KEGG pathways, which were elucidated with their significance rank p < 0.05 after correction by false discovery rate (FDR) [64]. It was revealed how functional participation of expressed proteins alters dramatically throughout disease stages, reflecting the mechanisms of disease progression.

Somatic mutations and cellular pathways
The identification of recurrent mutations in EGFR and fusions involving ALK and other receptor tyrosine kinases has greatly transformed the standard of treatment of patients with lung ACs. Current guidelines recommended the molecular genotyping of ACs to routinely include the EGFR and ALK status, alterations which are found to exist in ca. 25% of patients with AC who benefit more from approved targeted inhibitor therapies than from conventional chemotherapy. Such somatic alterations, mutations, and fusions in lung cancers frequently affect cellular pathway activities involved in lung cancer subtypes. Fig. 8 summarizes cellular pathways, the activities of which are affected by somatic alterations in lung cancer subtypes, namely AC, squamous cell carcinoma, and SCLC [72]. It should be noted that both somatic mutations and cellular pathways in disease subtypes are mutually intrinsically connected, and so both are needed to be unveiled to understand molecular mechanisms of a disease subtype. In lung cancer, numerous genes acquire mutations which frequently involve EGFR and KRAS, and unavailability of drugs or resistance to the available drugs is the major problem. For an instance, it is an attractive strategy that SCLC cells accompanied by mutated EGF receptors and become addicted to AKT/PKB signals can be fell into apoptosis by depriving these signals.

Summary and perspectives
The clinical proteomic analysis of lung cancer subtypes demonstrated its feasible technology to reveal their characteristic protein expressions, among which ALDH1A1, AK1C1, AK1C3, and CD44 would be characteristic to the LCNEC phenotype whereas BASP1, SEGN and CD56 would be for SCLC. They would be useful targets to immunohistochemically distinguish LCNEC from SCLC and OLC. Reagarding to lung ACs subtypes, the molecular biological background predisposing the worse prognosis of LEP compared with AIS and MIA may be in part due to the forms of altered protein expressions found. Proteins appearing in the step from AIS to MIA are probably important at the initial step of microinvasion. As LEP prepares characteristics of matured lung cancer, it is reasonable that LEP expresses a variety of proteins associated with cancer invasion. Some of these proteins would be candidates for molecular target therapy to suppress local invasion or distant metastases. In the new adenocarcinoma subtyping, prognoses of solid or micropapillary predominant invasive adenocarcinomas were reported to be apparently worse than these of other subtypes including lepidic type adenocarcinomas [16,74,75]. Clinical proteomic analyses will contribute to elucidate protein expressions determining malignant grade of various lung adenocarcinoma subtypes, which will further provide important knowledge to understand the carcinogenetic process and tumor lineages of lung adenocarcinomas for the benefit of patients with more efficient diagnosis and treatment of these tumors.
In this decade drug has been frequently found to indicate its efficacy different by races and patients's groups, and so driver mutations specific to each race and/or patients' groups have been actively investigated. Both protein-protein interaction (PPI) networks and cellular pathways affected by key somatic mutations should be investigated in relation to a subgroup of lung cancer patients with acquired resistance. Thus, the MS-based proteomic approach utilizing clinical specimens would make it possible to reveal molecular networks relevant to a disease subgroup, drug responders or nonresponders, good or poor prognosis, drug resistance, and which will provide a powerful solution in stratification of patients and target discovery. In Japan, clinical specimens with detailed clinical and pathological information and in high quality have been archived for many years within medical institutes and hospitals owing to the national healthcare system, and include even early stage lung cancers. When a distinct clinical study design using the valuable clinical samples, so as to say national assets, is established and conducted by partnering scrupulously with clinicians, an innovative treatment and target/drug discovery for lung cancer can be delivered from Japan.