Data mining has emerged as a very powerful tool to extract information. In the present study RB1 gene, which is a tumor suppressor gene has been studied. Data mining is done first at Ground Level Mining, in which relevant data sets are collected and then reduced to the minimum size possible through statistical representation. Chromosome Report is studied to determine the chromosome location of the gene under study. It regulates cell cycle as a check point for p53 and for other genes as well, specifying cell fate. The mRNA Report gives detailed information regarding the gene for its identification and transcript sequences. The Peptide Report reveals highly descriptive data regarding protein, SNP regions, position and alleles. A special emphasis is given to protein interactions and blastp result for study of homologous protein sequences. Phylogenetic tree is generated using homologous protein sequences for Phylogenetic Inference and homology with other taxa. The preclinical data of cyclosporine has suggested that it has a significant role in treating retinoblastoma malignancies but the binding of Actinomycin-D with Rb is far better as evident by the low e-total value signifying its greater affinity. Therefore the drug is suggested as a potential drug for future in treatment of retinoblastoma and associated retinoblastoma malignancies.
retinoblastoma, Actinomycin-D, Data mining, RB1 gene
RB1, the first tumor suppressor gene to be discovered has a crucial role in cell cycle as a major check point. More specifically, Rb family members have overlapping roles in promoting cellular growth arrest. Targeted inactivation of all the three RB-related genes in embryonic stem cells causes deregulated G1/S transition, G1 arrest and cellular immortalization. Most noteworthy is the fact that Rb is the only pocket protein known to exhibit features of a bonafide tumor suppressor. Accordingly, RB heterozygous mice are predisposed to the onset of pituitary and thyroid cancer and deregulation of the Rb signaling pathway or inactivation of RB itself is a hallmark of nearly all human tumors as reported by Hanahan et al. [1].
In addition to Rb is required for myogenesis p300, which is a regulatory factor for myogenic transcription and complete blood particulates (CBP) are in fact known to regulate the activity of numerous transcription factors [2]. It possesses conserved functions in cell growth, transformation and development as stated by Goodman & Smolik (2000). P/CAF has likewise known to play a role in cell differentiation and cause tumorigenesis [3,4]. Current models propose that p300/CBP and P/CAF can form a multimeric complex with MyoD, which recruits these co-activators to muscle-specific promoters [3,5,6]. P/CAF and p300/CBP are believed to hyperacetylate the surrounding nucleosomes, thus increasing the accessibility of additional transcription factors to MyoD target promoters, [7]. The proteins p300/CBP and P/CAF have now been demonstrated to acetylate a wide variety of non-histone proteins, including many DNA-binding proteins, also general transcription factors, cytoplasmic proteins and tumor suppressors [8]. Recently, the co-activator p300 was shown to acetylate pRb in vitro [9].
Retinoblastoma is a rapidly developing cancer which develops in the cells of the retina, the light detecting tissues of the eye. In the developed world, more than nine out of every ten sufferers survive into adulthood with cure to retinoblastoma [10]. They also described two forms of the disease; a genetic, heritable form and a non-genetic, non-heritable form. Approximately 55% of children with retinoblastoma have the non-genetic form. If there is no history of the disease within the family, the disease is labeled "sporadic", but this does not necessarily indicate that it is the non-genetic form. In humans, the protein is encoded by the RB1 gene located on 13q arm 14.1-q14.2 band position. It is a 4840 bp long gene having 54 K Single Nucleotide Polymorphism. If both alleles of this gene are mutated early in life, the protein is inactivated and results in development of retinoblastoma cancer, hence the name Rb. Rb is an associated protein, which is 928 residues long having 106,159.11 Da molecular weight. Its isoelectric point is 8.04. Rb has two domains: Domain A and Domain B separated by a spacer and has a pocket region that refers to the region where regulatory factors like E1A binds.
Data mining
It uses computational techniques from statistics, machine learning and pattern recognition. These techniques are an automated means of reducing the complexity of data in large bioinformatics databases and of discovering meaningful, useful patterns and relationship in data.It included the following steps: Data Characterization, Consistency Analysis, Domain Analysis, Data Enrichment, Frequency and Distribution Analysis, Normalization and Missing Value Analysis
Data Mining Methods
The process of data mining is concerned with extracting patterns from the data by using techniques such as: Classification which involves mapping of data into one of several predefined or newly discovered classes.
Evaluation
The patterns identified by the data mining analysis are interpreted and typical evaluation ranges from simple statistical analysis and complex numerical analysis of sequences and structures to determining the clinical relevance of the findings.
Visualization: Visualization of evaluation results can range from simple pie charts to 3-D virtual reality displays that can be manipulated by haptic (force feedback) controllers.
Multiple Sequence alignments of protein sequences are important tools in studying sequences. The basic information they provide is identification of conserved sequence regions. Sequences can be aligned across their entire length (global alignment) or only in certain regions (local alignment). Global alignments need to use gaps (representing insertions/deletions) while local alignments can avoid them, aligning regions between gaps. CLASTALW is a fully automatic program for global multiple alignment of DNA and protein sequences. The alignment is progressive and considers the sequence redundancy.
Phylogenetic Tree can also be calculated from multiple alignments. Evolutionary relationships can be seen by viewing Cladogram and Phylogram.
Docking is the process of calculating the e-value, which is the pairing of hydrogen bonds of the protein and the drug molecules, so as to determine their active site affinity. Lower the e-value stronger is the bonding. “Hex” is an interactive molecular graphics program for calculating and displaying feasible docking modes of pairs of protein and DNA molecules. Docking is frequently used to predict the binding orientation of small molecule drug candidates to their protein targets in order to predict the affinity and activity of the small molecule. Hence docking plays an important role in rational drug designing and therefore is of biological and pharmaceutical significance.
Tools
The various tools and their URL’s are:
- Basic Local Alignment Search Tool (BLAST): URL: www.ncbi.nih.nlm.gov/blast
- CLUSTAL W: URL: http://www.ebi.ac.uk/clustalw , URL: http://www.align.genome.jp/
- Open Reading Frame Finder (ORF): URL: http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Resources
The various data bases and software's used through NCBI are as under with their unique location addresses as URL’s:
S.No:
|
Data Base
|
URL
|
1
|
Map View
|
http://www.ncbi.nlm.nih.gov/mapview
|
2
|
Pub Med
|
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed
|
3
|
Ensembl
|
http://www.ensembl.org/
|
4
|
Kyoto Encyclopedia of Genes and Genomes (KEGG)
|
http://www.genome.jp/kegg/
|
5
|
Universal Protein Resource (UniProt)
|
http://www.ebi.uniprot.org/
|
6
|
Swissprot
|
http://www.ebi.ac.uk/swissprot/
|
7
|
Pfam
|
http://www.sanger.ac.uk/Software/Pfam/
|
8
|
Protein Data Bank (PDB)
|
http://www.pdb.org/
|
Ground Level Mining
To perform ground level mining, log on to www.ncbi.nlm.nih.gov/mapview ,search Homo sapiens for RB1. Procure genomic view and query results for RB1 of Homo sapiens. Go to RB1 for Celera gene, which is the references verified by the NCBI and is supported by Pub Med sequences. Obtain Master Map by clicking on Genes RB1 highlighted on Celera. Click on the highlighted RB1 in master map and collect detailed information on Summary, Functions and Genomic Regions. Gene Table is opended to get the number of exons and introns. Taxonomic Links are clicked to get taxonomic IDs and SNP Links for nucleotide information. Unigene IDs, Markers, Gene & genotype Links and pathways are downloaded from the main page. Note the references of abstracts and interactions from corresponding PubMed links. Figures and information about metabolism is extracted from Kyoto encyclopedia of gene and genome (KEGG) pathways and Reactome event from the respective links.
Chromosome Report
Log on to www.ensembl.org and click on Homo sapiens icon. Search gene for RB1 and select the gene under study.
Gene Report is taken from side menu of NCBI main page. The transcript sequence is taken directly from transcript information of Ensembl. The details regarding exons and introns are collected from Gene Table of NCBI. ORF Finder tool is used to predict ORFs. List of SNPs are obtained by integrating information from Ensembl and NCBI. The transcript structure is again collected from transcript info of Ensembl. Orthologues prediction table is directly procured from Ensembl.
Peptide Report
The description, function & feature table about Rb is collected from Uniprot. Peptide statistic and sequence are extracted from Ensembl. Interactions are studied in detail by referring to BIND and PubMed Links. http://www.pdb.org/explore.do?structureId=2AZE.
Phylogenetic Inference
Multiple sequence alignment is carried out using ClustalW to predict RB homologous sequences in different organisms and to generate phylogenetic tree such as Cladogram, Phylogram and Dendrogram.
Docking
In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second and is performed as under: Rb image is downloaded as ‘jpg’ image. Drug structures are downloaded as crystalline structures. Protein and drugs are subjected to docking via Hex software, to obtain the e-values.
Genome view
RB1 is located on 13th chromosome as visualized in the above karyotype (Figure 1).
Figure 1. Homo sapiens (Human) genome view.
Ideogram
The details of 13th chromosome are illustrated as under, with the retinoblastoma affected sequence highlighted as a red band ( Figure 2).
Figure 2. Ideogram of 13th chromosome showing the abnormal gene.
3. Map viewer: It shows the RB1 regions and gives a detailed summary as below.
Map 1: Celera Genes On Sequence (Celera) Table View |
Region Displayed |
29,800K-30,240K bp |
Total Celera Genes On Chromosome |
718 |
Celera Genes Labelled |
9 |
Total Celera Genes in Region |
9 |
Map 2: Homo sapiens UniGene Clusters (Celera) Table View |
Region Displayed: 29,800K-30,240K bp |
29,800K-30,240K bp |
Total Transcript alignments On Chromosome |
3172 |
UniGene Clusters Labelled |
25 |
Total Transcript alignments in Region |
39 |
Histogram Data |
Tick Width |
597bp/pixel |
Max Height |
846 transcripts |
Map 3: Genes On Sequence (Celera) Table View |
Region Displayed |
29,800K-30,240K bp |
Total Genes On Chromosome |
642 |
2021 Copyright OAT. All rights reserv
Genes Labelled |
5 |
Total Genes in Region |
5 |
Chromosome Report
Summery-Official Full Name: retinoblastoma 1 (including osteosarcoma) GeneID: 5925, Ensembl GeneID: ENSG00000139687, RefSeq status: Validated, Chromosome address : Chromosome: 13; Location: 13q14.2,Organism: Homo sapiens, Gene symbol: RB1, Gene aliases: RB; OSRC, Description : Retinoblastoma 1 (including osteosarcoma)
- Lineage: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo
- Gene type: protein coding
- Isoforms: none
Functions
Retinoblastoma (RB) is an embryonic malignant neoplasm of retinal origin. It almost always presents in early childhood and is often bilateral. Spontaneous regression ('cure') occurs in some cases.
Genomic Region
Contig region: AL 392048.9.1.137342 ; Location of gene : 47775912-47954023; NM number: NM_000321.1; NP number: NP_000312.1
Genomic Context
Number of exons: 27; Number of introns: 26
Taxonomic ID: 9606
Entrez Regions: UniGene: Hs.408528
SNP'S
SNP is a small genetic change or variation that can occur within DNA sequence. The functional significance of SNP variants in relation to environmental carcinogens is largely unknown. The overall objective is to determine whether SNPs in human bypass DNA polymerase genes affect responses to environmental exposure. Bypass DNA polymerases can copy past a variety of DNA modifications induced by exogenous and endogenous agents (Table 1).
Table 1.
Markers
Number of markers: 100 (Links: UniSTS); Gene genotype links: same as SNPs; Gene RIFs: (Gene References Into Function): Related Articles in PubMed: 191
Pathways
KEGG pathway: Cell cycle 04110. The metabolic pathway shows the check point p53 and its associated pathways (Figure 3).
Figure 3. Metabolic pathway shows the check point p53 and its associated pathways.
Chromogram
RB1 gene is located on Chromosome 13 at its q arm, band position 14.2 and can be found on 47,775,912 - 47,954,123 contig regions, which refers to the wild type of an individual (Table 2).
Gene report
GENE DESCRIPTION-The RB1 regulates proliferation, cell fate specification and differentiation in the all cell types. It inhibits cell proliferation in the absence of Rb which is the expression part of RB1 and is linked to be the main cause of retinoblastoma, as the role of Rb in normal cells is to suppress tumor formation. Rb is found in all cells of the body, under normal conditions it acts as a brake on the cell division cycle by preventing certain regulatory proteins from triggering DNA replication. If Rb is missing a cell can replicate itself over and over in an uncontrolled manner, resulting in tumor formation (Table 3).
Peptide report
Peptide sequence (Table 4).
Description
- Protein Name: Retinoblastoma-associated protein
- Synonyms: (PP110) (P105-RB) (RB).
- Protein ID Uniprot : P06400
- No. of residues: 928 aa
- Molecular weight: 106,159.11 Daltons
Phylogenetic inference
Homology search is performed for RB1 with entrez and generation of Phylogenetic Tree for RB1as Dendrogram (Figure 4).
Figure 4. Transcript Sequence.
Docking
The affinity between the Rb protein and drugs is seen in docking and the bond strength is calculated as e-value. Docking is done with ‘hex’ inserting Rb as receptor and drug as ligand in the software. The result is given below:
- Binding of cyclosporine with Rb as shown in figure 5
- Binding of actinomycin-D with Rb as shown in figure 6
Docking summary report for Rb receptor and cyclosporine ligand
|
Clst Soln Models Etotal Eshape Eforce Vshape Vclash Lig(A,B,G) Rec(B,G) R12 Bmp RMS
|
1 1 000:000 -63.7 -63.7 0.0 0.0 0.0 281.250 44.316 83.044 124.494 180.000 42.00 -1 -1.00
|
Docking summary report for Rb receptor and actinomycin –d ligand
|
Clst Soln Models Etotal Eshape Eforce Vshape Vclash Lig(A,B,G) Rec(B,G) R12 Bmp RMS
|
1 1 000:000 -446.7 -446.7 0.0 0.0 0.0 348.750 58.283 108.000 104.273 126.293 27.50 -1 -1.00
|
Figure 5. Peptide flow sequence.
Figure 6. Rooted Dendrogram.
The process of data mining is concerned with the extraction of relevant information from the enormous amount of data available in various databases. In order to create a high-level description of the nature and the content of the data for the specified gene RB1, minning is performed. RB1 is a tumour suppressor gene and regulates other genes as well, specifying cell fate.
For the above purpose the data is mined from various websites such as NCBI, PubMed, KEGG, Ensembl, Uniprot, Swissprot, InterPro, Pfam, PDB and the tools used for extracting information are BLAST, CLUSTALW and ORF FINDER.
GROUND LEVEL MINING results in reduction of data to the minimum size possible through statistical representation, whereas the CHROMOSOME REPORT is studied through map viewer and the chromosome location of RB1 is visualized.
Lee et. al. [11] in their studies showed that RB tumor grown in vitro expresses highly specific photoreceptor cell genes. They did not find any marker genes specific to rod cells, but during the study of gene report it is evident that the specified gene regions are associated with marker gene as found through entrez search.
The mRNA REPORT is deciphered to gather detailed information regarding the gene for its identification, it revealed the exon and the transcription regions and PEPTIDE REPORT revealed highly descriptive data regarding Rb. The observations are in confirmation with Yokota et. al. who found markedly reduced amounts of Rb transcript in some small cell carcinomas. Special emphasis was given to protein interactions and blastp results to oblation homologous protein sequences.
Our results agree the fact as given by DeCaprio et. al. Buchkovich et. al. [12] and Chen et. al. [13] who demonstrated that the RB1 gene product has the properties of a cell cycle regulatory element and that its function is modulated by a phosphorylation/dephosphorylation mechanism during cell proliferation and differentiation, as evident from the KEGG pathway.
Phylogenetic tree is generated using homologous protein sequences for PHYLOGENETIC INFERENCE. Results show that RB homologous genes were found in various organisms varying from Bacteria to Homo sapiens. These organisms can be used as model organisms for in vitro and in silico studies related to novel drug discovery. Lee e.t al. [14] used a rabbit antiserum against the RB for studying all cell lines expressing normal Rb mRNA. Sivakumaran et al. also conducted a comprehensive survey of sequence variation in the RB1 gene in diverse human populations and primates.
The preclinical data for the treatment of retinoblastoma with Cyclosporin is given by Finger PT et. al. [15]. As a practice, retinoblastoma is cured using technique such as chemotherapy and radiotherapy. The drugs inhibiting the activity of Rb are used for chemotherapy in treating retinoblastoma as well as retinoblastoma associated malignancies. The preclinical data of cyclosporine suggest that it is having significant role in treating retinoblastoma malignancies. Whereas, the docking report of actinomycin-D with Rb shows far better binding capacity of actinomycin-D than cyclosporine. This is evident by the low e-total value of actinomycin-D; therefore the drug is suggested as a potential drug for future in treatment of retinoblastoma and associated retinoblastoma malignancies.
Actinomycin-D has been used successfully for the treatment of liver associated cancers, but the drug is now not used to treat retinoblastoma till date. No previous docking data for Actinomycin-D with retinoblastoma protein has been found before.
Therefore the mined data was characterized and organized in a manner that could be inferred easily for future utilization and preclinical trials can be carried out with actinomycin-D.
Figure 7. Rb binds with cyclosporine after docking.
Figure 8. Rb and actinomycin–D in bond condition after docking.
- Hanahan D, Weinberg RA (2000) The hallmarks of cancer. Cell 100: 57–70.[crossref]
- Vo N, Goodman RH (2001) CREB-binding protein and p300 in transcriptional regulation. J Biol Chem 276: 13505-13508. [crossref]
- Yang XJ, Ogryzko VV, Nishikawa J, Howard BH, Nakatani Y (1996) A p300/CBP-associated factor that competes with the adenoviral oncoprotein E1A. Nature 382: 319–324
- Yuan W, Condorelli G, Caruso M, Felsani A, Giordano A (1996) Human p300 protein is a coactivator for the transcription factor MyoD. J Biol Chem 271: 9009–9013.[crossref]
- Schiltz RL, Nakatani Y (2000) The PCAF acetylase complex as a potential tumor suppressor. Biochim Biophys Acta 1470: M37-53. [crossref]
- Eckner R, Yao TP, Oldread E, Livingston DM (1996) Interaction and functional collaboration of p300/CBP and bHLH proteins in muscle and B-cell differentiation. Genes Dev 10: 2478-2490. [crossref]
- Puri PL, Sartorelli V, Yang XJ, Hamamori Y, Ogryzko VV, et al. (1997) Differential roles of p300 and PCAF acetyltransferases in muscle differentiation. Mol Cell 1: 35-45. [crossref]
- McKinsey TA, Zhang CL, Olson EN (2001) Control of muscle development by dueling HATs and HDACs. Curr Opin Genet Dev 11: 497-504. [crossref]
- Kouzarides T (2000) Acetylation: a regulatory modification to rival phosphorylation? EMBO J 19: 1176-1179. [crossref]
- Chan HM, Krstic-Demonacos M, Smith L, Demonacos C, La Thangue NB (2001) Acetylation control of the retinoblastoma tumour-suppressor protein. Nat Cell Biol 3: 667-674. [crossref]
- MacCarthy A, Birch JM, Draper GJ, Hungerford JL, Kingston JE, et al. (2009) Retinoblastoma in Great Britain 1963-2002. Br J Ophthalmol 93: 33-37. [crossref]
- Lee MH, Williams BO, Mulligan G, Mukai S, Bronson RT, et al. (1996) Targeted disruption of p107: functional overlap between p107 and Rb. Genes Dev. 10: 1621–1632.
- Buchkovich K, Duffy LA, Harlow E (1989) the retinoblastoma protein is phosphorylated during specific phases of the cell cycle. Cell 58: 1097-1105. [crossref]
- Chen PL, Scully P, Shew JY, Wang JY, Lee WH (1989) Phosphorylation of the retinoblastoma gene product is modulated during the cell cycle and cellular differentiation. Cell 58: 1193-1198. [crossref]
- Lee EY, Chang CY, Hu N, Wang YC, Lai CC, et al. (1992) Mice deficient for Rb are nonviable and show defects in neurogenesis and haematopoiesis. Nature 359: 288-294. [crossref]
- Finger PT, Nadal GB (1999) Preclinical data for the treatment of retinoblastoma with Cyclosporin crystalline structure. Cell 24: 17-105.