A Quantitative Understanding of Human Sex Chromosomal Genes

In the last few decades, the human allosomes are engrossed in an intensive attention among researchers. The allosomes are now already been sequenced and found there are about 2000 and 78 genes in human X and Y chromosomes respectively. The hemizygosity of the human X chromosome in males exposes recessive disease alleles, and this phenomenon has prompted decades of intensive study of X-linked disorders. By contrast, the small size of the human Y chromosome, and its prominent long-arm heterochromatic region suggested absence of function beyond sex determination. But the present problem is to accomplish whether a given sequence of nucleotides i.e. a DNA is a Human X or Y chromosomal genes or not, without any biological experimental support. In our perspective, a proper quantitative understanding of these genes is required to justify or nullify whether a given sequence is a Human X or Y chromosomal gene. In this paper, some of the X and Y chromosomal genes have been quantified in genomic and proteomic level through Fractal Geometric and Mathematical Morphometric analysis. Using the proposed quantitative model, one can easily make probable justification or deterministic nullification whether a given sequence of nucleotides is a probable Human X or Y chromosomal gene or not, without seeking any biological experiment. Of course, a further biological experiment is essential to validate it as the probable Human X or Y chromosomal gene homologue. This study would enable Biologists to understand these genes in more quantitative manner instead of their qualitative features.


Introduction
In the last ten years, Genomics has revolutionized the study of evolution. Evolution changes the sequence of DNA molecules, and comparing DNA sequences allow us to reconstruct evolutionary events from the past. The availability of DNA sequences from multiple vertebrates has confirmed that the process of sex chromosome evolution as envisioned by theorists has played out multiple times in the evolution of vertebrate sex chromosomes.
However, complete, high-quality sequences of sex chromosomes have led to discoveries that were unanticipated by existing theory. The next stage of genomic research will begin to derive meaningful knowledge from these genes. A quantitative genomic understanding will have a major impact in the fields of medicine, biotechnology, and the life sciences [1 and 2].
One of the most frontier contemporary challenges is to make a revolution in medical science by introducing Genetic Therapy [3]. Gene therapy is an experimental technique that uses genes to treat or prevent disease. This method of therapy would countenance us to treat a disorder by inserting a gene into a patient's cells instead of using drugs or surgery. The most commonly practiced approaches of gene therapy include…  Replacing a mutated gene that causes disease with a healthy copy of the gene.
 Inactivating, or "knocking out," a mutated gene that is functioning improperly.
 Introducing a new gene into the body to help fight a disease.
Although gene therapy is a promising treatment option for a number of diseases (including inherited disorders, some types of cancer, and certain viral infections), the technique remains risky and is still under study to make sure that it will be safe and effective. Gene therapy is currently being tested for the treatment of diseases that have no other cures [3, 4, 5 and 6]. Prior to gene therapy as a practical approach for treating diseases, we must overcome many technical challenges. One of the nontrivial challenges is to get quantitative insight of genes. This would help us in precise characterization of a particular DNA. This quantitative study of genes will be an add-on as the genetic signature of a gene.
In the present study, a mathematical quantification of human X and Y chromosomal genes [7, 8, 9, and 10, 12 and 13] has been done by using Fractal Geometry [14, 15 and 16]. So on using this proposed quantitative model, one can easily make probable justification (or deterministic nullification) whether a given sequence of nucleotides is a probable Human X/Y chromosomal genes or its homologue or not, without seeking any biological experiment.
This study would help researchers in understanding these genes in differentia ting from each other through the very nucleotide syntactical presentation.

Model Decomposition and Representation
(A) DNA 4-Colored Representation: Let a DNA sequence is in the form of four-letter (ATGC) nucleotides sequence (Fig. 1A). Such sequence shown in Fig. 1A is converted as a function (Fig. 1B) depicting colors Red, Blue, Green, and Yellow respectively for A, T, G, and C [17,18]. This allows ( , ) having maximum of 4 colors, i.e. 0 ≤ ( , ) ≤ 3. (C) 4-Adic Representation: Also we consider a DNA as a string of four variables 0, 1, 2 and 3 (as shown below)

Fig. 1 (A): A DNA string of four variables A T C and G of SRY
corresponding to A T C and G respectively. We name this string as 4-adic string of DNA [17,18].     Weight (MW), and Polarity (P) of the human X and Y chromosomal genes have been considered.

Fig. 4: Accessible residues (AR-Protein Plot) of SRY
All protein plots are generated from the gene sequences using Matlab (bioinformatics toolbox) (Fig. 4). Then boxcounting dimension for each of the protein plot have been calculated through BENOIT TM .
In the next section let us elaborate the methods applied to DNA string to extract the quantitative details .

Methods
The quantitative details of X and Y chromosomal genes have been studied in the light of fractal dimension. The very basic of one such fractal dimension method is Box Counting Dimension which is illustrated below.

Box-Counting Method:
The most practical and commonly used method of calculation of fractal dimension is Box-counting dimension.
This is mainly because it is easy to calculate mathematically and because it is easily estimated empirically. We note that the number of line segments of length that are needed to cover a line of length l is 1 δ , that the number of squares with side length δ that are needed to cover a square with area A is A δ 2 , and that the number of cubes with side length δ that are needed to cover a cube with volume V is V δ 3 ,. So in general, the box-counting dimension (or just ``box dimension'') of a set S subset of ℝ n as follows: For any ε > 0, let N ε (S) be the minimum number of n-dimensional cubes of side-length ε needed to cover S. If there is a number D so that Note that the log m term drops out, because it is constant while the denominator becomes infinite as ε → 0. Also, since 0 < ε < 1, log ε is negative, so D is positive.
But in practice, this method computes the number of cells required to entirely cover an object, with grids of cells of varying size. Practically, this is performed by superimposing regular grids over an object and by counting the number of occupied cells. The logarithm of N(r), the number of occupied cells, versus the logarithm of 1/r, where r is the size of one cell, gives a line whose gradient corresponds to the box dimension [14].

FD of DNA walks of the genes
The DNA walk is defined as a sum of the progression ∑ , = 1,2, … . . , & {1, 2, 3, 4} which is the cumulative sum on the DNA string representation . It has been resulted by plotting ( , ) as we have defined two functions: ≝ sin 2 − sin 2 and ≝ sin 2 − sin 2 .
Here we compute the Fractal dimension of all DNA walk for the 4-adic string of all sex genes. The plot of the DNA walk for the SRY string is shown in Fig. 5.

Hurst Exponent of the DNA sequences
Hurst exponent is referred as the "index of dependence," and is the relative tendency of a time series either to regress strongly to the mean or to cluster in a direction. It is a measure of long range correlation of one-dimensional time series [19,20].
Let us consider a string H = { h i } , i = 1,2, … . , n The Hurst exponent H is defined as : , where is the length of the string. The range for which the Hurst exponent, H indicates negative, positive auto-correlation are 0 < H < 0.5 and 0.5 < H < 1 respectively. A value of H=0.5 indicates a true random walk, where it is equally likely that a decrease or an increase will follow from any particular value [20].
Here we consider 2-adic strings of DNA for computation of Hurst exponent.

Succolarity
The degree of percolation of an image (how much a given fluid can flow through this image) can be measured through Succolarity, a fractal parameter [21].
The succolarity of a binary image is defined as Similarly, we have computed the succolarity of the decomposed images for all sex genes.

Statistical Autocorrelations
It is one of several descriptors, describing how far the values lie from the mean (expected value).
For a given sequence {Y1, Y2… YN}, 2 and the variance at distance N-k is given as It is easily computable that the variance for the string Fig. 1(C) is 1.29.

Mean and SD Ordering of Gene Sequences
A gene is a string constituting of different permutations of the base pairs A, C, T and G where repetition of a base pair is allowed. We can classify the miRNA sequences based on the ordering of poly -string mean of A, C, T, and G in the string. Given a string X, we calculate the mean of poly-strings consisting only of A, C, T and G separately [15,16]. According to the non-decreasing order of mean, we have classified all the genes into different classes. The mean order of sequence 1(A) is AUGC i.e. mean of poly-string of A is less than the same of U and so on.

Results and Discussions
Let us now elaborate in detail, the result obtained for all sex genes using the above stated methods.

Hurst Exponent of 2-adic DNA strings
We

Succolarity Indices
Succolarity measures how much a given fluid can flow through an image, considering as obstacles the set of pixels with a defined color (e.g. white) on 2D images analysis. In other words, it is a measure of continuous density of a 2D pattern.
The succolarity of A for all sex genes lies in the interval (0.000003, 0.2584) and so it is evident that the texture of A for each sex gene is having less density (Fig. 1).
The succolarity indices of the genomic textures of T of all the human sex genes with their corresponding homologues are spread over the interval (0.000001, 0.3077) (Fig. 1). It is seen that the succolarity i.e. the continuous density of the genomic texture of T is very low as it is seen in case of the genomic texture A . The succolarities for all these 93 genes are centred at 0.03 and there is no much deviation among the succolarity indices.
In case of the genomic texture G, The succolarity indices of all sex genes lie in the interval (0, 1.78) as shown in (Fig. 6). The succolarity indices of the genomic texture of C for all genes are computed. It is observed that the indices range from 0 to 0.2459.

Fig. 6: Descriptive Statistics of Succolarities of genomic texture A, T, C and G.
It is seen that the human sex genes and their corresponding homologues share almost same succolarity. Also the succolarity indexes of human sex genes are higher than their corresponding homologues of other species. For an example, the succolarity of A of human sex gene SOY is greater than Macaca mulatta's SOY of monkey and chimpanzee.
The correlation among succolarities of A, T, C and G are illustrated in the Tab. 4.

Tab. 4: Correlation coefficients for Succolarities of A, T, C and G
The correlation coefficient between the Suc-A and Suc-T is high and in contrast the correlation coefficient between the Suc-G and Suc-C is low.

Statistical Autocorrelations
The statistical autocorrelations ( ) of the 4-adic representations of all sex genes of human along with their corresponding homologues are being determined. It has been found that the values lie in the interval (1.06, 1.945).

Tab. 5: Descriptive statistics of statistical autocorrelations
The sigma values of all sex genes of human and their homologues are normally distributed as shown in the Tab. 5.

Fractal Dimensions of Threshold Decomposition Matrices
Here we consider the four different threshold decomposition matrices namely the template of A, T, C and G as we did in the 1.1 (D) for each of the sex-genes (Fig. 6). Then we have determined the fractal dimension of the threshold decomposed matrices.
In the Tab

Tab. 6: Descriptive statistics of fractal dimension of threshold decompositions
The fractal dimensions of the decomposed template of A and T follow normal distribution whereas fractal dimensions of the other templates do not follow the same. From the Fig. 7, it is seen that the fractal dimensions of the threshold decomposition matrices for A, T, G and C of genes ZFX (from Human X-chromosome) and ZFY (from Human Y-chromosome) are almost same although the genomic template are entirely different in terms of ordering of nucleotides.
The aforesaid fact holds good for all the one to one corresponding genes from X and Y chromosomes.

Fractal Dimensions of Skeleton of Threshold Decompositions
In the earlier subsection, the fractal dimensions of threshold decomposition matrices are found. Let us now find out the fractal dimension of the morphological skeleton of all decomposed threshold matrices.  In cases of ZFX and ZFY genes, the fractal dimensions of the morphological skeletons of the decomposed threshold matrices are almost same. Interestingly, the same is true for all human X and Y chromosomal genes namely as it is evident from the quantitative details (Supp. Met. 1).  Weight (MW) and Polarity (P) are following normal distribution (Fig. 9).

Fig. 10: Fractal dimension of Protein Plots of ZFX, ZFY and UTX, UTY.
In cases of the gene pairs (ZFX, ZFY) and (UTX, UTY), the fractal dimensions are agreed almost for all of the protein plots except one or two (Fig. 10). In our conviction, these non-agreements make them different as X and Y chromosomal gene. Interestingly, this fact is true for all the X and Y chromosomal homologues of human as evident from the quantitative data (Supp. Met-2).

Conclusion and Future Endeavours
In this paper, a quantitative and deterministic detail is adumbrated through which a given string of nucleotides can be inferred as a human X or Y chromosomal gene without seeking any biological experiment. This would help us in screening any given stretch of nucleotides of specific length as a Human sex-gene homologue. This quantitative detail of genomic imprints of sex genes would enable biologist to understand them in more precise way from the very genomic composition level and these understanding are the next challenge of current Genomics. It is noted that the proposed deterministic model is not only meant for human sex genes or its homologue but also can be treated as a standard prototype for other genes and genomes. In our future endeavours we would like to validate the model through the biological experiment.
Authors Contributions: Sk. S. Hassan conceptualized the problem and experiments and performed entire research with the rest of the authors of the article.