Computational analysis to identify deleterious nsSNPs and its impact on IL-6 protein in inflammation

IL-6 is a pro-inflammatory cytokine involved in driving inflammation and the acute phase immune response. Hence, it is important to predict SNPs which could affect IL-6 stability and thus inflammatory conditions. Therefore, this study was undertaken to find the functional nsSNPs in IL 6. Out of total 243 SNPs, 37 were nsSNPs (non-synonymous), 7 occurred in the mRNA 3’UTR, 7 occurred in 5’ UTR region, 193 occurred in intronic regions and rest were other types of SNPs. Among the predicted nsSNPs, rs2069860, rs11544633 were identified as deleterious and damaging by different programs. Additionally, I-Mutant showed a decrease in stability for these nsSNPs upon mutation. Protein structural analysis with these amino acid variants was performed by using I-Mutant and Swiss PDB viewer to check their molecular dynamics and energy minimization calculations. We also identified several IL 6 sites that may undergo post-translational modification, including sites that coincide with the location of high-risk nsSNPs. This study suggested that D162V and P119L variants of IL-6 could directly or indirectly destabilize the amino acid interactions and could be useful for evaluation of genotype association study with inflammatory diseases.


Introduction
DNA sequence variations of any organisms that occur when a single nucleotide (A, T, C, or G) in the genome is altered are called single nucleotide polymorphisms (SNPs).SNPs are found to occur frequently after every 100-300 bp in genomic sequence [1] .Both coding (gene) and non-coding regions of the genome contain SNPs.SNPs have been shown to affect differently in different cell types, many SNPs have been found to cause susceptibility of the disease.There are two types of SNPs: synonymous and non-synonymous SNPs (nsSNPs).Since, nsSNPs are to be found in protein coding regions, it can cause an amino acid substitution in the corresponding protein product and leads to alter the stability of the protein.As, nsSNPs can alter the structure, stability, or function of proteins, they are often found to be associated with human disease.nsSNPs are suggested to be responsible for almost half of the known genetic variations related to human inherited disease [2] .IL-6 is a pro-inflammatory cytokine and it is found to activate the inflammatory reactions [3] .A number of studies have recommended that estimation of IL-6 and CRP are major markers for prediction of systemic inflammation in diverse pathologic conditions [4][5][6] .Different types of inflammatory insults results in IL-6 secretion.Although, many clinical factors are known to activate secretion of IL-6, recent studies reported that genetic variations can also

RESEARCH HIGHLIGHT
influence IL-6 production.IL-6 promoter polymorphisms like -597 (G/A), -572 (C/G), -174 (G/C), -634 (C/G) and -190 (C/T) are reported to be associated with clinical outcome of many inflammatory conditions such as rheumatoid arthritis [7] , lipid abnormalities [8] , and bone mineral density [9] .IL-6 174 G/C polymorphism is found to alter the transcriptional activity and thus secretion of IL-6 [10,11] .It is also reported that the CC genotype of IL-6-174 G/C SNP is associated with cardiovascular events in patients of hemodialysis [12] .IL-6-634 C/G polymorphism has also been shown to be associated with increased secretion of IL-6 by peripheral blood mononuclear cell [13,14] and IL-6-634 G allele has been mentioned as an initiating factor in the development and succession of kidney disease [14,15] .
It is suggested that many tools are useful to predict deleterious SNPs of a gene before planning population study [18] In addition to this, various associations of SNPs with related disease may not be confirmed by following independent studies [16][17][18][19] .Therefore, independent confirmation of functionality of SNPs obtained by using prediction tools would be valuable to distinguish true associations from false positives [20] , as shown recently by the functional SNP analysis of the BRCA1, ABL1, TRIM22, TNF-α and EGFR genes [21][22][23][24][25][26] .Although several studies have mentioned the association of SNPs in the IL-6 gene with different types of inflammatory diseases, in silico analysis has not yet been performed on the structural and functional consequences of SNPs of IL-6.We performed computational analysis using various algorithms such as Sorting Intolerant from Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), PhD SNP, P-MUT and SNP&GO to identify likely deleterious SNPs which could affect protein function.Since these mutations are very likely to affect protein stability, we have identified modeled structures of mutant protein and observed its anomalies by comparing with native structure.

Structural analysis
The full length protein models by excising continuous fragments from threading alignments were created by I-TASSER (http://zhanglab.ccmb.med.umich.edu/I-TASSER.Confidence score (C-score) given by I-TASSER indicates the quality of a template.The C-score is generally found in the range of -5 to 2, wherein higher values denotes the high confidence for predicted model.Swiss PDB viewer was used to introduce mutation identified by above tools and for calculation of energy minimization.It was further validated by Ramachandran plot by using Ramaplot algorithm.

Protein stability analysis
I-Mutant version 2.0 was used to assess the changes in protein stability induced by nsSNPs [34] .Prediction of protein stability by I-Mutant is based on the estimation of free energy change of wild type and mutant structures.It also predicts the sign (increase or decrease) of the free energy change value (DDG), along with a reliability index for the results (RI: 0-10, where 0 is the lowest reliability and 10 is the highest reliability).A DDG<0 corresponds to a decrease in protein stability, whereas a DDG>0 corresponds to an increase in protein stability.

Results and Discussion
SNPs are the most frequent gene variation found in genomic sequence of any organism.After the human genome project, many SNPs are recognized in human which could be very useful for to study genotype-phenotype association.However, it is not very clear until now about mechanisms by which a SNP may result in phenotypic change.Many SNPs which are found to cause inherited disease are nsSNPs [2] .Since, nsSNPs are likely to be associated with inherited disease by destabilizing protein structure; we only analyzed nsSNPs of IL-6 in this study.

SNP dataset
Genetic variation data for the IL-6 gene were retrieved from the NCBI dbSNP database, the Ensembl genome browser and the UniProt database as shown in Figure 1 [27][28][29] .According to these databases, the IL-6 gene contains 37 nsSNPs, 193 SNPs in intronic region, 7 SNPs in its 5' UTR, and 7 SNPs in its 3' UTR.To determine whether a given mutation affected IL 6 function, we subjected the later 37 nsSNPs to a variety of SNP prediction algorithms.We found results of only four nsSNPs as shown in table 1, the rest of the nsSNPs were not found by different algorithms used in this study.

Non-synonymous SNP analysis
As IL-6 is a pro-inflammatory cytokine involved in inflammatory reactions; nsSNPs in IL-6 gene may play an important role in predisposition to inflammatory diseases.Therefore, an effort was made to identify nsSNPs which are deleterious for the IL-6 gene.
We have performed identification of deleterious analysis by using six SNP prediction algorithms such as Polyphen-2, SIFT PhD-SNP, PMUT, and SNPs&GO [30][31][32][33] .It was predicted by Polyphen that only 1 nsSNP is damaging to IL 6 functions.Outcome of SIFT analysis suggests that 2 nsSNPs are deleterious to IL 6 function and 2 nsSNPs are tolerated (Table 1).PhD-SNP predicted that only rs11544633 is pathological and rest nsSNPs were found to be neutral.PMUT algorithm predicted that 2 nsSNPs are pathological and 2 nsSNPs are neutral (Table 1).SNPs &GO analysis, which includes information from the Gene Ontology annotation predicted that 2 nsSNPs cause disease and 2 nsSNPs are neutral.Since every algorithm works on diverse parameters to assess the nsSNPs, those SNPs which have more positive results are more probable to be deleterious.

Structural Analysis of Mutant Structures
To assess whether D162V and P119L changed the structure of IL-6, we individually substituted each nsSNP into the wild type IL-6 sequence and submitted the sequences to I-TASSER and I -mutant for structural analysis.We then performed comparison of modeled mutant 3D model to the predicted 3D model of wild type IL-6 by Swiss PDB viewer (Figure 2).
Native IL-6 protein structure is essential for the testing of structural and functional impacts of activating mutations, which are increasingly being identified in several human inflammatory diseases.Therefore, in the present study, we predicted the secondary structure of native IL-6 protein using I-TASSER and discussed the result of highly damaging nsSNPs on the structural and functional aspects of protein.
We have assessed the Ramachandran plot (Figure 3) to corroborate quality of our IL-6 secondary structure.
We compared total energy values (kcal/mol) of native structure and mutated modeled structure for IL-6 gene variants.The total energy of wild type structure after energy minimization was different on comparison with gene variants (Table 2).Additionally, the structural stability of the protein was confirmed by Ramachandran plot (Figure 3).164 (97%) residues were found in favored region, 5 (3%) residues were in the allowed region and no residue (0%) was found in outlier region.

Prediction of protein structural stability
Analysis of protein stability alterations by considering the single-site mutation was performed using I-mutant.It works by giving different values for free energy alterations which is calculated by FOLD-X energy based web server.By combining the FOLD-X estimations and those of I-Mutant, high accuracy and precision can be achieved.The two mutations (162, D→V and 119, P→L) of IL-6 gene were selected on the basis of prediction by different algorithms.These variants were fed into I-Mutant web server to predict the protein stability and reliability index (RI) upon mutation (Table 3).

Prediction of post -translational modification sites in IL 6
We used several prediction algorithms to identify putative sites in the IL-6 protein which can be helpful to examine how    nsSNPs may influence the post-translational modification (PTM) of IL-6.PTMs are implicated in various biological processes such as immunity related pathways, cell signaling pathways.PTMs are also crucial for the regulation of structure and functions of many proteins.On account of this, many algorithms like UbPred, SUMO-plot, and SUMOsp 2.0 programs were used for this study.For prediction of phosphorylation sites in the IL-6 protein, GPS 2.1 and NetPhos 2.0 servers were used [35][36][37][38][39] .UbPred predicted that residue 37 in IL-6 undergo ubiquitylation (Table 4).It was also found out that residue 74, 92, 94, 98 and 174 are potential candiadates for the ubiquitylation with medium confidence.SUMOplot predicted that 2 residues in IL-6 undergo sumoylation (Table 5).
We found that 6 serine, 2 threonine and 1 tyrosine residue were potential target for phospholrylation in IL-6 protein.IL-6 protein stability could also be decrease by several high-risk nsSNPs and low-risk nsSNPs located at putativePTM sites.If stability of any protein is decreased it will result in decreased net function.Detailed analysis is required to examine the effects of these nsSNPs on the structure and function of IL-6 protein.Pertinent IL-6 residues which are predicted to be highly deleterious and/or undergo PTMs are denoted in Figure 4.

Prediction of Glycoslated residues of IL-6
Glycosylation is one of the most abundant and a significant post-translational modification of proteins.Glycosylated proteins (glycoproteins) are involved in many cellular biological processes such as protein folding, cell-cell communication, cell signaling and host-pathogen interactions.Eukaryotic glycoproteins can be useful for therapeutic and drug designing.Hence, characterization and analysis of residues which are prone to glycosylation in IL-6 would be very helpful for studying many inflammatory conditions caused by glycosylation of IL-6.Hence we performed analysis for prediction of glycosylated residues of IL 6 protein, which are shown in table 6.

Conclusions
Our findings showed that many nsSNPs in the inflammatory IL-6 gene may be deleterious to IL-6 structure and/or function.In this study, we show that 2 high-risk nsSNPs can disrupt the putative structure of IL-6 protein.
Results of structural analysis showed that the amino acid residue substitutions which had the greatest impact on the stability of the IL-6 protein were mutations D162V (rs2069860) and P119L (rs11544633).On the basis of this findings, we conclude that these SNPs are potential candidates in causing inflammatory diseases related to IL-6 gene malfunction.Besides this analysis, we have also identified many IL-6 sites which are more probable to undergo post-translational modification, together with sites that coincide with the location of high-risk nsSNPs.This study is the first extensive computational analysis of nsSNPs in the IL-6 gene.

Figure 2 .
Figure 2. (a) Native structure of protein.(b) Mutant modeled structure showing valine residue at position 162; Deep view of superimposed structure of wild and mutant residue at 162 position.(c) Mutant modeled structure showing leucine residue at position 119; Deep view of superimposed structure of wild and mutant residue at 119 position.

Figure 4 .
Figure 4. Putative sites for phosphorylation at different positions of IL-6 protein.