SOGA2, also known as Suppressor of glucose autophagy associated 2orCCDC165, is a protein that in humans is encoded by the SOGA2gene.[5][6]
SOGA2 has two human paralogs, SOGA1 and SOGA3.[7][8]
In humans, the gene coding sequence is 151,349 base pairs long, with an mRNA of 6092 base pairs, and a protein sequence of 1586 amino acids. The SOGA2 gene is conserved in gorilla, baboon, galago, rat, mouse, cat, and more. There is distant conservation seen in organisms such as zebra finches and anoles.[9]
SOGA2 is ubiquitously expressed in humans, with especially high expression in brain (especially the cerebellum and hippocampus), colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.[10]
There are two main paralogs to SOGA2: human protein SOGA1 and human protein SOGA3.[9]SOGA1 has been shown to be involved in suppression of glucose by autophagy.[12] The rate at which orthologs diverge from SOGA2 human(measured by % identity) places the approximate duplication event of SOGA1 from SOGA2 at ~254.1 MYA and the duplication event of SOGA3 from SOGA2 ~329.1 MYA.
protein name
accession number
sequence length (aa)
sequence identity to human protein
notes
SOGA3
NP_001012279.1
947
58%
conserved in ~500 N-terminal aa
SOGA1 isoform 2
NP_954650.2
1016 aa
65%
conserved in first ~900 aa
SOGA1 isoform 1
NP_542194.2
1661
41%
conserved across the length of sequence except ~950-1150
A comparison of multiple sequence alignment of the N-terminal regions vs. C-terminal regions of distantly related SOGA2 orthologs. Here it is demonstrated that the N-terminal region is well conserved in organisms like the clawed frog (FROG_SOGA2) but the C-terminal region is not. Location 19 is an example of one of the 7 Leucine residue that is conserved across all orthologs.
SOGA2 is rich in glycine (ratio r of SOGA2 composition to average human protein is 1.723), glutamate (r = 1.647), and arginine
(r = 1.357). It also has a lower than usual composition of tyrosine (r = 0.3406), isoleucine (r = 0.4430),
phenylalanine (r = 0.5808), and valine (r = 0.6161).[14][15]
SOGA2 has 4 isoforms: Q9Y4B5-1, Q9Y4B5-2, Q9Y4B5-3, Q9Y4B5-4.[16]
A graphic depicting the 4 different isoforms of SOGA2. Isoform 1 is canonical. Modification Key: * E → ELRGPPVLPEQSVSIEELQGQLVQAARLHQEETETFTNKIHK **Q → QNCCGYPRINIEEETLGFTRLPAGSTVKTLKSLGLQRLE *** NQTVLLTAPWGL → ELPCSALAPS...LHGLSQYNSL
SOGA2 contains Domain of Unknown Function 4201 (DUF4201) from aa 16-235. This domain is specific to the Coiled Coil Domain Containing family of proteins in eukaryotes.[17] It also contains two copies of Domain of Unknown Function 3166 (DUF3166): one from aa 140-235 and one from aa 269-364.[11]
The consensus of the prediction software PELE,[21] GOR4,[22] and SOSUICoil is that the secondary structure of SOGA2 is dominated by alpha helices with interspersed regions of random coil. GOR4 indicated that SOGA2 is dominated by alpha-helices; it predicted a mere 5.61% of
residues in an extended strand (parallel or antiparallel Beta-sheet) conformation, as opposed to
47.79% alpha helix and 46.6% random coils.
Secondary structure of human SOGA2 predicted by the GOR4 tool. h corresponds to alpha helices, c corresponds to random coils, and e corresponds to extended strand
SOGA2 shares sequence features in its highly conserved N-terminal region. This homology allows prediction of its tertiary structure on the basis of homology to published 3d structures via Phyre2[24] and NCBI structure.[25]
SOGA2's 3d structure predicted by Phyre2.[24] Structure is based on the crystal structure of tropomyosin at 7 angstrom resolution, with 12% identity. 283 residues match, in the CCDC containing N-terminal region.
1I84 S, Heavy Meromyosin Subfragment Of Chicken Gizzard Smooth Muscle Myosin With Regulatory Light Chain In The Dephosphorylated State 3d structure. Highlighted region is conserved in SOGA2.[25]
The EST profile shows that, in humans, SOGA2 is highly expressed in many sites throughout the body, including bone, brain, ear, eye, and many others.[26] There are a large number of transcripts in liver cancer samples. Human microarray data show that SOGA2 is moderately expressed, with especially high expression in brain (especially the cerebellum and hippocampus), colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.[10] Brain-tissue-specific microarray data show that SOGA2 has high expression throughout the posterior lobe of the cerebellar hemispheres and posterial lobe of the vermis in the mouse brain. There is low expression in most other areas of the brain.[27]
In humans, the SOGA2 gene produces 17 different transcripts, 8 of which form a protein product (one undergoes nonsense mediated decay). The main transcript in humans is transcript ID ENST00000359865, or SOGA2-001.[28]
Protein complex co-immunoprecipitation (Co-IP) experiments revealed interacting proteins such as cell death regulators, ATP-binding cassette (ABC) transporters and protein kinase A binding proteins.[30]
K-nearest neighbor analysis by wolf pSort indicates that in humans, SOGA2 is focused mainly in the nucleus, cytoplasm, and the cytonuclear
space. There is a small chance that it is localizes to the golgi.[31]
A number of protein interactants were also identified via the STRING database, including MARK2, MARK4, and PPP2R2B.
^Blom N; Gammeltoft S; Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". J. Mol. Biol. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID10600390.
Beausoleil SA, Villén J, Gerber SA, et al. (2006). "A probability-based approach for high-throughput protein phosphorylation analysis and site localization". Nat. Biotechnol. 24 (10): 1285–92. doi:10.1038/nbt1240. PMID16964243. S2CID14294292.