ORF8 in SARS-CoV and SARS-CoV-2 are very divergent, with less than 20% sequence identity.[1] The full-length ORF8 in SARS-CoV encodes a protein of 122 residues. In many SARS-CoV isolates it is split into ORF8a and ORF8b, separately expressing 39-residue ORF8a and 84-residue ORF8b proteins.[6] It has been suggested that the ORF8a and ORF8b proteins may form a protein complex.[2][9] The cysteine residue responsible for dimerization of the SARS-CoV-2 protein is not conserved in the SARS-CoV sequence.[1] The ORF8ab protein has also been reported to form disulfide-linked multimers.[10]
Along with the genes for other accessory proteins, the ORF8 gene is located near those encoding the structural proteins, at the 5' end of the coronavirus RNA genome. Along with ORF6, ORF7a, and ORF7b, ORF8 is located between the membrane (M) and nucleocapsid (N) genes.[6][4] The SARS-CoV-2 ORF8 protein has a signal sequence for trafficking to the endoplasmic reticulum (ER)[4] and has been experimentally localized to the ER.[11] It is probably a secreted protein.[4][3]
There are variable reports in the literature regarding the localization of SARS-CoV ORF8a, ORF8b, or ORF8ab proteins.[6] It is unclear if ORF8b is expressed at significant levels under natural conditions.[10][12] The full-length ORF8ab appears to localize to the ER.[12]
The function of the ORF8 protein is unknown. It is not essential for viral replication in either SARS-CoV[6] or SARS-CoV-2,[4] though there is conflicting evidence on whether loss of ORF8 affects the efficiency of viral replication.[13]
In SARS-CoV, the ORF8 region is thought to have originated through recombination among ancestral bat coronaviruses.[3][6][5][20] Among the most distinctive features of this region in SARS-CoV is the emergence of a 29-nucleotidedeletion that split the full-length open reading frame into two smaller ORFs, ORF8a and ORF8b. Viral isolates from early in the SARS epidemic have a full-length, intact ORF8, but the split structure emerged later in the epidemic.[3][6] Similar split structures have since been observed in bat coronaviruses.[21] Mutations and deletions have also been seen in SARS-CoV-2 variants.[2][19] Based on observations in SARS-CoV, it has been suggested that changes in ORF8 may be related to host adaptation, but it is possible that ORF8 does not affect fitness in human hosts.[19][5] In SARS-CoV, a high dN/dS ratio has been observed in ORF8, consistent with positive selection or with relaxed selection.[5]
ORF8 encodes a protein whose immunoglobulin domain (Ig) has distant similarity to that of ORF7a.[1] It has been suggested that ORF8 likely have evolved from ORF7a through gene duplication,[2][7][8] though some bioinformatics analyses suggest the similarity may be too low to support duplication, which is relatively uncommon in viruses.[19] Immunoglobulin domains are uncommon in coronaviruses; other than the subset of betacoronaviruses with ORF8 and ORF7a, only a small number of bat alphacoronaviruses have been identified as containing likely Ig domains, while they are absent from gammacoronaviruses and deltacoronaviruses.[2][8] ORF8 is notably absent in MERS-CoV.[8] The beta and alpha Ig domains may be independent acquisitions, where ORF8 and ORF7a may have been acquired from host proteins.[2] It is also possible that the absence of ORF8 reflects gene loss in those lineages.[8]