Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 RefSeq categories  





2 RefSeq Projects  





3 Statistics  





4 See also  





5 References  





6 Sources  





7 External links  














RefSeq






العربية
Bosanski
Català
Español
Galego
مصرى
Nederlands
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Refseq

Content

Description

curated non-redundant sequence database of genomes.

Contact

Research center

National Center for Biotechnology Information

Primary citation

Pruitt KD & al. (2005)[1]

Access

Website

https://www.ncbi.nlm.nih.gov/RefSeq

The Reference Sequence (RefSeq) database[1] is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products. RefSeq was introduced in 2000.[2][3] This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule (i.e. DNA, RNA or protein) for major organisms ranging from virusestobacteriatoeukaryotes.

For each model organism, RefSeq aims to provide separate and linked records for the genomic DNA, the gene transcripts, and the proteins arising from those transcripts. RefSeq is limited to major organisms for which sufficient data are available (121,461 distinct "named" organisms as of July 2022),[4] while GenBank includes sequences for any organism submitted (approximately 504,000 formally described species).[5]

RefSeq categories[edit]

RefSeq collection comprises different data types, with different origins, so it is necessary to establish standard categories and identifiers to store each data type. The most important categories are:

RefSeq accession categories and molecule types

Category

Description

NC

Complete genomic molecules

NG

Incomplete genomic region

NM

mRNA

NR

ncRNA

NP

Protein

XM

predicted mRNA model

XR

predicted ncRNA model

XP

predicted Protein model (eukaryotic sequences)

WP

predicted Protein model (prokaryotic sequences)

For more details and more categories, see Table 1inChapter 18 of the book The Reference Sequence (RefSeq) Database.

RefSeq Projects[edit]

Several projects to improve RefSeq services are currently in development by the NCBI, often in collaboration with research centers such as EMBL-EBI:

Statistics[edit]

According to the RefSeq release 213 (July 2022), the number of species represented in the database by counting distinct taxonomic IDs are as follows:[4]

Taxonomic ID

Species

Archaea

1443

Bacteria

69122

Fungi

16869

Invertebrate

5715

Mitochondrion

13648

Plant

9177

Plasmid

6073

Plastid

9430

Protozoa

746

Vertebrate (mammalian)

1509

Viral

11620

Vertebrate (other)

5237

Other

4

Complete

121461

The counts of accession and basepairs per molecule type are:[4]

Molecule type

Accessions

Basepairs/residues

Genomics

40,758,769

2.923212393984×10^12

RNA

45,781,716

1.22253022047×10^11

Protein

234,520,053

9.129062394×10^10

See also[edit]

References[edit]

  1. ^ a b Pruitt KD, Tatusova T, Maglott DR (January 2005). "NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins". Nucleic Acids Research. 33 (Database issue): D501–D504. doi:10.1093/nar/gki025. PMC 539979. PMID 15608248.
  • ^ Maglott DR, Katz KS, Sicotte H, Pruitt KD (January 2000). "NCBI's LocusLink and RefSeq". Nucleic Acids Research. 28 (1): 126–128. doi:10.1093/nar/28.1.126. PMC 102393. PMID 10592200.
  • ^ Pruitt KD, Katz KS, Sicotte H, Maglott DR (January 2000). "Introducing RefSeq and LocusLink: curated human genome resources at the NCBI". Trends in Genetics. 16 (1): 44–47. doi:10.1016/s0168-9525(99)01882-x. PMID 10637631.
  • ^ a b c RefSeq Release 213 Statistics (Report). National Library of Medicine. 11 July 2022. Retrieved 20 July 2022.
  • ^ Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I (January 2022). "GenBank". Nucleic Acids Research. 50 (D1): D161–D164. doi:10.1093/nar/gkab1135. PMC 8690257. PMID 34850943.
  • ^ Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, et al. (July 2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Research. 19 (7): 1316–1323. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102.
  • ^ Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, et al. (January 2018). "Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation". Nucleic Acids Research. 46 (D1): D221–D228. doi:10.1093/nar/gkx1031. PMC 5753299. PMID 29126148.
  • ^ Farrell CM, Goldfarb T, Rangwala SH, Astashyn A, Ermolaeva OD, Hem V, et al. (January 2022). "RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse". Genome Research. 32 (1): 175–188. doi:10.1101/gr.275819.121. PMC 8744684. PMID 34876495.
  • ^ Gulley ML, Braziel RM, Halling KC, Hsi ED, Kant JA, Nikiforova MN, et al. (June 2007). "Clinical laboratory reports in molecular pathology". Archives of Pathology & Laboratory Medicine. 131 (6): 852–863. doi:10.5858/2007-131-852-CLRIMP. PMID 17550311.
  • ^ "NCBI RefSeq Targeted Loci Project". www.ncbi.nlm.nih.gov. Retrieved 2022-07-27.
  • ^ Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, et al. (January 2017). "Virus Variation Resource - improved response to emergent viral outbreaks". Nucleic Acids Research. 45 (D1): D482–D490. doi:10.1093/nar/gkw1065. PMC 5210549. PMID 27899678.
  • ^ "NCBI RefSeq Select". www.ncbi.nlm.nih.gov. Retrieved 2022-07-27.
  • ^ Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, et al. (April 2022). "A joint NCBI and EMBL-EBI transcript set for clinical genomics and research". Nature. 604 (7905): 310–315. doi:10.1038/s41586-022-04558-8. PMC 9007741. PMID 35388217.
  • Sources[edit]

    External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=RefSeq&oldid=1195581232"

    Categories: 
    Genetics databases
    National Institutes of Health
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
    Wikipedia articles incorporating text from the United States National Library of Medicine
     



    This page was last edited on 14 January 2024, at 12:36 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki