Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Classification of protein structure  



1.1  Method  





1.2  Comparison with homology modeling  





1.3  More about threading  







2 Protein threading software  





3 See also  





4 References  





5 Further reading  














Threading (protein sequence)






Čeština
Français

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Inmolecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it (protein threading) is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

The prediction is made by "threading" (i.e. placing, aligning) each amino acid in the target sequence to a position in the template structure, and evaluating how well the target fits the template. After the best-fit template is selected, the structural model of the sequence is built based on the alignment with the chosen template. Protein threading is based on two basic observations: that the number of different folds in nature is fairly small (approximately 1300); and that 90% of the new structures submitted to the PDB in the past three years have similar structural folds to ones already in the PDB.

Classification of protein structure

[edit]

The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the structural and evolutionary relationships of known structure. Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily, and fold:

Method

[edit]

A general paradigm of protein threading consists of the following four steps:

  1. The construction of a structure template database: Select protein structures from the protein structure databases as structural templates. This generally involves selecting protein structures from databases such as PDB, FSSP, SCOP, or CATH, after removing protein structures with high sequence similarities.
  2. The design of the scoring function: Design a good scoring function to measure the fitness between target sequences and templates based on the knowledge of the known relationships between the structures and the sequences. A good scoring function should contain mutation potential, environment fitness potential, pairwise potential, secondary structure compatibilities, and gap penalties. The quality of the energy function is closely related to the prediction accuracy, especially the alignment accuracy.
  3. Threading alignment: Align the target sequence with each of the structure templates by optimizing the designed scoring function. This step is one of the major tasks of all threading-based structure prediction programs that take into account the pairwise contact potential; otherwise, a dynamic programming algorithm can fulfill it.
  4. Threading prediction: Select the threading alignment that is statistically most probable as the threading prediction. Then construct a structure model for the target by placing the backbone atoms of the target sequence at their aligned backbone positions of the selected structural template.

Comparison with homology modeling

[edit]

Homology modeling and protein threading are both template-based methods and there is no rigorous boundary between them in terms of prediction techniques. But the protein structures of their targets are different. Homology modeling is for those targets which have homologous proteins with known structure (usually/maybe of same family), while protein threading is for those targets with only fold-level homology found. In other words, homology modeling is for "easier" targets and protein threading is for "harder" targets.

Homology modeling treats the template in an alignment as a sequence, and only sequence homology is used for prediction. Protein threading treats the template in an alignment as a structure, and both sequence and structure information extracted from the alignment are used for prediction. When there is no significant homology found, protein threading can make a prediction based on the structure information. That also explains why protein threading may be more effective than homology modeling in many cases.

In practice, when the sequence identity in a sequence sequence alignment is low (i.e. <25%), homology modeling may not produce a significant prediction. In this case, if there is distant homology found for the target, protein threading can generate a good prediction.

More about threading

[edit]

Fold recognition methods can be broadly divided into two types: those that derive a 1-D profile for each structure in the fold library and align the target sequence to these profiles; and those that consider the full 3-D structure of the protein template. A simple example of a profile representation would be to take each amino acid in the structure and simply label it according to whether it is buried in the core of the protein or exposed on the surface. More elaborate profiles might take into account the local secondary structure (e.g. whether the amino acid is part of an alpha helix) or even evolutionary information (how conserved the amino acid is). In the 3-D representation, the structure is modeled as a set of inter-atomic distances, i.e. the distances are calculated between some or all of the atom pairs in the structure. This is a much richer and far more flexible description of the structure, but is much harder to use in calculating an alignment. The profile-based fold recognition approach was first described by Bowie, Lüthy and David Eisenberg in 1991.[1] The term threading was first coined by David Jones, William R. Taylor and Janet Thornton in 1992,[2] and originally referred specifically to the use of a full 3-D structure atomic representation of the protein template in fold recognition. Today, the terms threading and fold recognition are frequently (though somewhat incorrectly) used interchangeably.

Fold recognition methods are widely used and effective because it is believed that there are a strictly limited number of different protein folds in nature, mostly as a result of evolution but also due to constraints imposed by the basic physics and chemistry of polypeptide chains. There is, therefore, a good chance (currently 70-80%) that a protein which has a similar fold to the target protein has already been studied by X-ray crystallographyornuclear magnetic resonance (NMR) spectroscopy and can be found in the PDB. Currently there are nearly 1300 different protein folds known, but new folds are still being discovered every year due in significant part to the ongoing structural genomics projects.

Many different algorithms have been proposed for finding the correct threading of a sequence onto a structure, though many make use of dynamic programming in some form. For full 3-D threading, the problem of identifying the best alignment is very difficult (it is an NP-hard problem for some models of threading).[citation needed] Researchers have made use of many combinatorial optimization methods such as conditional random fields, simulated annealing, branch and bound and linear programming, searching to arrive at heuristic solutions. It is interesting to compare threading methods to methods which attempt to align two protein structures (protein structural alignment), and indeed many of the same algorithms have been applied to both problems.

Protein threading software

[edit]

See also

[edit]

References

[edit]
  1. ^ Bowie JU, Lüthy R, Eisenberg D (1991). "A method to identify protein sequences that fold into a known three-dimensional structure". Science. 253 (5016): 164–170. Bibcode:1991Sci...253..164B. doi:10.1126/science.1853201. PMID 1853201.
  • ^ Jones DT, Taylor WR, Thornton JM (1992). "A new approach to protein fold recognition". Nature. 358 (6381): 86–89. Bibcode:1992Natur.358...86J. doi:10.1038/358086a0. PMID 1614539. S2CID 4266346.
  • ^ Peng, Jian; Jinbo Xu (2011). "RaptorX: exploiting structure information for protein alignment by statistical inference". Proteins. 79 Suppl 10 (Suppl 10): 161–171. doi:10.1002/prot.23175. PMC 3226909. PMID 21987485.
  • ^ Peng, Jian; Jinbo Xu (2010). "Low-homology protein threading". Bioinformatics. 26 (12): i294–i300. doi:10.1093/bioinformatics/btq192. PMC 2881377. PMID 20529920.
  • ^ Peng, Jian; Jinbo Xu (April 2011). "A multiple-template approach to protein threading". Proteins. 79 (6): 1930–1939. doi:10.1002/prot.23016. PMC 3092796. PMID 21465564.
  • ^ Ma, Jianzhu; Sheng Wang; Jinbo Xu (June 2012). "A conditional neural fields model for protein threading". Bioinformatics. 28 (12): i59–66. doi:10.1093/bioinformatics/bts213. PMC 3371845. PMID 22689779.
  • ^ Wu S, Zhang Y (2008). "MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information". Proteins. 72 (2): 547–56. doi:10.1002/prot.21945. PMC 2666101. PMID 18247410.
  • ^ Yang Y, Faraggi E, Zhao H, Zhou Y (2011). "Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates". Bioinformatics. 27 (15): 2076–2082. doi:10.1093/bioinformatics/btr350. PMC 3137224. PMID 21666270.
  • ^ Gront D, Blaszczyk M, Wojciechowski P, Kolinski A (2012). "BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles". Nucleic Acids Research. 40 (W1): W257–W262. doi:10.1093/nar/gks555. PMC 3394251. PMID 22693216.
  • Further reading

    [edit]
  • Lathrop RH (1994). "The protein threading problem with sequence amino acid interaction preferences is NP-complete". Protein Eng. 7 (9): 1059–1068. CiteSeerX 10.1.1.367.9081. doi:10.1093/protein/7.9.1059. PMID 7831276.
  • Jones DT, Hadley C (2000). "Threading methods for protein structure prediction". In Higgins D, Taylor WR (eds.). Bioinformatics: Sequence, structure and databanks. Heidelberg: Springer-Verlag. pp. 1–13.
  • Xu J, Li M, Kim D, Xu Y (2003). "RAPTOR: Optimal Protein Threading by Linear Programming, the inaugural issue". J Bioinform Comput Biol. 1 (1): 95–117. CiteSeerX 10.1.1.5.4844. doi:10.1142/S0219720003000186. PMID 15290783.
  • Xu J, Li M, Lin G, Kim D, Xu Y (2003). "Protein threading by linear programming". Pac Symp Biocomput: 264–275. PMID 12603034.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Threading_(protein_sequence)&oldid=1235220970"

    Categories: 
    Protein methods
    Bioinformatics
    NP-complete problems
    Hidden categories: 
    CS1: long volume value
    Articles with short description
    Short description is different from Wikidata
    Articles needing additional references from July 2016
    All articles needing additional references
    All articles with unsourced statements
    Articles with unsourced statements from February 2009
     



    This page was last edited on 18 July 2024, at 07:08 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki