Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Terminology 2 Background 2.1 Input 2.2 Output 3 Process 3.1 Constraining the envelopes 3.2 Parameter training 4 In practice 5 References 6 External links

Stemloc

Add links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

Stemloc
Developer(s)	Ian Holmes (UC Berkeley)

Stable release	1

Written in	Dart
Operating system	UNIX, Linux, Mac, CygwinonWindows XP
Type	Bioinformatics tool
Licence	Open source
Website	Stemloc homepage

Inbioinformatics, Stemloc is an open source software for multiple RNA sequence alignment and RNA structure prediction based on probabilistic models of RNA structure known as Pair stochastic context-free grammars (also probabilistic context-free grammars). Stemloc attempts to simultaneously predict and align the structure of RNA sequences with an improved time and space cost compared to previous methods with the same motive. The resulting software implements constrained versions of the Sankoff algorithm by introducing both fold and alignment constraints, which reduces processor and memory usage and allows for larger RNA sequences to be analyzed on commodity hardware. Stemloc was written in 2004 by Ian Holmes.

Stemloc can be downloaded as part of the DART software package. It accepts input files in either FASTAorStockholm format.

Terminology

[edit]

Fold: RNA folding is the process by which an RNA molecule acquires secondary structure through intra-molecular interactions.
Fold envelope: The set of candidate folds to be considered in the algorithm
Alignment envelope: The set of candidate alignments to be considered in the algorithm

Background

[edit]

A previously developed algorithm by David Sankoff in 1985 uses dynamic programming to simultaneously align and predict multiple RNA structures. The Sankoff Algorithm takes time and space in big O notation $O(L^{3N})$ and $O(L^{2N})$ respectively for $N$ sequences of length $L$ . This is observantly expensive, and thus is the motivation to create better RNA analysis tools like Stemloc. The initial goal of Stemloc was to reduce the time and space cost of simultaneous alignment and structure prediction of two RNA sequences by using a stochastic context-free grammar (SCFG) scoring scheme and by implementing constrained versions of the Sankoff Algorithm.

Stemloc uses alignment envelopes and fold envelopes to simultaneously constrain both the alignment and the secondary structures of the sequences being compared. Fold envelopes can be used to "prune" the search over secondary structures and determine the subsequences of two RNA sequences that can be considered in the algorithm. For example, including or excluding specific nitrogen-bonded base pairings. Alignment envelopes can be used to "prune" the search over the alignments and determine possible "cutpoints" in the alignment of the two sequences. For example, including or excluding specific residue-level homologies. Fold envelopes are pre-calculated for each sequence individually, and alignment envelopes are pre-calculated by comparing the two sequences while ignoring secondary structures. Both global and local alignment is supported.

Input

[edit]

Input in Stemloc can either be in FASTA or Stockholm format (see above for descriptions of each). Sample input shown below:

stemloc --local dynalign.trna

The "--local" command analyzes the file in local alignment mode. Using "--global" will use global alignment mode.

Output

[edit]

This output is in Stockholm format. It shows the sequence names, the co-ordinates of the matches, the alignment, the consensus primary sequence, the secondary structure of each sequence, the consensus secondary structure, and the log-odds score of the alignment in bits. The "//" line is used to separate alignments or indicate end of file. Sample output shown below:

# STOCKHOLM 1.0
#=GR RD0260/26-67 SS ..<<<<<.......>>>>>.....(<<<<.......>>>>).
RD0260/26-67   UACUCCCCUGUCACGGGAGAGAAUGUGGGUUCAAAUCCCAUC
#=GC PS_cons   UAC..CCCUGUCACGG..G.GA..G.GGGUUC.AAUCCC..C
RD0500/26-66   UACGACCCUGUCACGGUCGUGA-CGCGGGUUCgAAUCCCGCC
#=GR RD0500/26-66 SS ..<<<<<.......>>>>>...-.<<<<<.......>>>>>.
#=GC SS_cons   ..<<<<<.......>>>>>.....<<<<<.......>>>>>.
#=GF SC      31.872
//

Process

[edit]

Stemloc relies heavily on stochastic context-free grammars, which can be seen as a scoring scheme for the algorithm. Because Sankoff's algorithm considers all possible folds and all possible alignments it is quite accurate and thorough, but it takes a measurable amount of time to obtain any results or output. To better this, Stemloc allows the user to constrain the total number of folds and alignments to be considered. More specifically, each sequence can be pre-folded individually in $O(L^{3})$ time and pre-aligned, ignoring secondary structure in $O(L^{2})$ time. For example, the using the "-fast" command below will only consider the 100 best RNA structures rather than analyzing all possible folds. Using the "-log DOTPLOT" command will output a visual representation of the fold and alignment envelopes.

stemloc nanos-tiny.rna -fast -log DOTPLOT

Constraining the envelopes

[edit]

The main idea of Stemloc is being able to set a threshold for the number of folds and alignments that are sampled to create the envelopes. This can be done with the options "-nf" and "-na", which sets the number of folds and alignments to be considered. (Using a -1 will unlimited the number of folds and alignments sampled, thus using -1 for both parameters will run the Sankoff algorithm on the input dataset.

stemloc nanos-tiny.rna -nf -1 -na -1

Parameter training

[edit]

Another feature of Stemloc is its ability to parameterize probabilistic models like stochastic context-free grammars from data. Stemloc utilizes the Inside-Outside algorithm and stochastic context-free grammars to maximize the likelihood of a training set. This is useful because the default parameters for Stemloc were trained on a selection of pairwise alignments of between 30% and 40% sequence identity from Rfam (database) version 5.0. These parameters however, are not always effective which is why being able to train parameters as a user can be helpful.

In practice

[edit]

Stemloc has since been used in a variety of research publications in RNA structure analysis. Most notably in the study of optimal multiple sequence alignment.

References

[edit]

Holmes I. (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics. 2005 Mar 24;6:73.
Sankoff D. (1985) Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems. SIAM Journal on Applied Mathematics. 1985 Oct;45:5.Sankoff D. (1985) Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems. SIAM Journal on Applied Mathematics. 1985 Oct;45:5.

External links

[edit]

Stemloc homepage and tutorial

Retrieved from "https://en.wikipedia.org/w/index.php?title=Stemloc&oldid=1191414520" Categories: ●Bioinformatics software ●Free software programmed in Dart ●Cross-platform free software Hidden categories: ●Articles with short description ●Short description is different from Wikidata ●This page was last edited on 23 December 2023, at 11:43 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view