Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Description 2 C Code for Raita algorithm 3 Example 4 Complexity 5 See also 6 References 7 External links

Raita algorithm

●Français ●Српски / srpski Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)

The topic of this article may not meet Wikipedia's general notability guideline. Please help to demonstrate the notability of the topic by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to be merged, redirected, or deleted.
Find sources: "Raita algorithm" – news · newspapers · books · scholar · JSTOR (March 2015) (Learn how and when to remove this message)

This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this articlebyintroducing citations to additional sources.
Find sources: "Raita algorithm" – news · newspapers · books · scholar · JSTOR (March 2015)

(Learn how and when to remove this message)

In computer science, the Raita algorithm is a string searching algorithm which improves the performance of Boyer–Moore–Horspool algorithm. This algorithm preprocesses the string being searched for the pattern, which is similar to Boyer–Moore string-search algorithm. The searching pattern of particular sub-string in a given string is different from Boyer–Moore–Horspool algorithm. This algorithm was published by Timo Raita in 1991.^[1]

Description

[edit]

Raita algorithm searches for a pattern "P" in a given text "T" by comparing each character of pattern in the given text. Searching will be done as follows. Window for a text "T" is defined as the length of "P".

First, last character of the pattern is compared with the rightmost character of the window.
If there is a match, first character of the pattern is compared with the leftmost character of the window.
If they match again, it compares the middle character of the pattern with middle character of the window.

If everything in the pre-check is successful, then the original comparison starts from the second character to last but one. If there is a mismatch at any stage in the algorithm, it performs the bad character shift function which was computed in pre-processing phase. Bad character shift function is identical to the one proposed in Boyer–Moore–Horspool algorithm.^[1]

A modern formulation of a similar pre-check is found in std::string::find, a linear/quadratic string-matcher, in libc++ and libstdc++. Assuming a well-optimized version of memcmp, not skipping characters in the "original comparison" tends to be more efficient as the pattern is likely to be aligned.^[2]

C Code for Raita algorithm

[edit]

#include <limits.h>
#include <stddef.h>

#define ALPHABET_SIZE (1 << CHAR_BITS) /* typically 256 */

/* Preprocessing: the BMH bad-match table. */
static inline void preBmBc(char *pat, size_t lpat, ptrdiff_t bmBc[]) {
  size_t i;
  for (i = 0; i < ALPHABET_SIZE; ++i)
    bmBc[i] = lpat;
  for (i = 0; i < lpat - 1; ++i)
    bmBc[pat[i]] = lpat - i - 1;
}

void RAITA(char *pat, size_t lpat, char *s, size_t n) {
  ptrdiff_t bmBc[ALPHABET_SIZE];

  /* Quick edge cases. */
  if (lpat == 0 || lpat > n)
    return;

  if (lpat == 1) {
    char *match_ptr = s;
    while (match_ptr < s + n) {
      match_ptr = memchr(match_ptr, pat[0], n - (match_ptr - s));
      if (match_ptr != NULL) {
        OUTPUT(match_ptr - s);
        match_ptr++;
      } else
        return;
    }
  }

  preBmBc(pat, lpat, bmBc);

  /* The prematch-window. */
  char firstCh = pat[0];
  char middleCh = pat[lpat / 2];
  char lastCh = pat[lpat - 1];

  /* Searching */
  ptrdiff_t j = 0;
  while (j <= n - m) {
    char c = s[j + lpat - 1];
    /* This could harm data locality on long patterns. For these consider reducing
     * the number of pre-tests, or using more clustered indices. */
    if (lastCh == c && middleCh == s[j + lpat / 2] && firstCh == s[j] &&
        memcmp(&pat[1], &s[j+1], lpat - 2) == 0)
      OUTPUT(j);
    j += bmBc[c];
  }
}

Example

[edit]

Pattern: abddb

Text:abbaabaabddbabadbb

Pre- Processing stage:

  a b d
  4 3 1

 Attempt 1:
 abbaabaabddbabadbb
 ....b
 Shift by 4 (bmBc[a])

Comparison of last character of pattern to rightmost character in the window. It's a mismatch and shifted by 4 according to the value in pre-processing stage.

 Attempt 2:
 abbaabaabddbabadbb
     A.d.B
 Shift by 3 (bmBc[b])

Here last and first character of the pattern are matched but middle character is a mismatch. So the pattern is shifted according to the pre-processing stage.

 Attempt 3:
 abbaabaabddbabadbb
        ABDDB
 Shift by 3 (bmBc[b])

We found exact match here but the algorithm continues until it can't move further.

 Attempt 4:
 abbaabaABDDBabadbb
           ....b
 Shift by 4 (bmBc[a])

At this stage, we need to shift by 4 and we can't move the pattern by 4. So, the algorithm terminates. Letters in capital letter are exact match of the pattern in the text.

Complexity

[edit]

Pre-processing stage takes O(m) time where "m" is the length of pattern "P".
Searching stage takes O(mn) time complexity where "n" is the length of text "T".

References

[edit]

^ ^a ^b RAITA T., 1992, Tuning the Boyer–Moore–Horspool string searching algorithm, Software - Practice & Experience, 22(10):879-884 [1]

^ "⚙ D27068 Improve string::find". LLVM Code Review.

External links

[edit]

Applet animation and Description for Raita Algorithm

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Suffix array Suffix automaton Suffix tree Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations

Retrieved from "https://en.wikipedia.org/w/index.php?title=Raita_algorithm&oldid=1157291016" Category: ●String matching algorithms Hidden categories: ●Articles with topics of unclear notability from March 2015 ●All articles with topics of unclear notability ●Articles needing additional references from March 2015 ●All articles needing additional references ●Articles with multiple maintenance issues ●This page was last edited on 27 May 2023, at 17:11 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view