Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 History  





2 How the Algorithm Works  





3 Time Complexity  





4 Alternative Algorithm  





5 References  














Commentz-Walter algorithm






Српски / srpski
Українська
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Incomputer science, the Commentz-Walter algorithm is a string searching algorithm invented by Beate Commentz-Walter.[1] Like the Aho–Corasick string matching algorithm, it can search for multiple patterns at once. It combines ideas from Aho–Corasick with the fast matching of the Boyer–Moore string-search algorithm. For a text of length n and maximum pattern length of m, its worst-case running time is O(mn), though the average case is often much better.[2]

GNU grep once implemented a string matching algorithm very similar to Commentz-Walter.[3]

History[edit]

The paper on the algorithm was first published by Beate Commentz-Walter in 1979 through the Saarland University and typed by "R. Scherner".[1] The paper detailed two differing algorithms she claimed combined the idea of the Aho-Corasick and Boyer-Moore algorithms, which she called algorithms B and B1. The paper mostly focuses on algorithm B, however.

How the Algorithm Works[edit]

A sample Reversed Pattern Tree

The Commentz-Walter algorithm combines two known algorithms in order to attempt to better address the multi-pattern matching problem. These two algorithms are the Boyer-Moore, which addresses single pattern matching using filtering, and the Aho-Corasick. To do this, the algorithm implements a suffix automaton to search through patterns within an input string, while also using reverse patterns, unlike in the Aho-Corasick.[4]

Commentz-Walter has two phases it must go through, these being a pre-computing phase and a matching phase. For the first phase, the Commentz-Walter algorithm uses a reversed pattern to build a pattern tree, this is considered the pre-computing phase. The second phase, known as the matching phase, takes into account the other two algorithms. Using the Boyer-Moore’s technique of shifting and the Aho-Corasick's technique of finite automata, the Commentz-Walter algorithm can begin matching.[4]

The Commentz-Walter algorithm will scan backwards throughout an input string, checking for a mismatch. If and when the algorithm does find a mismatch, the algorithm will already know some of the characters that are matches, and then use this information as an index. Using the index, the algorithm checks the pre-computed table to find a distance that it must shift, after this, the algorithm once more begins another matching attempt.

Time Complexity[edit]

Comparing the Aho-Corasick to the Commentz-Walter Algorithm yields results with the idea of time complexity. Aho-Corasick is considered linear O(m+n+k) where k is the number of matches. Commentz-Walter may be considered quadratic O(mn). The reason for this lies in the fact that Commentz-Walter was developed by adding the shifts within the Boyer–Moore string-search algorithm to the Aho-Corasick, thus moving its complexity from linear to quadratic.

According to a study done in “The Journal of National Science Foundation of Sri Lanka 46” Commentz-Walter seems to be generally faster than the Aho–Corasick string matching algorithm. This, according to the journal, only exists when using long patterns. However, the journal does state that there is no critical analysis on this statement and that there is a lack of general agreement on the performance of the algorithm.[5]

As seen in a visualization of the algorithm’s running time done in a study by “The International Journal of Advanced Computer Science and Information Technology” the performance of the algorithm increased linearly as the shortest pattern within the pattern set increased.[4]

Alternative Algorithm[edit]

In the original Commentz-Walter paper, an alternative algorithm was also created. This algorithm, known as B1, operates similarly to the main Commentz-Walter algorithm with the only difference being in the way the pattern tree is used during the scanning phase.

The paper also claims this algorithm performs better at the cost of increasing the running time and space of both the preprocessing phase and search phase. This algorithm has not been formally tested in other studies however, so its actual performance is unknown.[1]

References[edit]

  1. ^ a b c Commentz-Walter, Beate (1979). A String Matching Algorithm Fast on the Average (PDF). International Colloquium on Automata, Languages and Programming. LNCS. Vol. 71. Graz, Austria: Springer. pp. 118–132. doi:10.1007/3-540-09510-1_10. ISBN 3-540-09510-1. Archived from the original (PDF) on 2017-10-10.
  • ^ Watson, Bruce William (1995-09-15). Taxonomies and toolkits of regular language algorithms. Eindhoven University of Technology. doi:10.6100/IR444299. ISBN 90-386-0396-7.
  • ^ "grep: remove Commentz-Walter code". GNU grep. 2017-01-18. Retrieved 2022-05-15.
  • ^ a b c Akinul Islam Jony (2014). "Analysis of Multiple String Pattern Matching Algorithms" (PDF). International Journal of Advanced Computer Science and Information Technology (IJACSIT). 3 (4): 344–353. ISSN 2296-1739.
  • ^ Dewasurendra, SD; Vidanagamachchi, SM (2018). "Average time complexity analysis of Commentz-Walter algorithm". Journal of the National Science Foundation of Sri Lanka. 46 (4): 547–557. doi:10.4038/jnsfsr.v46i4.8630.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Commentz-Walter_algorithm&oldid=1198878641"

    Category: 
    String matching algorithms
     



    This page was last edited on 25 January 2024, at 08:09 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki