Skip to content

Substring Search

April 16, 2013

Finding a substring inside a larger text using brute force takes O(MN), where M is the length the substring and N is the length of the text. The reason for that is every character in N is compared with as much of M that it matches with. However, we can do better than this with linear algorithm, and some even sublinear.

Knuth-Morris-Pratt

This algorithm builds a finite state machine to represent the characters of the substring seen so far. As you read, without any backtracking, you encounter each character once and update the state machine as you go. The number of states in the machine is equal to the number of characters in the search pattern.

Boyer-Moore

This algorithm achieves a better than linear run time of O(N/M): as the pattern size increases, the performance of the algorithm does as well. The algorithm focuses on aligning the pattern to the text by focusing on comparing the pattern to the text from the last character of the pattern going forward.

Rabin-Karp

This algorithm is based on modular hashing or rolling hash. An interesting use of this algorithm is for the detection of plagiarism since it can be made to support multi-pattern search.

Advertisements

From → Algorithms

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: