Skip to content

Substring Search

April 16, 2013

Finding a substring inside a larger text using brute force takes O(MN), where M is the length the substring and N is the length of the text. The reason for that is every character in N is compared with as much of M that it matches with. However, we can do better than this with linear algorithm, and some even sublinear.


This algorithm builds a finite state machine to represent the characters of the substring seen so far. As you read, without any backtracking, you encounter each character once and update the state machine as you go. The number of states in the machine is equal to the number of characters in the search pattern.


This algorithm achieves a better than linear run time of O(N/M): as the pattern size increases, the performance of the algorithm does as well. The algorithm focuses on aligning the pattern to the text by focusing on comparing the pattern to the text from the last character of the pattern going forward.


This algorithm is based on modular hashing or rolling hash. An interesting use of this algorithm is for the detection of plagiarism since it can be made to support multi-pattern search.


From → Algorithms

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: