String matching “KMP algorithm”

Introduction

As we all know, strings have a large proportion in OI and other computer fields. What we are talking about today is an algorithm for matching strings – “KMP algorithm”.

 

0x00

KMP The algorithm is used to solve such a problem: Given a text string T and a pattern string S, you are required to find the number and position of S appearing in T (we define the position as the position where the first character in S matches in T).

Of course, there are many other uses. Specific can be reflected through a topic HDU 3336

 

0x01

We all know the simple algorithm of matching strings, and do not know the following example.

Text string: ABADCADCABPattern string: ADCAB
  1. A Match with A, compare the next one. Until you find that B and D do not match.
  2. The pattern string is shifted one bit after mismatch.
  3. A Do not match B, repeat the previous process until the match is complete.

But we found the following situation.

ABABCADCAB
  ADCAB

The last B and D do not match. But according to the naive algorithm, the pattern string will be moved one by one.

Until the following situation arises

ABABCADCAB
     ADCAB

Imagine if we could move the first bit of the pattern string directly to the fourth bit after the fifth bit mismatch.

Although we can see that this is very simple, how do we make the computer see it?

Here we introduce a $next $array, popularly, $next [i]$means the longest string that the prefix and suffix of the pattern string with a length of $i $share.

Let’s not talk about how to find $next$arrays first, let’s talk about how it works.

Do you have a tongue twister?

Let’s see what this sentence means.

Pattern string: ADCABPrefix: A, AD, ADC, ADCASuffix: B, AB, CAB, DCAB

This is all prefix and suffix in a prefix of length $5$.

We look at these suffixes which are satisfied with common elements.

Only the following two.

AD,AB

They return a value of $next $of $1, which means that after the fifth mismatch, the first bit jumps to the fourth to continue the match.

 

0x02

How do we figure out the next array?

We try to make the pattern string match itself.

Speaking of which, I probably know why I didn’t tell you how to get the $next$array before.

Because this $next$array will be used if it is used. Compared to the two, the “use” aspect is relatively simple and easy to understand.

Look at the code for the $next$array.

inline void Getnext() {
    for(int i=2; i<=n; i++) {
        p = nxt[i-1];
        //iWho will jump before the mismatch?
        while(p && s[p+1] != s[i]) p = nxt[p];
        //If next[i] is still unable to match s[i], then continue to jump forward.//In this sentence, P! =0 means that if you jump to the first place, you can skip.
        if(s[p+1] == s[i]) nxt[i] = p+1;
        //If you find a match, update next[i].
        else nxt[i] = 0;
        //No, it's 0.
    }
}

Here’s a complete code.

#include <iostream>
#include <cstdio>
#include <cstring>
const int maxn = 1e6+3;
using namespace std;
int n, m, next[maxn], p;
char T[maxn], S[maxn];
inline void Getnext() {
    for(int i=2; i<=n; i++) {
        p = next[i-1];
        while(p && s[p+1] != s[i]) p = next[p];
        if(s[p+1] == s[i]) next[i] = p+1;
        else next[i] = 0;
    }
}
int main() {
    scanf("%s%s", T+1, S+1);
    n = strlen(T+1), m = strlen(S+1);
    Getnx();
    p = 0;
    for(int i=1; i<=n; i++) {
        while (p && T[i] != S[p+1]) p = next[p];
        if(T[i] == S[p+1]) p++;
        else p = 0;
        if(p == m) printf("%d\n", i-m+1), p = next[p];
    }
    for(int i=1; i<=m; i++) {
        printf("%d ", next[i]);
    }
}

 

Leave a Reply

Your email address will not be published. Required fields are marked *