Levenshtein distance


The Levenshtein distance is a measure of the number of changes required to get one string from another. The possiblities are insertion, deletion, and replacement. Below we calculate the distance between two random 14 letter words. Thus the Levenshtein distance must be between 0 and 14. That makes fifteen possiblities for the correct answer. Let's see how many function calls it takes with this recursive implementation to determine the answer.

#include<iostream>
#include<cstring>

using namespace std;

typedef unsigned long long int u64;
u64 nFc=0;

u64 lev(char *s, char *t){
 nFc++;
 if(!*s) return strlen(t);
 if(!*t) return strlen(s);
 u64 min=1+lev(s+1,t);
 u64 c = lev(s,t+1);
 if(min>c) min=c;
 u64 d=(*s!=*t);
 c=d+lev(s+1,t+1);
 if(min>c) min=c;
 return min;
}

int main(){
 cout << lev("backscattering","ichthyological") << endl;
 cout << nFc << endl;
 return 0;
}

13 character changes between "backscattering" and "ichthyological".

This implementation only took 11,885,772,379 calls and 92 seconds to determine the correct answer. Amazing. Now if we were to build a table of all possible changes, it would be a 15 by 15 matrix. 255 possible manipulations. This would take approximately 255 function calls to generate or .000002% of the recursive implementation. Total time to run: .0000195 seconds. Not even enough to register with the Unix time command.

The Levenshtein Distance problem is a classic example of dynamic programming. For a more detailed analysis checkout Wikipedia.