function [result, K] = subseq_count(s,t,p) %SUBSEQ_COUNT % Finds the non-contiguous subsequence match count between strings s and t % by implementing a dynamic matrix, where the length of the subsequence is p. % % The following algorithm is used: % K[0](x,y) = 1 for all x,y % K[p](s,'empty string') = 0 for all p > 0, all s % K[p](sa,t) = K[p](s,t) + [Summation from 1 to i]K[p-1](s,t(1:i-1) [t(i) == a] % % Simply prompting the function will return the value K(s,t), however % using the function as [result,K] = K(s,t) will also return the matrix K[p]. % % Example: subseq_count('abc','abbbbc', 3) returns a value of 4. % (Note that subseq_count('abc','abbbbc')=subseq_count('abbbbc','abc') since K(s,t,p) = K(t,s,p) ). % Example: subseq_count('a','a', 1) returns a value of 1. % Example: subseq_count('a','a', 0) returns a value of 1 (for the empty string). % Example: subseq_count('a','b', 1) returns a value of 0. % Example: subseq_count('a','b', 0) returns a value of 1. % Example: subseq_count('ab','ab', 2) returns a value of 1. % % % %USAGE: scalar = subseq_count('string1','string2', p); (where p is the length of the subsequence) % % [scalar, matrix] = subseq_count('string1,'string2', p); % % %For more information, visit http://www.kernel-methods.net/ % %Written and tested in Matlab 6.0, Release 12. %Copyright 2003, Manju M. Pai 4/2003 %manju@kernel-methods.net %------------------------------------------------------------------------------------------ %Obtain lengths of strings n = length(s); m = length(t); %Allow one extra row & column for empty string %Initially set every matrix index to -1 to show value has not yet been found ans = repmat(-1, [n+1, m+1, p]); %If p is equal to zero, just return 1 for the empty string and quit program if p == 0 result = 1; K = repmat(1, [n+1, m+1]); return; end; %Fill in the rest of the matrix using the function subseq_count(s,t) for h=1:p for i=1:n+1 for j=1:m+1 ans(i,j,h) = subseq_count_kernel(s(1:i-1), t(1:j-1), ans, h); end; end; end; result = ans(n+1,m+1,p); K = ans(:,:,p); %------------------------------------------------------------------------------------------ function ans = subseq_count_kernel(sa, t, K, p) %This function is called by subseq_count(s,t,p). %Type 'help subseq_count' for a description of the program. % %------------------------------------------------------------------------------------------ %Obtain lengths of both strings n = length(sa); m = length(t); %Start algorithm: % 1) Base case: % K[p](sa, 'empty string') == K[p]('empty string', t) == zero if (m == 0) | (n == 0) ans = 0; return end; % 2) Split main algorithm into two parts: % a) K[p](s,t) %truncate last character of string s = sa(1:n-1); if( K( length(s)+1, length(t) + 1, p) == -1 ) % Value has not yet been calculated ans = subseq_count_kernel(s,t,K,p); else % Value has already been calculated ans = K( length(s)+1, length(t) + 1, p); end; % b) Summation of K[p-1](s,t(1:i-1)), where t(i) == a %this is the letter (a) that was truncated off the string letter = sa(n); %We need this 'for' loop to go through all strings of t that end in a. %(If you're wondering why there are alot of '+ 1' and '- 1' in the algorithm, % its because the length of the rows and the columns are 1 bigger than the length % of the strings because of the addition of the empty string to the matrix.) pos_array = find(t == letter); %array which consists of all indices of t where t(i) == a for index = 1:length(pos_array) i = pos_array(index); if (p-1) == 0 %This is a base case where 1 should always be returned if p = 0; ans = ans + 1; elseif( K( length(s) + 1 , length(t(1:i-1)) + 1, p-1) == -1) % Value has not yet been calculated ans = ans + subseq_count_kernel( s , t(1:i-1), K, p-1 ); else % Value has already been calculated. ans = ans + K( length(s) + 1, length(t(1:i-1)) + 1, p-1 ); end; end; return % End of algorithm