<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Python Damerau-Levenshtein distance implementation</title>
	<atom:link href="http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/feed/" rel="self" type="application/rss+xml" />
	<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/</link>
	<description></description>
	<lastBuildDate>Wed, 04 Apr 2012 21:27:55 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Lucid Imagination &#187; Creating a spellchecker with Solr</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-993</link>
		<dc:creator>Lucid Imagination &#187; Creating a spellchecker with Solr</dc:creator>
		<pubDate>Fri, 15 Apr 2011 22:27:09 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-993</guid>
		<description>[...] the previous listing I have to make some remarks. In line 2 and 3 we use third party libraries for Levenshtein distance and metaphone algorithms. In line 8 we are collecting a list of 70 candidates. This particular [...]</description>
		<content:encoded><![CDATA[<p>[...] the previous listing I have to make some remarks. In line 2 and 3 we use third party libraries for Levenshtein distance and metaphone algorithms. In line 8 we are collecting a list of 70 candidates. This particular [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: emmaespina</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-975</link>
		<dc:creator>emmaespina</dc:creator>
		<pubDate>Mon, 31 Jan 2011 15:59:08 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-975</guid>
		<description>[...] the previus listing I have to make some remarks. In line 2 and 3 we use third party libraries for levenshtain distance and metaphone algorithms. In line 8 we are collecting a list of 70 candidates. This particular [...]</description>
		<content:encoded><![CDATA[<p>[...] the previus listing I have to make some remarks. In line 2 and 3 we use third party libraries for levenshtain distance and metaphone algorithms. In line 8 we are collecting a list of 70 candidates. This particular [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-970</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Thu, 20 Jan 2011 21:47:23 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-970</guid>
		<description>Thanks, OneEyedMan. Those changes should be all that&#039;s required, and they&#039;re what 2to3 will give you if you&#039;re converting automatically. The Python 3 version should run fine on Python 2 as well, with just a little waste around the range() calls.</description>
		<content:encoded><![CDATA[<p>Thanks, OneEyedMan. Those changes should be all that&#8217;s required, and they&#8217;re what 2to3 will give you if you&#8217;re converting automatically. The Python 3 version should run fine on Python 2 as well, with just a little waste around the range() calls.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: OneEyedMan</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-969</link>
		<dc:creator>OneEyedMan</dc:creator>
		<pubDate>Thu, 20 Jan 2011 18:36:07 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-969</guid>
		<description>I wanted to port this to be compatible with Python 3. I&#039;m new to 3 myself, but I think I&#039;ve pulled it off by checking a few strings across both the versions.
Here are the changes:
1) thisrow = range(1, len(seq2) + 1) + [0]
becomes
    thisrow = list(range(1, len(seq2) + 1)) + [0]
2)     for x in xrange(len(seq1)):
becomes 
    for x in range(len(seq1)):
3)         for y in xrange(len(seq2)):
becomes
        for y in range(len(seq2)):
 
I checked &#039;spam&#039; vs. ham (2)
&#039;hasddddddddddddddddam&#039; vs &#039;ham&#039; (18)
Here is the full updated code:
____________________________________________
def dameraulevenshtein(seq1, seq2):
    &quot;&quot;&quot;Calculate the Damerau-Levenshtein distance between sequences.

    This distance is the number of additions, deletions, substitutions,
    and transpositions needed to transform the first sequence into the
    second. Although generally used with strings, any sequences of
    comparable objects will work.

    Transpositions are exchanges of *consecutive* characters; all other
    operations are self-explanatory.

    This implementation is O(N*M) time and O(M) space, for N and M the
    lengths of the two sequences.

    &gt;&gt;&gt; dameraulevenshtein(&#039;ba&#039;, &#039;abc&#039;)
    2
    &gt;&gt;&gt; dameraulevenshtein(&#039;fee&#039;, &#039;deed&#039;)
    2

    It works with arbitrary sequences too:
    &gt;&gt;&gt; dameraulevenshtein(&#039;abcd&#039;, [&#039;b&#039;, &#039;a&#039;, &#039;c&#039;, &#039;d&#039;, &#039;e&#039;])
    2
    &quot;&quot;&quot;
    # codesnippet:D0DE4716-B6E6-4161-9219-2903BF8F547F
    # Conceptually, this is based on a len(seq1) + 1 * len(seq2) + 1 matrix.
    # However, only the current and two previous rows are needed at once,
    # so we only store those.
    oneago = None
    thisrow = list(range(1, len(seq2) + 1)) + [0]
    for x in range(len(seq1)):
        # Python lists wrap around for negative indices, so put the
        # leftmost column at the *end* of the list. This matches with
        # the zero-indexed strings and saves extra calculation.
        twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
        for y in range(len(seq2)):
            delcost = oneago[y] + 1
            addcost = thisrow[y - 1] + 1
            subcost = oneago[y - 1] + (seq1[x] != seq2[y])
            thisrow[y] = min(delcost, addcost, subcost)
            # This block deals with transpositions
            if (x &gt; 0 and y &gt; 0 and seq1[x] == seq2[y - 1]
                and seq1[x-1] == seq2[y] and seq1[x] != seq2[y]):
                thisrow[y] = min(thisrow[y], twoago[y - 2] + 1)
    return thisrow[len(seq2) - 1]</description>
		<content:encoded><![CDATA[<p>I wanted to port this to be compatible with Python 3. I&#8217;m new to 3 myself, but I think I&#8217;ve pulled it off by checking a few strings across both the versions.<br />
Here are the changes:<br />
1) thisrow = range(1, len(seq2) + 1) + [0]<br />
becomes<br />
    thisrow = list(range(1, len(seq2) + 1)) + [0]<br />
2)     for x in xrange(len(seq1)):<br />
becomes<br />
    for x in range(len(seq1)):<br />
3)         for y in xrange(len(seq2)):<br />
becomes<br />
        for y in range(len(seq2)):</p>
<p>I checked &#8216;spam&#8217; vs. ham (2)<br />
&#8216;hasddddddddddddddddam&#8217; vs &#8216;ham&#8217; (18)<br />
Here is the full updated code:<br />
____________________________________________<br />
def dameraulevenshtein(seq1, seq2):<br />
    &#8220;&#8221;"Calculate the Damerau-Levenshtein distance between sequences.</p>
<p>    This distance is the number of additions, deletions, substitutions,<br />
    and transpositions needed to transform the first sequence into the<br />
    second. Although generally used with strings, any sequences of<br />
    comparable objects will work.</p>
<p>    Transpositions are exchanges of *consecutive* characters; all other<br />
    operations are self-explanatory.</p>
<p>    This implementation is O(N*M) time and O(M) space, for N and M the<br />
    lengths of the two sequences.</p>
<p>    &gt;&gt;&gt; dameraulevenshtein(&#8216;ba&#8217;, &#8216;abc&#8217;)<br />
    2<br />
    &gt;&gt;&gt; dameraulevenshtein(&#8216;fee&#8217;, &#8216;deed&#8217;)<br />
    2</p>
<p>    It works with arbitrary sequences too:<br />
    &gt;&gt;&gt; dameraulevenshtein(&#8216;abcd&#8217;, ['b', 'a', 'c', 'd', 'e'])<br />
    2<br />
    &#8220;&#8221;"<br />
    # codesnippet:D0DE4716-B6E6-4161-9219-2903BF8F547F<br />
    # Conceptually, this is based on a len(seq1) + 1 * len(seq2) + 1 matrix.<br />
    # However, only the current and two previous rows are needed at once,<br />
    # so we only store those.<br />
    oneago = None<br />
    thisrow = list(range(1, len(seq2) + 1)) + [0]<br />
    for x in range(len(seq1)):<br />
        # Python lists wrap around for negative indices, so put the<br />
        # leftmost column at the *end* of the list. This matches with<br />
        # the zero-indexed strings and saves extra calculation.<br />
        twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]<br />
        for y in range(len(seq2)):<br />
            delcost = oneago[y] + 1<br />
            addcost = thisrow[y - 1] + 1<br />
            subcost = oneago[y - 1] + (seq1[x] != seq2[y])<br />
            thisrow[y] = min(delcost, addcost, subcost)<br />
            # This block deals with transpositions<br />
            if (x &gt; 0 and y &gt; 0 and seq1[x] == seq2[y - 1]<br />
                and seq1[x-1] == seq2[y] and seq1[x] != seq2[y]):<br />
                thisrow[y] = min(thisrow[y], twoago[y - 2] + 1)<br />
    return thisrow[len(seq2) - 1]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: flow</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-518</link>
		<dc:creator>flow</dc:creator>
		<pubDate>Sat, 07 Aug 2010 21:00:19 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-518</guid>
		<description>i am trying to convert your code to Cython and have already obtained 20x speed gains. sadly, my translation still has some problems; so i opened a question on http://stackoverflow.com/questions/3431933/how-to-correct-algorithm-malloc-related-bugs-in-this-damerau-levenshtein-edit-di</description>
		<content:encoded><![CDATA[<p>i am trying to convert your code to Cython and have already obtained 20x speed gains. sadly, my translation still has some problems; so i opened a question on <a href="http://stackoverflow.com/questions/3431933/how-to-correct-algorithm-malloc-related-bugs-in-this-damerau-levenshtein-edit-di" rel="nofollow">http://stackoverflow.com/questions/3431933/how-to-correct-algorithm-malloc-related-bugs-in-this-damerau-levenshtein-edit-di</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: numerix</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-500</link>
		<dc:creator>numerix</dc:creator>
		<pubDate>Thu, 27 May 2010 13:12:53 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-500</guid>
		<description>Thanks Michael, for pointing that out. I was not aware of that difference!</description>
		<content:encoded><![CDATA[<p>Thanks Michael, for pointing that out. I was not aware of that difference!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-496</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Tue, 25 May 2010 02:49:28 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-496</guid>
		<description>numerix, it works perfectly well. You need to note the difference between the Levenshtein distance (described in the SPOJ problem and the calculator you link to) and the Damerau-Levenshtein distance, which allows transpositions.

Both algorithms give the correct answer for that pair: delete H, U, R, replace B with K, transpose O and H, replace P with Z. Six steps.</description>
		<content:encoded><![CDATA[<p>numerix, it works perfectly well. You need to note the difference between the Levenshtein distance (described in the SPOJ problem and the calculator you link to) and the Damerau-Levenshtein distance, which allows transpositions.</p>
<p>Both algorithms give the correct answer for that pair: delete H, U, R, replace B with K, transpose O and H, replace P with Z. Six steps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: numerix</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-495</link>
		<dc:creator>numerix</dc:creator>
		<pubDate>Mon, 24 May 2010 11:56:19 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-495</guid>
		<description>J. Reagle is right according to the speed of the algorithm, but the algorithms doesn&#039;t work correctly.
You find more information about that in my comment to this code: http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/</description>
		<content:encoded><![CDATA[<p>J. Reagle is right according to the speed of the algorithm, but the algorithms doesn&#8217;t work correctly.<br />
You find more information about that in my comment to this code: <a href="http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/" rel="nofollow">http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Reagle</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-251</link>
		<dc:creator>Joseph Reagle</dc:creator>
		<pubDate>Fri, 09 Oct 2009 15:48:10 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-251</guid>
		<description>btw: Any idea how to get a ratio function (normalization) akin to that in difflib? I tried: return float((max_length - DL_distance(seq1, seq2)) / max_length)

but it&#039;s not as nice as the one in difflib.</description>
		<content:encoded><![CDATA[<p>btw: Any idea how to get a ratio function (normalization) akin to that in difflib? I tried: return float((max_length &#8211; DL_distance(seq1, seq2)) / max_length)</p>
<p>but it&#8217;s not as nice as the one in difflib.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Reagle</title>
		<link>http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/comment-page-1/#comment-250</link>
		<dc:creator>Joseph Reagle</dc:creator>
		<pubDate>Fri, 09 Oct 2009 14:47:37 +0000</pubDate>
		<guid isPermaLink="false">http://mwh.geek.nz/?p=85#comment-250</guid>
		<description>Thanks for this. It is faster than: http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/</description>
		<content:encoded><![CDATA[<p>Thanks for this. It is faster than: <a href="http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/" rel="nofollow">http://www.guyrutenberg.com/2008/12/15/damerau-levenshtein-distance-in-python/comment-page-1/</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

