A ruby a day!

Comparing Python and Ruby

Yesterday I sat together with a friend, who was preparing for her CS finals. I was asking her questions about algorithm theorie to help her preparing and after a while I got bored with all the theorie, so I asked her if she wanted to implement the edit-distance algorithm we were talking about. I proposed she'd do it in Python because I wanted to learn a bit of Python and she knows the language. (Even though its not her "native" language ;) We sat down and wrote up the algorithm. Just for fun I added a direct Ruby translation of it. The idea was to compare how these two VHL languages capture the algorithm.

We wanted to show the time savings of a scratchpad approach and implemented a direct recursive function and a scratchpad recursive function.

And look - it makes a difference:

                                    user     system      total        real
Distance between "ruby" and "python"
rb: recursive (100 times)       0.000000   0.010000   5.160000 (  5.177772)
py: recursive (100 times)       0.010000   0.010000   1.810000 (  1.805827)
rb: notizblock (100 times)      0.000000   0.000000   3.490000 (  3.497941)
py: notizblock (100 times)      0.000000   0.020000   1.700000 (  1.683048)
rb: dynamic (100 times)         0.000000   0.010000   3.450000 (  3.461004)
py: dynamic (100 times)         0.010000   0.010000   1.650000 (  1.645892)

Distance between "ruby: a programmers best friend" and "python: batteries included"
rb: notizblock (100 times)      0.010000   0.010000   6.200000 (  6.197857)
py: notizblock (100 times)      0.000000   0.010000   2.870000 (  2.881867)
rb: dynamic (100 times)         0.000000   0.010000   4.840000 (  4.833780)
py: dynamic (100 times)         0.010000   0.010000   2.290000 (  2.284406)

Distance between "ruby: a programmers best friend" * 15 and "python: batteries included" * 15
rb: dynamic (10 times)          0.000000   0.000000  32.050000 ( 32.877103)
py: dynamic (10 times)          0.000000   0.000000  16.260000 ( 19.627328)

(For the exact benchmarking method see time_it.rb)

Ruby is quite a bit slower than python in this example - but as you can also see, the important factor is the choice of the algorithm. The dumb algorithms is not able to calculate the differences between the two long sentences anyhow, and the fast one is so fast it doesn't matter much.

At least for this simple example the language chosen is more a matter of taste. Though I dislike the something names and variables and also prefer "string".length about len("string"). And I'm not so fond of passing an explicit self to the functions. Seems not very DRY to me. I clearly am biased siding there, but I can understand the Python guys, they also have a very expressive language.

Being able to use the return value of an assign as return value for the function cut the code down quite a bit. But it stresses less the fact that we have a return here. I'm not so shure yet, if I really like my practice of ommiting the return keyword where possible.

One astonishing thing was, that we didn't find the "x ? y : z" operator in python, it shure hid itself nicely away.

have fun,

Brian

Recursive edit-distance calculation

PythonRuby
#!/usr/bin/python

def edit_distance_ij(a, b, i, j):
    if i == 0:
        return j
    if j == 0:
        return i
    if a[i-1] == b[j-1]:
        cij = 0
    else:
        cij = 1
    return min(edit_distance_ij(a, b, i-1, j  ) + 1, 
               edit_distance_ij(a, b, i,   j-1) + 1, 
               edit_distance_ij(a, b, i-1, j-1) + cij)

def edit_distance(astring, bstring):
    return edit_distance_ij(astring, bstring, len(astring), len(bstring))

import sys
if len(sys.argv) != 3:
  print "Usage: editierdistanz WORT_1 WORT_2"
else:
  print edit_distance(sys.argv[1], sys.argv[2])
#!/usr/bin/ruby

def edit_distance_ij(a, b, i, j)
  return j if i == 0
  return i if j == 0
  cij = a[i-1] == b[j-1] ? 0 : 1
  [edit_distance_ij(a, b, i-1, j  ) + 1, 
   edit_distance_ij(a, b, i,   j-1) + 1, 
   edit_distance_ij(a, b, i-1, j-1) + cij].min
end

def edit_distance(a_string, b_string)
  edit_distance_ij(a_string, b_string, a_string.length, b_string.length)
end

if ARGV.length != 2
  puts "Usage: editierdistanz WORT_1 WORT_2"
else
  puts edit_distance(ARGV[0], ARGV[1])
end

Scratchpad recursive edit-distance calculation

PythonRuby
#!/usr/bin/python
import string

class table:
    def __init__(self, m, n):
        self.t = [[-1 for y in range(n)] for x in range(m)]

    def __getitem__(self, i):
        return self.t[i]
      
    def to_s(self):
      return string.join(
          [string.join(
            [`cell` for cell in row], " ") for row in self.t], "\n")

def edit_distance_ij(a, b, i, j, table):
    if i == 0:
        return (j,"e"*j)
    if j == 0:
        return (i,"l"*i)
    if not table[i-1][j-1] == -1:
        return table[i-1][j-1]
    if a[i-1] == b[j-1]:
        cij = 0
        c = ' '
    else:
        cij = 1
        c = 'c'        
    (d1, a1) = edit_distance_ij(a, b, i-1, j,   table)
    (d2, a2) = edit_distance_ij(a, b, i,   j-1, table)
    (d3, a3) = edit_distance_ij(a, b, i-1, j-1, table)
    entry = min((d1 + 1, a1 + "d"), (d2 + 1, a2 + "i"), (d3 + cij, a3 + c))
    table[i-1][j-1] = entry
    return entry

def edit_distance(astring, bstring):
    m = len(astring)
    n = len(bstring)
    return edit_distance_ij(astring, bstring, m, n, table(m, n))

import sys
if len(sys.argv) != 3:
  print "Usage: editierdistanz WORT_1 WORT_2"
else:
  print edit_distance(sys.argv[1], sys.argv[2])
#!/usr/bin/ruby

class Table
  def initialize(m, n)
    @t = Array.new(m) { Array.new(n) { nil } }
  end

  def to_s
    @t.map{|row| row.join(" ") }.join("\n")
  end

  def [](i, j)      
    @t[i][j]
  end

  def []=(i, j, v)
    @t[i][j] = v
  end
end

def edit_distance_ij(a, b, i, j, table)
  return j, "e" * j if i == 0
  return i, "l" * i if j == 0
  return table[i-1, j-1] if table[i-1, j-1]
  if a[i-1] == b[j-1]
    cij = 0
    c = ' '
  else
    cij = 1
    c = 'c'        
  end
  d1, a1 = edit_distance_ij(a, b, i-1, j,   table)
  d2, a2 = edit_distance_ij(a, b, i,   j-1, table)
  d3, a3 = edit_distance_ij(a, b, i-1, j-1, table)
  table[i-1, j-1] = [[d1 + 1, a1 + "d"], [d2 + 1, a2 + "i"], [d3 + cij, a3+c]].min
end

def edit_distance(a_string, b_string)
  m = a_string.length
  n = b_string.length
  edit_distance_ij(a_string, b_string, m, n, Table.new(m,n))
end

if ARGV.length != 2
  puts "Usage: editierdistanz WORT_1 WORT_2"
else
  puts edit_distance(ARGV[0], ARGV[1])
end

Dynamic programming edit distance calculation

PythonRuby
#!/usr/bin/python

class distanceTable:
    def __init__(self, n, m):
      self.t = [[-1 for y in range(m)] for x in range(n)]

    # Missing the [i,j] operator 
    def __call__(self, i, j):
      if i < 0:
        return j+1
      if j < 0:
        return i+1
      return self.t[i][j]

    # Missing the [i,j]=v operator
    def set(self, i, j, value):
      self.t[i][j] = value

    # I have not found the join operator, so I did it with injection
    def to_s(self):
      return string.join(
          [string.join(
            [`cell` for cell in row], " ") for row in self.t], "\n")
      
def edit_distance(a, b):
  m = len(a)
  n = len(b)
  distances = distanceTable(m, n)  

  for i in range(m):
    for j in range(n):    

      if a[i] == b[j]:
        cij = 0
      else:
        cij = 1
        
      distances.set(i, j, min(
        distances(i-1, j  ) + 1, 
        distances(i,   j-1) + 1, 
        distances(i-1, j-1) + cij
      ))

  return distances(m-1, n-1)

import sys
if len(sys.argv) != 3:
  print "Usage: editierdistanz WORT_1 WORT_2"
else:
  print edit_distance(sys.argv[1], sys.argv[2])
#!/usr/bin/ruby

class DistanceTable
  def initialize(m, n)
    @t = Array.new(m) { Array.new(n) { nil } }
  end

  def to_s
    @t.map{ |row| row.join(" ") }.join("\n")
  end

  def [](i, j)
    return j+1 if i < 0
    return i+1 if j < 0
    @t[i][j]
  end

  def []=(i, j, v)
    @t[i][j] = v
  end
end

def edit_distance(a, b)  
  m = a.length
  n = b.length
  distances = DistanceTable.new(m, n)

  for i in 0...m
    for j in 0...n
      distances[i,j] = [
        distances[i-1, j  ] + 1, 
        distances[i,   j-1] + 1, 
        distances[i-1, j-1] + (a[i] == b[j] ? 0 : 1)
      ].min
    end
  end

  distances[m-1, n-1]
end

if ARGV.length != 2
  puts "Usage: editierdistanz WORT_1 WORT_2"
else
  puts edit_distance(ARGV[0], ARGV[1])
end

Browse the code in single files:

Download the code

Return to main page.