Comparing Python and Ruby
Yesterday I sat together with a friend, who was preparing for her CS finals. I was asking her questions about algorithm theorie to help her preparing and after a while I got bored with all the theorie, so I asked her if she wanted to implement the edit-distance algorithm we were talking about. I proposed she'd do it in Python because I wanted to learn a bit of Python and she knows the language. (Even though its not her "native" language ;) We sat down and wrote up the algorithm. Just for fun I added a direct Ruby translation of it. The idea was to compare how these two VHL languages capture the algorithm.
We wanted to show the time savings of a scratchpad approach and implemented a direct recursive function and a scratchpad recursive function.
And look - it makes a difference:
user system total real Distance between "ruby" and "python" rb: recursive (100 times) 0.000000 0.010000 5.160000 ( 5.177772) py: recursive (100 times) 0.010000 0.010000 1.810000 ( 1.805827) rb: notizblock (100 times) 0.000000 0.000000 3.490000 ( 3.497941) py: notizblock (100 times) 0.000000 0.020000 1.700000 ( 1.683048) rb: dynamic (100 times) 0.000000 0.010000 3.450000 ( 3.461004) py: dynamic (100 times) 0.010000 0.010000 1.650000 ( 1.645892) Distance between "ruby: a programmers best friend" and "python: batteries included" rb: notizblock (100 times) 0.010000 0.010000 6.200000 ( 6.197857) py: notizblock (100 times) 0.000000 0.010000 2.870000 ( 2.881867) rb: dynamic (100 times) 0.000000 0.010000 4.840000 ( 4.833780) py: dynamic (100 times) 0.010000 0.010000 2.290000 ( 2.284406) Distance between "ruby: a programmers best friend" * 15 and "python: batteries included" * 15 rb: dynamic (10 times) 0.000000 0.000000 32.050000 ( 32.877103) py: dynamic (10 times) 0.000000 0.000000 16.260000 ( 19.627328)
(For the exact benchmarking method see time_it.rb)
Ruby is quite a bit slower than python in this example - but as you can also see, the important factor is the choice of the algorithm. The dumb algorithms is not able to calculate the differences between the two long sentences anyhow, and the fast one is so fast it doesn't matter much.
At least for this simple example the language chosen is more a matter of taste. Though I dislike the something names and variables and also prefer "string".length about len("string"). And I'm not so fond of passing an explicit self to the functions. Seems not very DRY to me. I clearly am biased siding there, but I can understand the Python guys, they also have a very expressive language.
Being able to use the return value of an assign as return value for the function cut the code down quite a bit. But it stresses less the fact that we have a return here. I'm not so shure yet, if I really like my practice of ommiting the return keyword where possible.
One astonishing thing was, that we didn't find the "x ? y : z" operator in python, it shure hid itself nicely away.
have fun,
Brian
Recursive edit-distance calculation
| Python | Ruby |
|---|---|
#!/usr/bin/python def edit_distance_ij(a, b, i, j): if i == 0: return j if j == 0: return i if a[i-1] == b[j-1]: cij = 0 else: cij = 1 return min(edit_distance_ij(a, b, i-1, j ) + 1, edit_distance_ij(a, b, i, j-1) + 1, edit_distance_ij(a, b, i-1, j-1) + cij) def edit_distance(astring, bstring): return edit_distance_ij(astring, bstring, len(astring), len(bstring)) import sys if len(sys.argv) != 3: print "Usage: editierdistanz WORT_1 WORT_2" else: print edit_distance(sys.argv[1], sys.argv[2]) |
#!/usr/bin/ruby def edit_distance_ij(a, b, i, j) return j if i == 0 return i if j == 0 cij = a[i-1] == b[j-1] ? 0 : 1 [edit_distance_ij(a, b, i-1, j ) + 1, edit_distance_ij(a, b, i, j-1) + 1, edit_distance_ij(a, b, i-1, j-1) + cij].min end def edit_distance(a_string, b_string) edit_distance_ij(a_string, b_string, a_string.length, b_string.length) end if ARGV.length != 2 puts "Usage: editierdistanz WORT_1 WORT_2" else puts edit_distance(ARGV[0], ARGV[1]) end |
Scratchpad recursive edit-distance calculation
| Python | Ruby |
|---|---|
#!/usr/bin/python import string class table: def __init__(self, m, n): self.t = [[-1 for y in range(n)] for x in range(m)] def __getitem__(self, i): return self.t[i] def to_s(self): return string.join( [string.join( [`cell` for cell in row], " ") for row in self.t], "\n") def edit_distance_ij(a, b, i, j, table): if i == 0: return (j,"e"*j) if j == 0: return (i,"l"*i) if not table[i-1][j-1] == -1: return table[i-1][j-1] if a[i-1] == b[j-1]: cij = 0 c = ' ' else: cij = 1 c = 'c' (d1, a1) = edit_distance_ij(a, b, i-1, j, table) (d2, a2) = edit_distance_ij(a, b, i, j-1, table) (d3, a3) = edit_distance_ij(a, b, i-1, j-1, table) entry = min((d1 + 1, a1 + "d"), (d2 + 1, a2 + "i"), (d3 + cij, a3 + c)) table[i-1][j-1] = entry return entry def edit_distance(astring, bstring): m = len(astring) n = len(bstring) return edit_distance_ij(astring, bstring, m, n, table(m, n)) import sys if len(sys.argv) != 3: print "Usage: editierdistanz WORT_1 WORT_2" else: print edit_distance(sys.argv[1], sys.argv[2]) |
#!/usr/bin/ruby class Table def initialize(m, n) @t = Array.new(m) { Array.new(n) { nil } } end def to_s @t.map{|row| row.join(" ") }.join("\n") end def [](i, j) @t[i][j] end def []=(i, j, v) @t[i][j] = v end end def edit_distance_ij(a, b, i, j, table) return j, "e" * j if i == 0 return i, "l" * i if j == 0 return table[i-1, j-1] if table[i-1, j-1] if a[i-1] == b[j-1] cij = 0 c = ' ' else cij = 1 c = 'c' end d1, a1 = edit_distance_ij(a, b, i-1, j, table) d2, a2 = edit_distance_ij(a, b, i, j-1, table) d3, a3 = edit_distance_ij(a, b, i-1, j-1, table) table[i-1, j-1] = [[d1 + 1, a1 + "d"], [d2 + 1, a2 + "i"], [d3 + cij, a3+c]].min end def edit_distance(a_string, b_string) m = a_string.length n = b_string.length edit_distance_ij(a_string, b_string, m, n, Table.new(m,n)) end if ARGV.length != 2 puts "Usage: editierdistanz WORT_1 WORT_2" else puts edit_distance(ARGV[0], ARGV[1]) end |
Dynamic programming edit distance calculation
| Python | Ruby |
|---|---|
#!/usr/bin/python class distanceTable: def __init__(self, n, m): self.t = [[-1 for y in range(m)] for x in range(n)] # Missing the [i,j] operator def __call__(self, i, j): if i < 0: return j+1 if j < 0: return i+1 return self.t[i][j] # Missing the [i,j]=v operator def set(self, i, j, value): self.t[i][j] = value # I have not found the join operator, so I did it with injection def to_s(self): return string.join( [string.join( [`cell` for cell in row], " ") for row in self.t], "\n") def edit_distance(a, b): m = len(a) n = len(b) distances = distanceTable(m, n) for i in range(m): for j in range(n): if a[i] == b[j]: cij = 0 else: cij = 1 distances.set(i, j, min( distances(i-1, j ) + 1, distances(i, j-1) + 1, distances(i-1, j-1) + cij )) return distances(m-1, n-1) import sys if len(sys.argv) != 3: print "Usage: editierdistanz WORT_1 WORT_2" else: print edit_distance(sys.argv[1], sys.argv[2]) |
#!/usr/bin/ruby class DistanceTable def initialize(m, n) @t = Array.new(m) { Array.new(n) { nil } } end def to_s @t.map{ |row| row.join(" ") }.join("\n") end def [](i, j) return j+1 if i < 0 return i+1 if j < 0 @t[i][j] end def []=(i, j, v) @t[i][j] = v end end def edit_distance(a, b) m = a.length n = b.length distances = DistanceTable.new(m, n) for i in 0...m for j in 0...n distances[i,j] = [ distances[i-1, j ] + 1, distances[i, j-1] + 1, distances[i-1, j-1] + (a[i] == b[j] ? 0 : 1) ].min end end distances[m-1, n-1] end if ARGV.length != 2 puts "Usage: editierdistanz WORT_1 WORT_2" else puts edit_distance(ARGV[0], ARGV[1]) end |
Browse the code in single files:
- editierdistanz-dynamic.py
- editierdistanz-dynamic.rb
- editierdistanz-notizblock.py
- editierdistanz-notizblock.rb
- editierdistanz-recursive.py
- editierdistanz-recursive.rb
- time_it.rb