Skip to content

Commit 3e63a31

Browse files
author
Adam Gardiner
committed
Convert diff module to gem
0 parents  commit 3e63a31

File tree

10 files changed

+529
-0
lines changed

10 files changed

+529
-0
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.DS_Store
2+
doc
3+
pkg
4+
.yardoc

LICENSE

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Copyright (c) 2013, Adam Gardiner
2+
All rights reserved.
3+
4+
Redistribution and use in source and binary forms, with or without
5+
modification, are permitted provided that the following conditions are met:
6+
7+
* Redistributions of source code must retain the above copyright notice, this
8+
list of conditions and the following disclaimer.
9+
* Redistributions in binary form must reproduce the above copyright notice,
10+
this list of conditions and the following disclaimer in the documentation
11+
and/or other materials provided with the distribution.
12+
13+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
14+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
16+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
17+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
19+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
20+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
21+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
22+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# ColorConsole
2+
3+
ColorConsole is a small cross-platform library for outputting text to the console.
4+
5+
6+
## Usage
7+
8+
ColorConsole is supplied as a gem, and has no dependencies. To use it, simply:
9+
```
10+
gem install color-console
11+
```
12+
13+
ColorConsole provides methods for outputting lines of text in different colors, using the `Console.write` and `Console.puts` functions.
14+
15+
```ruby
16+
require 'color-console'
17+
18+
Console.puts "Some text" # Outputs text using the current console colours
19+
Console.puts "Some other text", :red # Outputs red text with the current background
20+
Console.puts "Yet more text", nil, :blue # Outputs text using the current foreground and a blue background
21+
22+
# The following lines output BlueRedGreen on a single line, each word in the appropriate color
23+
Console.write "Blue ", :blue
24+
Console.write "Red ", :red
25+
Console.write "Green", :green
26+
```
27+
28+
## Features
29+
30+
In addition to `Console.puts` and `Console.write` for outputting text in color, ColorConsole also supports:
31+
* __Setting the console title__: The title bar of the console window can be set using `Console.title = 'My title'`.
32+
* __Status messages__: Status messages (i.e. a line of text at the current scroll position) can be output and
33+
updated at any time. The status message will remain at the current scroll point even as new text is output
34+
using `Console.puts`.
35+
* __Progress bars__: A progress bar can be rendered like a status message, but with a pseudo-graphical representation
36+
of the current completion percentage:
37+
38+
```ruby
39+
(0..100).do |i|
40+
Console.show_progress('Processing data', i)
41+
end
42+
Console.clear_progress
43+
```
44+
Output:
45+
```
46+
[============== 35% ] Processing data
47+
```
48+
* __Tables__: Data can be output in a tabular representation:
49+
50+
```ruby
51+
HEADER_ROW = ['Column 1', 'Column 2', 'Column 3', 'Column 4']
52+
MIXED_ROW = [17,
53+
'A somewhat longer column',
54+
'A very very very long column that should wrap multple lines',
55+
'Another medium length column']
56+
SECOND_ROW = [24,
57+
'Lorem ipsum',
58+
'Some more text',
59+
'Lorem ipsum dolor sit amet']
60+
61+
Console.display_table([HEADER_ROW, MIXED_ROW, SECOND_ROW], width: 100,
62+
col_sep: '|', row_sep: '-')
63+
```
64+
Output:
65+
```
66+
+----------+--------------------------+-----------------------------+-----------------------------+
67+
| Column 1 | Column 2 | Column 3 | Column 4 |
68+
+----------+--------------------------+-----------------------------+-----------------------------+
69+
| 17 | A somewhat longer column | A very very very long | Another medium length |
70+
| | | column that should wrap | column |
71+
| | | multple lines | |
72+
+----------+--------------------------+-----------------------------+-----------------------------+
73+
| 24 | Lorem ipsum | Some more text | Lorem ipsum dolor sit amet |
74+
+----------+--------------------------+-----------------------------+-----------------------------+
75+
```
76+

Rakefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
require 'rubygems'
2+
require 'rubygems/package_task'
3+
4+
load 'color-console.gemspec'
5+
6+
Gem::PackageTask.new(GEMSPEC) do |pkg|
7+
pkg.need_tar = false
8+
end
9+
10+
task :default => :gem

csv-diff.gemspec

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
GEMSPEC = Gem::Specification.new do |s|
2+
s.name = "csv-diff"
3+
s.version = "0.1"
4+
s.authors = ["Adam Gardiner"]
5+
s.date = "2014-05-30"
6+
s.summary = "CSV Diff is a library for generating diffs from data in CSV format"
7+
s.description = <<-EOQ
8+
This library performs diffs of CSV files.
9+
10+
Unlike a standard diff that compares line by line, and is sensitive to the
11+
ordering of records, CSV-Diff identifies common lines by key field(s), and
12+
then compares the contents of the fields in each line.
13+
14+
Data may be supplied in the form of CSV files, or as an array of arrays. The
15+
diff process provides a fine level of control over what to diff, and can
16+
optionally ignore certain types of changes (e.g. changes in position).
17+
18+
CSV-Diff is particularly well suited to data in parent-child format. Parent-
19+
child data does not lend itself well to standard text diffs, as small changes
20+
in the organisation of the tree at an upper level can lead to big movements
21+
in the position of descendant records. By instead matching records by key,
22+
CSV-Diff avoids this issue, while still being able to detect changes in
23+
sibling order.
24+
EOQ
25+
s.email = "[email protected]"
26+
s.homepage = 'https://github.com/agardiner/csv-diff'
27+
s.require_paths = ['lib']
28+
s.files = ['README.md', 'LICENSE'] + Dir['lib/**/*.rb']
29+
end

lib/csv-diff.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
require 'csv-diff/csv_source'
2+
require 'csv-diff/algorithm'
3+
require 'csv-diff/csv_diff'
4+

lib/csv-diff/algorithm.rb

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
class CSVDiff
2+
3+
# Implements the CSV diff algorithm.
4+
module Algorithm
5+
6+
# Diffs two CSVSource structures.
7+
#
8+
# @param left [CSVSource] A CSVSource object containing the contents of
9+
# the left/from input.
10+
# @param right [CSVSource] A CSVSource object containing the contents of
11+
# the right/to input.
12+
# @param key_fields [Array] An array containing the names of the field(s)
13+
# that uniquely identify each row.
14+
# @param diff_fields [Array] An array containing the names of the fields
15+
# to be diff-ed.
16+
def diff_sources(left, right, key_fields, diff_fields, options = {})
17+
left_index = left.index
18+
left_values = left.lines
19+
left_keys = left_values.keys
20+
right_index = right.index
21+
right_values = right.lines
22+
right_keys = right_values.keys
23+
parent_fields = left.parent_fields.length
24+
25+
include_moves = options.fetch(:include_moves, true)
26+
include_deletes = options.fetch(:include_deletes, true)
27+
28+
diffs = Hash.new{ |h, k| h[k] = {} }
29+
right_keys.each_with_index do |key, right_row_id|
30+
key_vals = key.split('~')
31+
parent = key_vals[0...parent_fields].join('~')
32+
child = key_vals[parent_fields..-1].join('~')
33+
left_parent = left_index[parent]
34+
right_parent = right_index[parent]
35+
left_value = left_values[key]
36+
right_value = right_values[key]
37+
left_idx = left_parent && left_parent.index(key)
38+
right_idx = right_parent && right_parent.index(key)
39+
40+
id = {}
41+
id[:row] = right_row_id + 1
42+
id[:sibling_position] = right_idx + 1
43+
key_fields.each do |field_name|
44+
id[field_name] = right_value[field_name]
45+
end
46+
if left_idx && right_idx
47+
if include_moves
48+
left_common = left_parent & right_parent
49+
right_common = right_parent & left_parent
50+
left_pos = left_common.index(key)
51+
right_pos = right_common.index(key)
52+
if left_pos != right_pos
53+
# Move
54+
diffs[key].merge!(id.merge(:action => 'Move',
55+
:sibling_position => [left_idx + 1, right_idx + 1]))
56+
#puts "Move #{left_idx} -> #{right_idx}: #{key}"
57+
end
58+
end
59+
if changes = diff_row(left_values[key], right_values[key], diff_fields)
60+
diffs[key].merge!(id.merge(changes.merge(:action => 'Update')))
61+
#puts "Change: #{key}"
62+
end
63+
elsif right_idx
64+
# Add
65+
diffs[key].merge!(id.merge(right_values[key].merge(:action => 'Add')))
66+
#puts "Add: #{key}"
67+
end
68+
end
69+
70+
# Now identify deletions
71+
if include_deletes
72+
(left_keys - right_keys).each do |key|
73+
# Delete
74+
key_vals = key.split('~')
75+
child = key_vals.pop
76+
parent = key_vals.join('~')
77+
left_parent = left_index[parent]
78+
left_value = left_values[key]
79+
left_idx = left_parent.index(key)
80+
next unless left_idx
81+
id = {}
82+
id[:row] = left_keys.index(key) + 1
83+
id[:sibling_position] = left_idx + 1
84+
key_fields.each do |field_name|
85+
id[field_name] = left_value[field_name]
86+
end
87+
diffs[key].merge!(id.merge(:action => 'Delete'))
88+
#puts "Delete: #{key}"
89+
end
90+
end
91+
diffs
92+
end
93+
94+
95+
# Identifies the fields that are different between two versions of the
96+
# same row.
97+
#
98+
# @param left_row [Hash] The version of the CSV row from the left/from
99+
# file.
100+
# @param right_row [Hash] The version of the CSV row from the right/to
101+
# file.
102+
# @return [Hash<String, Array>] A Hash whose keys are the fields that
103+
# contain differences, and whose values are a two-element array of
104+
# [left/from, right/to] values.
105+
def diff_row(left_row, right_row, fields)
106+
diffs = {}
107+
fields.each do |attr|
108+
right_val = right_row[attr]
109+
right_val = nil if right_val == ""
110+
left_val = left_row[attr]
111+
left_val = nil if left_val == ""
112+
if left_val != right_val
113+
diffs[attr] = [left_val, right_val]
114+
#puts "#{attr}: #{left_val} -> #{right_val}"
115+
end
116+
end
117+
diffs if diffs.size > 0
118+
end
119+
120+
end
121+
122+
end

lib/csv-diff/csv_diff.rb

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# This library performs diffs of flat file content that contains structured data
2+
# in fields, with rows provided in a parent-child format.
3+
#
4+
# Parent-child data does not lend itself well to standard text diffs, as small
5+
# changes in the organisation of the tree at an upper level (e.g. re-ordering of
6+
# two ancestor nodes) can lead to big movements in the position of descendant
7+
# records - particularly when the parent-child data is generated by a hierarchy
8+
# traversal.
9+
#
10+
# Additionally, simple line-based diffs can identify that a line has changed,
11+
# but not which field(s) in the line have changed.
12+
#
13+
# Data may be supplied in the form of CSV files, or as an array of arrays. The
14+
# diff process process provides a fine level of control over what to diff, and
15+
# can optionally ignore certain types of changes (e.g. changes in order).
16+
class CSVDiff
17+
18+
# @return [CSVSource] CSVSource object containing details of the left/from
19+
# input.
20+
attr_reader :left
21+
alias_method :from, :left
22+
# @return [CSVSource] CSVSource object containing details of the right/to
23+
# input.
24+
attr_reader :right
25+
alias_method :to, :right
26+
# @return [Array<Hash>] An array of differences
27+
attr_reader :diffs
28+
29+
30+
# Generates a diff between two hierarchical tree structures, provided
31+
# as +left+ and +right+, each of which consists of an array of lines in CSV
32+
# format.
33+
# An array of field indexes can also be specified as +key_fields+;
34+
# a minimum of one field index must be specified; the last index is the
35+
# child id, and the remaining fields (if any) are the parent field(s) that
36+
# uniquely qualify the child instance.
37+
#
38+
# @param left [Array<Array<String>>] An Array of lines, each of which is in
39+
# turn an Array containing fields.
40+
# @param right [Array<Array<String>>] An Array of lines, each of which is in
41+
# turn an Array containing fields.
42+
# @param options [Hash] A hash containing options.
43+
# @option options [Array<String>] :field_names An Array of field names for
44+
# each field in +left+ and +right+. If not provided, the first row is
45+
# assumed to contain field names.
46+
# @option options [Boolean] :ignore_header If true, the first line of each
47+
# file is ignored. This option can only be true if :field_names is
48+
# specified.
49+
# @options options [Array] :ignore_fields The names of any fields to be
50+
# ignored when performing the diff.
51+
def initialize(left, right, options = {})
52+
@left = CSVSource.new(left, options)
53+
raise "No field names found in left (from) source" unless @left.field_names && @left.field_names.size > 0
54+
@right = CSVSource.new(right, options)
55+
raise "No field names found in right (to) source" unless @right.field_names && @right.field_names.size > 0
56+
@warnings = []
57+
@diff_fields = get_diff_fields(@left.field_names, @right.field_names, options.fetch(:ignore_fields, []))
58+
@key_fields = @left.key_fields.map{ |kf| @diff_fields[kf] }
59+
diff(options)
60+
end
61+
62+
63+
# Performs a diff with the specified +options+.
64+
def diff(options = {})
65+
@diffs = diff_sources(@left, @right, @key_fields, @diff_fields, options)
66+
end
67+
68+
69+
# Returns a summary of the number of adds, deletes, moves, and updates.
70+
def summary
71+
summ = Hash.new{ |h, k| h[k] = 0 }
72+
@diffs.each{ |k, v| summ[v[:action]] += 1 }
73+
summ
74+
end
75+
76+
77+
[:adds, :deletes, :updates, :moves].each do |mthd|
78+
define_method mthd do
79+
action = mthd.to_s.chomp('s')
80+
@diffs.select{ |k, v| v[:action].downcase == action }
81+
end
82+
end
83+
84+
85+
# @return [Array<String>] an array of warning messages generated during the
86+
# diff process.
87+
def warnings
88+
@left.warnings + @right.warnings + @warnings
89+
end
90+
91+
92+
private
93+
94+
95+
# Given two sets of field names, determines the common set of fields present
96+
# in both, on which members can be diffed.
97+
def get_diff_fields(left_fields, right_fields, ignore_fields)
98+
diff_fields = []
99+
right_fields.each do |fld|
100+
if left_fields.include?(fld)
101+
diff_fields << fld unless ignore_fields.include?(fld)
102+
else
103+
@warnings << "Field '#{fld}' is missing from the left (from) file, and won't be diffed"
104+
end
105+
end
106+
diff_fields
107+
end
108+
109+
110+
include Algorithm
111+
112+
end

0 commit comments

Comments
 (0)