Skip to content

Commit 731657c

Browse files
authored
Merge pull request #47 from davidanthoff/remove-dataarray-dataframe
Remove dependency on DataArray and DataFrame
2 parents 89b76f9 + ed7fde2 commit 731657c

File tree

7 files changed

+54
-362
lines changed

7 files changed

+54
-362
lines changed

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The ExcelReaders.jl package is licensed under the MIT "Expat" License:
22

3-
> Copyright (c) 2016: David Anthoff.
3+
> Copyright (c) 2016-2018: David Anthoff.
44
>
55
> Permission is hereby granted, free of charge, to any person obtaining
66
> a copy of this software and associated documentation files (the

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# ExcelReaders.jl v0.9.0 Release Notes
2+
* Drop support for DataFrames.
3+
* Use Dates.Time.
4+
* Use DataValue for missing values.
5+
* Fix deprecated syntax.
6+
17
# ExcelReaders.jl v0.8.2 Release Notes
28
* Fix bug in readxlsheet
39

README.md

Lines changed: 9 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,14 @@
1111

1212
ExcelReaders is a package that provides functionality to read Excel files.
1313

14+
**WARNING**: Version v0.9.0 removed all support for [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl)
15+
from this package. The [ExcelFiles.jl](https://github.com/davidanthoff/ExcelFiles.jl)
16+
package now provides functionality to read data from an Excel file into
17+
a ``DataFrame`` (or any other table type), and users are encouraged to use
18+
that package for tabular data going forward. Version v0.9.0 also no longer
19+
uses [DataArrays.jl](https://github.com/JuliaStats/DataArrays.jl), but instead
20+
is based on [DataValues.jl](https://github.com/davidanthoff/DataValues.jl).
21+
1422
## Installation
1523

1624
Use ``Pkg.add("ExcelReaders")`` in Julia to install ExcelReaders and its dependencies.
@@ -31,7 +39,7 @@ using ExcelReaders
3139
data = readxl("Filename.xlsx", "Sheet1!A1:C4")
3240
````
3341

34-
This will return a ``DataMatrix{Any}`` with all the data in the cell range A1 to C4 on Sheet1 in the Excel file Filename.xlsx.
42+
This will return an array with all the data in the cell range A1 to C4 on Sheet1 in the Excel file Filename.xlsx.
3543

3644
If you expect to read multiple ranges from the same Excel file you can get much better performance by opening the Excel file only once:
3745

@@ -62,37 +70,3 @@ This will read all content on Sheet1 in the file Filename.xlsx. Eventual blank r
6270
- ``ncols`` accepts either ``:all`` (default) or a postiive integer. With ``:all``, all columns (except skipped ones) are read. An integer specifies the exact number of columns to be read.
6371

6472
``readxlsheet`` also accepts an ExcelFile (as obtained from ``openxl``) as its first argument.
65-
66-
## Reading into a DataFrame
67-
68-
To read into a DataFrame:
69-
70-
````julia
71-
using ExcelReaders
72-
using DataFrames
73-
74-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4")
75-
````
76-
77-
This code will use the first row in the range A1:C4 as the column names in the DataFrame.
78-
79-
To read in data without a header row use
80-
81-
````julia
82-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4", header=false)
83-
````
84-
85-
This will auto-generate column names. Alternatively you can specify your own names:
86-
87-
````julia
88-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4",
89-
header=false, colnames=[:name1, :name2, :name3])
90-
````
91-
92-
You can also combine ``header=true`` and a custom ``colnames`` list, in that case the first row in the specified range will just be skipped.
93-
94-
To read the whole sheet into a DataFrame (respective keyword arguments (`header`, `skipstartrows` etc.) should work as expected):
95-
96-
```julia
97-
df = readxlsheet(DataFrame, "Filename.xlsx", "Sheet1")
98-
```

REQUIRE

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
julia 0.6
2-
DataArrays
3-
DataFrames
2+
DataValues
43
PyCall 1.5

src/ExcelReaders.jl

Lines changed: 10 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@ __precompile__()
22

33
module ExcelReaders
44

5-
using PyCall, DataArrays, DataFrames
6-
7-
import Base.show
5+
using PyCall, DataValues
86

97
export openxl, readxl, readxlsheet, ExcelErrorCell, ExcelFile, readxlnames, readxlrange
108

@@ -23,7 +21,7 @@ A handle to an open Excel file.
2321
2422
You can create an instance of an ``ExcelFile`` by calling ``openxl``.
2523
"""
26-
type ExcelFile
24+
mutable struct ExcelFile
2725
workbook::PyObject
2826
filename::AbstractString
2927
end
@@ -36,25 +34,21 @@ An Excel cell that has an Excel error.
3634
You cannot create ``ExcelErrorCell`` objects, they are returned if a cell in an
3735
Excel file has an Excel error.
3836
"""
39-
type ExcelErrorCell
37+
mutable struct ExcelErrorCell
4038
errorcode::Int
4139
end
4240

43-
# TODO Remove this type once there is a Time type in Dates
44-
immutable Time
45-
hours::Int
46-
minutes::Int
47-
seconds::Int
48-
end
49-
50-
function show(io::IO, o::ExcelFile)
41+
function Base.show(io::IO, o::ExcelFile)
5142
print(io, "ExcelFile <$(o.filename)>")
5243
end
5344

54-
function show(io::IO, o::ExcelErrorCell)
45+
function Base.show(io::IO, o::ExcelErrorCell)
5546
print(io, xlrd[:error_text_from_code][o.errorcode])
5647
end
5748

49+
Base.promote_rule(::Type{DataValue{T}}, ::Type{ExcelErrorCell}) where {T}= Any
50+
Base.promote_rule(::Type{ExcelErrorCell}, ::Type{DataValue{T}}) where {T} = Any
51+
5852
"""
5953
openxl(filename)
6054
@@ -219,7 +213,7 @@ function get_cell_value(ws, row, col, wb)
219213
elseif celltype == xlrd[:XL_CELL_DATE]
220214
date_year,date_month,date_day,date_hour,date_minute,date_sec = xlrd[:xldate_as_tuple](cellval, wb[:datemode])
221215
if date_month==0
222-
return Time(date_hour, date_minute, date_sec)
216+
return Base.Dates.Time(date_hour, date_minute, date_sec)
223217
else
224218
return DateTime(date_year, date_month, date_day, date_hour, date_minute, date_sec)
225219
end
@@ -241,7 +235,7 @@ function readxl_internal(file::ExcelFile, sheetname::AbstractString, startrow::I
241235
return get_cell_value(ws, startrow, startcol, wb)
242236
else
243237

244-
data = DataArray(Any, endrow-startrow+1,endcol-startcol+1)
238+
data = Array{Any}(endrow-startrow+1,endcol-startcol+1)
245239

246240
for row in startrow:endrow
247241
for col in startcol:endcol
@@ -253,119 +247,6 @@ function readxl_internal(file::ExcelFile, sheetname::AbstractString, startrow::I
253247
end
254248
end
255249

256-
function readxl(::Type{DataFrame}, filename::AbstractString, range::AbstractString; header::Bool=true, colnames::Vector{Symbol}=Symbol[])
257-
excelfile = openxl(filename)
258-
259-
readxl(DataFrame, excelfile, range, header=header, colnames=colnames)
260-
end
261-
262-
function readxl(::Type{DataFrame}, file::ExcelFile, range::AbstractString; header::Bool=true, colnames::Vector{Symbol}=Symbol[])
263-
sheetname, startrow, startcol, endrow, endcol = convert_ref_to_sheet_row_col(range)
264-
265-
readxl_internal(DataFrame, file, sheetname, startrow, startcol, endrow, endcol, header=header, colnames=colnames)
266-
end
267-
268-
function readxlsheet(::Type{DataFrame}, filename::AbstractString, sheetindex::Int; header::Bool=true, colnames::Vector{Symbol}=Symbol[], args...)
269-
excelfile = openxl(filename)
270-
readxlsheet(DataFrame, excelfile, sheetindex; args...)
271-
end
272-
273-
function readxlsheet(::Type{DataFrame}, excelfile::ExcelFile, sheetindex::Int; header::Bool=true, colnames::Vector{Symbol}=Symbol[], args...)
274-
sheetname = excelfile.workbook[:sheet_names]()[sheetindex]
275-
readxlsheet(DataFrame, excelfile, sheetname; args...)
276-
end
277-
278-
function readxlsheet(::Type{DataFrame}, filename::AbstractString, sheetname::AbstractString; header::Bool=true, colnames::Vector{Symbol}=Symbol[], args...)
279-
excelfile = openxl(filename)
280-
readxlsheet(DataFrame, excelfile, sheetname; header=header, colnames=colnames, args...)
281-
end
282-
283-
function readxlsheet(::Type{DataFrame}, excelfile::ExcelFile, sheetname::AbstractString; header::Bool=true, colnames::Vector{Symbol}=Symbol[], args...)
284-
sheet = excelfile.workbook[:sheet_by_name](sheetname)
285-
startrow, startcol, endrow, endcol = convert_args_to_row_col(sheet; args...)
286-
readxl_internal(DataFrame, excelfile, sheetname, startrow, startcol, endrow, endcol; header=header, colnames=colnames)
287-
end
288-
289-
function readxl_internal(::Type{DataFrame}, file::ExcelFile, sheetname::AbstractString, startrow::Int, startcol::Int, endrow::Int, endcol::Int; header::Bool=true, colnames::Vector{Symbol}=Symbol[])
290-
data = readxl_internal(file, sheetname, startrow, startcol, endrow, endcol)
291-
292-
nrow, ncol = size(data)
293-
294-
if length(colnames)==0
295-
if header
296-
headervec = data[1, :]
297-
NAcol = Bool.(isna.(headervec))
298-
headervec[NAcol] = DataFrames.gennames(countnz(NAcol))
299-
300-
# This somewhat complicated conditional makes sure that column names
301-
# that are integer numbers end up without an extra ".0" as their name
302-
colnames = [isa(i, AbstractFloat) ? ( modf(i)[1]==0.0 ? Symbol(Int(i)) : Symbol(string(i)) ) : Symbol(i) for i in vec(headervec)]
303-
else
304-
colnames = DataFrames.gennames(ncol)
305-
end
306-
elseif length(colnames)!=ncol
307-
error("Length of colnames must equal number of columns in selected range")
308-
end
309-
310-
columns = Array{Any}(ncol)
311-
312-
for i=1:ncol
313-
if header
314-
vals = data[2:end,i]
315-
else
316-
vals = data[:,i]
317-
end
318-
319-
# Check whether all non-NA values in this column
320-
# are of the same type
321-
all_one_type = true
322-
found_first_type = false
323-
type_of_el = Any
324-
NAs_present = false
325-
for val=vals
326-
if !found_first_type
327-
if !isna(val)
328-
type_of_el = typeof(val)
329-
found_first_type = true
330-
end
331-
elseif !isna(val) && (typeof(val)!=type_of_el)
332-
all_one_type = false
333-
if NAs_present
334-
break
335-
end
336-
end
337-
if isna(val)
338-
NAs_present = true
339-
if all_one_type == false
340-
break
341-
end
342-
end
343-
end
344-
345-
if all_one_type
346-
if NAs_present
347-
# TODO use the following line instead of the shim once upstream
348-
# bug is fixed
349-
#columns[i] = convert(DataArray{type_of_el},vals)
350-
shim_newarray = DataArray(type_of_el, length(vals))
351-
for l=1:length(vals)
352-
shim_newarray[l] = vals[l]
353-
end
354-
columns[i] = shim_newarray
355-
else
356-
# TODO Decide whether this should be converted to Array instead of DataArray
357-
columns[i] = convert(DataArray{type_of_el},vals)
358-
end
359-
else
360-
columns[i] = vals
361-
end
362-
end
363-
364-
df = DataFrame(columns, colnames)
365-
366-
return df
367-
end
368-
369250
function readxlnames(f::ExcelFile)
370251
return [lowercase(i[:name]) for i in f.workbook[:name_obj_list] if i[:hidden]==0]
371252
end

src/package_documentation.jl

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ using ExcelReaders
3333
data = readxl("Filename.xlsx", "Sheet1!A1:C4")
3434
````
3535
36-
This will return a ``DataMatrix{Any}`` with all the data in the cell range A1 to
36+
This will return an array with all the data in the cell range A1 to
3737
C4 on Sheet1 in the Excel file Filename.xlsx.
3838
3939
If you expect to read multiple ranges from the same Excel file you can get much
@@ -74,40 +74,5 @@ all columns (except skipped ones) are read. An integer specifies the exact numbe
7474
7575
``readxlsheet`` also accepts an ``ExcelFile`` (as obtained from ``openxl``) as its
7676
first argument.
77-
78-
## Reading into a DataFrame
79-
80-
To read into a DataFrame:
81-
82-
````julia
83-
using ExcelReaders, DataFrames
84-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4")
85-
````
86-
87-
This code will use the first row in the range A1:C4 as the column names in the
88-
DataFrame.
89-
90-
To read in data without a header row use
91-
92-
````julia
93-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4", header=false)
94-
````
95-
96-
This will auto-generate column names. Alternatively you can specify your own names:
97-
98-
````julia
99-
df = readxl(DataFrame, "Filename.xlsx", "Sheet1!A1:C4",
100-
header=false, colnames=[:name1, :name2, :name3])
101-
````
102-
103-
You can also combine ``header=true`` and a custom ``colnames`` list, in that
104-
case the first row in the specified range will just be skipped.
105-
106-
To read the whole sheet into a DataFrame (respective keyword arguments (header, skipstartrows etc.)
107-
should work as expected):
108-
109-
````julia
110-
df = readxlsheet(DataFrame, "Filename.xlsx", "Sheet1")
111-
````
11277
"""
11378
tutorial = nothing

0 commit comments

Comments
 (0)