Skip to content

CSV.jl fails to parse a file that DuckDB is fine with #1143

Open
@asinghvi17

Description

@asinghvi17

MWE:

import CSV, QuackIO
using DataFrames

file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")

# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works

# now try CSV
CSV.read(file, DataFrame) # errors

The error:

ERROR: TaskFailedException

    nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
    ", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
    
    Stacktrace:
     [1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
     [2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
     [3] parserow
       @ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
     [4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
     [5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
     [6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
       @ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:455
 [2] macro expansion
   @ ./task.jl:487 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
 [4] File
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
 [5] #File#32
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
 [6] CSV.File(source::String)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
 [7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
 [8] read(source::String, sink::Type)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
 [9] top-level scope
   @ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.

I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions