Skip to content

Incorrect escaping of & without a trailing space #31

Closed
@TimG1964

Description

@TimG1964

Maybe this is somehow connected with #17. Or maybe I'm simply misunderstanding something.

I'm using XML.jl to work directly with Excel files. A string like this is a valid cell entry in Excel:

"https://myhouse.sharepoint.com/:i:/r/sites/CorporateServices-COR08PublicAffairsandRelations/Shared%20Documents/COR08%20Public%20Affairs%20and%20Relations/Case%20Studies/Photos/My%20favourite%20society/MAT_5946%20(1).jpg?csf=1&web=1&e=tIywhD"

I can paste it directly in to an Excel workbook by hand.

But I cannot use XML.jl to add it to a cell programmatically because XML.escape() won't escape the two ampersands. The escape function looks for a space after the ampersand, and so these two aren't found.

julia> const escape_chars = ('&' => "&amp;", '<' => "&lt;", '>' => "&gt;", "'" => "&apos;", '"' => "&quot;")
('&' => "&amp;", '<' => "&lt;", '>' => "&gt;", "'" => "&apos;", '"' => "&quot;")

julia> function escape(x::String)
           result = replace(x, r"&(?=\s;)" => "&amp;")
           for (pat, r) in escape_chars[2:end]
                   result = replace(result, pat => r)
           end
           return result
       end
escape (generic function with 1 method)

julia> x = "https://myhouse.sharepoint.com/:i:/r/sites/CorporateServices-COR08PublicAffairsandRelations/Shared%20Documents/COR08%20Public%20Affairs%20and%20Relations/Case%20Studies/Photos/My%20favourite%20society/MAT_5946%20(1).jpg?csf=1&web=1&e=tIywhD"
"https://myhouse.sharepoint.com/:i:/r/sites/CorporateServices-COR08PublicAffairsandRelations/Shared%20Documents/COR08%20Public%20Affairs%20and%20Relations/Case%20Studies/Photos/My%20favourite%20society/MAT_5946%20(1).jpg?csf=1&web=1&e=tIywhD"

julia> escape(x)
"https://myhouse.sharepoint.com/:i:/r/sites/CorporateServices-COR08PublicAffairsandRelations/Shared%20Documents/COR08%20Public%20Affairs%20and%20Relations/Case%20Studies/Photos/My%20favourite%20society/MAT_5946%20(1).jpg?csf=1&web=1&e=tIywhD"

julia> 

A simple change of the regex to r"&(?!amp;|quot;|apos;|gt;|lt;)" seems to work for me, but I'm not sure if this is a general solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions