BUG: pd.NA.format fails with format_specs #34740

topper-123 · 2020-06-13T08:22:28Z

pd.NA fails if passed to a format string and format parameters are supplied. This is different behaviour than np.nan and makes converting arrays containing pd.NA to strings very brittle and annoying.

Examples:

>>> format(pd.NA)
'<NA>'  # master and PR, ok
>>> format(pd.NA, ".1f")
TypeError  # master
'<NA>'  # this PR
>>> format(pd.NA, ">5")
TypeError  # master
' <NA>'  # this PR, tries to behave like a string, then falls back to '<NA>', like np.na

The new behaviour mirrors the behaviour of np.nan.

jorisvandenbossche · 2020-06-13T08:53:39Z

@topper-123 Thanks for looking into this!

Personally, instead of relying on a try/except of NaN to check what is supported, I would rather try to understand how and what works for NaN, and try to implement the same logic here.

For example, I suppose that format(pd.NA, ">10.1f") will fail on this branch? While for NaN this works.

Now, properly implementing __format__ manually might be too complicated though, and the "fallback" of formatting the string might already be useful anyway.

jorisvandenbossche · 2020-06-13T09:06:42Z

Hmm, np.nan is just a float, so using the builtin float.__format__, I think, which is probably a bit complicated to replicate ...

Another idea: how robust would it be if we format some other value (eg np.nan), and then replace "nan" with "<NA>" in the result? We would need a bit of logic to potentially replace " nan" instead of "nan" if possible, but for the rest it might work in many cases?

topper-123 · 2020-06-13T09:42:37Z

Another idea: how robust would it be if we format some other value (eg np.nan), and then replace "nan" with "" in the result?

Wouldn't work out of the box, e.g. "nantes_{}.format(np.nan)", I don't think adding logic to get the correct "nan" is the right approach, it's too complicated IMO.

Another idea: pd.NA is supposed to work with all dtypes, not just floats, so probably should'nt be restricted to format_specs accepted by float. How about a simple:

def __format__(self, format_spec):
    try:
        return self.__repr__().__format__(format_spec)
    except ValueError:
        return self.__repr__()

This would allow string format_spec to work (as they do for floats already) and make self.repr() a fallback that always works.

jorisvandenbossche · 2020-06-13T09:49:25Z

Wouldn't work out of the box, e.g. "nantes_{}.format(np.nan)",

I don't fully know how the inner python details of this method work, but I suppose the above would end up calling pd.NA.__format__("") ?As long that the nan -> NA replacement happens inside the __format__ function, I would think the above to work fine.

How about a simple:

I think that is certainly better (avoiding only accepting the rules valid for float), but that still wouldn't work for the example I gave of format(pd.NA, ">10.1f") (I think).

(now, it's certainly already fixing a set of use cases, so could also be a good start)

jorisvandenbossche · 2020-06-13T09:58:13Z

Very quick try with

    def __format__(self, format_spec) -> str:
        res = format(np.nan, format_spec)
        return res.replace("nan", "<NA>")

works for the example you gave, and also for the example I gave:

In [1]: "nantes_{}".format(pd.NA)  
Out[1]: 'nantes_<NA>'

In [3]: format(pd.NA, ">10.1f")
Out[3]: '       <NA>'

Of course, the above still needs 1) take the 1 char length difference into account in case there is whitespace (like the second example) and 2) still fallback to formatting with the string repr and finally the plain <NA> string repr (like your example impl at #34740 (comment)).

topper-123 · 2020-06-13T10:22:22Z

Yeah, __format__ only works inside the brackets, so you're right there.

The length format spec would be one special case that would need to be handled, but are there other? I don't think so for floats, but there could be for other format_specs?

topper-123 · 2020-06-13T13:28:31Z

I've made the simpler implementation that I suggested. I'm a bit hesitant that adding the special cases will make this too complex.

jreback

lgtm cc @jorisvandenbossche

jorisvandenbossche

Yeah, I am fine with the simplest solution that at least fixes the basic formatting, for now. I still think it wouldn't be hard to support proper floating point / numeric formatting (with the NaN formatting and replace afterwards)

jreback · 2020-06-15T21:36:42Z

thanks @topper-123

BUG: format pd.NA

dd19a1d

topper-123 force-pushed the format_na branch from 586ce30 to b7bfd18 Compare June 13, 2020 08:24

changes

9c97376

topper-123 force-pushed the format_na branch from b7bfd18 to 9c97376 Compare June 13, 2020 08:25

make NA.format__ more permissive

c3f236d

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Output-Formatting __repr__ of pandas objects, to_string labels Jun 14, 2020

jreback added this to the 1.1 milestone Jun 14, 2020

jreback approved these changes Jun 14, 2020

View reviewed changes

jorisvandenbossche approved these changes Jun 15, 2020

View reviewed changes

jreback merged commit 594dc2a into pandas-dev:master Jun 15, 2020

topper-123 deleted the format_na branch June 15, 2020 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: pd.NA.format fails with format_specs #34740

BUG: pd.NA.format fails with format_specs #34740

Uh oh!

topper-123 commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020 •

edited

Loading

Uh oh!

topper-123 commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020 •

edited

Loading

Uh oh!

topper-123 commented Jun 13, 2020 •

edited

Loading

Uh oh!

topper-123 commented Jun 13, 2020

Uh oh!

jreback left a comment

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jreback commented Jun 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

BUG: pd.NA.__format__ fails with format_specs #34740

BUG: pd.NA.__format__ fails with format_specs #34740

Uh oh!

Conversation

topper-123 commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topper-123 commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020

Uh oh!

jorisvandenbossche commented Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topper-123 commented Jun 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topper-123 commented Jun 13, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BUG: pd.NA.format fails with format_specs #34740

BUG: pd.NA.format fails with format_specs #34740

jorisvandenbossche commented Jun 13, 2020 •

edited

Loading

jorisvandenbossche commented Jun 13, 2020 •

edited

Loading

topper-123 commented Jun 13, 2020 •

edited

Loading