Skip to content

ENH: support new-style float_format string in to_csv #49580

Open
@joooeey

Description

@joooeey

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The float_format parameter in pd.DataFrame.to_csv doesn't support modern (i.e. since Python 2.6) format strings yet.

The documentation says about the float_format parameter:

float_format : float_formatstr, Callable, default None Format string for floating
point numbers. If a Callable is given, it takes precedence over other
numeric formatting parameters, like decimal.

The word "format string" apparently means an old-style %-format string like "%.6f". However, the Python docs tend to use that word for a modern format string like "{:.6f}".

import pandas as pd

df = pd.DataFrame([0.1, 0.2])

for float_format in ["%.6f", "{:.6f}".format, "{:.6f}"]:
    print(float_format, ":\n", df.to_csv(float_format=float_format))

Out:

%.6f :
 ,0
0,0.100000
1,0.200000

<built-in method format of str object at 0x7f62be4e01f0> :
 ,0
0,0.100000
1,0.200000

{:.6f} :
 ,0
0,{:.6f}
1,{:.6f}

Feature Description

pseudo code:

class DataFrame:
...
    def to_csv(..., float_format, ...):
        if isinstance(float_format, str):
            if "%" in float_format:
                float_format = lambda x: float_format % x
            else:
                float_format = float_format.format
        ...
        out = float_format(value)
        ...

Alternative Solutions

Workaround

df.to_csv(float_format="{.6f}".format)

Documentation change

Document that:

  • only old-style % format strings are supported.
  • one can pass in a string's format method (e.g. float_format = "{:.6f}".format) if one wants to use a modern format string.

Additional Context

If this feature is implemented and used, one could get a minor speed-up in Python 3.10 compared to using %-strings (The same speedup is accessible by using the workaround described above.):

from timeit import timeit

num = 0.1

print("%:", timeit('"%.6f" % 0.1'))
print("format:", timeit('"{:.6f}".format(0.1)'))

setup = '''
from numpy.random import rand
import pandas as pd
df = pd.DataFrame(rand(1000, 1000))
'''

print("% to_csv:", timeit(
    'df.to_csv(float_format="%.6f")', setup=setup, number=10))
print("format to_csv:", timeit(
    'df.to_csv(float_format="{:.6f}".format)', setup=setup, number=10))

Output:

%: 0.10213060600017343
format: 0.10648653099997318
% to_csv: 7.168424273999335
format to_csv: 5.367143424999995

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions