Skip to content

Dropna Subnet changes timestamp format in to_csv() #29711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
grleblanc opened this issue Nov 19, 2019 · 4 comments
Open

Dropna Subnet changes timestamp format in to_csv() #29711

grleblanc opened this issue Nov 19, 2019 · 4 comments
Labels
Bug Datetime Datetime data dtype Output-Formatting __repr__ of pandas objects, to_string

Comments

@grleblanc
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime

date_example_string = "1911180945"
ts = datetime.datetime.strptime(
    date_example_string, "%y%m%d%H%M%S"
)
test_json = [
    {
        "created_at": "2019-11-18 16:28:42.932887",
        "foo": "bar",
    }
]

df = pd.DataFrame(test_json)
df["baz"] = ts

print ("=== before ===")
print (df.to_csv())
df["created_at"] = pd.to_datetime(
    df["created_at"], infer_datetime_format=True, errors="coerce"
)

print("=== after ===")
print (df.to_csv())

print("=== dropna ===")
df = df.dropna(subset=["created_at"])
print(df.to_csv())

Problem description

When using pd.dropna, it changes the format of a datetime column to a different format when calling to_csv()

=== before ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05

=== after ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05

=== dropna ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05.000000

As you can see after calling dropna the format of the baz column is now 2019-11-18 09:04:05.000000

Expected Output

=== before ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05

=== after ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05

=== dropna ===
,created_at,foo,baz
0,2019-11-18 16:28:42.932887,bar,2019-11-18 09:04:05

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.16.2
pytz : 2018.9
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@mroeschke
Copy link
Member

Getting a similar result on master. I supposed these results should align. Investigation and PRs welcome!

@mroeschke mroeschke added Bug Output-Formatting __repr__ of pandas objects, to_string Datetime Datetime data dtype labels Nov 20, 2019
@prakhar987
Copy link
Contributor

prakhar987 commented Nov 20, 2019

working on this

@prakhar987
Copy link
Contributor

I think the issue occurs after call to BlockManager's consolidate(), though i am unable to pinpoint the exact cause. Would appreciate any leads.

@burkbre
Copy link

burkbre commented Dec 10, 2019

I looked into this as well. I agree with @prakhar987. It seems to be stemming from the fact that BlockManager is grouping together the 2 datetime values in the DataFrame thus making it so the second datetime value is formatted like the first(leading to the trailing zeroes). If you want to format all datetimes the same in the csv you can simply make use of the date_format attribute which the CsvFormatter takes as a variable to format the datetimes however you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants