Skip to content

BUG: file hash uint64's converted to float64's in darshan.report.mod_read_all_records()  #438

@nawtrey

Description

@nawtrey

As mentioned here, we are seeing some file system hashes that are floats despite the name records (report.data["name_records"]) containing integer values.

I looked into this and apparently the issue is log-dependent, where some logs are generating pandas dataframes with ids/hashes of type int64 and other logs are returning dataframes with ids/hashes of type float64. I traced this back to darshan.report.mod_read_all_records(), in the if dtype == pandas code block. Each record (pre-concatenation) has id's that are either int64's or uint64's, but upon concatenation, get uint64's get converted to float64's. This is apparently expected behavior for pandas since np.int64's cannot contain the largest values from uint64's. I was able to find some discussion about it here.

This is unwanted behavior since we want to be able to make safe comparisons between the file hashes stored in report.data["name_records"] and the counter/fcounter dataframes. As far as solutions go, I don't think we can force pandas to keep the uint64 type when it concatenates the dataframes, but we could use astype() to change the float64's back to uint64's after the the concatenation step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions