BUG: file hash uint64's converted to float64's in darshan.report.mod_read_all_records() 

As mentioned [here](https://github.com/darshan-hpc/darshan/pull/397#issuecomment-893984211), we are seeing some file system hashes that are floats despite the name records (`report.data["name_records"]`) containing integer values. 

I looked into this and apparently the issue is log-dependent, where some logs are generating `pandas` dataframes with ids/hashes of type `int64` and other logs are returning dataframes with ids/hashes of type `float64`. I traced this back to `darshan.report.mod_read_all_records()`, in the `if dtype == pandas` [code block](https://github.com/darshan-hpc/darshan/blob/main/darshan-util/pydarshan/darshan/report.py#L620). Each record (pre-concatenation) has id's that are either `int64`'s or `uint64`'s, but upon concatenation, get `uint64`'s get converted to `float64`'s. This is apparently expected behavior for `pandas` since `np.int64`'s cannot contain the largest values from `uint64`'s. I was able to find some discussion about it [here](https://github.com/pandas-dev/pandas/issues/34356).

This is unwanted behavior since we want to be able to make safe comparisons between the file hashes stored in `report.data["name_records"]` and the counter/fcounter dataframes.  As far as solutions go, I don't think we can force `pandas` to keep the `uint64` type when it concatenates the dataframes, but we *could* use `astype()` to change the `float64`'s back to `uint64`'s after the the concatenation step.   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: file hash uint64's converted to float64's in darshan.report.mod_read_all_records() #438

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: file hash uint64's converted to float64's in darshan.report.mod_read_all_records() #438

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions