Skip to content

read_plink returns bytes for variant_alleles not unicode #1209

@jeromekelleher

Description

@jeromekelleher

There's no good reason for returning bytes rather than utf8 unicode strings I think --- it can only lead to bugs in user code and inconsistencies in string handling (anyone remember Python 2???)

This is based on the "example" plink dataset in the test suite

       sg_ds = sgkit.io.plink.read_plink(path=path)
        print(sg_ds.variant_allele.values)
        print(sg_ds.variant_allele)

Gives

[[b'A' b'G']
 [b'T' b'C']]
<xarray.DataArray 'variant_allele' (variants: 2, alleles: 2)>
dask.array<astype, shape=(2, 2), dtype=|S1, chunksize=(2, 1), chunktype=numpy.ndarray>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions