Skip to content

Indiscriminate conversion of string fields to categorical is problematic #799

@jpn--

Description

@jpn--

Describe the bug
Most but not all fields initially encoded as strings are actually categorical. When they are categorical, conversion to an explicit categorical type is efficient. However, if they are not categorical (e.g. escort tour participants) or are loosely categorical but with potentially a lot of categories (vehicle type / age / fuel), the conversion to explicit categorical is not efficient.

In particular, converting non-categorical data to categorical ruins sharrow performance by triggering excessive recompiling, because every different categorical encoding is treated as a unique data type. This means, for example, if a "categorical" escort tour participants data column appears in a chooser table, then re-compiling will happen basically every time the model runs.

A fix will require not converting these fields to categorical data types.

This is quite possibly the problem in #756.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working/bug f

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions