Skip to content

ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier #58154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
cmjcharlton opened this issue Apr 5, 2024 · 0 comments · Fixed by #58155
Closed
1 of 3 tasks

ENH: Support reading value labels for Stata formats 108 (Stata 6) and earlier #58154

cmjcharlton opened this issue Apr 5, 2024 · 0 comments · Fixed by #58155
Labels
Enhancement IO Stata read_stata, to_stata

Comments

@cmjcharlton
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently Pandas supports reading value labels for data files saved in 111 (Stata 7 SE) and later formats. It would be nice if this could be extended to all supported format versions.

Feature Description

This could be implemented by extending the function _read_value_labels in pandas/io/stata.py.

Value labels in the 108 format use the same structure as later versions, except that label names are restricted to 8 characters, plus a null terminator [1].

Values labels prior to the 108 format used a simple structure for each label containing a list of codes, followed by a list of 8 character strings corresponding to each code [2].

References:
[1] Description of the 108 .dta format, section 5.6 Value Labels (dta_108.txt)
[2] Description of the 105 .dta format, section 5.6 Value Labels (dta_105.txt)

Alternative Solutions

Currently the only way to import these labels is to open the file in another piece of software that does support reading them, and then save them to a more recent version for which Pandas has value label support.

Additional Context

No response

@cmjcharlton cmjcharlton added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 5, 2024
@rhshadrach rhshadrach added IO Stata read_stata, to_stata Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member Needs Discussion Requires discussion from core team before further action labels Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Stata read_stata, to_stata
Projects
None yet
2 participants