-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: read_xml iterparse doesn't handle multiple toplevel elements with lxml parser #47422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I cannot reproduce this issue with your exact code. What version of import lxml
print(lxml.__version__) The pandas' minimum recommended version for optional dependency for |
4.9.0 Edit: this Dockerfile reproduces the issue for me.
|
Thanks. Oddly now, I do reproduce issue with 4.5.2. I may have been using a development code version. Thinking more, this may not be a pandas issue but an an lxml issue especially since we follow the same setup as lxml docs. I will raise an issue on their mailing list with an iterparse reprex. It may be the unusual case of a comment before root element. They may even provide guidance since the iterparse / modifying the tree docs includes a note on the |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Note this feature is not yet released and planned for 1.5.0: #45724 (comment)
Expected Behavior
No exception is thrown and the document is parsed successfully. Perhaps don't try to delete if the element has no parent?
Installed Versions
5465f54
The text was updated successfully, but these errors were encountered: