Skip to content

Class HTMLParser may not be initialized properly as method ParserBase.__init__ is not called from its __init__ method #95813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MyreMylar opened this issue Aug 9, 2022 · 0 comments · Fixed by #95874
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@MyreMylar
Copy link

Bug report

Hello,

I receive the error in the title from LGTM.com when subclassing the HTMLParser class. Should HTMLParser technically be calling: super().__init__(), or something similar, in its own initialiser? I know adding this call won't improve the functionality as the BaseParser initialiser does nothing of interest for subclasses, but I guess it would stop annoying errors like this.

Relevant code is here:

class HTMLParser(_markupbase.ParserBase):
"""Find tags and other markup and call handler functions.
Usage:
p = HTMLParser()
p.feed(data)
...
p.close()
Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag(). The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks). If convert_charrefs is
True the character references are converted automatically to the
corresponding Unicode character (and self.handle_data() is no
longer split in chunks), otherwise they are passed by calling
self.handle_entityref() or self.handle_charref() with the string
containing respectively the named or numeric reference as the
argument.
"""
CDATA_CONTENT_ELEMENTS = ("script", "style")
def __init__(self, *, convert_charrefs=True):
"""Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
"""
self.convert_charrefs = convert_charrefs
self.reset()

And here:

class ParserBase:
"""Parser base class which provides some common support methods used
by the SGML/HTML and XHTML parsers."""
def __init__(self):
if self.__class__ is ParserBase:
raise RuntimeError(
"_markupbase.ParserBase must be subclassed")

@MyreMylar MyreMylar added the type-bug An unexpected behavior, bug, or error label Aug 9, 2022
corona10 added a commit to corona10/cpython that referenced this issue Aug 11, 2022
@ezio-melotti ezio-melotti self-assigned this Aug 11, 2022
corona10 added a commit to corona10/cpython that referenced this issue Aug 11, 2022
corona10 added a commit to corona10/cpython that referenced this issue Aug 17, 2022
corona10 added a commit to corona10/cpython that referenced this issue Aug 17, 2022
ezio-melotti pushed a commit that referenced this issue Aug 18, 2022
* gh-95813: Improve HTMLParser from the view of inheritance

* gh-95813: Add unittest

* Address code review
tiran pushed a commit to tiran/cpython that referenced this issue Aug 19, 2022
…on#95874)

* pythongh-95813: Improve HTMLParser from the view of inheritance

* pythongh-95813: Add unittest

* Address code review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants