Skip to content

[3.14] gh-131535: Fix stale example in html.parser docs, make examples doctests (GH-131551) #133589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 37 additions & 14 deletions Doc/library/html.parser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ Example HTML Parser Application

As a basic example, below is a simple HTML parser that uses the
:class:`HTMLParser` class to print out start tags, end tags, and data
as they are encountered::
as they are encountered:

.. testcode::

from html.parser import HTMLParser

Expand All @@ -63,7 +65,7 @@ as they are encountered::

The output will then be:

.. code-block:: none
.. testoutput::

Encountered a start tag: html
Encountered a start tag: head
Expand Down Expand Up @@ -230,7 +232,9 @@ Examples
--------

The following class implements a parser that will be used to illustrate more
examples::
examples:

.. testcode::

from html.parser import HTMLParser
from html.entities import name2codepoint
Expand Down Expand Up @@ -266,13 +270,17 @@ examples::

parser = MyHTMLParser()

Parsing a doctype::
Parsing a doctype:

.. doctest::

>>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
... '"http://www.w3.org/TR/html4/strict.dtd">')
Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"

Parsing an element with a few attributes and a title::
Parsing an element with a few attributes and a title:

.. doctest::

>>> parser.feed('<img src="python-logo.png" alt="The Python logo">')
Start tag: img
Expand All @@ -285,7 +293,9 @@ Parsing an element with a few attributes and a title::
End tag : h1

The content of ``script`` and ``style`` elements is returned as is, without
further parsing::
further parsing:

.. doctest::

>>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style
Expand All @@ -300,35 +310,48 @@ further parsing::
Data : alert("<strong>hello!</strong>");
End tag : script

Parsing comments::
Parsing comments:

.. doctest::

>>> parser.feed('<!-- a comment -->'
>>> parser.feed('<!--a comment-->'
... '<!--[if IE 9]>IE-specific content<![endif]-->')
Comment : a comment
Comment : a comment
Comment : [if IE 9]>IE-specific content<![endif]

Parsing named and numeric character references and converting them to the
correct char (note: these 3 references are all equivalent to ``'>'``)::
correct char (note: these 3 references are all equivalent to ``'>'``):

.. doctest::

>>> parser = MyHTMLParser()
>>> parser.feed('&gt;&#62;&#x3E;')
Data : >>>

>>> parser = MyHTMLParser(convert_charrefs=False)
>>> parser.feed('&gt;&#62;&#x3E;')
Named ent: >
Num ent : >
Num ent : >

Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once
(unless *convert_charrefs* is set to ``True``)::
(unless *convert_charrefs* is set to ``True``):

>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
.. doctest::

>>> for chunk in ['<sp', 'an>buff', 'ered', ' text</s', 'pan>']:
... parser.feed(chunk)
...
Start tag: span
Data : buff
Data : ered
Data : text
Data : text
End tag : span

Parsing invalid HTML (e.g. unquoted attributes) also works::
Parsing invalid HTML (e.g. unquoted attributes) also works:

.. doctest::

>>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>')
Start tag: p
Expand Down
Loading