Skip to content

Commit 864c3d9

Browse files
authored
Add benchmark for Docutils (#216)
This adds a benchmark of Docutils as an application. I thought a reasonable test load was Docutils' own docs (takes ~4.5-5s on my computer). I haven't submitted a benchmark before---I don't know the best way of storing the input data, so for speed I copied the documentation into git here (the docs are public domain).
1 parent 2082c53 commit 864c3d9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+37435
-0
lines changed

doc/benchmarks.rst

+9
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,15 @@ Pseudo-code of the benchmark::
174174
See the `Dulwich project <https://www.dulwich.io/>`_.
175175

176176

177+
178+
docutils
179+
--------
180+
181+
Use Docutils_ to convert Docutils' documentation to HTML.
182+
Representative of building a medium-sized documentation set.
183+
184+
.. _Docutils: https://docutils.sourceforge.io/
185+
177186
fannkuch
178187
--------
179188

pyperformance/data-files/benchmarks/MANIFEST

+1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ deepcopy <local>
1818
deltablue <local>
1919
django_template <local>
2020
dulwich_log <local>
21+
docutils <local>
2122
fannkuch <local>
2223
float <local>
2324
genshi <local>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
========================
2+
The Docutils Publisher
3+
========================
4+
5+
:Author: David Goodger
6+
7+
:Date: $Date$
8+
:Revision: $Revision$
9+
:Copyright: This document has been placed in the public domain.
10+
11+
.. contents::
12+
13+
14+
The ``docutils.core.Publisher`` class is the core of Docutils,
15+
managing all the processing and relationships between components. See
16+
`PEP 258`_ for an overview of Docutils components.
17+
18+
The ``docutils.core.publish_*`` convenience functions are the normal
19+
entry points for using Docutils as a library.
20+
21+
See `Inside A Docutils Command-Line Front-End Tool`_ for an overview
22+
of a typical Docutils front-end tool, including how the Publisher
23+
class is used.
24+
25+
.. _PEP 258: ../peps/pep-0258.html
26+
.. _Inside A Docutils Command-Line Front-End Tool: ../howto/cmdline-tool.html
27+
28+
29+
Publisher Convenience Functions
30+
===============================
31+
32+
Each of these functions set up a ``docutils.core.Publisher`` object,
33+
then call its ``publish`` method. ``docutils.core.Publisher.publish``
34+
handles everything else. There are several convenience functions in
35+
the ``docutils.core`` module:
36+
37+
:_`publish_cmdline()`: for command-line front-end tools, like
38+
``rst2html.py``. There are several examples in the ``tools/``
39+
directory. A detailed analysis of one such tool is in `Inside A
40+
Docutils Command-Line Front-End Tool`_
41+
42+
:_`publish_file()`: for programmatic use with file-like I/O. In
43+
addition to writing the encoded output to a file, also returns the
44+
encoded output as a string.
45+
46+
:_`publish_string()`: for programmatic use with string I/O. Returns
47+
the encoded output as a string.
48+
49+
:_`publish_parts()`: for programmatic use with string input; returns a
50+
dictionary of document parts. Dictionary keys are the names of
51+
parts, and values are Unicode strings; encoding is up to the client.
52+
Useful when only portions of the processed document are desired.
53+
See `publish_parts() Details`_ below.
54+
55+
There are usage examples in the `docutils/examples.py`_ module.
56+
57+
:_`publish_doctree()`: for programmatic use with string input; returns a
58+
Docutils document tree data structure (doctree). The doctree can be
59+
modified, pickled & unpickled, etc., and then reprocessed with
60+
`publish_from_doctree()`_.
61+
62+
:_`publish_from_doctree()`: for programmatic use to render from an
63+
existing document tree data structure (doctree); returns the encoded
64+
output as a string.
65+
66+
:_`publish_programmatically()`: for custom programmatic use. This
67+
function implements common code and is used by ``publish_file``,
68+
``publish_string``, and ``publish_parts``. It returns a 2-tuple:
69+
the encoded string output and the Publisher object.
70+
71+
.. _Inside A Docutils Command-Line Front-End Tool: ../howto/cmdline-tool.html
72+
.. _docutils/examples.py: ../../docutils/examples.py
73+
74+
75+
Configuration
76+
-------------
77+
78+
To pass application-specific setting defaults to the Publisher
79+
convenience functions, use the ``settings_overrides`` parameter. Pass
80+
a dictionary of setting names & values, like this::
81+
82+
overrides = {'input_encoding': 'ascii',
83+
'output_encoding': 'latin-1'}
84+
output = publish_string(..., settings_overrides=overrides)
85+
86+
Settings from command-line options override configuration file
87+
settings, and they override application defaults. For details, see
88+
`Docutils Runtime Settings`_. See `Docutils Configuration`_ for
89+
details about individual settings.
90+
91+
.. _Docutils Runtime Settings: ./runtime-settings.html
92+
.. _Docutils Configuration: ../user/config.html
93+
94+
95+
Encodings
96+
---------
97+
98+
The default output encoding of Docutils is UTF-8.
99+
Docutils may introduce some non-ASCII text if you use
100+
`auto-symbol footnotes`_ or the `"contents" directive`_.
101+
102+
.. _auto-symbol footnotes:
103+
../ref/rst/restructuredtext.html#auto-symbol-footnotes
104+
.. _"contents" directive:
105+
../ref/rst/directives.html#table-of-contents
106+
107+
108+
``publish_parts()`` Details
109+
===========================
110+
111+
The ``docutils.core.publish_parts()`` convenience function returns a
112+
dictionary of document parts. Dictionary keys are the names of parts,
113+
and values are Unicode strings.
114+
115+
Each Writer component may publish a different set of document parts,
116+
described below. Not all writers implement all parts.
117+
118+
119+
Parts Provided By All Writers
120+
-----------------------------
121+
122+
_`encoding`
123+
The output encoding setting.
124+
125+
_`version`
126+
The version of Docutils used.
127+
128+
_`whole`
129+
``parts['whole']`` contains the entire formatted document.
130+
131+
132+
Parts Provided By the HTML Writers
133+
----------------------------------
134+
135+
HTML4 Writer
136+
````````````
137+
138+
_`body`
139+
``parts['body']`` is equivalent to parts['fragment_']. It is
140+
*not* equivalent to parts['html_body_'].
141+
142+
_`body_prefix`
143+
``parts['body_prefix']`` contains::
144+
145+
</head>
146+
<body>
147+
<div class="document" ...>
148+
149+
and, if applicable::
150+
151+
<div class="header">
152+
...
153+
</div>
154+
155+
_`body_pre_docinfo`
156+
``parts['body_pre_docinfo]`` contains (as applicable)::
157+
158+
<h1 class="title">...</h1>
159+
<h2 class="subtitle" id="...">...</h2>
160+
161+
_`body_suffix`
162+
``parts['body_suffix']`` contains::
163+
164+
</div>
165+
166+
(the end-tag for ``<div class="document">``), the footer division
167+
if applicable::
168+
169+
<div class="footer">
170+
...
171+
</div>
172+
173+
and::
174+
175+
</body>
176+
</html>
177+
178+
_`docinfo`
179+
``parts['docinfo']`` contains the document bibliographic data, the
180+
docinfo field list rendered as a table.
181+
182+
_`footer`
183+
``parts['footer']`` contains the document footer content, meant to
184+
appear at the bottom of a web page, or repeated at the bottom of
185+
every printed page.
186+
187+
_`fragment`
188+
``parts['fragment']`` contains the document body (*not* the HTML
189+
``<body>``). In other words, it contains the entire document,
190+
less the document title, subtitle, docinfo, header, and footer.
191+
192+
_`head`
193+
``parts['head']`` contains ``<meta ... />`` tags and the document
194+
``<title>...</title>``.
195+
196+
_`head_prefix`
197+
``parts['head_prefix']`` contains the XML declaration, the DOCTYPE
198+
declaration, the ``<html ...>`` start tag and the ``<head>`` start
199+
tag.
200+
201+
_`header`
202+
``parts['header']`` contains the document header content, meant to
203+
appear at the top of a web page, or repeated at the top of every
204+
printed page.
205+
206+
_`html_body`
207+
``parts['html_body']`` contains the HTML ``<body>`` content, less
208+
the ``<body>`` and ``</body>`` tags themselves.
209+
210+
_`html_head`
211+
``parts['html_head']`` contains the HTML ``<head>`` content, less
212+
the stylesheet link and the ``<head>`` and ``</head>`` tags
213+
themselves. Since ``publish_parts`` returns Unicode strings and
214+
does not know about the output encoding, the "Content-Type" meta
215+
tag's "charset" value is left unresolved, as "%s"::
216+
217+
<meta http-equiv="Content-Type" content="text/html; charset=%s" />
218+
219+
The interpolation should be done by client code.
220+
221+
_`html_prolog`
222+
``parts['html_prolog]`` contains the XML declaration and the
223+
doctype declaration. The XML declaration's "encoding" attribute's
224+
value is left unresolved, as "%s"::
225+
226+
<?xml version="1.0" encoding="%s" ?>
227+
228+
The interpolation should be done by client code.
229+
230+
_`html_subtitle`
231+
``parts['html_subtitle']`` contains the document subtitle,
232+
including the enclosing ``<h2 class="subtitle">`` & ``</h2>``
233+
tags.
234+
235+
_`html_title`
236+
``parts['html_title']`` contains the document title, including the
237+
enclosing ``<h1 class="title">`` & ``</h1>`` tags.
238+
239+
_`meta`
240+
``parts['meta']`` contains all ``<meta ... />`` tags.
241+
242+
_`stylesheet`
243+
``parts['stylesheet']`` contains the embedded stylesheet or
244+
stylesheet link.
245+
246+
_`subtitle`
247+
``parts['subtitle']`` contains the document subtitle text and any
248+
inline markup. It does not include the enclosing ``<h2>`` &
249+
``</h2>`` tags.
250+
251+
_`title`
252+
``parts['title']`` contains the document title text and any inline
253+
markup. It does not include the enclosing ``<h1>`` & ``</h1>``
254+
tags.
255+
256+
257+
PEP/HTML Writer
258+
```````````````
259+
260+
The PEP/HTML writer provides the same parts as the `HTML4 writer`_,
261+
plus the following:
262+
263+
_`pepnum`
264+
``parts['pepnum']`` contains
265+
266+
267+
S5/HTML Writer
268+
``````````````
269+
270+
The S5/HTML writer provides the same parts as the `HTML4 writer`_.
271+
272+
273+
HTML5 Writer
274+
````````````
275+
276+
The HTML5 writer provides the same parts as the `HTML4 writer`_.
277+
However, it uses semantic HTML5 elements for the document, header and
278+
footer.
279+
280+
281+
Parts Provided by the LaTeX2e Writer
282+
------------------------------------
283+
284+
See the template files for examples how these parts can be combined
285+
into a valid LaTeX document.
286+
287+
abstract
288+
``parts['abstract']`` contains the formatted content of the
289+
'abstract' docinfo field.
290+
291+
body
292+
``parts['body']`` contains the document's content. In other words, it
293+
contains the entire document, except the document title, subtitle, and
294+
docinfo.
295+
296+
This part can be included into another LaTeX document body using the
297+
``\input{}`` command.
298+
299+
body_pre_docinfo
300+
``parts['body_pre_docinfo]`` contains the ``\maketitle`` command.
301+
302+
dedication
303+
``parts['dedication']`` contains the formatted content of the
304+
'dedication' docinfo field.
305+
306+
docinfo
307+
``parts['docinfo']`` contains the document bibliographic data, the
308+
docinfo field list rendered as a table.
309+
310+
With ``--use-latex-docinfo`` 'author', 'organization', 'contact',
311+
'address' and 'date' info is moved to titledata.
312+
313+
'dedication' and 'abstract' are always moved to separate parts.
314+
315+
fallbacks
316+
``parts['fallbacks']`` contains fallback definitions for
317+
Docutils-specific commands and environments.
318+
319+
head_prefix
320+
``parts['head_prefix']`` contains the declaration of
321+
documentclass and document options.
322+
323+
latex_preamble
324+
``parts['latex_preamble']`` contains the argument of the
325+
``--latex-preamble`` option.
326+
327+
pdfsetup
328+
``parts['pdfsetup']`` contains the PDF properties
329+
("hyperref" package setup).
330+
331+
requirements
332+
``parts['requirements']`` contains required packages and setup
333+
before the stylesheet inclusion.
334+
335+
stylesheet
336+
``parts['stylesheet']`` contains the embedded stylesheet(s) or
337+
stylesheet loading command(s).
338+
339+
subtitle
340+
``parts['subtitle']`` contains the document subtitle text and any
341+
inline markup.
342+
343+
title
344+
``parts['title']`` contains the document title text and any inline
345+
markup.
346+
347+
titledata
348+
``parts['titledata]`` contains the combined title data in
349+
``\title``, ``\author``, and ``\data`` macros.
350+
351+
With ``--use-latex-docinfo``, this includes the 'author',
352+
'organization', 'contact', 'address' and 'date' docinfo items.

0 commit comments

Comments
 (0)