Skip to content

Add context to return from pandas.io.html.read_html #4469

@cancan101

Description

@cancan101

Currently pandas.io.html.read_html returns a list of DataFrames. This offers no context as to where in the source HTML the table was found. For example, a user might be interested in the title or caption of the table.

For example in an SEC 10-Q filing (see for example: http://apps.shareholder.com/sec/viewerContent.aspx?companyid=GMCR&docid=9277772#A13-6685_110Q_HTM_UNAUDITEDCONSOLIDATEDSTATEMENTSOF_223103), there are many tables. The user might be interested in one or more of them. They are not returned in a "standard" order such that without the title provided in the HTML document, finding the desired table in the return list is difficult.

My suggestion is to offer some way of linking the returned table to the context in which it was found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignEnhancementIO DataIO issues that don't fit into a more specific labelIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions