Skip to content

parse dtd/entity #12

@daviehh

Description

@daviehh

Not sure if this is within the scope of this package, but currently it seems the DTD may not be correctly parsed, such as entity tags. For example, with this file as test.xml

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<footer>&writer;&nbsp;&copyright;</footer>
</note>

using EzXML.jl or in browser, the footer part is parsed as "Writer: Donald Duck. Copyright: W3Schools."

using EzXML
doc = readxml("test.xml")
doc.root |> eachelement |> collect |> last |> nodecontent |> println
doc.node.owner = TextNode("") # skip gc

but with XML.jl, they are verbatim strings &writer;&nbsp;&copyright;

using XML
doc2 = read("test.xml", Node)
doc2[end][end][1] |> x -> x.value |> println

in addition, glancing over doc2 it appears the DTD part may not be correctly parsed, e.g. doc2[2] is

Node DTD <!DOCTYPE note [
<!ENTITY nbsp "&#xA0;">

i.e. it matches the next ">" instead of the closing ">" for "<!DOCTYPE"

j = findnext(==(UInt8('>')), data, i)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions