Description
When investigating libwww-perl/WWW-Mechanize#125 I noticed that the following HTML parses weirdly.
<input type="hidden" name="foo" value>
According to the HTML spec on an input element a value attribute that's not followed by an equals =
should be empty, so we should be parsing it to an empty string.
Empty attribute syntax
Just the attribute name. The value is implicitly the empty string.
Instead of making it empty, we set it to "value".
I've looked into it, and got as far as that get_tag returns a data structure that contains the wrong value:
\ [
[0] "input",
[1] {
/ "/",
name "foo",
type "hidden",
value "value"
},
[2] [
[0] "type",
[1] "name",
[2] "value",
[3] "/"
],
[3] "<input type="hidden" name="foo" value />"
]
Unfortunately I am out of my depths with the actual C code for the parser. But I think, we should be returning an empty string for the value attribute, as well as all other empty attributes.
I wrote the following test to demonstrates the problem.
use strict;
use warnings;
use HTML::TokeParser ();
use Test::More;
use Data::Dumper;
ok(
!get_tag(q{})->{value},
'No value when there was no value'
); # key does not exist
{
# this fails because value is 'value'
my $t = get_tag(q{value});
ok(
!$t->{value},
'No value when value attr has no value'
) or diag Dumper $t;
}
ok(
!get_tag(q{value=""})->{value},
'No value when value attr is an empty string'
); # key is an empty string
is(
get_tag(q{value="bar"})->{value},
'bar',
'Value is bar'
); # this obviously works
sub get_tag {
my $attr = shift;
return HTML::TokeParser->new(\qq{<input type="hidden" name="foo" $attr />})->get_tag->[1];
}
done_testing;