#!/usr/bin/env perl
use v5.38;
use HTML::TreeBuilder;
my $indent = 3;
my $content = do {local $/; };
my $tree = HTML::TreeBuilder->new();
$tree->parse_content($content);
visit($tree);
sub visit($x) {
my $depth = $x->depth;
my $in = ' ' x ($indent * $depth);
foreach my $e ($x->content_list) {
# element
if (ref ($e)) {
say $in . $e->starttag;
visit($e);
say $in . $e->endtag;
}
# text
else {
say $in . $e;
}
}
}
__DATA__
5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA
(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)
5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA
Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway
(*included on
Before The Dead
;
birthday doodle for Barbara by Jerry
;
the master tape
)
My problem is that each
is output as:
Both
and cause new lines to be rendered. I was surprised that endtag generated anything at all in the case of tag br (and img).
I avoided using HTML::Tree::traverse because the doc discourages its use:
[I]f you want to recursively visit every node in the tree, it's almost
always simpler to write a subroutine does just that, than it is to
bundle up the pre- and/or post-order code in callbacks for the
traverse method.
There are no examples given, so the above is what I cooked up.
Am I using starttag and endtag correctly? Should I detect when I'm displaying a tag that doesn't take an end tag and avoid calling endtag? What's the right/best/simplest way to traverse an HTML tree and prettify it?
Update:
As suggested by Stephen Ullrich, I tried to use as_HTML() for formatting:
#!/usr/bin/env perl
use v5.38;
use HTML::TreeBuilder;
say "\%HTML::Element::optionalEndTag= ",
join ', ', keys %HTML::Element::optionalEndTag;
my $content = do {local $/; };
my $tree = HTML::TreeBuilder->new();
$tree->parse_content($content);
# don't encode any entities; indent with three spaces;
say $tree->as_HTML('', ' ');
__DATA__
5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA
(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)
5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA
Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway
(*included on
Before The Dead
;
birthday doodle for Barbara by Jerry
;
the master tape
)
Output:
%HTML::Element::optionalEndTag= dt, dd, li, p
5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA
(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)
5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA
Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway
(*included on Before The Dead ; birthday doodle for Barbara by Jerry ; the master tape )
Unfortunately, this isn't "pretty" enough. I don't understand why the indenting leaves off after the first couple of levels. However, I do note that it doesn't generate or , despite the fact that neither of these tags is mentioned in %HTML::Element::optionalEndTag!
0 comments:
Post a Comment
Thanks