NAME DataFlow::Proc::HTMLFilter - A HTML filtering processor VERSION version 1.112100 SYNOPSIS use DataFlow::Proc::HTMLFilter; my $filter_html = DataFlow::Proc::HTMLFilter->new( search_xpath => '//td', result_type => 'HTML', ); my $filter_value = DataFlow::Proc::HTMLFilter->new( search_xpath => '//td', result_type => 'VALUE', ); my $input = <
Line 1L1, Column 2
Line 2L2, Column 2
EOM $filter_html->process( $input ); # @result == 'Line 1', ... 'L2, Column 2' $filter_value->process( $input ); # @result == q{Line 1}, ... q{L2, Column 2} DESCRIPTION This processor type provides a filter for HTML content. Each item will be considered as a HTML content and will be filtered using HTML::TreeBuilder::XPath. ATTRIBUTES search_xpath This attribute is a XPath string used to filter down the HTML content. The "search_xpath" attribute is mandatory. result_type This attribute is a string, but its value must be one of: "HTML", "VALUE", "NODE". The default is "HTML". * HTML The result will be the HTML content specified by "search_xpath". * VALUE The result will be the literal value enclosed by the tag and/or attribute specified by "search_xpath". * NODE The result will be a list of HTML::Element objects, as returned by the "findnodes" method of HTML::TreeBuilder::XPath class. Most people will probably use "HTML" or "VALUE", but this option is also provided in case someone wants to manipulate the HTML elements directly. ref_result This attribute is a boolean, and it signals whether the result list should be added as a list of items to the output queue, or as a reference to an array of items. The default is 0 (false). There is a semantic subtlety here: if "ref_result" is 1 (true), then one HTML item (input) may generate one or zero ArrayRef item (output), i.e. it is a one-to-one mapping. On the other hand, by keeping "ref_result" as 0 (false), one HTML item may produce any number of items as result, i.e. it is a one-to-many mapping. 