NAME

    Pod::Simple::Words - Parse words and locations from a POD document

VERSION

    version 0.07

SYNOPSIS

     use Pod::Simple::Words;
     
     my $parser = Pod::Simple::Words->new;
     
     $parser->callback(sub {
       my($type, $filename, $line, $input) = @_;
     
       if($type eq 'word')
       {
         # $input is human language word
       }
       elsif($type eq 'stopword')
       {
         # $input is a stopword in tech speak
       }
       elsif($type eq 'module')
       {
         # $input is CPAN moudle (eg FFI::Platypus)
       }
       elsif($type eq 'url_link')
       {
         # $input   is the URL
       }
       elsif($type eq 'pod_link')
       {
         my($podname, $section) = @$input;
         # $podname is the POD document (undef for current)
         # $section is the section      (can be undef)
       }
       elsif($type eq 'man_link')
       {
         my($manname, $section) = @$input;
         # $manname is the MAN document
         # $section is the section      (can be undef)
       }
       elsif($type eq 'section')
       {
         # $input is the name of a documentation section
       }
       elsif($type eq 'error')
       {
         # $input is a POD error
       }
     });
     
     $parser->parse_file('lib/Foo.pm');

DESCRIPTION

    This Pod::Simple parser extracts words from POD, with location
    information. Some other event types are supported for convenience. The
    intention is to feed this into a spell checker. Note:

    stopwords

      This module recognizes inlined stopwords. These are words that
      shouldn't be considered misspelled for the POD document.

    head1 is normalized to lowercase

      Since the convention is to uppercase =head1 elements in POD, and most
      spell checkers consider this a spelling error, we convert =head1
      elements to lower case.

    comments in verbatim blocks

      Comments are extracted from verbatim blocks and their words are
      included, because misspelled words in the synopsis comments can be
      embarrassing!

    unicode

      Should correctly handle unicode, if the =encoding directive is
      correctly set.

CONSTRUCTOR

 new

     my $parser = Pod::Simple::Words->new;

    This creates an instance of the parser.

PROPERTIES

 callback

     $parser->callback(sub {
       my($type, $filename, $line, $input) = @_;
       ...
     });

    This defines the callback when the specific input items are found.
    Types:

    word

      Regular human language word.

    stopword

      Word that should not be considered misspelled. This is often for
      technical jargon which is spelled correctly but not in the regular
      human language dictionary.

    module

      CPAN Perl module. Of the form Foo::Bar. As a special case Foo::Bar's
      is recognized as the possessive of the Foo::Bar module.

    url_link

      A regular internet URL link.

    pod_link

       my($podname, $section) = @$input;

      A link to another POD document. Usually a module or a script. The
      $podname is the name of the pod document to link to. If this is
      undef, it means that the link is to a section inside the current
      document. The $section is the section of the document to link to. The
      $section will be undef if not linking to a specific section.

    man_link

       my($manname, $section) = @$input;

      A link to a UNIX man page. The $manname is the name of the man page.
      The $section is the section of the man page to link to, which will be
      undef if not linking to a specific section.

    section

      A section inside of the current document which can be linked to
      externally or internally. This is usually the title of a header like
      =head1, =head2, etc.

    error

      An error that was detected during parsing. This allows the spell
      checker to check the correctness of the POD at the same time if it so
      chooses.

    Additional arbitrary types can be added to the splitter class in
    addition to these.

 splitter

     $parser->splitter($splitter);

    The $splitter is an instance of Text::HumanComputerWords, or something
    that implements a split method exactly like it does. It is used to
    split text into human and computer words. The default is reasonable for
    Perl.

METHODS

 skip_sections

     $parser->skip_sections(@sections);

    Skip the given =head1 level sections. Note that words from the section
    header itself will be included, but the content of the section will
    not. This is useful for skipping CONTRIBUTOR or similar sections which
    are usually mostly names and shouldn't be spell checked against a human
    language dictionary.

SEE ALSO

    Pod::Spell

      and other modules do similar parsing of POD for potentially
      misspelled words. At least internally. The usually explicitly exclude
      comments from verbatim blocks, and often split words on the wrong
      boundaries.

AUTHOR

    Graham Ollis <plicease@cpan.org>

COPYRIGHT AND LICENSE

    This software is copyright (c) 2021 by Graham Ollis.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.