NAME Hustle::Table - Pattern dispatch to code with optional optimisation SYNOPSIS use Hustle::Table; #1. Create a new object my $table=Hustle::Table->new; #2. Add entries which contain: # matcher: required. The matcher (ie regex, string, number) to test with # sub: required. The sub to call when matcher is 'true' when testing input # label: optional. For identification # count: optional. Used as priority when adding (larger number=> higher priority) # Can either be a array ref or hash ref $table->add( #entry as a hash ref { matcher => qr/regex (match)/, sub => sub{ #can access regex capture }}, #entry as array ref, with label, priority values and context set [qr/another/, sub{}, "my label", 10,{}], #undef matcher is default match all. Must be array ref format [undef,sub {},"default",undef,undef] ); #3. Prepare a dispatcher: # A dispatcher is sub reference, which is called directly with input to match my $dispatch = $table->prepare_dispatcher( type => "online", #either "online" (default="online") or "offline" cache => {}, #the hash to use as a cache reorder => 1, #reorder (default=1) table before building dispatcher reset => 0 #Reset (default=undef) entry counts ); #4. Dispatch input $dispatch->("thing to match", "optional", "arguments"); #The dispatched sub is called with the match entry and then all argument from the second onwards #5. Offline optimisation is also possible DESCRIPTION This module provides a small class to constructs a dispatch table and builds a dispatcher for it. All interactions are done via the object/class methods so no exports are defined. It supports straightforward optimisation of the dispatch table and an optional cache to squeeze even more performance out. Notable features include: * Captures in regex are available in dispatched subs * Arguments supplied to dispatcher (except first) are available to executed vector sub * Cached pre-matching (optional) * Basic hit count and optimising (optional) * Fall through/catch all matching The dispatch table is essentially a list of at least one entry which maps a matcher to a subroutine which is called on a successful match of the input. Conceptually the list is looped over, applying the matchers sequentially to the input until a match is found. In practice it isn't a loop, but generated code reference with potentially optimised ordering of the entries. In the case of no successful match, a default catch all dispatch vector is called. Matching performance is optionally boosted by using a hash as a cache. Hash lookup is much quicker than repeated conditional testing. Controlling which inputs are removed from the cached is dictated by the return value of the dispatch vector/sub VERSION DIFFERENCES Version 0.4.0 and later "given"/"when" and thus the use of smart matching is removed. This gives a nice performance improvement of up to 20% over previous versions, however does mean the features of smart matching (i.e. allowing string equality test ) is no longer supported. To match an exact string you will need to make a regex matcher like "^exact$". Version 0.3.0 and later The input string (the first arg to the dispatcher) is removed and replaced with the matching table entry when calling a dispatched sub. In Previous versions, all arguments would be passed to the dispatched sub unchanged. If you need the input string that was being tested, pass it a second time to the dispatcher: #From v0.3.0 Onwards $dispatcher->($input, $input, $another, $arg. ..._) CREATING A TABLE Simply calling the class constructor returns a new Table. There are no arguments required for the constructor: my $table=Hustle::Table->new; In this case, a default catch all entry (an empty sub) is added automatically. Alternatively, a "sub" and "ctx" argument can be provided: my $table=Hustle::Table->new(sub{}, $ctx); This will overwrite the default entry in the table The table is a blessed array reference and can be manipulated as such. It's not recommended to access the table directly other than dumping the contents to storage. ENTRIES MANAGEMENT STRUCTURE An entry contains the following fields matcher "matcher" can be anything that smart-matching can handle. However the focus on this module is on strings and regular expressions "match exaclty this stirng" qr|match and capture (this|that)| When "matcher" is a regex, any capturing is accessible in the target "sub" via the dynamically scoped numbered capture group variables (i.e. $1 et al.) sub The "sub" has access to any capture groups used in the matcher (if applicable). It is also passed the arguments supplied to call the dispatcher, except for the first. The first is replaced with the matching entry in the table, to allow tracing. $dispatcher->("my input","optional", "arguments", "to", "dispatched", "sub"); The return value of the sub indicates if the input matched is to be removed from the cache. label "label" is a user defined item to allow identification of the entry. This is useful for saving/loading from configuration files etc. count "count" This is a dual purpose attributes. When adding entries to the list, it is used as a priority. Higher numeric values are a higher priority. The list is sorted in descending order of priority, meaning the highest priority is the first element in the list. During running of the dispatch, this is a tally recording how many times,the entry has been matched. This information can be then used as a priority later to reorder the list. ctx "ctx" Represents a user context which is applicable to the matcher. It would usually be a weak reference but can be any scalar. The dispatched subroutine has access to this value via the matcher entry, which is the first argument to the called subroutine ADDING Entries are added in anonymous hash, anonymous array or flattened format, using the "add" method. Array entries must contain four elements, in the order of: $table->add([$matcher, $sub, $label, $count, $ctx]); Hashes ref format only need to specify the matcher and sub pairs $table->add({matcher=>$matcher, sub=>$sub, label=>$label, count=>$count, ctx=>$ctx}); Single flattened format takes a list directly. It must contain 4 elements $table->add(matcher=>$matcher, sub => $sub); Single simple format takes two elements $table->add(qr{some matcher}=>sub { say "run me" }) Or add multiple at once using mixed formats together $table->add( [$matcher, $sub, $label, $count, $ctx], {matcher=> $matcher, sub=>$sub}, matcher=>$matcher, sub=>$sub ); In any case,$matcher and $sub are the only items which must be defined. $sub must also be a CODE reference. If a label is not specified, one will be generated automatically. All the labels used in a call to "add" are returned in the same order as the input arguments. Auto generated labels are only unique in the lifetime of a single table. If permanent/globally unique labels are need, the user will need to generate them. REMOVING Removal of entries is via the "remove" method. It takes a list of labels to remove $table->remove("label1", "funky-name"); It returns a list of all entries removed. DUMPING The table contents be accessed can like a normal array reference $table->@*; @$table; It is not recommended to manipulate the entires directly THE DEFAULT MATCHER Each list has a default matcher that will unconditionally match the input. This entry is specified by using "undef" as the matcher when adding an entry. When set this way only the array format can be used. To make it more explicit, the it can also be changed via the "set_default" method. The default subroutine of the 'default' entry does nothing. PREPARING A DISPATCHER Once all the entries required are added to the table, the dispatcher can be constructed by calling "prepare_dispatcher": my $dispatcher=$table->prepare_dispatcher(%args); Arguments to this method include: type The type of dispatcher. Either "online" or "offline". If not specified, or an invalid value is supplied, an "online" dispatcher is created online Dispatcher which calls the dispatch vector, increases count statistics. This is what you normally would want to use in your code. Online mode. offline Dispatcher which only updates the count statistics. DOES NOT call dispatch vector. Useful for running a input data set through to obtain priority levels. Offline training mode. cache The hash ref to use as the dispatchers cache reorder Flag indicating the table should be reordered/optimised before building the dispatcher. reset Flag specifying if counter statistics should be reset before building a dispatcher If no arguments are provided, the dispatcher will be created with the following defaults: my $dispatcher=$table->prepare_dispatcher(type=>"online", cache=>undef, reset=>undef, reorder=>1); USING A DISPATCHER The dispatcher is simply a sub. Any arguments passed to it are passed to the dispatch vector as is. $dispatcher->("input", "argument1",2); PERFORMANCE CACHING AND CACHE CONTROL The return code of the dispatched sub is used to control if a successful match should be removed from the cache, or not be removed; Any 'true' value returned will means the input, as a key to the cache, is to be removed from the cache. Any 'false' value indicates the input is to remain in the cache or be added if it doesn't exist. {matcher => qr/r(.)gexp/, sub => sub { return }}; #Returns undef so not removed from cache. {matcher => qr/re(.)gxp/, sub => sub { return 1}}; #Returns 'true' so removed from cache {matcher => qr/reg(.)xp/, sub => sub { say $1;}}; #Returns true. Last statement is say and returns 1 OPTIMISATION The concept is to perform less searching to find the dispatch vector. However the type of input does play an important role in determining the best way perform the search. For example a uniformly distributed input will not gain benefits from reordering the entries in the list. On the other hand when the distributing becomes 'centred', the reordering of the table can greatly improve the search time. On my laptop, the simple benchmark script, (with 6 dispatch entries) (regex and string) shows around 1.4M dispatches/s for non cached and over 2M dispatches/s for cached dispatcher. Your results may vary. COMPARISON TO OTHER MODULES There a couple of other dispatching modules on CPAN. Notably Smart::Dispatch has a nice syntax, probably more flexibility and the ability to return values instead of just executing a sub. This module is smaller/(much)faster in my basic tests and also supports direct access to the regex capture groups and dispatch arguments. BUGS Probably. Please report via github TODO * Write tests for removing entries * Write tests for offline dispatcher * Override array STORE and FETCH interface to ensure default is preserved * Add a more concrete performance discussion * Document how to run offline dispatcher for optimising AUTHOR Ruben Westerberg, COPYRIGHT AND LICENSE Copyright (C) 2021 by Ruben Westerberg Licensed under MIT and GNU DISCLAIMER OF WARRANTIES THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.