Text-Balanced-2.06/0000755000104006017510000000000014247103162014766 5ustar AdministratorssteveText-Balanced-2.06/Artistic0000644000104006017510000001426114247103162016477 0ustar Administratorssteve The "Artistic License" Preamble The intent of this document is to state the conditions under which a Package may be copied, such that the Copyright Holder maintains some semblance of artistic control over the development of the package, while giving the users of the package the right to use and distribute the Package in a more-or-less customary fashion, plus the right to make reasonable modifications. Definitions: "Package" refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files created through textual modification. "Standard Version" refers to such a Package if it has not been modified, or has been modified in accordance with the wishes of the Copyright Holder as specified below. "Copyright Holder" is whoever is named in the copyright or copyrights for the package. "You" is you, if you're thinking about copying or distributing this Package. "Reasonable copying fee" is whatever you can justify on the basis of media cost, duplication charges, time of people involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at large as a market that must bear the fee.) "Freely Available" means that no fee is charged for the item itself, though there may be fees involved in handling the item. It also means that recipients of the item may redistribute it under the same conditions they received it. 1. You may make and give away verbatim copies of the source form of the Standard Version of this Package without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright Holder. A Package modified in such a way shall still be considered the Standard Version. 3. You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each changed file stating how and when you changed that file, and provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as uunet.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. c) rename any non-standard executables so the names do not conflict with standard executables, which must also be provided, and provide a separate manual page for each non-standard executable that clearly documents how it differs from the Standard Version. d) make other distribution arrangements with the Copyright Holder. 4. You may distribute the programs of this Package in object code or executable form, provided that you do at least ONE of the following: a) distribute a Standard Version of the executables and library files, together with instructions (in the manual page or equivalent) on where to get the Standard Version. b) accompany the distribution with the machine-readable source of the Package with your modifications. c) give non-standard executables non-standard names, and clearly document the differences in manual pages (or equivalent), together with instructions on where to get the Standard Version. d) make other distribution arrangements with the Copyright Holder. 5. You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution provided that you do not advertise this Package as a product of your own. You may embed this Package's interpreter within an executable of yours (by linking); this shall be construed as a mere form of aggregation, provided that the complete Standard Version of the interpreter is so embedded. 6. The scripts and library files supplied as input to or produced as output from the programs of this Package do not automatically fall under the copyright of this Package, but belong to whoever generated them, and may be sold commercially, and may be aggregated with this Package. If such scripts or library files are aggregated with this Package via the so-called "undump" or "unexec" methods of producing a binary executable image, then distribution of such an image shall neither be construed as a distribution of this Package nor shall it fall under the restrictions of Paragraphs 3 and 4, provided that you do not represent such an executable image as a Standard Version of this Package. 7. C subroutines (or comparably compiled subroutines in other languages) supplied by you and linked into this Package in order to emulate subroutines and variables of the language defined by this Package shall not be considered part of this Package, but are the equivalent of input as in Paragraph 6, provided these subroutines do not change the language in any way that would cause it to fail the regression tests for the language. 8. Aggregation of this Package with a commercial distribution is always permitted provided that the use of this Package is embedded; that is, when no overt attempt is made to make this Package's interfaces visible to the end user of the commercial distribution. Such use shall not be construed as a distribution of this Package. 9. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. 10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The End Text-Balanced-2.06/Changes0000644000104006017510000003134014247103162016262 0ustar AdministratorssteveRevision history for Perl distribution Text-Balanced. 2.06 2022-06-05 - Released with no further code changes. 2.05_01 2022-05-29 - Fix direct use of _match_codeblock by e.g. Switch. [mohawk2, CPAN RT#142923] - Fix resetting of whether "/" or "?" allowed to open regex. [mohawk2, CPAN RT#142922] - Fix false negative on /.../ regex after "and". [mohawk2, GH#7] 2.05 2022-05-22 - Released with no further code changes. 2.04_02 2022-03-09 - Fix missed case of spotting expression ending ")" or "]". [mohawk2] - Fix too-loosely allowing ?...? as RE (often actually conditional op). [mohawk2] 2.04_01 2022-03-05 - Performance optimizations. [mohawk2, PR#5] - Fix "<<=" being seen as heredoc, misparsing of "y=>". [mohawk2, PR#6] - Update documentation to clarify extract_tagged() takes regexes. [Jay Hannah, GH#3] - Modernize tests. [mohawk2, GH#2] - Fix extract_variable() not recognising ${var} end of string. [Ed J, CPAN RT#70007] - Fix string-comparing $@ causing exception. [Ed J, CPAN RT#74994) - Update documentation to correct CSV example. [djerius@cpan.org, CPAN RT#140408] - Fix extract_codeblock() being confused by //. [Ed J, CPAN RT#78313] - Improve here-doc detection. [Ed J, CPAN RT#74714] - Fix extract_multiple() to track whether to allow /.../ as quotelike. [Ed J, CPAN RT#5722] 2.04 2020-12-11 - Fixed INSTALLDIRS to account for the @INC reordering change in Perl 5.12. See Perl RT#116479 for details. (Text-Balanced entered the perl core in Perl 5.7.1 so that's what the lower bound of the check should strictly be, but since we only support Perl 5.8.1 and higher anyway we do not need to check the lower bound. The upper bound is correctly Perl 5.11.0 since the @INC reordering change in question (Perl core commit #b9ba2fadb1) first appeared in Perl 5.11.0.) - Removed superfluous loading of SelfLoader. [, CPAN RT#85572] - Minor documentation patch. [, CPAN RT#52623] - Typo fixes. [, CPAN RT#85373] - Added optional Changes testing (skipped unless AUTHOR_TESTING). - Reformatted Changes file as per CPAN::Changes::Spec. - Added optional POD coverage testing (skipped unless AUTHOR_TESTING). - Added optional Perl::Critic testing (skipped unless AUTHOR_TESTING). - Made code Perl::Critic clean. - Included GitHub repository URLs in metadata now that source code has been uploaded to GitHub. - Included META.json file in addition to META.yml. - Set minimum required ExtUtils::MakeMaker version to 6.64 to ensure that all parameters used are supported, to save jumping through hoops to support earlier versions. (This should not be a problem since ExtUtils::MakeMaker 6.64 is easily installed into Perl 5.8.1 and above, that being the whole point of the new choice of minimum supported Perl version.) - Set minimum required Perl version to 5.8.1. This is in line with the minimum requirement of the "Perl Toolchain". 2.03 2015-03-04 - Removed test boilerplate code to synchronize with bleadperl. 2.02 2009-07-29 - Fixed the mixed "Damian Conway " AUTHOR setting. For the record, I am NOT the author, I'm just the maintainer. Unfortunately, Makefile.PL does not have a MAINTAINER setting and this way all the emails about this module come to me. 2.01 2009-07-28 - Taken over by Adam Kennedy to move it to a long-term maintenance mode. - Removing use warnings to restore 5.005 compatibility. - Removing Module::Build for higher back-compatibility. - Removing version.pm for higher back-compatibility. - use Exporter -> use Exporter () to avoid some pathalogical cases. - Upgraded the bundled author tests to be friendlier. - Changes order now a more normal reverse chronological. 2.0.0 2006-12-20 10:50:24 - Added patches from bleadperl version (thanks Rafael!). - Fixed bug in second bracketed delimiters (thanks David). 1.99.1 2006-11-16 09:29:14 - Included dependency on version.pm (thanks Andy). 1.99.0 2006-11-16 07:32:06 - Removed reliance on expensive $& variable (thanks John). - Made Makefile.PL play nice with core versions (thanks Schwern!). 1.98 2006-05-05 14:58:49 - Reinstated full test suite (thanks Steve!). 1.97 2006-05-01 21:58:04 - Removed three-part version number and dependency on version.pm. 1.96.0 2006-05-01 21:52:37 - Fixed major bug in extract_multiple handling of unknowns. - Fixed return value on failure (thanks Eric). - Fixed bug differentiating heredocs and left-shift operators (thanks Anthony). 1.95 2003-04-28 22:04 - Constrainted _match_quote to only match at word boundaries (so "exemplum(hic)" doesn't match "m(hic)") (thanks Craig). 1.94 2003-04-13 02:18:41 - Rereleased in attempt to fix CPAN problems. 1.91 2003-03-28 23:19:17 - Fixed error count on t/extract_variable.t. - Fixed bug in extract_codelike when non-standard delimiters used. 1.90 2003-03-25 11:14:38 - Fixed subtle bug in gen_extract_tagged (thanks Martin). - Doc fix: removed suggestion that extract_tagged defaults to matching HTML tags. - Doc fix: clarified general matching behaviour. - Fixed bug in parsing /.../ after a (. - Doc fix: documented extract_variable. - Fixed extract_variable handling of $h{qr}, $h{tr}, etc. (thanks, Briac). - Fixed incorrect handling of $::var (thanks Tim). 1.89 2001-11-18 22:49:50 - Fixed extvar.t tests. 1.87 2001-11-15 21:25:35 - Made extract_multiple aware of skipped prefixes returned by subroutine extractors (such as extract_quotelike, etc.). - Made extract_variable aware of punctuation variables. - Corified tests. 1.86 2001-09-03 06:57:08 - Revised licence for inclusion in core distribution. - Consolidated POD in .pm file. - Renamed tests to let DOS cope with them. 1.85 2001-06-03 07:47:18 - Fixed bug in extract_variable recognizing method calls that start with an underscore (thanks Jeff). 1.84 2001-04-26 11:58:13 - Fixed bug in certain extractions not matching strings with embedded newlines (thanks Robin). 1.83 2001-01-15 12:43:12 - Fixed numerous bugs in here doc extraction (many thanks Tim). 1.82 2001-01-14 14 16:56:04 - Fixed nit in extract_variable.t (tested more cases than it promised to). - Fixed bug extracting prefix in extract_quotelike (Thanks Michael). - Added handling of Perl 4 package qualifier: $Package'var, etc. - Added handling of here docs (see documentation for limitations). - Added reporting of failure position via $@->{pos} (see documentation). 1.81 2000-09-13 11:58:49 - Fixed test count in extract_codeblock.t. - Fixed improbable bug with trailing ->'s in extract_variable. - Fixed (HT|X)ML tag extraction in extract_tagged (thanks, Tim). - Added explanatory note about prefix matching (thanks again, Tim). - Added handling of globs and sub refs to extract_variable. - Pod tweak (thanks Abigail). - Allowed right tags to be run-time evaluated, so extract_tagged($text, '/([a-z]+)', '/end$1') works as expected. - Added optional blessing of matches via extract_multiple - Fixed bug in autogeneration of closing tags in extract_tagged (Thanks, Coke). - Fixed bug in interaction between extract_multiple and gen_extract_tagged (Thanks Anthony). 1.77 1999-11-22 06:08:23 - Fixed major bug in extract_codeblock (would not terminate if there was trailing whitespace). - Improved /.../ pattern parsing within codeblocks. 1.76 1999-11-19 06:51:54 - IMPORTANT: Now requires 5.005 or better. - IMPORTANT: Made extract methods sensitive to the pos() value of the text they are parsing. In other words, all extract subroutines now act like patterns of the form /\G.../gc. See documentation for details. - IMPORTANT: Changed semantics of extract_multiple, in line with the above change, and to simplify the semantics to something vaguely predictable. See documentation for details. - Added ability to use qr/../'s and raw strings as extractors in extract_multiple. See documentation. - Added fourth argument to extract_codeblock to allow outermost brackets to be separately specified. See documentation for details. - Reimplemented internals of all extraction subroutines for significantx speed-ups (between 100% and 2000% improvement). - Fixed nasty bug in extract_variable and extract_codeblock (they were returning prefix as well in scalar context). - Allowed read-only strings to be used as arguments in scalar contexts. - Renamed delimited_pat to gen-delimited pat (in line with gen_extract_tagged). Old name still works, but is now deprecated. - Tweaked all extraction subs so they correctly handle zero-length prefixx matches after another zero-length match. 1.66 1999-07-02 13:29:22 - Added ability to use quotelike operators in extract_bracketed. - Fixed bug under 5.003 ('foreach my $func' not understood). - Added escape specification as fourth arg to &extract_delimited. - Fixed handling of &delimited_pat and &extract_delimited when delimiter is same as escape. - Fixed handling of ->, =>, and >> in &extract_code when delimiters are "<>". 1.52 1999-03-04 12:43:38 - Added CSV parsing example to documentation of extract_multiple. - Fixed a bug with extract_codeblock in "RecDescent" mode (it would accept "subrule(s?)" and "subrule(?)", but not "subrule(s)"). Thanks, Jan. 1.51 1999-02-13 10:31:55 - Fixed bugs in prefix matching in extract_variable: * Incorrectly used default if '' specified. * Now handles $#array correctly. - Fixed bugs in extract_codeblock: * Now handles !~ properly. * Now handles embedded comments better. * Now handles "raw" pattern matches better. - Added support for single strings or qr's as 'reject' and 'ignore' args to extract_tagged(). - Added gen_extract_tagged() to "precompile" a specific tag extractor for repeated use (approximately 3 times faster!). 1.50 1998-08-27 09:20:19 - Improved the structure of the regex generated by delimited_pat (and used in extract_delimited). It's considerably more complex, but also more robust and much faster in the worst case. - Altered extract_variable to accept whitespace in variables, e.g. '$ a -> {'b'} -> [2]'. 1.41 1998-08-10 14:51:50 - Reinstated change to extract_codeblock from 1.36 which were mysteriously lost in 1.40. 1.40 1998-08-04 13:54:52 - Added (optional) handling of embedded quoted text to extract_delimited (see revised entry in Balanced.pod). - Added extract_tagged which extracts text between arbitrary, optionally nested start and end tags (see new entry in Balanced.pod). - Added delimited_pat which builds a pattern which matches a string delimited by any of the delimiters specified (see new entry in Balanced.pod). - Added test.pl. 1.36 1998-07-14 12:26:04 - Reinstated POD file missing from previous distribution. - Added undocumented fourth parameter to extract_codeblock so as to correctly handle (?) and (s?) modifiers in RecDescent grammars. 1.35 1998-06-24 09:53:31 - Fixed handling of :: quantifiers in extract_variable(). - Numerous trivial lexical changes to make xemacs happy. 1.24 1997-12-12 - Changed behaviour in scalar contexts. Scalar contexts now return the extracted string _and_ remove it from the first argument (or $_). - Changed return values on failure (all contexts return undef for invalid return fields) - Fixed some lurking bugs with trailing modifier handling. - Added :ALL tag to simplify wholesale importing of functions. - Fixed serious bug with embedded division operators ("/"). This now also allows the ?...? form of pattern matching! 1.23 1997-10-17 10:26:38-05:00 - Changed behaviour in scalar and void contexts. Scalar contexts now return only the extracted string. Void contexts now remove the extracted string from the first argument (or $_). 1.21 1997-10-04 17:21:54-05:00 - Synchronised with Parse::RecDescent distribution (version number will now reflect that package). 1.10 1997-09-30 17:23:23-05:00 - Reworked extract_quotelike to correct handling of some obscure cases. 1.01 1997-09-08 18:09:18-05:00 - Changed "quotemeta" to "quotemeta $_" to work around bug in Perl 5.002 and 5.003. 1.00 1997-08-11 12:42:56 - Original version. Text-Balanced-2.06/Copying0000644000104006017510000003053014247103162016322 0ustar Administratorssteve GNU GENERAL PUBLIC LICENSE Version 1, February 1989 Copyright (C) 1989 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: a) cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and b) cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. d) You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, b) accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, c) accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19xx name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice That's all there is to it! Text-Balanced-2.06/INSTALL0000644000104006017510000000124114247103162016015 0ustar AdministratorsstevePREREQUISITES Perl Perl version 5.8.1 or later. The latest version of Perl is available from https://www.perl.com/. Perl Modules There are no non-standard Perl modules required by this module. INSTALLATION To install this module, cd to the directory that contains this INSTALL file and type the following: perl Makefile.PL make make test make install Use the appropriate program name instead of "make" in the above commands if your perl was built with a different make program. To determine which make program was used to build your perl type the following: perl -V:make Text-Balanced-2.06/lib/0000755000104006017510000000000014247103156015537 5ustar AdministratorssteveText-Balanced-2.06/lib/Text/0000755000104006017510000000000014247103162016460 5ustar AdministratorssteveText-Balanced-2.06/lib/Text/Balanced.pm0000644000104006017510000023561614247103162020524 0ustar Administratorssteve# Copyright (C) 1997-2001 Damian Conway. All rights reserved. # Copyright (C) 2009 Adam Kennedy. # Copyright (C) 2015, 2022 Steve Hay and other contributors. All rights # reserved. # This module is free software; you can redistribute it and/or modify it under # the same terms as Perl itself, i.e. under the terms of either the GNU General # Public License or the Artistic License, as specified in the F file. package Text::Balanced; # EXTRACT VARIOUSLY DELIMITED TEXT SEQUENCES FROM STRINGS. # FOR FULL DOCUMENTATION SEE Balanced.pod use 5.008001; use strict; use Exporter (); use vars qw { $VERSION @ISA %EXPORT_TAGS }; BEGIN { $VERSION = '2.06'; @ISA = 'Exporter'; %EXPORT_TAGS = ( ALL => [ qw{ &extract_delimited &extract_bracketed &extract_quotelike &extract_codeblock &extract_variable &extract_tagged &extract_multiple &gen_delimited_pat &gen_extract_tagged &delimited_pat } ], ); } Exporter::export_ok_tags('ALL'); our $RE_PREREGEX_PAT = qr#( [!=]~ | split|grep|map | not|and|or|xor )#x; our $RE_EXPR_PAT = qr#( (?:\*\*|&&|\|\||<<|>>|//|[-+*x%^&|.])=? | /(?:[^/]) | =(?!>) | return | [\(\[] )#x; our $RE_NUM = qr/\s*[+\-.0-9][+\-.0-9e]*/i; # numerical constant our %ref2slashvalid; # is quotelike /.../ pattern valid here for given textref? our %ref2qmarkvalid; # is quotelike ?...? pattern valid here for given textref? # HANDLE RETURN VALUES IN VARIOUS CONTEXTS sub _failmsg { my ($message, $pos) = @_; $@ = bless { error => $message, pos => $pos, }, 'Text::Balanced::ErrorMsg'; } sub _fail { my ($wantarray, $textref, $message, $pos) = @_; _failmsg $message, $pos if $message; return (undef, $$textref, undef) if $wantarray; return; } sub _succeed { $@ = undef; my ($wantarray,$textref) = splice @_, 0, 2; my ($extrapos, $extralen) = @_ > 18 ? splice(@_, -2, 2) : (0, 0); my ($startlen, $oppos) = @_[5,6]; my $remainderpos = $_[2]; if ( $wantarray ) { my @res; while (my ($from, $len) = splice @_, 0, 2) { push @res, substr($$textref, $from, $len); } if ( $extralen ) { # CORRECT FILLET my $extra = substr($res[0], $extrapos-$oppos, $extralen, "\n"); $res[1] = "$extra$res[1]"; eval { substr($$textref,$remainderpos,0) = $extra; substr($$textref,$extrapos,$extralen,"\n")} ; #REARRANGE HERE DOC AND FILLET IF POSSIBLE pos($$textref) = $remainderpos-$extralen+1; # RESET \G } else { pos($$textref) = $remainderpos; # RESET \G } return @res; } else { my $match = substr($$textref,$_[0],$_[1]); substr($match,$extrapos-$_[0]-$startlen,$extralen,"") if $extralen; my $extra = $extralen ? substr($$textref, $extrapos, $extralen)."\n" : ""; eval {substr($$textref,$_[4],$_[1]+$_[5])=$extra} ; #CHOP OUT PREFIX & MATCH, IF POSSIBLE pos($$textref) = $_[4]; # RESET \G return $match; } } # BUILD A PATTERN MATCHING A SIMPLE DELIMITED STRING ## no critic (Subroutines::ProhibitSubroutinePrototypes) sub gen_delimited_pat($;$) # ($delimiters;$escapes) { my ($dels, $escs) = @_; return "" unless $dels =~ /\S/; $escs = '\\' unless $escs; $escs .= substr($escs,-1) x (length($dels)-length($escs)); my @pat = (); my $i; for ($i=0; $i\0-\377/[[(({{</; my $posbug = pos; $ldel = join('|', map { quotemeta $_ } split('', $ldel)); $rdel = join('|', map { quotemeta $_ } split('', $rdel)); pos = $posbug; @{ $eb_delim_cache{$ldel_orig} = [ qr/\G($ldel)/, $qdel && qr/\G([$qdel])/, $quotelike, qr/\G($rdel)/ ] }; } sub extract_bracketed (;$$$) { my $textref = defined $_[0] ? \$_[0] : \$_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $ldel = defined $_[1] ? $_[1] : '{([<'; my $pre = defined $_[2] ? qr/\G$_[2]/ : qr/\G\s*/; my $wantarray = wantarray; my @ret = _eb_delims($ldel); unless (@ret) { return _fail $wantarray, $textref, "Did not find a suitable bracket in delimiter: \"$_[1]\"", 0; } my $startpos = pos $$textref || 0; my @match = _match_bracketed($textref, $pre, @ret); return _fail ($wantarray, $textref) unless @match; return _succeed ( $wantarray, $textref, $match[2], $match[5]+2, # MATCH @match[8,9], # REMAINDER @match[0,1], # PREFIX ); } sub _match_bracketed # $textref, $pre, $ldel, $qdel, $quotelike, $rdel { my ($textref, $pre, $ldel, $qdel, $quotelike, $rdel) = @_; my ($startpos, $ldelpos, $endpos) = (pos $$textref = pos $$textref||0); unless ($$textref =~ m/$pre/gc) { _failmsg "Did not find prefix: /$pre/", $startpos; return; } $ldelpos = pos $$textref; unless ($$textref =~ m/$ldel/gc) { _failmsg "Did not find opening bracket after prefix: \"$pre\"", pos $$textref; pos $$textref = $startpos; return; } my @nesting = ( $1 ); my $textlen = length $$textref; while (pos $$textref < $textlen) { next if $$textref =~ m/\G\\./gcs; if ($$textref =~ m/$ldel/gc) { push @nesting, $1; } elsif ($$textref =~ m/$rdel/gc) { my ($found, $brackettype) = ($1, $1); if ($#nesting < 0) { _failmsg "Unmatched closing bracket: \"$found\"", pos $$textref; pos $$textref = $startpos; return; } my $expected = pop(@nesting); $expected =~ tr/({[/; if ($expected ne $brackettype) { _failmsg qq{Mismatched closing bracket: expected "$expected" but found "$found"}, pos $$textref; pos $$textref = $startpos; return; } last if $#nesting < 0; } elsif ($qdel && $$textref =~ m/$qdel/gc) { $$textref =~ m/\G[^\\$1]*(?:\\.[^\\$1]*)*(\Q$1\E)/gsc and next; _failmsg "Unmatched embedded quote ($1)", pos $$textref; pos $$textref = $startpos; return; } elsif ($quotelike && _match_quotelike($textref,qr/\G()/,$ref2slashvalid{$textref},$ref2qmarkvalid{$textref})) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 1; # back-compat next; } else { $$textref =~ m/\G(?:[a-zA-Z0-9]+|.)/gcs } } if ($#nesting>=0) { _failmsg "Unmatched opening bracket(s): " . join("..",@nesting)."..", pos $$textref; pos $$textref = $startpos; return; } $endpos = pos $$textref; return ( $startpos, $ldelpos-$startpos, # PREFIX $ldelpos, 1, # OPENING BRACKET $ldelpos+1, $endpos-$ldelpos-2, # CONTENTS $endpos-1, 1, # CLOSING BRACKET $endpos, length($$textref)-$endpos, # REMAINDER ); } sub _revbracket($) { my $brack = reverse $_[0]; $brack =~ tr/[({/; return $brack; } my $XMLNAME = q{[a-zA-Z_:][a-zA-Z0-9_:.-]*}; my $et_default_ldel = '<\w+(?:' . gen_delimited_pat(q{'"}) . '|[^>])*>'; sub extract_tagged (;$$$$$) # ($text, $opentag, $closetag, $pre, \%options) { my $textref = defined $_[0] ? \$_[0] : \$_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $ldel = $_[1]; my $rdel = $_[2]; my $pre = defined $_[3] ? qr/\G$_[3]/ : qr/\G\s*/; my %options = defined $_[4] ? %{$_[4]} : (); my $omode = defined $options{fail} ? $options{fail} : ''; my $bad = ref($options{reject}) eq 'ARRAY' ? join('|', @{$options{reject}}) : defined($options{reject}) ? $options{reject} : '' ; my $ignore = ref($options{ignore}) eq 'ARRAY' ? join('|', @{$options{ignore}}) : defined($options{ignore}) ? $options{ignore} : '' ; $ldel = $et_default_ldel if !defined $ldel; $@ = undef; my @match = _match_tagged($textref, $pre, $ldel, $rdel, $omode, $bad, $ignore); return _fail(wantarray, $textref) unless @match; return _succeed wantarray, $textref, $match[2], $match[3]+$match[5]+$match[7], # MATCH @match[8..9,0..1,2..7]; # REM, PRE, BITS } sub _match_tagged # ($$$$$$$) { my ($textref, $pre, $ldel, $rdel, $omode, $bad, $ignore) = @_; my $rdelspec; my ($startpos, $opentagpos, $textpos, $parapos, $closetagpos, $endpos) = ( pos($$textref) = pos($$textref)||0 ); unless ($$textref =~ m/$pre/gc) { _failmsg "Did not find prefix: /$pre/", pos $$textref; goto failed; } $opentagpos = pos($$textref); unless ($$textref =~ m/\G$ldel/gc) { _failmsg "Did not find opening tag: /$ldel/", pos $$textref; goto failed; } $textpos = pos($$textref); if (!defined $rdel) { $rdelspec = substr($$textref, $-[0], $+[0] - $-[0]); unless ($rdelspec =~ s/\A([[(<{]+)($XMLNAME).*/ quotemeta "$1\/$2". _revbracket($1) /oes) { _failmsg "Unable to construct closing tag to match: $rdel", pos $$textref; goto failed; } } else { ## no critic (BuiltinFunctions::ProhibitStringyEval) $rdelspec = eval "qq{$rdel}" || do { my $del; for (qw,~ ! ^ & * ) _ + - = } ] : " ; ' > . ? / | ',) { next if $rdel =~ /\Q$_/; $del = $_; last } unless ($del) { use Carp; croak "Can't interpolate right delimiter $rdel" } eval "qq$del$rdel$del"; }; } while (pos($$textref) < length($$textref)) { next if $$textref =~ m/\G\\./gc; if ($$textref =~ m/\G(\n[ \t]*\n)/gc ) { $parapos = pos($$textref) - length($1) unless defined $parapos; } elsif ($$textref =~ m/\G($rdelspec)/gc ) { $closetagpos = pos($$textref)-length($1); goto matched; } elsif ($ignore && $$textref =~ m/\G(?:$ignore)/gc) { next; } elsif ($bad && $$textref =~ m/\G($bad)/gcs) { pos($$textref) -= length($1); # CUT OFF WHATEVER CAUSED THE SHORTNESS goto short if ($omode eq 'PARA' || $omode eq 'MAX'); _failmsg "Found invalid nested tag: $1", pos $$textref; goto failed; } elsif ($$textref =~ m/\G($ldel)/gc) { my $tag = $1; pos($$textref) -= length($tag); # REWIND TO NESTED TAG unless (_match_tagged(@_)) # MATCH NESTED TAG { goto short if $omode eq 'PARA' || $omode eq 'MAX'; _failmsg "Found unbalanced nested tag: $tag", pos $$textref; goto failed; } } else { $$textref =~ m/./gcs } } short: $closetagpos = pos($$textref); goto matched if $omode eq 'MAX'; goto failed unless $omode eq 'PARA'; if (defined $parapos) { pos($$textref) = $parapos } else { $parapos = pos($$textref) } return ( $startpos, $opentagpos-$startpos, # PREFIX $opentagpos, $textpos-$opentagpos, # OPENING TAG $textpos, $parapos-$textpos, # TEXT $parapos, 0, # NO CLOSING TAG $parapos, length($$textref)-$parapos, # REMAINDER ); matched: $endpos = pos($$textref); return ( $startpos, $opentagpos-$startpos, # PREFIX $opentagpos, $textpos-$opentagpos, # OPENING TAG $textpos, $closetagpos-$textpos, # TEXT $closetagpos, $endpos-$closetagpos, # CLOSING TAG $endpos, length($$textref)-$endpos, # REMAINDER ); failed: _failmsg "Did not find closing tag", pos $$textref unless $@; pos($$textref) = $startpos; return; } sub extract_variable (;$$) { my $textref = defined $_[0] ? \$_[0] : \$_; return ("","","") unless defined $$textref; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $pre = defined $_[1] ? qr/\G$_[1]/ : qr/\G\s*/; my @match = _match_variable($textref,$pre); return _fail wantarray, $textref unless @match; return _succeed wantarray, $textref, @match[2..3,4..5,0..1]; # MATCH, REMAINDER, PREFIX } sub _match_variable { # $# # $^ # $$ my ($textref, $pre) = @_; my $startpos = pos($$textref) = pos($$textref)||0; unless ($$textref =~ m/$pre/gc) { _failmsg "Did not find prefix: /$pre/", pos $$textref; return; } my $varpos = pos($$textref); unless ($$textref =~ m{\G\$\s*(?!::)(\d+|[][&`'+*./|,";%=~:?!\@<>()-]|\^[a-z]?)}gci) { unless ($$textref =~ m/\G((\$#?|[*\@\%]|\\&)+)/gc) { _failmsg "Did not find leading dereferencer", pos $$textref; pos $$textref = $startpos; return; } my $deref = $1; unless ($$textref =~ m/\G\s*(?:::|')?(?:[_a-z]\w*(?:::|'))*[_a-z]\w*/gci or _match_codeblock($textref, qr/\G()/, '\{', qr/\G\s*(\})/, '\{', '\}', 0, 1) or $deref eq '$#' or $deref eq '$$' or pos($$textref) == length $$textref ) { _failmsg "Bad identifier after dereferencer", pos $$textref; pos $$textref = $startpos; return; } } while (1) { next if $$textref =~ m/\G\s*(?:->)?\s*[{]\w+[}]/gc; next if _match_codeblock($textref, qr/\G\s*->\s*(?:[_a-zA-Z]\w+\s*)?/, qr/[({[]/, qr/\G\s*([)}\]])/, qr/[({[]/, qr/[)}\]]/, 0, 1); next if _match_codeblock($textref, qr/\G\s*/, qr/[{[]/, qr/\G\s*([}\]])/, qr/[{[]/, qr/[}\]]/, 0, 1); next if _match_variable($textref,qr/\G\s*->\s*/); next if $$textref =~ m/\G\s*->\s*\w+(?![{([])/gc; last; } $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 0; my $endpos = pos($$textref); return ($startpos, $varpos-$startpos, $varpos, $endpos-$varpos, $endpos, length($$textref)-$endpos ); } my %ec_delim_cache; sub _ec_delims { my ($ldel_inner, $ldel_outer) = @_; return @{ $ec_delim_cache{$ldel_outer}{$ldel_inner} } if $ec_delim_cache{$ldel_outer}{$ldel_inner}; my $rdel_inner = $ldel_inner; my $rdel_outer = $ldel_outer; my $posbug = pos; for ($ldel_inner, $ldel_outer) { tr/[]()<>{}\0-\377/[[((<<{{/ds } for ($rdel_inner, $rdel_outer) { tr/[]()<>{}\0-\377/]]))>>}}/ds } for ($ldel_inner, $ldel_outer, $rdel_inner, $rdel_outer) { $_ = '('.join('|',map { quotemeta $_ } split('',$_)).')' } pos = $posbug; @{ $ec_delim_cache{$ldel_outer}{$ldel_inner} = [ $ldel_outer, qr/\G\s*($rdel_outer)/, $ldel_inner, $rdel_inner ] }; } sub extract_codeblock (;$$$$$) { my $textref = defined $_[0] ? \$_[0] : \$_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $wantarray = wantarray; my $ldel_inner = defined $_[1] ? $_[1] : '{'; my $pre = !defined $_[2] ? qr/\G\s*/ : qr/\G$_[2]/; my $ldel_outer = defined $_[3] ? $_[3] : $ldel_inner; my $rd = $_[4]; my @delims = _ec_delims($ldel_inner, $ldel_outer); my @match = _match_codeblock($textref, $pre, @delims, $rd, 1); return _fail($wantarray, $textref) unless @match; return _succeed($wantarray, $textref, @match[2..3,4..5,0..1] # MATCH, REMAINDER, PREFIX ); } sub _match_codeblock { my ($textref, $pre, $ldel_outer, $rdel_outer, $ldel_inner, $rdel_inner, $rd, $no_backcompat) = @_; $rdel_outer = qr/\G\s*($rdel_outer)/ if !$no_backcompat; # Switch calls this func directly my $startpos = pos($$textref) = pos($$textref) || 0; unless ($$textref =~ m/$pre/gc) { _failmsg qq{Did not match prefix /$pre/ at"} . substr($$textref,pos($$textref),20) . q{..."}, pos $$textref; return; } my $codepos = pos($$textref); unless ($$textref =~ m/\G($ldel_outer)/gc) # OUTERMOST DELIMITER { _failmsg qq{Did not find expected opening bracket at "} . substr($$textref,pos($$textref),20) . q{..."}, pos $$textref; pos $$textref = $startpos; return; } my $closing = $1; $closing =~ tr/([<{/)]>}/; my $matched; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref) or !defined $ref2slashvalid{$textref}; # default, or reset while (pos($$textref) < length($$textref)) { if ($rd && $$textref =~ m#\G(\Q(?)\E|\Q(s?)\E|\Q(s)\E)#gc) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 0; next; } if ($$textref =~ m/\G\s*#.*/gc) { next; } if ($$textref =~ m/$rdel_outer/gc) { unless ($matched = ($closing && $1 eq $closing) ) { next if $1 eq '>'; # MIGHT BE A "LESS THAN" _failmsg q{Mismatched closing bracket at "} . substr($$textref,pos($$textref),20) . qq{...". Expected '$closing'}, pos $$textref; } last; } if (_match_variable($textref,qr/\G\s*/) || _match_quotelike($textref,qr/\G\s*/,$ref2slashvalid{$textref},$ref2qmarkvalid{$textref}) ) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 0; next; } if ($$textref =~ m#\G\s*(?!$ldel_inner)(?:$RE_PREREGEX_PAT|$RE_EXPR_PAT)#gc) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 1; next; } if ( _match_codeblock($textref, qr/\G\s*/, $ldel_inner, qr/\G\s*($rdel_inner)/, $ldel_inner, $rdel_inner, $rd, 1) ) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 1; next; } if ($$textref =~ m/\G\s*$ldel_outer/gc) { _failmsg q{Improperly nested codeblock at "} . substr($$textref,pos($$textref),20) . q{..."}, pos $$textref; last; } $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 0; $$textref =~ m/\G\s*(\w+|[-=>]>|.|\Z)/gc; } continue { $@ = undef } unless ($matched) { _failmsg 'No match found for opening bracket', pos $$textref unless $@; return; } $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = undef; my $endpos = pos($$textref); return ( $startpos, $codepos-$startpos, $codepos, $endpos-$codepos, $endpos, length($$textref)-$endpos, ); } my %mods = ( 'none' => '[cgimsox]*', 'm' => '[cgimsox]*', 's' => '[cegimsox]*', 'tr' => '[cds]*', 'y' => '[cds]*', 'qq' => '', 'qx' => '', 'qw' => '', 'qr' => '[imsx]*', 'q' => '', ); sub extract_quotelike (;$$) { my $textref = $_[0] ? \$_[0] : \$_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $wantarray = wantarray; my $pre = defined $_[1] ? qr/\G$_[1]/ : qr/\G\s*/; my @match = _match_quotelike($textref,$pre,$ref2slashvalid{$textref},$ref2qmarkvalid{$textref}); return _fail($wantarray, $textref) unless @match; return _succeed($wantarray, $textref, $match[2], $match[18]-$match[2], # MATCH @match[18,19], # REMAINDER @match[0,1], # PREFIX @match[2..17], # THE BITS @match[20,21], # ANY FILLET? ); }; my %maybe_quote = map +($_=>1), qw(" ' `); sub _match_quotelike { my ($textref, $pre, $allow_slash_match, $allow_qmark_match) = @_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref) or !defined $ref2slashvalid{$textref}; # default, or reset my ($textlen,$startpos, $preld1pos,$ld1pos,$str1pos,$rd1pos, $preld2pos,$ld2pos,$str2pos,$rd2pos, $modpos) = ( length($$textref), pos($$textref) = pos($$textref) || 0 ); unless ($$textref =~ m/$pre/gc) { _failmsg qq{Did not find prefix /$pre/ at "} . substr($$textref, pos($$textref), 20) . q{..."}, pos $$textref; return; } my $oppos = pos($$textref); my $initial = substr($$textref,$oppos,1); if ($initial && $maybe_quote{$initial} || $allow_slash_match && $initial eq '/' || $allow_qmark_match && $initial eq '?') { unless ($$textref =~ m/\G \Q$initial\E [^\\$initial]* (\\.[^\\$initial]*)* \Q$initial\E /gcsx) { _failmsg qq{Did not find closing delimiter to match '$initial' at "} . substr($$textref, $oppos, 20) . q{..."}, pos $$textref; pos $$textref = $startpos; return; } $modpos= pos($$textref); $rd1pos = $modpos-1; if ($initial eq '/' || $initial eq '?') { $$textref =~ m/\G$mods{none}/gc } my $endpos = pos($$textref); $ref2qmarkvalid{$textref} = $ref2slashvalid{$textref} = 0; return ( $startpos, $oppos-$startpos, # PREFIX $oppos, 0, # NO OPERATOR $oppos, 1, # LEFT DEL $oppos+1, $rd1pos-$oppos-1, # STR/PAT $rd1pos, 1, # RIGHT DEL $modpos, 0, # NO 2ND LDEL $modpos, 0, # NO 2ND STR $modpos, 0, # NO 2ND RDEL $modpos, $endpos-$modpos, # MODIFIERS $endpos, $textlen-$endpos, # REMAINDER ); } unless ($$textref =~ m{\G(\b(?:m|s|qq|qx|qw|q|qr|tr|y)\b(?=\s*\S)|<<(?=[a-zA-Z]|\s*['"`;,]))}gc) { _failmsg q{No quotelike operator found after prefix at "} . substr($$textref, pos($$textref), 20) . q{..."}, pos $$textref; pos $$textref = $startpos; return; } my $op = $1; $preld1pos = pos($$textref); if ($op eq '<<') { $ld1pos = pos($$textref); my $label; if ($$textref =~ m{\G([A-Za-z_]\w*)}gc) { $label = $1; } elsif ($$textref =~ m{ \G ' ([^'\\]* (?:\\.[^'\\]*)*) ' | \G " ([^"\\]* (?:\\.[^"\\]*)*) " | \G ` ([^`\\]* (?:\\.[^`\\]*)*) ` }gcsx) { $label = $+; } else { $label = ""; } my $extrapos = pos($$textref); $$textref =~ m{.*\n}gc; $str1pos = pos($$textref)--; unless ($$textref =~ m{.*?\n(?=\Q$label\E\n)}gc) { _failmsg qq{Missing here doc terminator ('$label') after "} . substr($$textref, $startpos, 20) . q{..."}, pos $$textref; pos $$textref = $startpos; return; } $rd1pos = pos($$textref); $$textref =~ m{\Q$label\E\n}gc; $ld2pos = pos($$textref); $ref2qmarkvalid{$textref} = $ref2slashvalid{$textref} = 0; return ( $startpos, $oppos-$startpos, # PREFIX $oppos, length($op), # OPERATOR $ld1pos, $extrapos-$ld1pos, # LEFT DEL $str1pos, $rd1pos-$str1pos, # STR/PAT $rd1pos, $ld2pos-$rd1pos, # RIGHT DEL $ld2pos, 0, # NO 2ND LDEL $ld2pos, 0, # NO 2ND STR $ld2pos, 0, # NO 2ND RDEL $ld2pos, 0, # NO MODIFIERS $ld2pos, $textlen-$ld2pos, # REMAINDER $extrapos, $str1pos-$extrapos, # FILLETED BIT ); } $$textref =~ m/\G\s*/gc; $ld1pos = pos($$textref); $str1pos = $ld1pos+1; if ($$textref !~ m/\G(\S)/gc) # SHOULD USE LOOKAHEAD { _failmsg "No block delimiter found after quotelike $op", pos $$textref; pos $$textref = $startpos; return; } elsif (substr($$textref, $ld1pos, 2) eq '=>') { _failmsg "quotelike $op was actually quoted by '=>'", pos $$textref; pos $$textref = $startpos; return; } pos($$textref) = $ld1pos; # HAVE TO DO THIS BECAUSE LOOKAHEAD BROKEN my ($ldel1, $rdel1) = ("\Q$1","\Q$1"); if ($ldel1 =~ /[[(<{]/) { $rdel1 =~ tr/[({/; defined(_match_bracketed($textref,qr/\G/,qr/\G($ldel1)/,"","",qr/\G($rdel1)/)) || do { pos $$textref = $startpos; return }; $ld2pos = pos($$textref); $rd1pos = $ld2pos-1; } else { $$textref =~ /\G$ldel1[^\\$ldel1]*(\\.[^\\$ldel1]*)*$ldel1/gcs || do { pos $$textref = $startpos; return }; $ld2pos = $rd1pos = pos($$textref)-1; } my $second_arg = $op =~ /s|tr|y/ ? 1 : 0; if ($second_arg) { my ($ldel2, $rdel2); if ($ldel1 =~ /[[(<{]/) { unless ($$textref =~ /\G\s*(\S)/gc) # SHOULD USE LOOKAHEAD { _failmsg "Missing second block for quotelike $op", pos $$textref; pos $$textref = $startpos; return; } $ldel2 = $rdel2 = "\Q$1"; $rdel2 =~ tr/[({/; } else { $ldel2 = $rdel2 = $ldel1; } $str2pos = $ld2pos+1; if ($ldel2 =~ /[[(<{]/) { pos($$textref)--; # OVERCOME BROKEN LOOKAHEAD defined(_match_bracketed($textref,qr/\G/,qr/\G($ldel2)/,"","",qr/\G($rdel2)/)) || do { pos $$textref = $startpos; return }; } else { $$textref =~ /[^\\$ldel2]*(\\.[^\\$ldel2]*)*$ldel2/gcs || do { pos $$textref = $startpos; return }; } $rd2pos = pos($$textref)-1; } else { $ld2pos = $str2pos = $rd2pos = $rd1pos; } $modpos = pos $$textref; $$textref =~ m/\G($mods{$op})/gc; my $endpos = pos $$textref; $ref2qmarkvalid{$textref} = $ref2slashvalid{$textref} = undef; return ( $startpos, $oppos-$startpos, # PREFIX $oppos, length($op), # OPERATOR $ld1pos, 1, # LEFT DEL $str1pos, $rd1pos-$str1pos, # STR/PAT $rd1pos, 1, # RIGHT DEL $ld2pos, $second_arg, # 2ND LDEL (MAYBE) $str2pos, $rd2pos-$str2pos, # 2ND STR (MAYBE) $rd2pos, $second_arg, # 2ND RDEL (MAYBE) $modpos, $endpos-$modpos, # MODIFIERS $endpos, $textlen-$endpos, # REMAINDER ); } my $def_func = [ sub { extract_variable($_[0], '') }, sub { extract_quotelike($_[0],'') }, sub { extract_codeblock($_[0],'{}','') }, ]; my %ref_not_regex = map +($_=>1), qw(CODE Text::Balanced::Extractor); sub _update_patvalid { my ($textref, $text) = @_; if ($ref2slashvalid{$textref} && $text =~ m/(?:$RE_NUM|[\)\]])\s*$/) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 0; } elsif (!$ref2slashvalid{$textref} && $text =~ m/$RE_PREREGEX_PAT\s*$/) { $ref2slashvalid{$textref} = $ref2qmarkvalid{$textref} = 1; } elsif (!$ref2slashvalid{$textref} && $text =~ m/$RE_EXPR_PAT\s*$/) { $ref2slashvalid{$textref} = 1; $ref2qmarkvalid{$textref} = 0; } } sub extract_multiple (;$$$$) # ($text, $functions_ref, $max_fields, $ignoreunknown) { my $textref = defined($_[0]) ? \$_[0] : \$_; $ref2slashvalid{$textref} = 1, $ref2qmarkvalid{$textref} = 0 if !pos($$textref); # reset my $posbug = pos; my ($lastpos, $firstpos); my @fields = (); #for ($$textref) { my @func = defined $_[1] ? @{$_[1]} : @{$def_func}; my $max = defined $_[2] && $_[2]>0 ? $_[2] : 1_000_000_000; my $igunk = $_[3]; pos $$textref ||= 0; unless (wantarray) { use Carp; carp "extract_multiple reset maximal count to 1 in scalar context" if $^W && defined($_[2]) && $max > 1; $max = 1 } my @class; foreach my $func ( @func ) { push @class, undef; ($class[-1], $func) = %$func if ref($func) eq 'HASH'; $func = qr/\G$func/ if !$ref_not_regex{ref $func}; } my $unkpos; FIELD: while (pos($$textref) < length($$textref)) { foreach my $i ( 0..$#func ) { my ($field, $pref); my ($class, $func) = ($class[$i], $func[$i]); $lastpos = pos $$textref; if (ref($func) eq 'CODE') { ($field,undef,$pref) = $func->($$textref) } elsif (ref($func) eq 'Text::Balanced::Extractor') { $field = $func->extract($$textref) } elsif( $$textref =~ m/$func[$i]/gc ) { $field = defined($1) ? $1 : substr($$textref, $-[0], $+[0] - $-[0]) } $pref ||= ""; if (defined($field) && length($field)) { if (!$igunk) { $unkpos = $lastpos if length($pref) && !defined($unkpos); if (defined $unkpos) { push @fields, substr($$textref, $unkpos, $lastpos-$unkpos).$pref; $firstpos = $unkpos unless defined $firstpos; undef $unkpos; last FIELD if @fields == $max; } } push @fields, $class ? bless(\$field, $class) : $field; _update_patvalid($textref, $fields[-1]); $firstpos = $lastpos unless defined $firstpos; $lastpos = pos $$textref; last FIELD if @fields == $max; next FIELD; } } if ($$textref =~ /\G(.)/gcs) { $unkpos = pos($$textref)-1 unless $igunk || defined $unkpos; _update_patvalid($textref, substr $$textref, $unkpos, pos($$textref)-$unkpos); } } if (defined $unkpos) { push @fields, substr($$textref, $unkpos); $firstpos = $unkpos unless defined $firstpos; $lastpos = length $$textref; } last; } pos $$textref = $lastpos; return @fields if wantarray; $firstpos ||= 0; eval { substr($$textref,$firstpos,$lastpos-$firstpos)=""; pos $$textref = $firstpos }; return $fields[0]; } sub gen_extract_tagged # ($opentag, $closetag, $pre, \%options) { my $ldel = $_[0]; my $rdel = $_[1]; my $pre = defined $_[2] ? qr/\G$_[2]/ : qr/\G\s*/; my %options = defined $_[3] ? %{$_[3]} : (); my $omode = defined $options{fail} ? $options{fail} : ''; my $bad = ref($options{reject}) eq 'ARRAY' ? join('|', @{$options{reject}}) : defined($options{reject}) ? $options{reject} : '' ; my $ignore = ref($options{ignore}) eq 'ARRAY' ? join('|', @{$options{ignore}}) : defined($options{ignore}) ? $options{ignore} : '' ; $ldel = $et_default_ldel if !defined $ldel; my $posbug = pos; for ($ldel, $bad, $ignore) { $_ = qr/$_/ if $_ } pos = $posbug; my $closure = sub { my $textref = defined $_[0] ? \$_[0] : \$_; my @match = _match_tagged($textref, $pre, $ldel, $rdel, $omode, $bad, $ignore); return _fail(wantarray, $textref) unless @match; return _succeed wantarray, $textref, $match[2], $match[3]+$match[5]+$match[7], # MATCH @match[8..9,0..1,2..7]; # REM, PRE, BITS }; bless $closure, 'Text::Balanced::Extractor'; } package Text::Balanced::Extractor; sub extract($$) # ($self, $text) { &{$_[0]}($_[1]); } package Text::Balanced::ErrorMsg; use overload '""' => sub { "$_[0]->{error}, detected at offset $_[0]->{pos}" }, fallback => 1; 1; __END__ =pod =head1 NAME Text::Balanced - Extract delimited text sequences from strings. =head1 SYNOPSIS use Text::Balanced qw ( extract_delimited extract_bracketed extract_quotelike extract_codeblock extract_variable extract_tagged extract_multiple gen_delimited_pat gen_extract_tagged ); # Extract the initial substring of $text that is delimited by # two (unescaped) instances of the first character in $delim. ($extracted, $remainder) = extract_delimited($text,$delim); # Extract the initial substring of $text that is bracketed # with a delimiter(s) specified by $delim (where the string # in $delim contains one or more of '(){}[]<>'). ($extracted, $remainder) = extract_bracketed($text,$delim); # Extract the initial substring of $text that is bounded by # an XML tag. ($extracted, $remainder) = extract_tagged($text); # Extract the initial substring of $text that is bounded by # a C...C pair. Don't allow nested C tags ($extracted, $remainder) = extract_tagged($text,"BEGIN","END",undef,{bad=>["BEGIN"]}); # Extract the initial substring of $text that represents a # Perl "quote or quote-like operation" ($extracted, $remainder) = extract_quotelike($text); # Extract the initial substring of $text that represents a block # of Perl code, bracketed by any of character(s) specified by $delim # (where the string $delim contains one or more of '(){}[]<>'). ($extracted, $remainder) = extract_codeblock($text,$delim); # Extract the initial substrings of $text that would be extracted by # one or more sequential applications of the specified functions # or regular expressions @extracted = extract_multiple($text, [ \&extract_bracketed, \&extract_quotelike, \&some_other_extractor_sub, qr/[xyz]*/, 'literal', ]); # Create a string representing an optimized pattern (a la Friedl) # that matches a substring delimited by any of the specified characters # (in this case: any type of quote or a slash) $patstring = gen_delimited_pat(q{'"`/}); # Generate a reference to an anonymous sub that is just like extract_tagged # but pre-compiled and optimized for a specific pair of tags, and # consequently much faster (i.e. 3 times faster). It uses qr// for better # performance on repeated calls. $extract_head = gen_extract_tagged('',''); ($extracted, $remainder) = $extract_head->($text); =head1 DESCRIPTION The various C subroutines may be used to extract a delimited substring, possibly after skipping a specified prefix string. By default, that prefix is optional whitespace (C), but you can change it to whatever you wish (see below). The substring to be extracted must appear at the current C location of the string's variable (or at index zero, if no C position is defined). In other words, the C subroutines I extract the first occurrence of a substring anywhere in a string (like an unanchored regex would). Rather, they extract an occurrence of the substring appearing immediately at the current matching position in the string (like a C<\G>-anchored regex would). =head2 General Behaviour in List Contexts In a list context, all the subroutines return a list, the first three elements of which are always: =over 4 =item [0] The extracted string, including the specified delimiters. If the extraction fails C is returned. =item [1] The remainder of the input string (i.e. the characters after the extracted string). On failure, the entire string is returned. =item [2] The skipped prefix (i.e. the characters before the extracted string). On failure, C is returned. =back Note that in a list context, the contents of the original input text (the first argument) are not modified in any way. However, if the input text was passed in a variable, that variable's C value is updated to point at the first character after the extracted text. That means that in a list context the various subroutines can be used much like regular expressions. For example: while ( $next = (extract_quotelike($text))[0] ) { # process next quote-like (in $next) } =head2 General Behaviour in Scalar and Void Contexts In a scalar context, the extracted string is returned, having first been removed from the input text. Thus, the following code also processes each quote-like operation, but actually removes them from $text: while ( $next = extract_quotelike($text) ) { # process next quote-like (in $next) } Note that if the input text is a read-only string (i.e. a literal), no attempt is made to remove the extracted text. In a void context the behaviour of the extraction subroutines is exactly the same as in a scalar context, except (of course) that the extracted substring is not returned. =head2 A Note About Prefixes Prefix patterns are matched without any trailing modifiers (C etc.) This can bite you if you're expecting a prefix specification like '.*?(?=

)' to skip everything up to the first

tag. Such a prefix pattern will only succeed if the

tag is on the current line, since . normally doesn't match newlines. To overcome this limitation, you need to turn on /s matching within the prefix pattern, using the C<(?s)> directive: '(?s).*?(?=

)' =head2 Functions =over 4 =item C The C function formalizes the common idiom of extracting a single-character-delimited substring from the start of a string. For example, to extract a single-quote delimited string, the following code is typically used: ($remainder = $text) =~ s/\A('(\\.|[^'])*')//s; $extracted = $1; but with C it can be simplified to: ($extracted,$remainder) = extract_delimited($text, "'"); C takes up to four scalars (the input text, the delimiters, a prefix pattern to be skipped, and any escape characters) and extracts the initial substring of the text that is appropriately delimited. If the delimiter string has multiple characters, the first one encountered in the text is taken to delimit the substring. The third argument specifies a prefix pattern that is to be skipped (but must be present!) before the substring is extracted. The final argument specifies the escape character to be used for each delimiter. All arguments are optional. If the escape characters are not specified, every delimiter is escaped with a backslash (C<\>). If the prefix is not specified, the pattern C<'\s*'> - optional whitespace - is used. If the delimiter set is also not specified, the set C is used. If the text to be processed is not specified either, C<$_> is used. In list context, C returns a array of three elements, the extracted substring (I), the remainder of the text, and the skipped prefix (if any). If a suitable delimited substring is not found, the first element of the array is the empty string, the second is the complete original text, and the prefix returned in the third element is an empty string. In a scalar context, just the extracted substring is returned. In a void context, the extracted substring (and any prefix) are simply removed from the beginning of the first argument. Examples: # Remove a single-quoted substring from the very beginning of $text: $substring = extract_delimited($text, "'", ''); # Remove a single-quoted Pascalish substring (i.e. one in which # doubling the quote character escapes it) from the very # beginning of $text: $substring = extract_delimited($text, "'", '', "'"); # Extract a single- or double- quoted substring from the # beginning of $text, optionally after some whitespace # (note the list context to protect $text from modification): ($substring) = extract_delimited $text, q{"'}; # Delete the substring delimited by the first '/' in $text: $text = join '', (extract_delimited($text,'/','[^/]*')[2,1]; Note that this last example is I the same as deleting the first quote-like pattern. For instance, if C<$text> contained the string: "if ('./cmd' =~ m/$UNIXCMD/s) { $cmd = $1; }" then after the deletion it would contain: "if ('.$UNIXCMD/s) { $cmd = $1; }" not: "if ('./cmd' =~ ms) { $cmd = $1; }" See L<"extract_quotelike"> for a (partial) solution to this problem. =item C Like C<"extract_delimited">, the C function takes up to three optional scalar arguments: a string to extract from, a delimiter specifier, and a prefix pattern. As before, a missing prefix defaults to optional whitespace and a missing text defaults to C<$_>. However, a missing delimiter specifier defaults to C<'{}()[]EE'> (see below). C extracts a balanced-bracket-delimited substring (using any one (or more) of the user-specified delimiter brackets: '(..)', '{..}', '[..]', or '<..>'). Optionally it will also respect quoted unbalanced brackets (see below). A "delimiter bracket" is a bracket in list of delimiters passed as C's second argument. Delimiter brackets are specified by giving either the left or right (or both!) versions of the required bracket(s). Note that the order in which two or more delimiter brackets are specified is not significant. A "balanced-bracket-delimited substring" is a substring bounded by matched brackets, such that any other (left or right) delimiter bracket I the substring is also matched by an opposite (right or left) delimiter bracket I. Any type of bracket not in the delimiter list is treated as an ordinary character. In other words, each type of bracket specified as a delimiter must be balanced and correctly nested within the substring, and any other kind of ("non-delimiter") bracket in the substring is ignored. For example, given the string: $text = "{ an '[irregularly :-(] {} parenthesized >:-)' string }"; then a call to C in a list context: @result = extract_bracketed( $text, '{}' ); would return: ( "{ an '[irregularly :-(] {} parenthesized >:-)' string }" , "" , "" ) since both sets of C<'{..}'> brackets are properly nested and evenly balanced. (In a scalar context just the first element of the array would be returned. In a void context, C<$text> would be replaced by an empty string.) Likewise the call in: @result = extract_bracketed( $text, '{[' ); would return the same result, since all sets of both types of specified delimiter brackets are correctly nested and balanced. However, the call in: @result = extract_bracketed( $text, '{([<' ); would fail, returning: ( undef , "{ an '[irregularly :-(] {} parenthesized >:-)' string }" ); because the embedded pairs of C<'(..)'>s and C<'[..]'>s are "cross-nested" and the embedded C<'E'> is unbalanced. (In a scalar context, this call would return an empty string. In a void context, C<$text> would be unchanged.) Note that the embedded single-quotes in the string don't help in this case, since they have not been specified as acceptable delimiters and are therefore treated as non-delimiter characters (and ignored). However, if a particular species of quote character is included in the delimiter specification, then that type of quote will be correctly handled. for example, if C<$text> is: $text = 'link'; then @result = extract_bracketed( $text, '<">' ); returns: ( '', 'link', "" ) as expected. Without the specification of C<"> as an embedded quoter: @result = extract_bracketed( $text, '<>' ); the result would be: ( 'link', "" ) In addition to the quote delimiters C<'>, C<">, and C<`>, full Perl quote-like quoting (i.e. q{string}, qq{string}, etc) can be specified by including the letter 'q' as a delimiter. Hence: @result = extract_bracketed( $text, '' ); would correctly match something like this: $text = ''; See also: C<"extract_quotelike"> and C<"extract_codeblock">. =item C C extracts any valid Perl variable or variable-involved expression, including scalars, arrays, hashes, array accesses, hash look-ups, method calls through objects, subroutine calls through subroutine references, etc. The subroutine takes up to two optional arguments: =over 4 =item 1. A string to be processed (C<$_> if the string is omitted or C) =item 2. A string specifying a pattern to be matched as a prefix (which is to be skipped). If omitted, optional whitespace is skipped. =back On success in a list context, an array of 3 elements is returned. The elements are: =over 4 =item [0] the extracted variable, or variablish expression =item [1] the remainder of the input text, =item [2] the prefix substring (if any), =back On failure, all of these values (except the remaining text) are C. In a scalar context, C returns just the complete substring that matched a variablish expression. C is returned on failure. In addition, the original input text has the returned substring (and any prefix) removed from it. In a void context, the input text just has the matched substring (and any specified prefix) removed. =item C C extracts and segments text between (balanced) specified tags. The subroutine takes up to five optional arguments: =over 4 =item 1. A string to be processed (C<$_> if the string is omitted or C) =item 2. A string specifying a pattern (i.e. regex) to be matched as the opening tag. If the pattern string is omitted (or C) then a pattern that matches any standard XML tag is used. =item 3. A string specifying a pattern to be matched at the closing tag. If the pattern string is omitted (or C) then the closing tag is constructed by inserting a C after any leading bracket characters in the actual opening tag that was matched (I the pattern that matched the tag). For example, if the opening tag pattern is specified as C<'{{\w+}}'> and actually matched the opening tag C<"{{DATA}}">, then the constructed closing tag would be C<"{{/DATA}}">. =item 4. A string specifying a pattern to be matched as a prefix (which is to be skipped). If omitted, optional whitespace is skipped. =item 5. A hash reference containing various parsing options (see below) =back The various options that can be specified are: =over 4 =item C $listref> The list reference contains one or more strings specifying patterns that must I appear within the tagged text. For example, to extract an HTML link (which should not contain nested links) use: extract_tagged($text, '', '', undef, {reject => ['']} ); =item C $listref> The list reference contains one or more strings specifying patterns that are I to be treated as nested tags within the tagged text (even if they would match the start tag pattern). For example, to extract an arbitrary XML tag, but ignore "empty" elements: extract_tagged($text, undef, undef, undef, {ignore => ['<[^>]*/>']} ); (also see L<"gen_delimited_pat"> below). =item C $str> The C option indicates the action to be taken if a matching end tag is not encountered (i.e. before the end of the string or some C pattern matches). By default, a failure to match a closing tag causes C to immediately fail. However, if the string value associated with is "MAX", then C returns the complete text up to the point of failure. If the string is "PARA", C returns only the first paragraph after the tag (up to the first line that is either empty or contains only whitespace characters). If the string is "", the default behaviour (i.e. failure) is reinstated. For example, suppose the start tag "/para" introduces a paragraph, which then continues until the next "/endpara" tag or until another "/para" tag is encountered: $text = "/para line 1\n\nline 3\n/para line 4"; extract_tagged($text, '/para', '/endpara', undef, {reject => '/para', fail => MAX ); # EXTRACTED: "/para line 1\n\nline 3\n" Suppose instead, that if no matching "/endpara" tag is found, the "/para" tag refers only to the immediately following paragraph: $text = "/para line 1\n\nline 3\n/para line 4"; extract_tagged($text, '/para', '/endpara', undef, {reject => '/para', fail => MAX ); # EXTRACTED: "/para line 1\n" Note that the specified C behaviour applies to nested tags as well. =back On success in a list context, an array of 6 elements is returned. The elements are: =over 4 =item [0] the extracted tagged substring (including the outermost tags), =item [1] the remainder of the input text, =item [2] the prefix substring (if any), =item [3] the opening tag =item [4] the text between the opening and closing tags =item [5] the closing tag (or "" if no closing tag was found) =back On failure, all of these values (except the remaining text) are C. In a scalar context, C returns just the complete substring that matched a tagged text (including the start and end tags). C is returned on failure. In addition, the original input text has the returned substring (and any prefix) removed from it. In a void context, the input text just has the matched substring (and any specified prefix) removed. =item C C generates a new anonymous subroutine which extracts text between (balanced) specified tags. In other words, it generates a function identical in function to C. The difference between C and the anonymous subroutines generated by C, is that those generated subroutines: =over 4 =item * do not have to reparse tag specification or parsing options every time they are called (whereas C has to effectively rebuild its tag parser on every call); =item * make use of the new qr// construct to pre-compile the regexes they use (whereas C uses standard string variable interpolation to create tag-matching patterns). =back The subroutine takes up to four optional arguments (the same set as C except for the string to be processed). It returns a reference to a subroutine which in turn takes a single argument (the text to be extracted from). In other words, the implementation of C is exactly equivalent to: sub extract_tagged { my $text = shift; $extractor = gen_extract_tagged(@_); return $extractor->($text); } (although C is not currently implemented that way). Using C to create extraction functions for specific tags is a good idea if those functions are going to be called more than once, since their performance is typically twice as good as the more general-purpose C. =item C C attempts to recognize, extract, and segment any one of the various Perl quotes and quotelike operators (see L) Nested backslashed delimiters, embedded balanced bracket delimiters (for the quotelike operators), and trailing modifiers are all caught. For example, in: extract_quotelike 'q # an octothorpe: \# (not the end of the q!) #' extract_quotelike ' "You said, \"Use sed\"." ' extract_quotelike ' s{([A-Z]{1,8}\.[A-Z]{3})} /\L$1\E/; ' extract_quotelike ' tr/\\\/\\\\/\\\//ds; ' the full Perl quotelike operations are all extracted correctly. Note too that, when using the /x modifier on a regex, any comment containing the current pattern delimiter will cause the regex to be immediately terminated. In other words: 'm / (?i) # CASE INSENSITIVE [a-z_] # LEADING ALPHABETIC/UNDERSCORE [a-z0-9]* # FOLLOWED BY ANY NUMBER OF ALPHANUMERICS /x' will be extracted as if it were: 'm / (?i) # CASE INSENSITIVE [a-z_] # LEADING ALPHABETIC/' This behaviour is identical to that of the actual compiler. C takes two arguments: the text to be processed and a prefix to be matched at the very beginning of the text. If no prefix is specified, optional whitespace is the default. If no text is given, C<$_> is used. In a list context, an array of 11 elements is returned. The elements are: =over 4 =item [0] the extracted quotelike substring (including trailing modifiers), =item [1] the remainder of the input text, =item [2] the prefix substring (if any), =item [3] the name of the quotelike operator (if any), =item [4] the left delimiter of the first block of the operation, =item [5] the text of the first block of the operation (that is, the contents of a quote, the regex of a match or substitution or the target list of a translation), =item [6] the right delimiter of the first block of the operation, =item [7] the left delimiter of the second block of the operation (that is, if it is a C, C, or C), =item [8] the text of the second block of the operation (that is, the replacement of a substitution or the translation list of a translation), =item [9] the right delimiter of the second block of the operation (if any), =item [10] the trailing modifiers on the operation (if any). =back For each of the fields marked "(if any)" the default value on success is an empty string. On failure, all of these values (except the remaining text) are C. In a scalar context, C returns just the complete substring that matched a quotelike operation (or C on failure). In a scalar or void context, the input text has the same substring (and any specified prefix) removed. Examples: # Remove the first quotelike literal that appears in text $quotelike = extract_quotelike($text,'.*?'); # Replace one or more leading whitespace-separated quotelike # literals in $_ with "" do { $_ = join '', (extract_quotelike)[2,1] } until $@; # Isolate the search pattern in a quotelike operation from $text ($op,$pat) = (extract_quotelike $text)[3,5]; if ($op =~ /[ms]/) { print "search pattern: $pat\n"; } else { print "$op is not a pattern matching operation\n"; } =item C C can successfully extract "here documents" from an input string, but with an important caveat in list contexts. Unlike other types of quote-like literals, a here document is rarely a contiguous substring. For example, a typical piece of code using here document might look like this: <<'EOMSG' || die; This is the message. EOMSG exit; Given this as an input string in a scalar context, C would correctly return the string "<<'EOMSG'\nThis is the message.\nEOMSG", leaving the string " || die;\nexit;" in the original variable. In other words, the two separate pieces of the here document are successfully extracted and concatenated. In a list context, C would return the list =over 4 =item [0] "<<'EOMSG'\nThis is the message.\nEOMSG\n" (i.e. the full extracted here document, including fore and aft delimiters), =item [1] " || die;\nexit;" (i.e. the remainder of the input text, concatenated), =item [2] "" (i.e. the prefix substring -- trivial in this case), =item [3] "<<" (i.e. the "name" of the quotelike operator) =item [4] "'EOMSG'" (i.e. the left delimiter of the here document, including any quotes), =item [5] "This is the message.\n" (i.e. the text of the here document), =item [6] "EOMSG" (i.e. the right delimiter of the here document), =item [7..10] "" (a here document has no second left delimiter, second text, second right delimiter, or trailing modifiers). =back However, the matching position of the input variable would be set to "exit;" (i.e. I the closing delimiter of the here document), which would cause the earlier " || die;\nexit;" to be skipped in any sequence of code fragment extractions. To avoid this problem, when it encounters a here document whilst extracting from a modifiable string, C silently rearranges the string to an equivalent piece of Perl: <<'EOMSG' This is the message. EOMSG || die; exit; in which the here document I contiguous. It still leaves the matching position after the here document, but now the rest of the line on which the here document starts is not skipped. To prevent from mucking about with the input in this way (this is the only case where a list-context C does so), you can pass the input variable as an interpolated literal: $quotelike = extract_quotelike("$var"); =item C C attempts to recognize and extract a balanced bracket delimited substring that may contain unbalanced brackets inside Perl quotes or quotelike operations. That is, C is like a combination of C<"extract_bracketed"> and C<"extract_quotelike">. C takes the same initial three parameters as C: a text to process, a set of delimiter brackets to look for, and a prefix to match first. It also takes an optional fourth parameter, which allows the outermost delimiter brackets to be specified separately (see below), and a fifth parameter used only by L. Omitting the first argument (input text) means process C<$_> instead. Omitting the second argument (delimiter brackets) indicates that only C<'{'> is to be used. Omitting the third argument (prefix argument) implies optional whitespace at the start. Omitting the fourth argument (outermost delimiter brackets) indicates that the value of the second argument is to be used for the outermost delimiters. Once the prefix and the outermost opening delimiter bracket have been recognized, code blocks are extracted by stepping through the input text and trying the following alternatives in sequence: =over 4 =item 1. Try and match a closing delimiter bracket. If the bracket was the same species as the last opening bracket, return the substring to that point. If the bracket was mismatched, return an error. =item 2. Try to match a quote or quotelike operator. If found, call C to eat it. If C fails, return the error it returned. Otherwise go back to step 1. =item 3. Try to match an opening delimiter bracket. If found, call C recursively to eat the embedded block. If the recursive call fails, return an error. Otherwise, go back to step 1. =item 4. Unconditionally match a bareword or any other single character, and then go back to step 1. =back Examples: # Find a while loop in the text if ($text =~ s/.*?while\s*\{/{/) { $loop = "while " . extract_codeblock($text); } # Remove the first round-bracketed list (which may include # round- or curly-bracketed code blocks or quotelike operators) extract_codeblock $text, "(){}", '[^(]*'; The ability to specify a different outermost delimiter bracket is useful in some circumstances. For example, in the Parse::RecDescent module, parser actions which are to be performed only on a successful parse are specified using a Cdefer:...E> directive. For example: sentence: subject verb object Parse::RecDescent uses CE')> to extract the code within the Cdefer:...E> directive, but there's a problem. A deferred action like this: 10) {$count--}} > will be incorrectly parsed as: because the "less than" operator is interpreted as a closing delimiter. But, by extracting the directive using SE')>> the '>' character is only treated as a delimited at the outermost level of the code block, so the directive is parsed correctly. =item C The C subroutine takes a string to be processed and a list of extractors (subroutines or regular expressions) to apply to that string. In an array context C returns an array of substrings of the original string, as extracted by the specified extractors. In a scalar context, C returns the first substring successfully extracted from the original string. In both scalar and void contexts the original string has the first successfully extracted substring removed from it. In all contexts C starts at the current C of the string, and sets that C appropriately after it matches. Hence, the aim of a call to C in a list context is to split the processed string into as many non-overlapping fields as possible, by repeatedly applying each of the specified extractors to the remainder of the string. Thus C is a generalized form of Perl's C subroutine. The subroutine takes up to four optional arguments: =over 4 =item 1. A string to be processed (C<$_> if the string is omitted or C) =item 2. A reference to a list of subroutine references and/or qr// objects and/or literal strings and/or hash references, specifying the extractors to be used to split the string. If this argument is omitted (or C) the list: [ sub { extract_variable($_[0], '') }, sub { extract_quotelike($_[0],'') }, sub { extract_codeblock($_[0],'{}','') }, ] is used. =item 3. An number specifying the maximum number of fields to return. If this argument is omitted (or C), split continues as long as possible. If the third argument is I, then extraction continues until I fields have been successfully extracted, or until the string has been completely processed. Note that in scalar and void contexts the value of this argument is automatically reset to 1 (under C<-w>, a warning is issued if the argument has to be reset). =item 4. A value indicating whether unmatched substrings (see below) within the text should be skipped or returned as fields. If the value is true, such substrings are skipped. Otherwise, they are returned. =back The extraction process works by applying each extractor in sequence to the text string. If the extractor is a subroutine it is called in a list context and is expected to return a list of a single element, namely the extracted text. It may optionally also return two further arguments: a string representing the text left after extraction (like $' for a pattern match), and a string representing any prefix skipped before the extraction (like $` in a pattern match). Note that this is designed to facilitate the use of other Text::Balanced subroutines with C. Note too that the value returned by an extractor subroutine need not bear any relationship to the corresponding substring of the original text (see examples below). If the extractor is a precompiled regular expression or a string, it is matched against the text in a scalar context with a leading '\G' and the gc modifiers enabled. The extracted value is either $1 if that variable is defined after the match, or else the complete match (i.e. $&). If the extractor is a hash reference, it must contain exactly one element. The value of that element is one of the above extractor types (subroutine reference, regular expression, or string). The key of that element is the name of a class into which the successful return value of the extractor will be blessed. If an extractor returns a defined value, that value is immediately treated as the next extracted field and pushed onto the list of fields. If the extractor was specified in a hash reference, the field is also blessed into the appropriate class, If the extractor fails to match (in the case of a regex extractor), or returns an empty list or an undefined value (in the case of a subroutine extractor), it is assumed to have failed to extract. If none of the extractor subroutines succeeds, then one character is extracted from the start of the text and the extraction subroutines reapplied. Characters which are thus removed are accumulated and eventually become the next field (unless the fourth argument is true, in which case they are discarded). For example, the following extracts substrings that are valid Perl variables: @fields = extract_multiple($text, [ sub { extract_variable($_[0]) } ], undef, 1); This example separates a text into fields which are quote delimited, curly bracketed, and anything else. The delimited and bracketed parts are also blessed to identify them (the "anything else" is unblessed): @fields = extract_multiple($text, [ { Delim => sub { extract_delimited($_[0],q{'"}) } }, { Brack => sub { extract_bracketed($_[0],'{}') } }, ]); This call extracts the next single substring that is a valid Perl quotelike operator (and removes it from $text): $quotelike = extract_multiple($text, [ sub { extract_quotelike($_[0]) }, ], undef, 1); Finally, here is yet another way to do comma-separated value parsing: $csv_text = "a,'x b',c"; @fields = extract_multiple($csv_text, [ sub { extract_delimited($_[0],q{'"}) }, qr/([^,]+)/, ], undef,1); # @fields is now ('a', "'x b'", 'c') The list in the second argument means: I<"Try and extract a ' or " delimited string, otherwise extract anything up to a comma...">. The undef third argument means: I<"...as many times as possible...">, and the true value in the fourth argument means I<"...discarding anything else that appears (i.e. the commas)">. If you wanted the commas preserved as separate fields (i.e. like split does if your split pattern has capturing parentheses), you would just make the last parameter undefined (or remove it). =item C The C subroutine takes a single (string) argument and builds a Friedl-style optimized regex that matches a string delimited by any one of the characters in the single argument. For example: gen_delimited_pat(q{'"}) returns the regex: (?:\"(?:\\\"|(?!\").)*\"|\'(?:\\\'|(?!\').)*\') Note that the specified delimiters are automatically quotemeta'd. A typical use of C would be to build special purpose tags for C. For example, to properly ignore "empty" XML elements (which might contain quoted strings): my $empty_tag = '<(' . gen_delimited_pat(q{'"}) . '|.)+/>'; extract_tagged($text, undef, undef, undef, {ignore => [$empty_tag]} ); C may also be called with an optional second argument, which specifies the "escape" character(s) to be used for each delimiter. For example to match a Pascal-style string (where ' is the delimiter and '' is a literal ' within the string): gen_delimited_pat(q{'},q{'}); Different escape characters can be specified for different delimiters. For example, to specify that '/' is the escape for single quotes and '%' is the escape for double quotes: gen_delimited_pat(q{'"},q{/%}); If more delimiters than escape chars are specified, the last escape char is used for the remaining delimiters. If no escape char is specified for a given specified delimiter, '\' is used. =item C Note that C was previously called C. That name may still be used, but is now deprecated. =back =head1 DIAGNOSTICS In a list context, all the functions return C<(undef,$original_text)> on failure. In a scalar context, failure is indicated by returning C (in this case the input text is not modified in any way). In addition, on failure in I context, the C<$@> variable is set. Accessing C<$@-E{error}> returns one of the error diagnostics listed below. Accessing C<$@-E{pos}> returns the offset into the original string at which the error was detected (although not necessarily where it occurred!) Printing C<$@> directly produces the error message, with the offset appended. On success, the C<$@> variable is guaranteed to be C. The available diagnostics are: =over 4 =item C The delimiter provided to C was not one of C<'()[]EE{}'>. =item C A non-optional prefix was specified but wasn't found at the start of the text. =item C C or C was expecting a particular kind of bracket at the start of the text, and didn't find it. =item C C didn't find one of the quotelike operators C, C, C, C, C, C or C at the start of the substring it was extracting. =item C C, C or C encountered a closing bracket where none was expected. =item C C, C or C ran out of characters in the text before closing one or more levels of nested brackets. =item C C attempted to match an embedded quoted substring, but failed to find a closing quote to match it. =item C C was unable to find a closing delimiter to match the one that opened the quote-like operation. =item C C, C or C found a valid bracket delimiter, but it was the wrong species. This usually indicates a nesting error, but may indicate incorrect quoting or escaping. =item C C or C found one of the quotelike operators C, C, C, C, C, C or C without a suitable block after it. =item C C was expecting one of '$', '@', or '%' at the start of a variable, but didn't find any of them. =item C C found a '$', '@', or '%' indicating a variable, but that character was not followed by a legal Perl identifier. =item C C failed to find any of the outermost opening brackets that were specified. =item C A nested code block was found that started with a delimiter that was specified as being only to be used as an outermost bracket. =item C C or C found one of the quotelike operators C, C or C followed by only one block. =item C C failed to find a closing bracket to match the outermost opening bracket. =item C C did not find a suitable opening tag (after any specified prefix was removed). =item C C matched the specified opening tag and tried to modify the matched text to produce a matching closing tag (because none was specified). It failed to generate the closing tag, almost certainly because the opening tag did not start with a bracket of some kind. =item C C found a nested tag that appeared in the "reject" list (and the failure mode was not "MAX" or "PARA"). =item C C found a nested opening tag that was not matched by a corresponding nested closing tag (and the failure mode was not "MAX" or "PARA"). =item C C reached the end of the text without finding a closing tag to match the original opening tag (and the failure mode was not "MAX" or "PARA"). =back =head1 EXPORTS The following symbols are, or can be, exported by this module: =over 4 =item Default Exports I. =item Optional Exports C, C, C, C, C, C, C, C, C, C. =item Export Tags =over 4 =item C<:ALL> C, C, C, C, C, C, C, C, C, C. =back =back =head1 KNOWN BUGS See L. =head1 FEEDBACK Patches, bug reports, suggestions or any other feedback is welcome. Patches can be sent as GitHub pull requests at L. Bug reports and suggestions can be made on the CPAN Request Tracker at L. Currently active requests on the CPAN Request Tracker can be viewed at L. Please test this distribution. See CPAN Testers Reports at L for details of how to get involved. Previous test results on CPAN Testers Reports can be viewed at L. Please rate this distribution on CPAN Ratings at L. =head1 AVAILABILITY The latest version of this module is available from CPAN (see L for details) at L or L or L. The latest source code is available from GitHub at L. =head1 INSTALLATION See the F file. =head1 AUTHOR Damian Conway ELE. Steve Hay ELE is now maintaining Text::Balanced as of version 2.03. =head1 COPYRIGHT Copyright (C) 1997-2001 Damian Conway. All rights reserved. Copyright (C) 2009 Adam Kennedy. Copyright (C) 2015, 2020, 2022 Steve Hay and other contributors. All rights reserved. =head1 LICENCE This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e. under the terms of either the GNU General Public License or the Artistic License, as specified in the F file. =head1 VERSION Version 2.06 =head1 DATE 05 Jun 2022 =head1 HISTORY See the F file. =cut Text-Balanced-2.06/LICENCE0000644000104006017510000000223414247103162015754 0ustar AdministratorssteveThis distribution is free software; you can redistribute it and/or modify it under the terms of either: a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version; or b) the "Artistic License" which comes with this distribution. This distribution is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the Artistic License for more details. You should have received a copy of the GNU General Public License along with this distribution in the file named "Copying". If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA or visit their web page on the internet at https://www.gnu.org/copyleft/gpl.html or the Perl web page at https://dev.perl.org/licenses/gpl1.html. You should also have received a copy of the Artistic License with this distribution, in the file named "Artistic". If not, visit the Perl web page on the internet at https://dev.perl.org/licenses/artistic.html. Text-Balanced-2.06/Makefile.PL0000644000104006017510000001127214247103162016743 0ustar Administratorssteve#!perl #=============================================================================== # # Makefile.PL # # DESCRIPTION # Makefile creation script. # # COPYRIGHT # Copyright (C) 2015, 2020 Steve Hay. All rights reserved. # # LICENCE # This script is free software; you can redistribute it and/or modify it under # the same terms as Perl itself, i.e. under the terms of either the GNU # General Public License or the Artistic License, as specified in the LICENCE # file. # #=============================================================================== use 5.008001; use strict; use warnings; use ExtUtils::MakeMaker; use ExtUtils::MakeMaker qw(WriteMakefile); #=============================================================================== # MAIN PROGRAM #=============================================================================== MAIN: { WriteMakefile( NAME => 'Text::Balanced', ABSTRACT_FROM => 'lib/Text/Balanced.pm', AUTHOR => 'Damian Conway , Adam Kennedy , Steve Hay ', LICENSE => 'perl_5', VERSION_FROM => 'lib/Text/Balanced.pm', META_MERGE => { 'meta-spec' => { version => 2 }, resources => { repository => { type => 'git', web => 'https://github.com/steve-m-hay/Text-Balanced' } }, optional_features => { changestest => { description => 'Changes testing', prereqs => { test => { requires => { 'Test::CPAN::Changes' => '0' } } } }, critictest => { description => 'Perl::Critic testing', prereqs => { test => { requires => { 'Test::Perl::Critic' => '0' } } } }, metatest => { description => 'META testing', prereqs => { test => { requires => { 'Test::CPAN::Meta' => '0.12' } } } }, pmvtest => { description => 'Perl minimum version testing', prereqs => { test => { requires => { 'Perl::MinimumVersion' => '1.20', 'Test::MinimumVersion' => '0.101082' } } } }, podtest => { description => 'POD testing', prereqs => { test => { requires => { 'Pod::Simple' => '3.07', 'Test::Pod' => '1.26' } } } }, podcoveragetest => { description => 'POD coverage testing', prereqs => { test => { requires => { 'Test::Pod::Coverage' => '0.08' } } } } } }, MIN_PERL_VERSION => '5.008001', CONFIGURE_REQUIRES => { 'ExtUtils::MakeMaker' => '6.64', 'perl' => '5.008001', 'strict' => '0', 'warnings' => '0' }, TEST_REQUIRES => { 'Test::More' => '0.88', # done_testing 'vars' => '0' }, PREREQ_PM => { 'Carp' => '0', 'Exporter' => '0', 'overload' => '0', 'strict' => '0', 'vars' => '0' }, INSTALLDIRS => ($] < 5.011 ? 'perl' : 'site'), dist => { PREOP => 'find $(DISTVNAME) -type d -print|xargs chmod 0755 && ' . 'find $(DISTVNAME) -type f -print|xargs chmod 0644', TO_UNIX => 'find $(DISTVNAME) -type f -print|xargs dos2unix' } ); } #=============================================================================== Text-Balanced-2.06/MANIFEST0000644000104006017510000000311014247103162016112 0ustar AdministratorssteveArtistic The "Artistic License" Changes Differences from previous version Copying The GNU General Public License INSTALL Detailed installation instructions lib/Text/Balanced.pm Text::Balanced Perl module LICENCE The Licence Makefile.PL Makefile writer MANIFEST This list of files MANIFEST.SKIP Manifest skip specs README The Instructions t/01_compile.t Test script t/02_extbrk.t Test script t/03_extcbk.t Test script t/04_extdel.t Test script t/05_extmul.t Test script t/06_extqlk.t Test script t/07_exttag.t Test script t/08_extvar.t Test script t/09_gentag.t Test script t/94_changes.t See if Changes file format is OK t/95_critic.t See if coding style is OK t/96_pmv.t Test script t/97_pod.t See if POD is OK t/98_pod_coverage.t See if POD coverage is OK META.yml Module YAML meta-data (added by MakeMaker) META.json Module JSON meta-data (added by MakeMaker) Text-Balanced-2.06/MANIFEST.SKIP0000644000104006017510000000044314247103162016665 0ustar Administratorssteve# Source control system files ^\.git/ ^\.gitignore$ # Files generated by Makefile.PL ^Makefile$ ^MYMETA\.json$ ^MYMETA\.yml$ # Files generated by *make ^blib/ ^pm_to_blib$ # Files generated by *make clean ^Makefile\.old$ # Files generated by *make dist ^Text-Balanced-\d\.\d\d\.tar\.gz$ Text-Balanced-2.06/META.json0000644000104006017510000000602314247103162016410 0ustar Administratorssteve{ "abstract" : "Extract delimited text sequences from strings.", "author" : [ "Damian Conway , Adam Kennedy , Steve Hay " ], "dynamic_config" : 1, "generated_by" : "ExtUtils::MakeMaker version 7.34, CPAN::Meta::Converter version 2.150010", "license" : [ "perl_5" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : 2 }, "name" : "Text-Balanced", "no_index" : { "directory" : [ "t", "inc" ] }, "optional_features" : { "changestest" : { "description" : "Changes testing", "prereqs" : { "test" : { "requires" : { "Test::CPAN::Changes" : "0" } } } }, "critictest" : { "description" : "Perl::Critic testing", "prereqs" : { "test" : { "requires" : { "Test::Perl::Critic" : "0" } } } }, "metatest" : { "description" : "META testing", "prereqs" : { "test" : { "requires" : { "Test::CPAN::Meta" : "0.12" } } } }, "pmvtest" : { "description" : "Perl minimum version testing", "prereqs" : { "test" : { "requires" : { "Perl::MinimumVersion" : "1.20", "Test::MinimumVersion" : "0.101082" } } } }, "podcoveragetest" : { "description" : "POD coverage testing", "prereqs" : { "test" : { "requires" : { "Test::Pod::Coverage" : "0.08" } } } }, "podtest" : { "description" : "POD testing", "prereqs" : { "test" : { "requires" : { "Pod::Simple" : "3.07", "Test::Pod" : "1.26" } } } } }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "6.64", "perl" : "5.008001", "strict" : "0", "warnings" : "0" } }, "runtime" : { "requires" : { "Carp" : "0", "Exporter" : "0", "overload" : "0", "perl" : "5.008001", "strict" : "0", "vars" : "0" } }, "test" : { "requires" : { "Test::More" : "0.88", "vars" : "0" } } }, "release_status" : "stable", "resources" : { "repository" : { "type" : "git", "web" : "https://github.com/steve-m-hay/Text-Balanced" } }, "version" : "2.06", "x_serialization_backend" : "JSON::PP version 4.02" } Text-Balanced-2.06/META.yml0000644000104006017510000000224314247103162016240 0ustar Administratorssteve--- abstract: 'Extract delimited text sequences from strings.' author: - 'Damian Conway , Adam Kennedy , Steve Hay ' build_requires: ExtUtils::MakeMaker: '0' Test::More: '0.88' vars: '0' configure_requires: ExtUtils::MakeMaker: '6.64' perl: '5.008001' strict: '0' warnings: '0' dynamic_config: 1 generated_by: 'ExtUtils::MakeMaker version 7.34, CPAN::Meta::Converter version 2.150010' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: Text-Balanced no_index: directory: - t - inc optional_features: changestest: description: 'Changes testing' critictest: description: 'Perl::Critic testing' metatest: description: 'META testing' pmvtest: description: 'Perl minimum version testing' podcoveragetest: description: 'POD coverage testing' podtest: description: 'POD testing' requires: Carp: '0' Exporter: '0' overload: '0' perl: '5.008001' strict: '0' vars: '0' resources: repository: https://github.com/steve-m-hay/Text-Balanced version: '2.06' x_serialization_backend: 'CPAN::Meta::YAML version 0.018' Text-Balanced-2.06/README0000644000104006017510000000766314247103162015662 0ustar AdministratorssteveNAME Text::Balanced - Extract delimited text sequences from strings. SYNOPSIS use Text::Balanced qw ( extract_delimited extract_bracketed extract_quotelike extract_codeblock extract_variable extract_tagged extract_multiple gen_delimited_pat gen_extract_tagged ); # Extract the initial substring of $text that is delimited by # two (unescaped) instances of the first character in $delim. ($extracted, $remainder) = extract_delimited($text,$delim); # Extract the initial substring of $text that is bracketed # with a delimiter(s) specified by $delim (where the string # in $delim contains one or more of '(){}[]<>'). ($extracted, $remainder) = extract_bracketed($text,$delim); # Extract the initial substring of $text that is bounded by # an XML tag. ($extracted, $remainder) = extract_tagged($text); # Extract the initial substring of $text that is bounded by # a C...C pair. Don't allow nested C tags ($extracted, $remainder) = extract_tagged($text,"BEGIN","END",undef,{bad=>["BEGIN"]}); # Extract the initial substring of $text that represents a # Perl "quote or quote-like operation" ($extracted, $remainder) = extract_quotelike($text); # Extract the initial substring of $text that represents a block # of Perl code, bracketed by any of character(s) specified by $delim # (where the string $delim contains one or more of '(){}[]<>'). ($extracted, $remainder) = extract_codeblock($text,$delim); # Extract the initial substrings of $text that would be extracted by # one or more sequential applications of the specified functions # or regular expressions @extracted = extract_multiple($text, [ \&extract_bracketed, \&extract_quotelike, \&some_other_extractor_sub, qr/[xyz]*/, 'literal', ]); # Create a string representing an optimized pattern (a la Friedl) # that matches a substring delimited by any of the specified characters # (in this case: any type of quote or a slash) $patstring = gen_delimited_pat(q{'"`/}); # Generate a reference to an anonymous sub that is just like extract_tagged # but pre-compiled and optimized for a specific pair of tags, and # consequently much faster (i.e. 3 times faster). It uses qr// for better # performance on repeated calls. $extract_head = gen_extract_tagged('',''); ($extracted, $remainder) = $extract_head->($text); DESCRIPTION The various extract_... subroutines may be used to extract a delimited substring, possibly after skipping a specified prefix string. By default, that prefix is optional whitespace (/\s*/), but you can change it to whatever you wish (see below). The substring to be extracted must appear at the current pos location of the string's variable (or at index zero, if no pos position is defined). In other words, the extract_... subroutines *don't* extract the first occurrence of a substring anywhere in a string (like an unanchored regex would). Rather, they extract an occurrence of the substring appearing immediately at the current matching position in the string (like a \G-anchored regex would). INSTALLATION See the INSTALL file. COPYRIGHT Copyright (C) 1997-2001 Damian Conway. All rights reserved. Copyright (C) 2009 Adam Kennedy. Copyright (C) 2015, 2020, 2022 Steve Hay and other contributors. All rights reserved. LICENCE This distribution is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e. under the terms of either the GNU General Public License or the Artistic License, as specified in the LICENCE file. Text-Balanced-2.06/t/0000755000104006017510000000000014247103162015231 5ustar AdministratorssteveText-Balanced-2.06/t/01_compile.t0000644000104006017510000000014214247103162017343 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More tests => 1; use_ok( 'Text::Balanced' ); Text-Balanced-2.06/t/02_extbrk.t0000644000104006017510000000364614247103162017227 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( extract_bracketed ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $orig_str = $str; $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my $var = eval "() = $cmd"; debug "\t list got: [$var]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); diag $@ if $@ && $DEBUG; pos $str = 0; $var = eval $cmd; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); diag $@ if $@ && $DEBUG; } done_testing; __DATA__ # USING: extract_bracketed($str); {a nested { and } are okay as are () and <> pairs and escaped \}'s }; {a nested\n{ and } are okay as are\n() and <> pairs and escaped \}'s }; # USING: extract_bracketed($str,'{}'); {a nested { and } are okay as are unbalanced ( and < pairs and escaped \}'s }; # THESE SHOULD FAIL {an unmatched nested { isn't okay, nor are ( and < }; {an unbalanced nested [ even with } and ] to match them; # USING: extract_bracketed($str,'<"`q>'); " unbalanced right bracket of /(q>)/ either sort (`>>>""">>>>`) is okay >; # USING: extract_bracketed($str,'<">'); " unbalanced right bracket is okay >; # USING: extract_bracketed($str,'<"`>'); " unbalanced right bracket of either sort (`>>>""">>>>`) is okay >; # THIS SHOULD FAIL ' unbalanced right bracket is bad >; Text-Balanced-2.06/t/03_extcbk.t0000644000104006017510000000527714247103162017213 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( extract_codeblock ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $orig_str = $str; $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my @res; my $var = eval "\@res = $cmd"; is $@, '', 'no error'; debug "\t list got: [" . join("|", map {defined $_ ? $_ : ''} @res) . "]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); pos $str = 0; $var = eval $cmd; is $@, '', 'no error'; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } my $grammar = <<'EOF'; given 2 { when __ < 1 { ok(0) } else { ok(1) } } EOF pos $grammar = 8; my ($out) = Text::Balanced::_match_codeblock(\$grammar,qr/\s*/,qr/\{/,qr/\}/,qr/\{/,qr/\}/,undef); ok $out, 'Switch error from calling _match_codeblock'; $grammar = <<'EOF'; comment: m/a/ enum_list: (/b/) EOF pos $grammar = 10; ($out) = Text::Balanced::extract_quotelike($grammar); is $out, 'm/a/', 'PRD error (setup for real error)'; pos $grammar = 26; ($out) = extract_codeblock($grammar,'{([',undef,'(',1); is $out, '(/b/)', 'PRD error'; done_testing; __DATA__ # USING: extract_codeblock($str,'(){}',undef,'()'); (Foo(')')); # USING: extract_codeblock($str); { $data[4] =~ /['"]/; }; {1<<2}; {1<<2};\n {1<<2};\n\n { $a = /\}/; }; { sub { $_[0] /= $_[1] } }; # / here { 1; }; { $a = 1; }; # USING: extract_codeblock($str,'<>'); < %x = ( try => "this") >; < %x = () >; < %x = ( $try->{this}, "too") >; < %'x = ( $try->{this}, "too") >; < %'x'y = ( $try->{this}, "too") >; < %::x::y = ( $try->{this}, "too") >; # THIS SHOULD FAIL < %x = do { $try > 10 } >; # USING: extract_codeblock($str, '()'); (($x || 2)); split /z/, $y (($x // 2)); split /z/, $y # USING: extract_codeblock($str,undef,'=*'); ========{$a=1}; # USING: extract_codeblock($str,'{}<>'); < %x = do { $try > 10 } >; # USING: extract_codeblock($str,'{}',undef,'<>'); < %x = do { $try > 10 } >; # USING: extract_codeblock($str,'{}'); { $a = $b; # what's this doing here? \n };' { $a = $b; \n $a =~ /$b/; \n @a = map /\s/ @b }; # THIS SHOULD FAIL { $a = $b; # what's this doing here? };' { $a = $b; # what's this doing here? ;' Text-Balanced-2.06/t/04_extdel.t0000644000104006017510000000372114247103162017211 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( extract_delimited extract_multiple ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $orig_str = $str; $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my $var = eval "() = $cmd"; is $@, '', 'no error'; debug "\t list got: [$var]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); pos $str = 0; $var = eval $cmd; is $@, '', 'no error'; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } my $text = 'while($a == "test"){ print "true";}'; my ($extracted, $remainder) = extract_delimited($text, '#'); ok '' ne $@, 'string overload should not crash'; $text = "a,'x b',c"; my @fields = extract_multiple($text, [ sub { extract_delimited($_[0],q{'"}) }, qr/([^,]+)/, ], undef,1); is_deeply \@fields, ['a', "'x b'", 'c'] or diag 'got: ', explain \@fields; done_testing; __DATA__ # USING: extract_delimited($str,'/#$',undef,'/#$'); /a/; /a///; #b#; #b###; $c$; $c$$$; # TEST EXTRACTION OF DELIMITED TEXT WITH ESCAPES # USING: extract_delimited($str,'/#$',undef,'\\'); /a/; /a\//; #b#; #b\##; $c$; $c\$$; # TEST EXTRACTION OF DELIMITED TEXT # USING: extract_delimited($str); 'a'; "b"; `c`; 'a\''; 'a\\'; '\\a'; "a\\"; "\\a"; "b\'\"\'"; `c '\`abc\`'`; # TEST EXTRACTION OF DELIMITED TEXT # USING: extract_delimited($str,'/#$','-->'); -->/a/; -->#b#; -->$c$; # THIS SHOULD FAIL $c$; Text-Balanced-2.06/t/05_extmul.t0000644000104006017510000002253414247103162017246 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( :ALL ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } sub expect { my ($l1, $l2) = @_; is_deeply $l1, $l2 or do { diag 'got:', explain $l1; diag 'expected:', explain $l2; }; } sub divide { my ($text, @index) = @_; my @bits = (); unshift @index, 0; push @index, length($text); for ( my $i= 0; $i < $#index; $i++) { push @bits, substr($text, $index[$i], $index[$i+1]-$index[$i]); } pop @bits; return @bits; } my $stdtext1 = q{$var = do {"val" && $val;};}; my $text = $stdtext1; expect [ extract_multiple($text,undef,1) ], [ divide $stdtext1 => 4 ]; expect [ pos $text], [ 4 ]; expect [ $text ], [ $stdtext1 ]; $text = $stdtext1; expect [ scalar extract_multiple($text,undef,1) ], [ divide $stdtext1 => 4 ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext1,4) ]; $text = $stdtext1; expect [ extract_multiple($text,undef,2) ], [ divide($stdtext1 => 4, 10) ]; expect [ pos $text], [ 10 ]; expect [ $text ], [ $stdtext1 ]; $text = $stdtext1; expect [ eval{local$^W;scalar extract_multiple($text,undef,2)} ], [ substr($stdtext1,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext1,4) ]; $text = $stdtext1; expect [ extract_multiple($text,undef,3) ], [ divide($stdtext1 => 4, 10, 26) ]; expect [ pos $text], [ 26 ]; expect [ $text ], [ $stdtext1 ]; $text = $stdtext1; expect [ eval{local$^W;scalar extract_multiple($text,undef,3)} ], [ substr($stdtext1,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext1,4) ]; $text = $stdtext1; expect [ extract_multiple($text,undef,4) ], [ divide($stdtext1 => 4, 10, 26, 27) ]; expect [ pos $text], [ 27 ]; expect [ $text ], [ $stdtext1 ]; $text = $stdtext1; expect [ eval{local$^W;scalar extract_multiple($text,undef,4)} ], [ substr($stdtext1,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext1,4) ]; $text = $stdtext1; expect [ extract_multiple($text,undef,5) ], [ divide($stdtext1 => 4, 10, 26, 27) ]; expect [ pos $text], [ 27 ]; expect [ $text ], [ $stdtext1 ]; $text = $stdtext1; expect [ eval{local$^W;scalar extract_multiple($text,undef,5)} ], [ substr($stdtext1,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext1,4) ]; my $stdtext2 = q{$var = "val" && (1,2,3);}; $text = $stdtext2; expect [ extract_multiple($text) ], [ divide($stdtext2 => 4, 7, 12, 24) ]; expect [ pos $text], [ 24 ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ scalar extract_multiple($text) ], [ substr($stdtext2,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext2,4) ]; $text = $stdtext2; expect [ extract_multiple($text,[\&extract_bracketed]) ], [ substr($stdtext2,0,16), substr($stdtext2,16,7), substr($stdtext2,23) ]; expect [ pos $text], [ 24 ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ scalar extract_multiple($text,[\&extract_bracketed]) ], [ substr($stdtext2,0,16) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext2,15) ]; $text = $stdtext2; expect [ extract_multiple($text,[\&extract_variable]) ], [ substr($stdtext2,0,4), substr($stdtext2,4) ]; expect [ pos $text], [ length($text) ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ scalar extract_multiple($text,[\&extract_variable]) ], [ substr($stdtext2,0,4) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext2,4) ]; $text = $stdtext2; expect [ extract_multiple($text,[\&extract_quotelike]) ], [ substr($stdtext2,0,7), substr($stdtext2,7,5), substr($stdtext2,12) ]; expect [ pos $text], [ length($text) ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ scalar extract_multiple($text,[\&extract_quotelike]) ], [ substr($stdtext2,0,7) ]; expect [ pos $text], [ 0 ]; expect [ $text ], [ substr($stdtext2,6) ]; $text = $stdtext2; expect [ extract_multiple($text,[\&extract_quotelike],2,1) ], [ substr($stdtext2,7,5) ]; expect [ pos $text], [ 23 ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ eval{local$^W;scalar extract_multiple($text,[\&extract_quotelike],2,1)} ], [ substr($stdtext2,7,5) ]; expect [ pos $text], [ 6 ]; expect [ $text ], [ substr($stdtext2,0,6). substr($stdtext2,12) ]; $text = $stdtext2; expect [ extract_multiple($text,[\&extract_quotelike],1,1) ], [ substr($stdtext2,7,5) ]; expect [ pos $text], [ 12 ]; expect [ $text ], [ $stdtext2 ]; $text = $stdtext2; expect [ scalar extract_multiple($text,[\&extract_quotelike],1,1) ], [ substr($stdtext2,7,5) ]; expect [ pos $text], [ 6 ]; expect [ $text ], [ substr($stdtext2,0,6). substr($stdtext2,12) ]; my $stdtext3 = "a,b,c"; $_ = $stdtext3; expect [ extract_multiple(undef, [ sub { /\G[a-z]/gc && $& } ]) ], [ divide($stdtext3 => 1,2,3,4,5) ]; expect [ pos ], [ 5 ]; expect [ $_ ], [ $stdtext3 ]; $_ = $stdtext3; expect [ scalar extract_multiple(undef, [ sub { /\G[a-z]/gc && $& } ]) ], [ divide($stdtext3 => 1) ]; expect [ pos ], [ 0 ]; expect [ $_ ], [ substr($stdtext3,1) ]; $_ = $stdtext3; expect [ extract_multiple(undef, [ qr/\G[a-z]/ ]) ], [ divide($stdtext3 => 1,2,3,4,5) ]; expect [ pos ], [ 5 ]; expect [ $_ ], [ $stdtext3 ]; $_ = $stdtext3; expect [ scalar extract_multiple(undef, [ qr/\G[a-z]/ ]) ], [ divide($stdtext3 => 1) ]; expect [ pos ], [ 0 ]; expect [ $_ ], [ substr($stdtext3,1) ]; $_ = $stdtext3; expect [ extract_multiple(undef, [ q/([a-z]),?/ ]) ], [ qw(a b c) ]; expect [ pos ], [ 5 ]; expect [ $_ ], [ $stdtext3 ]; $_ = $stdtext3; expect [ scalar extract_multiple(undef, [ q/([a-z]),?/ ]) ], [ divide($stdtext3 => 1) ]; expect [ pos ], [ 0 ]; expect [ $_ ], [ substr($stdtext3,2) ]; # Fails in Text-Balanced-1.95 with result ['1 ', '""', '1234'] $_ = q{ ""1234}; expect [ extract_multiple(undef, [\&extract_quotelike]) ], [ ' ', '""', '1234' ]; my $not_here_doc = "sub f {\n my \$pa <<= 2;\n}\n\n"; # wrong in 2.04 expect [ extract_multiple($not_here_doc, [ { DONT_MATCH => \&extract_quotelike } ]) ], [ "sub f {\n my \$pa <<= 2;\n}\n\n" ]; my $y_falsematch = <<'EOF'; # wrong in 2.04 my $p = {y => 1}; { $pa=ones(3,3,3); my $f = do { my $i=1; my $v=$$p{y}-$i; $pb = $pa(,$i,) }; } EOF expect [ extract_multiple($y_falsematch, [ \&extract_variable, { DONT_MATCH => \&extract_quotelike } ]) ], [ 'my ', '$p', " = {y => 1};\n{ ", '$pa', '=ones(3,3,3); my ', '$f', ' = do { my ', '$i', '=1; my ', '$v', qw(= $$p{y} - $i), '; ', '$pb', ' = ', '$pa', '(,', '$i', ",) }; }\n", ]; my $slashmatch = <<'EOF'; # wrong in 2.04 my $var = 10 / 3; if ($var !~ /\./) { decimal() ;} EOF my @expect_slash = ('my ', '$var', ' = 10 / 3; if (', '$var', " !~ ", '/\\./', ") { decimal() ;}\n" ); expect [ extract_multiple($slashmatch, [ \&extract_variable, \&extract_quotelike, ]) ], \@expect_slash; $slashmatch = <<'EOF'; # wrong in 2.04 my $var = 10 / 3; if ($var =~ /\./) { decimal() ;} EOF $expect_slash[4] = " =~ "; expect [ extract_multiple($slashmatch, [ \&extract_variable, \&extract_quotelike, ]) ], \@expect_slash; $slashmatch = <<'EOF'; # wrong in 2.04 my $var = 10 / 3; if ($var =~ # a comment /\./) { decimal() ;} EOF my $comment = qr/(?t->(($a))->sever; wantarray ? 1 : 0; $min = $var ? 0; EOF expect [ extract_multiple($slashmatch, [ \&extract_variable, $id, \&extract_quotelike, ]) ], [ '$x->t->(($a))->sever', ";\n", 'wantarray', ' ? ', '1', ' : ', '0', '; ', '$min', ' = ', '$var', ' ? ', '0', ";\n", ]; $slashmatch = <<'EOF'; # wrong in 2.04_01 $var //= 'default'; $x = 1 / 2; EOF expect [ extract_multiple($slashmatch, [ \&extract_variable, \&extract_quotelike, ]) ], [ '$var', ' //= ', '\'default\'', '; ', '$x', " = 1 / 2;\n" ]; $slashmatch = <<'EOF'; # wrong in 2.04_01 $m; return wantarray ? ($m, $i) : $var ? $m : 0; EOF expect [ extract_multiple($slashmatch, [ \&extract_variable, \&extract_quotelike, ]) ], [ '$m', '; return wantarray ? (', '$m', ', ', '$i', ') : ', '$var', ' ? ', '$m', " : 0;\n" ]; $slashmatch = <<'EOF'; # wrong in 2.05 $_ = 1 unless defined $_ and /\d\b/; EOF expect [ extract_multiple($slashmatch, [ \&extract_variable, \&extract_quotelike, ]) ], [ '$_', ' = 1 unless defined ', '$_', ' and ', '/\\d\\b/', ";\n" ]; done_testing; Text-Balanced-2.06/t/06_extqlk.t0000644000104006017510000000655714247103162017250 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( extract_quotelike ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } sub esc { my $x = shift||''; $x =~ s/\n/\\n/gs; $x } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $setup_cmd = ($str =~ s/\A\{(.*)\}//) ? $1 : ''; my $tests = 'sl'; my $orig_str = $str; $str =~ s/\\n/\n/g; my $orig = $str; eval $setup_cmd if $setup_cmd ne ''; is $@, '', 'no error'; if($tests =~ /l/) { debug "\tUsing: $cmd\n"; debug "\t on: [" . esc($setup_cmd) . "][" . esc($str) . "]\n"; my @res; eval qq{\@res = $cmd; }; is $@, '', 'no error'; debug "\t got:\n" . join "", map { "\t\t\t$_: [" . esc($res[$_]) . "]\n"} (0..$#res); debug "\t left: [" . esc($str) . "]\n"; debug "\t pos: [" . esc(substr($str,pos($str))) . "...]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); } eval $setup_cmd if $setup_cmd ne ''; is $@, '', 'no error'; if($tests =~ /s/) { $str = $orig; debug "\tUsing: scalar $cmd\n"; debug "\t on: [" . esc($str) . "]\n"; my $var = eval $cmd; $var = "" unless defined $var; debug "\t scalar got: [" . esc($var) . "]\n"; debug "\t scalar left: [" . esc($str) . "]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } } # fails in Text::Balanced 1.95 $_ = qq(s{}{}); my @z = extract_quotelike(); isnt $z[0], ''; @z = extract_quotelike("<<, 1; done()\nline1\nline2\n\n and next"); like $z[1], qr/\A,/, 'implied heredoc with ,' or do { diag "error: '$@'\ngot: ", explain \@z; }; done_testing; __DATA__ # USING: extract_quotelike($str); ''; ""; "a"; 'b'; `cc`; < pairs and escaped \}'s }; q/slash/; q # slash #; qr qw qx; s/x/y/; s/x/y/cgimsox; s{a}{b}; s{a}\n {b}; s(a){b}; s(a)/b/; s/'/\\'/g; tr/x/y/; y/x/y/; # fails on Text-Balanced-1.95 { $tests = 'l'; pos($str)=6 }012345<{pat}>{$self->{sub}}; # CAN'T HANDLE '>' in '->' s-$self->{pap}-$self->{sub}-; # CAN'T HANDLE '-' in '->' <>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $orig_str = $str; $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my @res; my $var = eval "\@res = $cmd"; is $@, '', 'no error'; debug "\t list got: [" . join("|",map {defined $_ ? $_ : ''} @res) . "]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); pos $str = 0; $var = eval $cmd; is $@, '', 'no error'; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } done_testing; __DATA__ # USING: gen_extract_tagged("BEGIN([A-Z]+)",'END$1',"(?s).*?(?=BEGIN)")->($str); ignore\n this and then BEGINHERE at the ENDHERE; ignore\n this and then BEGINTHIS at the ENDTHIS; # USING: extract_tagged($str,"BEGIN([A-Z]+)",'END$1',"(?s).*?(?=BEGIN)"); ignore\n this and then BEGINHERE at the ENDHERE; ignore\n this and then BEGINTHIS at the ENDTHIS; # USING: extract_tagged($str,"BEGIN([A-Z]+)",'END$1',"(?s).*?(?=BEGIN)"); ignore\n this and then BEGINHERE at the ENDHERE; ignore\n this and then BEGINTHIS at the ENDTHIS; # THIS SHOULD FAIL ignore\n this and then BEGINTHIS at the ENDTHAT; # USING: extract_tagged($str,"BEGIN","END","(?s).*?(?=BEGIN)"); ignore\n this and then BEGIN at the END; # USING: extract_tagged($str); some text; # USING: extract_tagged($str,qr/<[A-Z]+>/,undef, undef, {ignore=>["
"]});
aaabbb
ccc
ddd
; # USING: extract_tagged($str,"BEGIN","END"); BEGIN at the BEGIN keyword and END at the END; BEGIN at the beginning and end at the END; # USING: extract_tagged($str,undef,undef,undef,{ignore=>["<[^>]*/>"]}); aaabbb
ccc
ddd
; # USING: extract_tagged($str,";","-",undef,{reject=>[";"],fail=>"MAX"}); ; at the ;-) keyword # USING: extract_tagged($str,"<[A-Z]+>",undef, undef, {ignore=>["
"]}); aaabbb
ccc
ddd
; # THESE SHOULD FAIL BEGIN at the beginning and end at the end; BEGIN at the BEGIN keyword and END at the end; # TEST EXTRACTION OF TAGGED STRINGS # USING: extract_tagged($str,"BEGIN","END",undef,{reject=>["BEGIN","END"]}); # THESE SHOULD FAIL BEGIN at the BEGIN keyword and END at the end; # USING: extract_tagged($str,";","-",undef,{reject=>[";"],fail=>"PARA"}); ; at the ;-) keyword # USING: extract_tagged($str); some text; some textother text; some textother text; some text; # THESE SHOULD FAIL some text some textother text; some textother text; Text-Balanced-2.06/t/08_extvar.t0000644000104006017510000000425514247103162017244 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Test::More; use Text::Balanced qw ( extract_variable ); our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; if ($str =~ s/\A# USING://) { $neg = 0; $cmd = $str; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } my $orig_str = $str; $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my @res; my $var = eval "\@res = $cmd"; is $@, '', 'no error'; debug "\t list got: [" . join("|",map {defined $_ ? $_ : ''} @res) . "]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); pos $str = 0; $var = eval $cmd; is $@, '', 'no error'; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } my @res = extract_variable('${a}'); is $res[0], '${a}' or diag "error was: $@"; done_testing; __DATA__ # USING: extract_variable($str); # THESE SHOULD FAIL $a->; $a (1..3) { print $a }; # USING: extract_variable($str); $::obj; $obj->nextval; *var; *$var; *{var}; *{$var}; *var{cat}; \&var; \&mod::var; \&mod'var; $a; $_; $a[1]; $_[1]; $a{cat}; $_{cat}; $a->[1]; $a->{"cat"}[1]; @$listref; @{$listref}; $obj->nextval; $obj->_nextval; $obj->next_val_; @{$obj->nextval}; @{$obj->nextval($cat,$dog)->{new}}; @{$obj->nextval($cat?$dog:$fish)->{new}}; @{$obj->nextval(cat()?$dog:$fish)->{new}}; $ a {'cat'}; $a::b::c{d}->{$e->()}; $a'b'c'd{e}->{$e->()}; $a'b::c'd{e}->{$e->()}; $#_; $#array; $#{array}; $var[$#var]; $1; $11; $&; $`; $'; $+; $*; $.; $/; $|; $,; $"; $;; $#; $%; $=; $-; $~; $^; $:; $^L; $^A; $?; $!; $^E; $@; $$; $<; $>; $(; $); $[; $]; $^C; $^D; $^F; $^H; $^I; $^M; $^O; $^P; $^R; $^S; $^T; $^V; $^W; ${^WARNING_BITS}; ${^WIDE_SYSTEM_CALLS}; $^X; # THESE SHOULD FAIL $a->; @{$; $ a :: b :: c $ a ' b ' c # USING: extract_variable($str,'=*'); ========$a; Text-Balanced-2.06/t/09_gentag.t0000644000104006017510000000536314247103162017202 0ustar Administratorssteveuse 5.008001; use strict; use warnings; use Text::Balanced qw ( gen_extract_tagged ); use Test::More; our $DEBUG; sub debug { print "\t>>>",@_ if $DEBUG } ## no critic (BuiltinFunctions::ProhibitStringyEval) my $cmd = "print"; my $neg = 0; my $str; while (defined($str = )) { chomp $str; my $orig_str = $str; $str =~ s/\\n/\n/g; if ($str =~ s/\A# USING://) { $neg = 0; eval { # Capture "Subroutine main::f redefined" warning my @warnings; local $SIG{__WARN__} = sub { push @warnings, shift; }; *f = eval $str || die; }; is $@, '', 'no error'; next; } elsif ($str =~ /\A# TH[EI]SE? SHOULD FAIL/) { $neg = 1; next; } elsif (!$str || $str =~ /\A#/) { $neg = 0; next } $str =~ s/\\n/\n/g; debug "\tUsing: $cmd\n"; debug "\t on: [$str]\n"; my @res; my $var = eval { @res = f($str) }; is $@, '', 'no error'; debug "\t list got: [" . join("|",map {defined $_ ? $_ : ''} @res) . "]\n"; debug "\t list left: [$str]\n"; ($neg ? \&isnt : \&is)->(substr($str,pos($str)||0,1), ';', "$orig_str matched list"); pos $str = 0; $var = eval { scalar f($str) }; is $@, '', 'no error'; $var = "" unless defined $var; debug "\t scalar got: [$var]\n"; debug "\t scalar left: [$str]\n"; ($neg ? \&unlike : \&like)->( $str, qr/\A;/, "$orig_str matched scalar"); } done_testing; __DATA__ # USING: gen_extract_tagged('{','}'); { a test }; # USING: gen_extract_tagged(qr/<[A-Z]+>/,undef, undef, {ignore=>["
"]});
aaabbb
ccc
ddd
; # USING: gen_extract_tagged("BEGIN","END"); BEGIN at the BEGIN keyword and END at the END; BEGIN at the beginning and end at the END; # USING: gen_extract_tagged(undef,undef,undef,{ignore=>["<[^>]*/>"]}); aaabbb
ccc
ddd
; # USING: gen_extract_tagged(";","-",undef,{reject=>[";"],fail=>"MAX"}); ; at the ;-) keyword # USING: gen_extract_tagged("<[A-Z]+>",undef, undef, {ignore=>["
"]}); aaabbb
ccc
ddd
; # THESE SHOULD FAIL BEGIN at the beginning and end at the end; BEGIN at the BEGIN keyword and END at the end; # TEST EXTRACTION OF TAGGED STRINGS # USING: gen_extract_tagged("BEGIN","END",undef,{reject=>["BEGIN","END"]}); # THESE SHOULD FAIL BEGIN at the BEGIN keyword and END at the end; # USING: gen_extract_tagged(";","-",undef,{reject=>[";"],fail=>"PARA"}); ; at the ;-) keyword # USING: gen_extract_tagged(); some text; some textother text; some textother text; some text; # THESE SHOULD FAIL some text some textother text; some textother text; Text-Balanced-2.06/t/94_changes.t0000644000104006017510000000233014247103162017340 0ustar Administratorssteve#!perl #=============================================================================== # # t/94_changes.t # # DESCRIPTION # Test script to check CPAN::Changes conformance. # # COPYRIGHT # Copyright (C) 2015 Steve Hay. All rights reserved. # # LICENCE # This script is free software; you can redistribute it and/or modify it under # the same terms as Perl itself, i.e. under the terms of either the GNU # General Public License or the Artistic License, as specified in the LICENCE # file. # #=============================================================================== use 5.008001; use strict; use warnings; use Test::More; #=============================================================================== # MAIN PROGRAM #=============================================================================== MAIN: { plan skip_all => 'Author testing only' unless $ENV{AUTHOR_TESTING}; my $ok = eval { require Test::CPAN::Changes; Test::CPAN::Changes->import(); 1; }; if (not $ok) { plan skip_all => 'Test::CPAN::Changes required to test Changes'; } else { changes_ok(); } } #=============================================================================== Text-Balanced-2.06/t/95_critic.t0000644000104006017510000000236114247103162017212 0ustar Administratorssteve#!perl #=============================================================================== # # t/95_critic.t # # DESCRIPTION # Test script to check Perl::Critic conformance. # # COPYRIGHT # Copyright (C) 2015 Steve Hay. All rights reserved. # # LICENCE # This script is free software; you can redistribute it and/or modify it under # the same terms as Perl itself, i.e. under the terms of either the GNU # General Public License or the Artistic License, as specified in the LICENCE # file. # #=============================================================================== use 5.008001; use strict; use warnings; use Test::More; #=============================================================================== # MAIN PROGRAM #=============================================================================== MAIN: { plan skip_all => 'Author testing only' unless $ENV{AUTHOR_TESTING}; my $ok = eval { require Test::Perl::Critic; Test::Perl::Critic->import(-profile => ''); 1; }; if (not $ok) { plan skip_all => 'Test::Perl::Critic required to test with Perl::Critic'; } else { all_critic_ok('.'); } } #=============================================================================== Text-Balanced-2.06/t/96_pmv.t0000644000104006017510000000117414247103162016541 0ustar Administratorssteve#!/usr/bin/perl # Test that our declared minimum Perl version matches our syntax use 5.008001; use strict; use warnings; use Test::More; my @MODULES = ( 'Perl::MinimumVersion 1.20', 'Test::MinimumVersion 0.101082', ); # Don't run tests for installs use Test::More; unless ( $ENV{AUTHOR_TESTING} ) { plan( skip_all => "Author testing only" ); } # Load the testing modules foreach my $MODULE ( @MODULES ) { ## no critic (BuiltinFunctions::ProhibitStringyEval) eval "use $MODULE"; if ( $@ ) { plan( skip_all => "$MODULE not available for testing" ); } } all_minimum_version_from_mymetayml_ok(); Text-Balanced-2.06/t/97_pod.t0000644000104006017510000000110714247103162016516 0ustar Administratorssteve#!/usr/bin/perl # Test that the syntax of our POD documentation is valid use 5.008001; use strict; use warnings; use Test::More; my @MODULES = ( 'Pod::Simple 3.07', 'Test::Pod 1.26', ); # Don't run tests for installs use Test::More; unless ( $ENV{AUTHOR_TESTING} ) { plan( skip_all => "Author testing only" ); } # Load the testing modules foreach my $MODULE ( @MODULES ) { ## no critic (BuiltinFunctions::ProhibitStringyEval) eval "use $MODULE"; if ( $@ ) { plan( skip_all => "$MODULE not available for testing" ); } } all_pod_files_ok(); Text-Balanced-2.06/t/98_pod_coverage.t0000644000104006017510000000256414247103162020402 0ustar Administratorssteve#!perl #=============================================================================== # # t/99_pod_coverage.t # # DESCRIPTION # Test script to check POD coverage. # # COPYRIGHT # Copyright (C) 2015 Steve Hay. All rights reserved. # # LICENCE # This script is free software; you can redistribute it and/or modify it under # the same terms as Perl itself, i.e. under the terms of either the GNU # General Public License or the Artistic License, as specified in the LICENCE # file. # #=============================================================================== use 5.008001; use strict; use warnings; use Test::More; #=============================================================================== # MAIN PROGRAM #=============================================================================== MAIN: { plan skip_all => 'Author testing only' unless $ENV{AUTHOR_TESTING}; my $ok = eval { require Test::Pod::Coverage; Test::Pod::Coverage->import(); 1; }; if (not $ok) { plan skip_all => 'Test::Pod::Coverage required to test POD coverage'; } elsif ($Test::Pod::Coverage::VERSION < 0.08) { plan skip_all => 'Test::Pod::Coverage 0.08 or higher required to test POD coverage'; } else { all_pod_coverage_ok(); } } #===============================================================================