pax_global_header 0000666 0000000 0000000 00000000064 14337466404 0014525 g ustar 00root root 0000000 0000000 52 comment=6a1aa96b483d297c8ba26764f93e09b980b0e642
txt2html-3.0/ 0000775 0000000 0000000 00000000000 14337466404 0013155 5 ustar 00root root 0000000 0000000 txt2html-3.0/.github/ 0000775 0000000 0000000 00000000000 14337466404 0014515 5 ustar 00root root 0000000 0000000 txt2html-3.0/.github/workflows/ 0000775 0000000 0000000 00000000000 14337466404 0016552 5 ustar 00root root 0000000 0000000 txt2html-3.0/.github/workflows/test1.txt 0000664 0000000 0000000 00000000034 14337466404 0020350 0 ustar 00root root 0000000 0000000 Hi!
This is a simple test.
txt2html-3.0/.github/workflows/test2.txt 0000664 0000000 0000000 00000000213 14337466404 0020350 0 ustar 00root root 0000000 0000000 Hi,
This is an FTP and HTTPS test.
ftp://example.com
My site is https://github.com/resurrecting-open-source-projects/txt2html. Try it!
txt2html-3.0/.github/workflows/test3.txt 0000664 0000000 0000000 00000000062 14337466404 0020353 0 ustar 00root root 0000000 0000000 Hi,
This is my protocol XYZ.
xyz://example.com
txt2html-3.0/.github/workflows/tests.yml 0000664 0000000 0000000 00000002562 14337466404 0020444 0 ustar 00root root 0000000 0000000 name: full-check
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: install
run: |
sudo apt install -y libmodule-build-perl libtest-distribution-perl libyaml-syck-perl
/usr/bin/perl Build.PL --installdirs vendor --config "optimize=-g -O2 -ffile-prefix-map=/PKGS/txt2html/txt2html=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2" --config "ld=x86_64-linux-gnu-gcc -g -O2 -ffile-prefix-map=/PKGS/txt2html/txt2html=. -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro"
/usr/bin/perl Build
sudo /usr/bin/perl Build install --create_packlist 0
- name: test1
run: |
txt2html .github/workflows/test1.txt | egrep '
This is a simple test.
'
- name: test2
run: >
txt2html .github/workflows/test2.txt |
egrep 'ftp://example.com' -C5 |
egrep '
My site is https://github.com/resurrecting-open-source-projects/txt2html. Try it!'
- name: test3
run: >
txt2html --links_dictionaries .github/workflows/xyz.dict .github/workflows/test3.txt |
egrep '
xyz://example.com'
txt2html-3.0/.github/workflows/xyz.dict 0000664 0000000 0000000 00000000036 14337466404 0020250 0 ustar 00root root 0000000 0000000 |xyz:[\w/\.:+\-]+| -> $&
txt2html-3.0/Build.PL 0000664 0000000 0000000 00000002375 14337466404 0014460 0 ustar 00root root 0000000 0000000
use strict;
use warnings;
use Module::Build 0.3601;
my %module_build_args = (
"build_requires" => {
"Module::Build" => "0.3601"
},
"configure_requires" => {
"Module::Build" => "0.3601"
},
"dist_abstract" => "convert plain text file to HTML.",
"dist_author" => [
"Kathryn Andersen "
],
"dist_name" => "txt2html",
"dist_version" => "3.0",
"license" => "gpl",
"module_name" => "txt2html",
"recommends" => {},
"recursive_test_files" => 1,
"requires" => {
"File::Basename" => 0,
"Getopt::Long" => 0,
"Pod::Usage" => 0,
"YAML::Syck" => 0,
"constant" => 0,
"perl" => "v5.8.1",
"strict" => 0
},
"script_files" => [
"scripts/txt2html"
],
"test_requires" => {
"File::Find" => 0,
"File::Temp" => 0,
"Test::More" => 0,
"warnings" => 0
}
);
unless ( eval { Module::Build->VERSION(0.4004) } ) {
my $tr = delete $module_build_args{test_requires};
my $br = $module_build_args{build_requires};
for my $mod ( keys %$tr ) {
if ( exists $br->{$mod} ) {
$br->{$mod} = $tr->{$mod} if $tr->{$mod} > $br->{$mod};
}
else {
$br->{$mod} = $tr->{$mod};
}
}
}
my $build = Module::Build->new(%module_build_args);
$build->create_build_script;
txt2html-3.0/CONTRIBUTING.md 0000664 0000000 0000000 00000001775 14337466404 0015420 0 ustar 00root root 0000000 0000000 ## HOW TO CONTRIBUTE TO TXT2HTML DEVELOPMENT
txt2html is available at
https://github.com/resurrecting-open-source-projects/txt2html
If you are interested in contribute to txt2html development, please,
follow these steps:
1. Send me a patch that fix an issue or that implement a new feature.
Alternatively, you can do a 'pull request'[1] in GitHub.
[1] https://help.github.com/articles/using-pull-requests
2. Ask for join to txt2html project in GitHub, if you want to work
officially. Note that this second step is not compulsory. However,
to accept you in project, I need a minimum collaboration before.
To find issues and bugs to fix, you can check these addresses:
- https://github.com/resurrecting-open-source-projects/txt2html/issues
- https://bugs.debian.org/cgi-bin/pkgreport.cgi?dist=unstable;package=txt2html
- https://bugs.launchpad.net/ubuntu/+source/txt2html/+bugs
If you want to join, please contact me: eriberto at eriberto.pro.br
-- Eriberto, Thu, 01 Aug 2019 21:55:53 -0300
txt2html-3.0/ChangeLog 0000664 0000000 0000000 00000060317 14337466404 0014736 0 ustar 00root root 0000000 0000000 Revision History for txt2html
=============================
3.0 2022-11-23
[ Bruce Momjian ]
* Fixed --links_dictionaries option.
[ Francesco (Francicoria) ]
* Reformated README.md.
[ Joao Eriberto Mota Filho ]
* Added two example files to doc/ (sample.txt and txt2html.dict).
* Created CI tests for GitHub.
2.53 2019-08-02
* New repository for this project (now in GitHub).
https://github.com/resurrecting-open-source-projects/txt2html
* Added CONTRIBUTING.md file.
* Added README.md and renamed old README file to README.txt2html.
* Fixed the licensing in LICENSE file.
* Fixed some spelling errors.
* Fixed the path for Perl in scripts/txt2html.
* Unified the changelogs.
| ----------------------- |
| ---- OLD CHANGELOG ---- |
\ / ---- OLD UPSTREAM ---- \ /
· ----------------------- ·
2.52.01 (aka 2.5201) 2013-05-21
-------------------------------
* 2013-05-21 16:25:10 +1000
rebuilding with new version
* 2012-05-11 14:20:41 +1000
Updating website.
2.51 Sun 4th March 2008
- fixed bug with underscores in links
- fixed docs about escape_chars (should be escapechars)
- fixed docs about DOCTYPE
2.50 Sat 22nd December 2007
- fixed bug with formatting and punctuation
- removed old reference-to-an-array argument method
- made --xhtml true by default (used to be false)
- moved the debugging options to global variables
2.46 Fri 9th November 2007
- updated docs on custom_heading_regexp
- fixed bug with xhtml output
- documented all undocumented functions
2.45 Fri 26th January 2007
- fixed bug with umlauts
- fixed bug with UTF-8 characters
- added --underline_delimiter option.
2.44 Tue 17th January 2006
- fixed bug with delimiter tables
- minor documentation fixes
2.43 Fri 7th October 2005
- fixed bug with interaction between #bolding# and #anchor links
2.42 Wed 10th August 2005
- new option to txt2html script: --instring, which enables one
to process a string instead of a file.
- new options to HTML::TextToHTML:
* instring, as above
* inhandle, which enables one to pass in input file handles to process
* outhandle, which enables one to pass in an output file handle
2.41 Sun 8th May 2005
- solved the system links dictionary problem! No longer uses
an external file at all, uses the DATA handle. This means that there is
no longer a --system_link_dict option.
- changed versioning scheme; now bugfix versions will be
things like 2.4101
- removed the run_txt2html command; it isn't needed when the txt2html script is part of the package and does things much nicer.
- generate the README from the PoD
2.40 Sat 12th February 2005
- much improved speed
2.37 Thu 10th February 2005
- another fix to installation
- fixed CPAN module problem
2.36 Sun 23rd January 2005
- slight fix to installation
2.35 Wed 18th January 2005
- fixed bug where a Dos file was processed on a Unix system.
- removed Makefile.PL; not needed with modern perls
2.34 Thu 6th January 2005
- fixed another bug with demoronize code (gah!)
2.33 Wed 5th January 2005
- darn, left out some files!
2.32 Wed 5th January 2005
- fixed bug with demoronize code
- fixed documentation for lower_case_tags
- changed around installation so that no custom Module::Builder
is needed; also moved files into more customary places, and now use
Filter::Simple to change the system dictionary path in the module
and script files for installation.
2.31 Tue 21st September 2004
- fixed bug with install
- did some changes to help version-changing
2.30 Fri 27th August 2004
- changed build system over to Module::Build to eliminate problems
with different versions of ExtUtils::MakeMaker and the installation of
the system links dictionary, as well as hoping to make the installation
more portable.
- changed build so that the system links dictionary (in defaults
and documentation) is automatically set to where the system links
dictionary is going to be installed (yay!)
- improved the INSTALL document (thanks to Mark Schmidt)
- improved the bold and italic processing (now it's not messed up
by short lines putting a
in)
- bug fix for paragraphs with "0" (thanks to Andrew Williams)
2.25 Sun 23rd May 2004
- the default location for the system links dictionary (txt2html.dict) is
now "/usr/local/share/txt2html/txt2html.dict". (It uses SITEPREFIX not PREFIX now)
- added --bold_delimiter and --italic_delimiter options; now the bold and
italic matching is done separately, not as part of the links-dictionary stuff.
This means it's much easier to change or turn off if you don't want it.
- links-dictionary matching is now done on segments without structural tags in them
so matching for things like '^' should be replaced with just '^'. I don't think
this will break anything, and will prevent certain kinds of illegal substitutions.
- tidied up some of the links in the links dictionary
2.24 Sun 16th May 2004
- bug fixes for preformatting: fixed up loss of trailing PRE, and
now preformats and lists no longer have a trailing blank line
- documented the 'quote_mail' and 'quote_explicit' classes, and added
the 'mail_header' class for mail-header paragraphs.
- made the #bold# and *italic* matches more aggressive.
- added Alan Jackson's "demoronize" patch from John Walker's demoronize script
at
2.23 Wed 25th February 2004
- oops! Forgot to include one of the test files!
2.22 Sun 22nd February 2004
- bug fix with delimiter table
2.21 Sat 10th January 2004
- bugfixes with a stricter perl 5.8.2
- bugfix for processing empty file
2.20 Sun 7th December 2003
- added --table_type option to say which types of table will
be recognised and parsed if the --make_tables option is true. In other
words, there are now new types of table which can be parsed.
* ALIGN is the original space-aligned table type
* PGSQL is the type of table you get from a Postgresql query
* BORDER is a table with +-----+ lines around it as a border
* DELIM is delimited columns
2.10 Sat 6th December 2003
- changed process_para method to assume it has only one paragraph;
and added process_chunk method to deal with processing strings which may
contain more than one paragraph in them.
- a fair few internal changes to make more things think in terms of
paragraphs rather than lines; this changes the parsing of a few things,
but doesn't break the conventions.
- fixed a bug where unordered lists were allowed to have bullets
which were more than one character wide. Well, I considered it a bug,
it was annoying me.
- added --bullets and --bullets_ordered options to enable the
user to define the bullet characters used for unordered and ordered
lists.
- hooray! Figured out a way of having multi-paragraph list items!
- woo hoo! Added definition lists! And managed to do it so that
they are treated like the other lists, that is, you can nest lists in
definition lists and visa versa.
2.06 Sun 15th November 2003
- fixed bug with processing STDIN (ie piping input to txt2html)
- moved test input files into separate tfiles directory
- added in most of Seth's old test files to the testing; and fixed
resulting bugs that were flushed out of cover
2.05 Wed 5th November 2003
- error in fix to PREFIX! Argh!
2.04 Tue 4th November 2003
- changed Makefile.PL to fix PREFIX
- use "#!/usr/bin/env perl" trick (courtesy of Sami Haahtinen)
instead of using ExtUtils::configPL module
- made Getopt::ArgvFile an official prerequisite
- enabled CAPS tagging to be optional (just give an empty caps_tag)
2.03 Tue 15th July 2003
- fixed bug with para tests (it didn't fail on my system
because it was using the already installed system links dictionary)
2.02 Sun 13th July 2003
- fixed bug in documentation about custom headings
- moved tests into t/ directory
- added is_fragment option to process_para to enable it to
process a fragment without assuming that it was a paragraph.
2.01 Sun 1st June 2003
- fixed up a few documentation/reference things that I'd forgotten,
changing names in sample.txt for example.
2.00 Sat 31st May 2003
- merge of HTML::TextToHTML and the official txt2html - hence
the version jump.
* the distribution name is now txt2html
* renamed texthyper to txt2html, TextToHTML.dict to txt2html.dict
* merged change history
* split README into README, INSTALL and LICENSE files
* updated DEVNOTES
- merged in the changes from 1.28 to 1.35 as much as I could
* corrected bugs that still applied
* added --style_url option
* added --body_deco option
* added two CSS classes: quote_mail and quote_explicit
- Did Not add the following, as it was rather more complicated:
* multi-paragraph list items
* the heading_callback_customize stuff
==================================================
1.12 Sat 15th February 2002
- removed heavily spammed email address from documentation
and examples.
1.11 Fri 20th December 2002
- fixed bug in texthyper script which was giving warnings.
1.10 Wed 18th December 2002
- removed all dependency on AppConfig
* all of the existing options now must use their full names.
* However, one now has the choice between passing options
as a hash, or the old way, as a reference to an array.
- removed the do_help method; if you want the documentation
of the module, use perldoc HTML::TextToHTML
- moved the texthyper script into this distribution
* It now uses Getopt::Long and Getopt::ArgvFile.
* The format of .texthyperrc has changed to conform with
Getopt::ArgvFile rather than AppConfig.
* Changed the version number so that it was bigger than
either the script or the module so that both could have the same
version number (that's why the big jump).
- the included system link dictionary (TextToHTML.dict)
is now installed in /usr/share/txt2html as part of the install process.
0.09 Wed 20th November 2002
- improved the XHTML mode, so that open paragraphs get closed
sooner. This fixed a bug related to paragraphs inside lists.
0.08 Wed 20th November 2002
- CPAN testers complained about a lack of explicitly stating
all the dependencies of AppConfig, which either means that AppConfig
has changed desperately, or their testing methods have changed, since
I didn't think it was possible to get the AppConfig module without getting
all its dependent modules, but, oh well.
0.07 Sun 17th November 2002
- fixed a bug in process_para to ensure that if one is using it
standalone, any open lists will be closed
- added --lower_case_tags option to force the tags to be output
in lower-case
- first pass at XHTML conformance; added --xhtml option.
It isn't that pretty-looking, but the sample does pass the scrutiny of "tidy".
When turned on it:
* forces lower-case on
* makes empty tags have the empty marker (eg
)
* closes all open P and LI tags where they should be closed
* table cell alignments are done as style attributes
0.06 Mon 2nd September 2002
- fine-tuned some of the links in the default links dictionary
- some internal rewriting
0.05 Wed 5th June 2002
- fixed minor bugs
0.04 Sun 2nd June 2002
- fixed bug with detection of paragraphs by indentation
- added --indent_par_break and --preserve_indent options
- fixed error in documentation
- fixed bug with nonexistant link dictionaries
0.03 Sun 26th May 2002
- documented the format of the Link Dictionary
- added the do_help method, and changed the behaviour of --help
and --manpage
- added the --make_anchors option, which enables one to disable
the making of anchors, so that if one prefers another method of
anchor-making (such as that in HTML::GenToc) then one can use that
instead.
- altered the #bold# pattern in the link dictionary to only need one
hash. This should still hopefully allow things like #1 without turning it
bold, and being able to use ### as a separator.
- gratuitous self-promotion: added HTML::GenToc to the
sample links dictionary
- removed the need for getline(), but rather pass the lines in to
the methods, in order to parse by-paragraph and then by-line (mucho rewrite)
which enabled me to:
* implement the table-parsing from Gareth Rees's HTML::FromText
module -- added the --make_tables option
* now the links dictionary does multiline matches (useful for
things like italics which break over lines)
* enable converting passed-in strings rather than just files
0.02 Wed 15th May 2002
- fixed bug with link dictionary parsing
- improved the tests
- updated link dictionary to fix a few bugs (eg underlines)
and add a few things (like using double # for ##bold## text).
0.01 Sun 12th May 2002
- conversion of Seth Golub's txt2html (version 1.28) to a module
- made all global settings options (eg the location of the system
link dictionary)
- added "outfile" option
- added use_mosaic_header option
- changed the dynamic code generation completely.
- removed the evil $* variable
-----------------------------------------------------------------
=====================================================
(cvs log on Sourceforge)
revision 1.35
date: 2002/12/03 03:47:46; author: suntong; state: Exp; lines: +14 -13
- Defines two CSS Style: quote_mail & quote_explicit
- Mail quote mode and quote mode won't interfere with each other
----------------------------
revision 1.34
date: 2002/12/03 01:51:41; author: suntong; state: Exp; lines: +8 -7
- preformat should be dealed before other cases
----------------------------
revision 1.33
date: 2002/12/02 23:44:13; author: suntong; state: Exp; lines: +7 -7
Bug fixing (details in http://groups.yahoo.com/group/txt2html/message/89):
- Changed incorrect entity names of "fraqfrac*" to "frac*".
- redundant \| in [\||:] removed.
- change misleading --style help to '--style '
----------------------------
revision 1.32
date: 2002/12/01 19:59:09; author: suntong; state: Exp; lines: +39 -5
- Add Callback functions for HTML header handling
so that users can customise their own heading,
add horizon lines, change colors or write their own toc, etc
- User can define/keep their callbacks locally without tampering main
distribution
- 'mailstuff' priority should be higher than 'preformat'
----------------------------
revision 1.31
date: 2002/12/01 18:59:30; author: suntong; state: Exp; lines: +5 -5
- solve the problem that txt2html can't html-ify its own anchor.
----------------------------
revision 1.30
date: 2002/12/01 18:51:20; author: suntong; state: Exp; lines: +53 -10
- All my previous patches are lost because yahoogroups doesn't keep my patch
attachements. So, here they are in a big chunk. Major updates are:
- Able to use a style sheet for the generated html file
- Able to use body decoration for the generated html file
customize your own background color/image/sound, etc...
- Misc enhancements.
----------------------------
revision 1.29
date: 2002/12/01 18:28:35; author: suntong; state: Exp; lines: +3 -2
- Apply patch from http://groups.yahoo.com/group/txt2html/message/32
,-----
| I don't know when I'll get around to releasing a new version of
| txt2html, but I have a few fixes I've been sitting on. I thought
| I'd send them out on this list so people could take advantage of
| them without having to wait until I finally package up a new release.
|
| * Changed incorrect entity names of "fraq*" to "frac*".
|
| * Allow paragraphs within lists by permitting blank lines within
| list elements, as long as the following text has the same
| indentation level.
`-----
=======================================================
1.28
----
- bugfix: reserved characters in titles created with --titlefirst are
now escaped properly.
- bugfix: when preformatting entire document, each line was
getting its own container (introduced
with explicit preformatting feature in 1.26).
- dict: added some characters to those allowed in http urls (=&;,).
- dict: added "-" to allowed characters within *emphasized-pattern*.
1.27
----
- Changed names of default link dictionaries to txt2html.dict
1.26 (not released)
-------------------
- Added -8 (for 8-bit-clean) to disable conversion of non-ASCII
characters to their corresponding Latin-1 character entities.
- Added -pm to allow explicit marking of preformatted text in source
- Changes => to , in mapping, to stay compatible with Perl 4
- Added debug flag 4, for observing link rules in action
- Fixed length checking bug in header underline analysis
- Change a regexp so Perl 5.6 doesn't complain.
- No longer add space after tags
- Allow unindented lists to start after CAPS lines
- Use · as a bullet character
- Fixed bug that dropped a character when certain actions were
taken on the last line of input that didn't end with a newline.
- Added more aggressive regexps for _underlined_ and *emphasized* text.
- Improved character markup rules
- Added link rule for news URLs. (This must have been
accidentally deleted at some point.)
- Added link rule for common explicit url markup:
1.25
----
- Changed the official home page to
(the old page will have a working redirect indefinitely.)
- Added a LICENSE to the distribution. (modified BSD-style)
- When no title is specified, an empty title element is inserted.
(The old behavior was to omit the title element, which is
forbidden by the spec.)
- Made heading anchors appear inside the heading, rather than
surrounding it (which is forbidden by the HTML spec)
- Changed the DTD name
- Added the --linkonly option so people can use the links
dictionary feature without doing any other markup. This is
useful for adding links to HTML fragments or documents.
- Added the --prepend_body option for prepending HTML to the body.
- Made in_link_context smarter so it won't link on attributes or
tag names. (This is good for adding hyperlinks, but may screw
up some clever uses of the linking code.)
- Added link rules for _underlined text_ and *emphasized text*
- Added --noescapechars to suppress converting "&" "<" and
">" into "&" "<" and ">"
- Changed pattern rules to handle non-ascii letters properly in
matching patterns.
- Added conversion of non-ascii letters into character entities.
- Lots of upgrades to the links dictionary patterns
1.24
----
- Changed behavior of custom headers to something much more
useful: Header levels are assigned by regex in order seen.
When a line matches a custom header regex, it is tagged as a
header. If it's the first time that particular regex has
matched, the next available header level is associated with it
and applied to the line. Any later matches of that regex will
use the same header level.
- Added the -EH / --explicit-headings option
- Added some unnecessary initialization to avoid warnings when
perl is run with the -w switch.
1.23
----
- Added handling for when the consistent formatting of numbered
lists is the position of the non-numeric character, not the
amount of whitespace preceding the number. (The numbers
grow to the left instead of the right.)
1.22
----
- Fixed bug in unhyphenation
- Changed HTML version in default doctype line to 3.2
1.21
----
- Added
1.20
----
- Added DOCTYPE tag and --doctype options.
- Syntax change to get rid of Perl 5 warning
- Added ability to use the first line of the text as the title
- Fixed some (unused) grossness in links dict file
1.19
----
- Added --append_head
- Mail and News name anchor surrounds just the first word
("Newsgroups:" or "From"), and not the whole line. That way,
newsgroup names and email addresses get HREF'd as normal.
1.18
----
- Cleaned up nested list handling & fixed a bug under Perl 5.
- Changed a couple minor things to get rid of some of the Perl 5 warnings.
1.17
----
- Lists can start even when not indented and not preceded by a
blank line if the previous line was short or a header.
- New flag "o" added for dictionary entries. Specifies that the
link should only be done the first time a match is found.
1.16
----
- Added anchoring of custom headers
- Took the changelog out of the script
- Tweaked $line_indent in sub liststuff
- Insert before each mail/news message
1.15
----
- Fixed options handling for -e/+e , -r
- Added "Newsgroups:" to trigger mail headers
- Fixed anchor naming
- took out -T option, since it isn't implemented yet. Whoops..
- Fixed bug in endpreformat
1.14
----
- Fixed +l/--nolink option handling
- Fixed major bug in dynamic_make_dictionary_links that allowed
nested links under some circumstances.
1.13
----
- Fixed usage message so it matches options. (whoops)
- Added custom heading style feature
1.12
----
- Fixed bug in heading regexp
- Changed underline tolerance parameters from min & max length
difference to length difference & offset difference
- Centralized line reading, added handling of DOS carriage returns
- Switched to heading style stack. Styles still very limited.
- Changed heading anchor names from a simple count to a hierarchical
section number.
1.11
----
- Blank lines are never considered underlined
- Shortline breaking slightly more intelligent (or at least different)
- Paragraph breaks much more intelligent
- Lowercased tags. Style is so fickle.
- Added links dictionaries, link making, etc.
- Allow repeated bullet chars for unordered lists. (Tiny mod to regexp)
- switched order of caps & liststuff in main()
- improved untabify() so it converts the whole line, not just beginning
- split up all lines >79 characters to avoid common downloading error
(people would sometimes copy the script off the display,
inadvertently adding a few newlines in bad places in the code)
- Handles option "--" now.
- Accepts named files as input as alternative to stdin
- Deals with stdin properly (no more extra EOFs needed)
- Improved mail handling
1.10
====
- Added --extract, etc.
1.9
---
- Changed from #!/usr/local/bin/perl to the more clever version in
the man page. (How did I manage not to read this for so long?)
- Swapped hrule & header back to handle double lines. Why should
this order screw up headers?
1.8
---
- put mail_anchor back in. (Why did I take this out?)
- Finally added handling of lettered lists (ordered lists marked with
letters)
- Added title option (--title, -t)
- Shortline now looks at how long the line was before txt2html
started adding tags. ($line_length)
- Changed list references to scalars where appropriate. (@foo[0] -> $foo[0])
- Added untabify() to homogenize leading indentation for list
prefixes and functions that use line length
- Added "underline tolerance" for when underlines are not exactly the
same length as what they underline.
- Added error message for unrecognized options
- removed \w matching on --capstag
- Tagline now removes leading & trailing whitespace before tagging
- swapped order of caps & heading in main loop
- Cleaned up code for speed and to get rid of warnings
- Added more restrictions to something being a mail header
- Added indentation for lists, just to make the output more readable.
- Fixed major bug in lists: $OL and $UL were never set, so when a
list was ended "" was *always* used!
- swapped order of hrule & header to properly handle long underlines
1.7
---
- Added to comments in options section
- renamed blank to is_blank
- Page break is converted to horizontal rule
- moved usage subroutine up top so people who look through code see
it sooner
1.6
---
- Creates anchors at each heading
1.5
---
- Fixed minor bug in Headers
- Preformatting can be set to only start/stop when TWO lines of
[non]formatted-looking-text are encountered. Old behavior is still
possible through command line options (-pb 1 -pe 1).
- Can preformat entire document (-pb 0) or disable preformatting
completely (-pe 0).
- Fixed minor bug in CAPS handling (paragraph breaks broke)
- Puts paragraph tags *before* paragraphs, not just between them.
1.4
---
- Allow ':' for numbered lists (e.g. "1: Figs")
- Whitespace at end of line will not start or end preformatting
- Mailmode is now off by default
- Doesn't break short lines if they are the first line in a list
item. It *should* break them anyway if the next line is a
continuation of the list item, but I haven't dealt with this yet.
- Added action on lines that are all capital letters. You can change
how these lines get tagged, as well as the minimum number of
consecutive capital letters required to fire off this action.
1.3
---
- Tiny bugfix in unhyphenation
1.2
---
- Added unhyphenation
seth@aigeek.com
txt2html-3.0/DEVNOTES 0000664 0000000 0000000 00000004726 14337466404 0014340 0 ustar 00root root 0000000 0000000 Developer Notes
===============
This uses Dist::Zilla for building/testing/release.
Changing the version
--------------------
1. Edit the dist.ini file to increment the version
2. Replace the old version number with the new version in the tfiles files.
3. "dzil test"
Generating README
--------------------------
Make the README changes in HTML/TextToHTML.pm
Dist::Zilla will generate the README from that.
Adding Options
--------------
All new options need to be added in five places:
- lib/HTML/TextToHTML.pm init_our_args(), to initialize the default
- lib/HTML/TextToHTML.pm args() in the parse-the-array-ref part, unless
it is a simple option-with-a-value. If it is a boolean option,
it also needs to be added in its "no" form.
- lib/HTML/TextToHTML.pm OPTIONS documentation
- scripts/txt2html Getopt call, including its type and possible shortnames
- scripts/txt2html OPTIONS documentation
Changing the Global Link Dictionary
-----------------------------------
The contents of the global link dictionary are kept inside
lib/HTML/TextToHTML.pm in the __DATA__ section.
Any changes or updates to it must be done there.
Release Notes
=============
Before releasing, don't forget to run
dzil test
dzil build # for a sanity check of the build
dzil release
(and usually "dzil install" as well)
This bundle was released in three different external places in early:
- CPAN
- Sourceforge
- FreshMeat
Currently, txt2html is available in GitHub.
CPAN Release
------------
Dist::Zilla will do the CPAN release as part of the release process.
Sourceforge Release
-------------------
Do a git push if you haven't already.
Go to the Admin/Files section and follow the instructions.
You will need to ftp upload the .tar.gz file to upload.sourceforge.net
/incoming directory before doing certain steps.
One also needs to update the web-page. Cd to txt2html/web/htdocs on
your home machine, and edit index.html to update the version number.
Then do
make del_cpfiles
make cpfiles
make
git add
git commit
git push
cd ..
make install
(The last copies it over with rsync)
One should also post a message to the txt2html mailing list.
FreshMeat Release
-----------------
Log in to freshmeat.net, and follow the instructions for new releases.
The "changes this release" will need to be a summary of the changes.
Since the file is on sourceforge, you won't need to alter the download
URL.
GitHub
------
Please, go to https://github.com/resurrecting-open-source-projects/txt2html
txt2html-3.0/LICENSE 0000664 0000000 0000000 00000000362 14337466404 0014163 0 ustar 00root root 0000000 0000000 Copyright 1994-2000 Seth Golub seth AT aigeek.com
Copyright 2002-2013 Kathryn Andersen
Copyright 2018-2019 Joao Eriberto Mota Filho
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
txt2html-3.0/MANIFEST 0000664 0000000 0000000 00000003561 14337466404 0014313 0 ustar 00root root 0000000 0000000 Build.PL
ChangeLog
CONTRIBUTING.md
DEVNOTES
doc/UPDATE-CHECK
doc/README.samples
doc/sample.txt
doc/txt2html.dict
LICENSE
MANIFEST
MANIFEST.SKIP
META.yml
README.md
README.txt2html
TODO
lib/HTML/TextToHTML.pm
scripts/txt2html
t/00-compile.t
t/10para.t
t/20tfiles.t
t/25handles.t
t/30sample.t
t/50xsample.t
t/70bugs.t
t/release-distmeta.t
t/release-has-version.t
t/release-pod-coverage.t
t/release-pod-syntax.t
t/release-portability.t
tfiles/custom-headers.txt
tfiles/custom-headers2.txt
tfiles/empty.txt
tfiles/good_custom-headers.html
tfiles/good_custom-headers2.html
tfiles/good_empty.html
tfiles/good_heading1.html
tfiles/good_hyphens.html
tfiles/good_links.html
tfiles/good_links2.html
tfiles/good_links3.html
tfiles/good_links4.html
tfiles/good_list-2.html
tfiles/good_list-3.html
tfiles/good_list-4.html
tfiles/good_list-5.html
tfiles/good_list-advanced.html
tfiles/good_list-custom.html
tfiles/good_list-styles.html
tfiles/good_list.html
tfiles/good_mixed.html
tfiles/good_news.html
tfiles/good_pre.html
tfiles/good_pre2.html
tfiles/good_punct.html
tfiles/good_robo.html
tfiles/good_sample.html
tfiles/good_table-align.html
tfiles/good_table-border.html
tfiles/good_table-delim.html
tfiles/good_table-pgsql.html
tfiles/good_table-pgsql2.html
tfiles/good_umlauttest.html
tfiles/good_utf8.html
tfiles/good_xhtml_sample.html
tfiles/heading1.txt
tfiles/hyphens.txt
tfiles/links.txt
tfiles/links2.txt
tfiles/links3.txt
tfiles/links4.txt
tfiles/list-2.txt
tfiles/list-3.txt
tfiles/list-4.txt
tfiles/list-5.txt
tfiles/list-advanced.txt
tfiles/list-custom.txt
tfiles/list-styles.txt
tfiles/list.txt
tfiles/mixed.txt
tfiles/news.txt
tfiles/pre.txt
tfiles/pre2.txt
tfiles/punct.txt
tfiles/robo.txt
tfiles/sample.foot
tfiles/sample.foot2
tfiles/sample.txt
tfiles/table-align.txt
tfiles/table-border.txt
tfiles/table-delim.txt
tfiles/table-pgsql.txt
tfiles/table-pgsql2.txt
tfiles/umlauttest.txt
tfiles/utf8.txt
txt2html-3.0/MANIFEST.SKIP 0000664 0000000 0000000 00000000511 14337466404 0015050 0 ustar 00root root 0000000 0000000 # version control files
\bRCS\b
\bCVS\b
,v$
\.svn\b
# archives
.*\.tar\.gz$
# the TODO file
^\.todo$
# other files
# MakeMaker files
^Makefile$
^blib/
^MakeMaker-\d
pm_to_blib
# Module::Build files
^_build/
^Build$
# development files
^dist.ini$
^profiling/
# temp, old, backup files
~$
\.old$
\.bak$
\.tmp$
\.#
\.swp$
^#.*#$
txt2html-3.0/META.yml 0000664 0000000 0000000 00000001133 14337466404 0014424 0 ustar 00root root 0000000 0000000 ---
abstract: 'convert plain text file to HTML.'
author:
- 'Kathryn Andersen '
build_requires:
File::Find: 0
File::Temp: 0
Module::Build: 0.3601
Test::More: 0
warnings: 0
configure_requires:
Module::Build: 0.3601
dynamic_config: 0
generated_by: 'Dist::Zilla version 4.300034, CPAN::Meta::Converter version 2.120921'
license: gpl
meta-spec:
url: http://module-build.sourceforge.net/META-spec-v1.4.html
version: 1.4
name: txt2html
requires:
File::Basename: 0
Getopt::Long: 0
Pod::Usage: 0
YAML::Syck: 0
constant: 0
perl: v5.8.1
strict: 0
version: 3.0
txt2html-3.0/README.md 0000664 0000000 0000000 00000005436 14337466404 0014444 0 ustar 00root root 0000000 0000000 # txt2html
**Convert plain text file to HTML**
1. **[HELP THIS PROJECT](https://github.com/resurrecting-open-source-projects/txt2html/blob/master/README.md#1-help-this-project)**
2. **[WHAT IS TXT2HTML?](https://github.com/resurrecting-open-source-projects/txt2html/blob/master/README.md#2-what-is-txt2html)**
3. **[WHAT IS TXT2HTML NOT?](https://github.com/resurrecting-open-source-projects/txt2html/blob/master/README.md#3-what-is-txt2html-not)**
4. **[HOW TO INSTALL AND USE](https://github.com/resurrecting-open-source-projects/txt2html/blob/master/README.md#4-how-to-install-and-use)**
## 1. HELP THIS PROJECT
txt2html needs your help. **If you are a Perl programmer** and if you wants
to help a nice project, this is your opportunity.
My name is Eriberto and **I am not a Perl developer**. I imported txt2html
from its old repositories[1][2] to GitHub (the original developer is
inactive[3]). After this, I applied all patches found in Debian project and
other places for this program. All my work was registered in ChangeLog
file (version 2.53 and later releases). I also maintain txt2html packaged in
Debian[4].
If you are interested to help txt2html, read the [CONTRIBUTING.md](CONTRIBUTING.md) file.
[1] http://txt2html.sourceforge.net
[2] https://metacpan.org/release/txt2html
[3] Kathryn Andersen (RUBYKAT) told me in a private email message that is
inactive because a personal problem. So, txt2html needs help!
[4] https://tracker.debian.org/pkg/txt2html
## 2. WHAT IS TXT2HTML?
txt2html is a Perl program that converts plain text to HTML, using
HTML::TextToHTML Perl module.
It supports headings, lists, simple character markup, and hyperlinking, and
is highly customizable. It recognizes some of the apparent structure of the
source document (mostly whitespace and typographic layout), and attempts to
mark that structure explicitly using HTML.
The purpose for this tool is to provide an easier way of converting existing
text documents to HTML format, giving something nicer than just whapping the
text into a big PRE block. txt2html can also be used to aid in writing new
HTML documents, but there are probably better ways of doing that.
## 3. WHAT IS TXT2HTML NOT?
txt2html is not a program to convert wordprocessor files or other marked-up
document formats. It is also not a program to convert HTML to text. Most HTML
browsers do that.
If you need to convert something other than plain text to HTML, or you need to
convert from HTML, you should look for a more appropriate tool.
txt2html is not a program for automatically generating a table-of-contents from
a file. If you want that, then use txt2html to generate a HTML file, and then
use htmltoc or hypertoc on the HTML file.
## 4. HOW TO INSTALL AND USE
Please, read the README.txt2html file and generated manpages txt2html(1) and
HTML::TextToHTML(3).
txt2html-3.0/README.txt2html 0000664 0000000 0000000 00000130063 14337466404 0015625 0 ustar 00root root 0000000 0000000 NAME
HTML::TextToHTML - convert plain text file to HTML
VERSION
version 3.0
SYNOPSIS
From the command line:
txt2html I
From Scripts:
use HTML::TextToHTML;
# create a new object
my $conv = new HTML::TextToHTML();
# convert a file
$conv->txt2html(infile=>[$text_file],
outfile=>$html_file,
title=>"Wonderful Things",
mail=>1,
]);
# reset arguments
$conv->args(infile=>[], mail=>0);
# convert a string
$newstring = $conv->process_chunk($mystring)
DESCRIPTION
HTML::TextToHTML converts plain text files to HTML. The txt2html script
uses this module to do the same from the command-line.
It supports headings, tables, lists, simple character markup, and
hyperlinking, and is highly customizable. It recognizes some of the
apparent structure of the source document (mostly whitespace and
typographic layout), and attempts to mark that structure explicitly
using HTML. The purpose for this tool is to provide an easier way of
converting existing text documents to HTML format, giving something
nicer than just whapping the text into a big PRE block.
History
The original txt2html script was written by Seth Golub (see
http://www.aigeek.com/txt2html/), and converted to a perl module by
Kathryn Andersen (see http://www.katspace.com/tools/text_to_html/) and
made into a sourceforge project by Sun Tong (see
http://sourceforge.net/projects/txt2html/). Earlier versions of the
HTML::TextToHTML module called the included script texthyper so as not
to clash with the original txt2html script, but now the projects have
all been merged. UPDATING: currently, the project is available on GitHub
at https://github.com/resurrecting-open-source-projects/txt2html
OPTIONS
All arguments can be set when the object is created, and further options
can be set when calling the actual txt2html method. Arguments to methods
can take a hash of arguments.
Note that all option-names must match exactly -- no abbreviations are
allowed. The argument-keys are expected to have values matching those
required for that argument -- whether that be a boolean, a string, a
reference to an array or a reference to a hash. These will replace any
value for that argument that might have been there before.
append_file
append_file=>I
If you want something appended by default, put the filename here.
The appended text will not be processed at all, so make sure it's
plain text or correct HTML. i.e. do not have things like: Mary
Andersen but instead, have: Mary Andersen
<kitty@example.com>
(default: nothing)
append_head
append_head=>I
If you want something appended to the head by default, put the
filename here. The appended text will not be processed at all, so
make sure it's plain text or correct HTML. i.e. do not have things
like: Mary Andersen but instead, have: Mary
Andersen <kitty@example.com>
(default: nothing)
body_deco
body_deco=>I
Body decoration string: a string to be added to the BODY tag so that
one can set attributes to the BODY (such as class, style, bgcolor
etc) For example, "class='withimage'".
bold_delimiter
bold_delimiter=>I
This defines what character (or string) is taken to be the delimiter
of text which is to be interpreted as bold (that is, to be given a
STRONG tag). If this is empty, then no bolding of text will be done.
(default: #)
bullets
bullets=>I
This defines what single characters are taken to be "bullet"
characters for unordered lists. Note that because this is used as a
character class, if you use '-' it must come first.
(default:-=o*\267)
bullets_ordered
bullets_ordered=>I
This defines what single characters are taken to be "bullet"
placeholder characters for ordered lists. Ordered lists are normally
marked by a number or letter followed by '.' or ')' or ']' or ':'.
If an ordered bullet is used, then it simply indicates that this is
an ordered list, without giving explicit numbers.
Note that because this is used as a character class, if you use '-'
it must come first. (default:nothing)
caps_tag
caps_tag=>I
Tag to put around all-caps lines (default: STRONG) If an empty tag
is given, then no tag will be put around all-caps lines.
custom_heading_regexp
custom_heading_regexp=>\@custom_headings
Add patterns for headings. Header levels are assigned by regexp in
the order seen in the input text. When a line matches a custom
header regexp, it is tagged as a header. If it's the first time that
particular regexp has matched, the next available header level is
associated with it and applied to the line. Any later matches of
that regexp will use the same header level. Therefore, if you want
to match numbered header lines, you could use something like this:
my @custom_headings = ('^ *\d+\. \w+',
'^ *\d+\.\d+\. \w+',
'^ *\d+\.\d+\.\d+\. \w+');
...
custom_heading_regexp=>\@custom_headings,
...
Then lines like
" 1. Examples "
" 1.1. Things"
and " 4.2.5. Cold Fusion"
Would be marked as H1, H2, and H3 (assuming they were found in that
order, and that no other header styles were encountered). If you
prefer that the first one specified always be H1, the second always
be H2, the third H3, etc, then use the "explicit_headings" option.
This expects a reference to an array of strings.
(default: none)
default_link_dict
default_link_dict=>I
The name of the default "user" link dictionary. (default:
"$ENV{'HOME'}/.txt2html.dict" -- this is the same as for the
txt2html script. If there is no $ENV{HOME} then it is just
'.txt2html.dict')
demoronize
demoronize=>1
Convert Microsoft-generated character codes that are non-ISO codes
into something more reasonable. (default:true)
doctype
doctype=>I
This gets put in the DOCTYPE field at the top of the document,
unless it's empty.
Default : '-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd'
If xhtml is true, the contents of this is ignored, unless it's
empty, in which case no DOCTYPE declaration is output.
eight_bit_clean
eight_bit_clean=>1
If false, convert Latin-1 characters to HTML entities. If true, this
conversion is disabled; also "demoronize" is set to false, since
this also changes 8-bit characters. (default: false)
escape_HTML_chars
escape_HTML_chars=>1
turn & < > into & > < (default: true)
explicit_headings
explicit_headings=>1
Don't try to find any headings except the ones specified in the
--custom_heading_regexp option. Also, the custom headings will not
be assigned levels in the order they are encountered in the
document, but in the order they are specified on the
custom_heading_regexp option. (default: false)
extract
extract=>1
Extract Mode; don't put HTML headers or footers on the result, just
the plain HTML (thus making the result suitable for inserting into
another document (or as part of the output of a CGI script).
(default: false)
hrule_min
hrule_min=>I
Min number of ---s for an HRule. (default: 4)
indent_width
indent_width=>I
Indents this many spaces for each level of a list. (default: 2)
indent_par_break
indent_par_break=>1
Treat paragraphs marked solely by indents as breaks with indents.
That is, instead of taking a three-space indent as a new paragraph,
put in a
and three non-breaking spaces instead. (see also
--preserve_indent) (default: false)
infile
infile=>\@my_files
infile=>['chapter1.txt', 'chapter2.txt']
The name of the input file(s). This expects a reference to an array
of filenames.
The special filename '-' designates STDIN.
See also "inhandle" and "instring".
(default:-)
inhandle
inhandle=>\@my_handles
inhandle=>[\*MYINHANDLE, \*STDIN]
An array of input filehandles; use this instead of "infile" or
"instring" to use a filehandle or filehandles as input.
instring
instring=>\@my_strings
instring=>[$string1, $string2]
An array of input strings; use this instead of "infile" or
"inhandle" to use a string or strings as input.
italic_delimiter
italic_delimiter=>I
This defines what character (or string) is taken to be the delimiter
of text which is to be interpreted as italic (that is, to be given a
EM tag). If this is empty, no italicising of text will be done.
(default: *)
underline_delimiter
underline_delimiter=>I
This defines what character (or string) is taken to be the delimiter
of text which is to be interpreted as underlined (that is, to be
given a U tag). If this is empty, no underlining of text will be
done. (default: _)
links_dictionaries
links_dictionaries=>\@my_link_dicts
links_dictionaries=>['url_links.dict', 'format_links.dict']
File(s) to use as a link-dictionary. There can be more than one of
these. These are in addition to the Global Link Dictionary and the
User Link Dictionary. This expects a reference to an array of
filenames.
link_only
link_only=>1
Do no escaping or marking up at all, except for processing the links
dictionary file and applying it. This is useful if you want to use
the linking feature on an HTML document. If the HTML is a complete
document (includes HTML,HEAD,BODY tags, etc) then you'll probably
want to use the --extract option also. (default: false)
lower_case_tags
lower_case_tags=>1
Force all tags to be in lower-case.
mailmode
mailmode=>1
Deal with mail headers & quoted text. The mail header paragraph is
given the class 'mail_header', and mail-quoted text is given the
class 'quote_mail'. (default: false)
make_anchors
make_anchors=>0
Should we try to make anchors in headings? (default: true)
make_links
make_links=>0
Should we try to build links? If this is false, then the links
dictionaries are not consulted and only structural text-to-HTML
conversion is done. (default: true)
make_tables
make_tables=>1
Should we try to build tables? If true, spots tables and marks them
up appropriately. See "Input File Format" for information on how
tables should be formatted.
This overrides the detection of lists; if something looks like a
table, it is taken as a table, and list-checking is not done for
that paragraph.
(default: false)
min_caps_length
min_caps_length=>I
min sequential CAPS for an all-caps line (default: 3)
outfile
outfile=>I
The name of the output file. If it is "-" then the output goes to
Standard Output. (default: - )
outhandle
The output filehandle; if this is given then the output goes to this
filehandle instead of to the file given in "outfile".
par_indent
par_indent=>I
Minimum number of spaces indented in first lines of paragraphs. Only
used when there's no blank line preceding the new paragraph.
(default: 2)
preformat_trigger_lines
preformat_trigger_lines=>I
How many lines of preformatted-looking text are needed to switch to
<= 0 : Preformat entire document 1 : one line triggers >= 2 :
two lines trigger
(default: 2)
endpreformat_trigger_lines
endpreformat_trigger_lines=>I
How many lines of unpreformatted-looking text are needed to switch
from <= 0 : Never preformat within document 1 : one line
triggers >= 2 : two lines trigger (default: 2)
NOTE for preformat_trigger_lines and endpreformat_trigger_lines: A
zero takes precedence. If one is zero, the other is ignored. If both
are zero, entire document is preformatted.
preformat_start_marker
preformat_start_marker=>I
What flags the start of a preformatted section if
--use_preformat_marker is true.
(default: "^(:?(:?<)|<)PRE(:?(:?>)|>)\$")
preformat_end_marker
preformat_end_marker=>I
What flags the end of a preformatted section if
--use_preformat_marker is true.
(default: "^(:?(:?<)|<)/PRE(:?(:?>)|>)\$")
preformat_whitespace_min
preformat_whitespace_min=>I
Minimum number of consecutive whitespace characters to trigger
normal preformatting. NOTE: Tabs are expanded to spaces before this
check is made. That means if tab_width is 8 and this is 5, then one
tab may be expanded to 8 spaces, which is enough to trigger
preformatting. (default: 5)
prepend_file
prepend_file=>I
If you want something prepended to the processed body text, put the
filename here. The prepended text will not be processed at all, so
make sure it's plain text or correct HTML.
(default: nothing)
preserve_indent
preserve_indent=>1
Preserve the first-line indentation of paragraphs marked with
indents by replacing the spaces of the first line with non-breaking
spaces. (default: false)
short_line_length
short_line_length=>I
Lines this short (or shorter) must be intentionally broken and are
kept that short. (default: 40)
style_url
style_url=>I
This gives the URL of a stylesheet; a LINK tag will be added to the
output.
tab_width
tab_width=>I
How many spaces equal a tab? (default: 8)
table_type
table_type=>{ ALIGN=>0, PGSQL=>0, BORDER=>1, DELIM=>0 }
This determines which types of tables will be recognised when
"make_tables" is true. The possible types are ALIGN, PGSQL, BORDER
and DELIM. (default: all types are true)
title
title=>I
You can specify a title. Otherwise it will use a blank one.
(default: nothing)
titlefirst
titlefirst=>1
Use the first non-blank line as the title. (See also "title")
underline_length_tolerance
underline_length_tolerance=>I
How much longer or shorter can underlines be and still be
underlines? (default: 1)
underline_offset_tolerance
underline_offset_tolerance=>I
How far offset can underlines be and still be underlines? (default:
1)
unhyphenation
unhyphenation=>0
Enables unhyphenation of text. (default: true)
use_mosaic_header
use_mosaic_header=>1
Use this option if you want to force the heading styles to match
what Mosaic outputs. (Underlined with "***"s is H1, with "==="s is
H2, with "+++" is H3, with "---" is H4, with "~~~" is H5 and with
"..." is H6) This was the behavior of txt2html up to version 1.10.
(default: false)
use_preformat_marker
use_preformat_marker=>1
Turn on preformatting when encountering "" on a line by itself,
and turn it off when there's a line containing only "
". When
such preformatted text is detected, the PRE tag will be given the
class 'quote_explicit'. (default: off)
xhtml
xhtml=>1
Try to make the output conform to the XHTML standard, including
closing all open tags and marking empty tags correctly. This turns
on --lower_case_tags and overrides the --doctype option. Note that
if you add a header or a footer file, it is up to you to make it
conform; the header/footer isn't touched by this. Likewise, if you
make link-dictionary entries that break XHTML, then this won't fix
them, except to the degree of putting all tags into lower-case.
(default: true)
DEBUGGING
There are global variables for setting types and levels of debugging.
These should only be used by developers.
$HTML::TextToHTML::Debug
$HTML::TextToHTML::Debug = 1;
Enable copious debugging output. (default: false)
$HTML::TextToHTML::DictDebug
$HTML::TextToHTML::DictDebug = I;
Debug mode for link dictionaries. Bitwise-Or what you want to see:
1: The parsing of the dictionary
2: The code that will make the links
4: When each rule matches something
8: When each tag is created
(default: 0)
METHODS
new
$conv = new HTML::TextToHTML()
$conv = new HTML::TextToHTML(titlefirst=>1,
...
);
Create a new object with new. If arguments are given, these arguments
will be used in invocations of other methods.
See "OPTIONS" for the possible values of the arguments.
args
$conv->args(short_line_length=>60,
titlefirst=>1,
....
);
Updates the current arguments/options of the HTML::TextToHTML object.
Takes hash of arguments, which will be used in invocations of other
methods. See "OPTIONS" for the possible values of the arguments.
process_chunk
$newstring = $conv->process_chunk($mystring);
Convert a string to a HTML fragment. This assumes that this string is at
the least, a single paragraph, but it can contain more than that. This
returns the processed string. If you want to pass arguments to alter the
behaviour of this conversion, you need to do that earlier, either when
you create the object, or with the "args" method.
$newstring = $conv->process_chunk($mystring,
close_tags=>0);
If there are open tags (such as lists) in the input string,
process_chunk will automatically close them, unless you specify not to,
with the close_tags option.
$newstring = $conv->process_chunk($mystring,
is_fragment=>1);
If you want this string to be treated as a fragment, and not assumed to
be a paragraph, set is_fragment to true. If there is more than one
paragraph in the string (ie it contains blank lines) then this option
will be ignored.
process_para
$newstring = $conv->process_para($mystring);
Convert a string to a HTML fragment. This assumes that this string is at
the most a single paragraph, with no blank lines in it. If you don't
know whether your string will contain blank lines or not, use the
"process_chunk" method instead.
This returns the processed string. If you want to pass arguments to
alter the behaviour of this conversion, you need to do that earlier,
either when you create the object, or with the "args" method.
$newstring = $conv->process_para($mystring,
close_tags=>0);
If there are open tags (such as lists) in the input string, process_para
will automatically close them, unless you specify not to, with the
close_tags option.
$newstring = $conv->process_para($mystring,
is_fragment=>1);
If you want this string to be treated as a fragment, and not assumed to
be a paragraph, set is_fragment to true.
txt2html
$conv->txt2html(%args);
Convert a text file to HTML. Takes a hash of arguments. See "OPTIONS"
for the possible values of the arguments. Arguments which have already
been set with new or args will remain as they are, unless they are
overridden.
PRIVATE METHODS
These are methods used internally, only of interest to developers.
init_our_data
$self->init_our_data();
Initializes the internal object data.
deal_with_options
$self->deal_with_options();
do extra processing related to particular options
escape
$newtext = escape($text);
Escape & < and >
demoronize_char
$newtext = demoronize_char($text);
Convert Microsoft character entities into characters.
Added by Alan Jackson, alan at ajackson dot org, and based on the
demoronize script by John Walker, http://www.fourmilab.ch/
demoronize_code
$newtext = demoronize_code($text);
convert Microsoft character entities into HTML code
get_tag
$tag = $self->get_tag($in_tag);
$tag = $self->get_tag($in_tag, tag_type=>TAG_START, inside_tag=>'');
output the tag wanted (add the <> and the / if necessary) - output in
lower or upper case - do tag-related processing options:
tag_type=>TAG_START | tag_type=>TAG_END | tag_type=>TAG_EMPTY (default
start) inside_tag=>string (default empty)
close_tag
$tag = $self->close_tag($in_tag);
close the open tag
hrule
$self->hrule(para_lines_ref=>$para_lines,
para_action_ref=>$para_action,
ind=>0);
Deal with horizontal rules.
shortline
$self->shortline(line_ref=>$line_ref,
line_action_ref=>$line_action_ref,
prev_ref=>$prev_ref,
prev_action_ref=>$prev_action_ref,
prev_line_len=>$prev_line_len);
Deal with short lines.
is_mailheader
if ($self->is_mailheader(rows_ref=>$rows_ref))
{
...
}
Is this a mailheader line?
mailheader
$self->mailheader(rows_ref=>$rows_ref);
Deal with a mailheader.
mailquote
$self->mailquote(line_ref=>$line_ref,
line_action_ref=>$line_action_ref,
prev_ref=>$prev_ref,
prev_action_ref=>$prev_action_ref,
next_ref=>$next_ref);
Deal with quoted mail.
subtract_modes
$newvector = subtract_modes($vector, $mask);
Subtracts modes listed in $mask from $vector.
paragraph
$self->paragraph(line_ref=>$line_ref,
line_action_ref=>$line_action_ref,
prev_ref=>$prev_ref,
prev_action_ref=>$prev_action_ref,
line_indent=>$line_indent,
prev_indent=>$prev_indent,
is_fragment=>$is_fragment,
ind=>$ind);
Detect paragraph indentation.
listprefix
($prefix, $number, $rawprefix, $term) = $self->listprefix($line);
Detect and parse a list item.
startlist
$self->startlist(prefix=>$prefix,
number=>0,
rawprefix=>$rawprefix,
term=>$term,
para_lines_ref=>$para_lines_ref,
para_action_ref=>$para_action_ref,
ind=>0,
prev_ref=>$prev_ref,
total_prefix=>$total_prefix);
Start a list.
endlist
$self->endlist(num_lists=>0,
prev_ref=>$prev_ref,
line_action_ref=>$line_action_ref);
End N lists
continuelist
$self->continuelist(para_lines_ref=>$para_lines_ref,
para_action_ref=>$para_action_ref,
ind=>0,
term=>$term);
Continue a list.
liststuff
$self->liststuff(para_lines_ref=>$para_lines_ref,
para_action_ref=>$para_action_ref,
para_line_indent_ref=>$para_line_indent_ref,
ind=>0,
prev_ref=>$prev_ref);
Process a list (higher-level method).
get_table_type
$table_type = $self->get_table_type(rows_ref=>$rows_ref,
para_len=>0);
Figure out the table type of this table, if any
is_aligned_table
if ($self->is_aligned_table(rows_ref=>$rows_ref, para_len=>0))
{
...
}
Check if the given paragraph-array is an aligned table
is_pgsql_table
if ($self->is_pgsql_table(rows_ref=>$rows_ref, para_len=>0))
{
...
}
Check if the given paragraph-array is a Postgresql table (the ascii
format produced by Postgresql)
A PGSQL table can start with an optional table-caption,
then it has a row of column headings separated by |
then it has a row of ------+-----
then it has one or more rows of column values separated by |
then it has a row-count (N rows)
is_border_table
if ($self->is_border_table(rows_ref=>$rows_ref, para_len=>0))
{
...
}
Check if the given paragraph-array is a Border table.
A BORDER table can start with an optional table-caption,
then it has a row of +------+-----+
then it has a row of column headings separated by |
then it has a row of +------+-----+
then it has one or more rows of column values separated by |
then it has a row of +------+-----+
is_delim_table
if ($self->is_delim_table(rows_ref=>$rows_ref, para_len=>0))
{
...
}
Check if the given paragraph-array is a Delimited table.
A DELIM table can start with an optional table-caption, then it has at
least two rows which start and end and are punctuated by a
non-alphanumeric delimiter.
| val1 | val2 |
| val3 | val4 |
tablestuff
$self->tablestuff(table_type=>0,
rows_ref=>$rows_ref,
para_len=>0);
Process a table.
make_aligned_table
$self->make_aligned_table(rows_ref=>$rows_ref,
para_len=>0);
Make an Aligned table.
make_pgsql_table
$self->make_pgsql_table(rows_ref=>$rows_ref,
para_len=>0);
Make a PGSQL table.
make_border_table
$self->make_border_table(rows_ref=>$rows_ref,
para_len=>0);
Make a BORDER table.
make_delim_table
$self->make_delim_table(rows_ref=>$rows_ref,
para_len=>0);
Make a Delimited table.
is_preformatted
if ($self->is_preformatted($line))
{
...
}
Returns true if the passed string is considered to be preformatted.
split_end_explicit_preformat
$front = $self->split_end_explicit_preformat(para_ref=>$para_ref);
Modifies the given string, and returns the front preformatted part.
endpreformat
$self->endpreformat(para_lines_ref=>$para_lines_ref,
para_action_ref=>$para_action_ref,
ind=>0,
prev_ref=>$prev_ref);
End a preformatted section.
preformat
$self->preformat(mode_ref=>$mode_ref,
line_ref=>$line_ref,
line_action_ref=>$line_action_ref,
prev_ref=>$prev_ref,
next_ref=>$next_ref,
prev_action_ref);
Detect and process a preformatted section.
make_new_anchor
$anchor = $self->make_new_anchor($heading_level);
Make a new anchor.
anchor_mail
$self->anchor_mail($line_ref);
Make an anchor for a mail section.
anchor_heading
$self->anchor_heading($heading_level, $line_ref);
Make an anchor for a heading.
heading_level
$self->heading_level($style);
Add a new heading style if this is a new heading style.
is_ul_list_line
if ($self->is_ul_list_line($line))
{
...
}
Tests if this line starts a UL list item.
is_heading
if ($self->is_heading(line_ref=>$line_ref, next_ref=>$next_ref))
{
...
}
Tests if this line is a heading. Needs to take account of the next line,
because a standard heading is defined by "underlining" the text of the
heading.
heading
$self->heading(line_ref=>$line_ref,
next_ref=>$next_ref);
Make a heading. Assumes is_heading is true.
is_custom_heading
if ($self->is_custom_heading($line))
{
...
}
Check if the given line matches a custom heading.
custom_heading
$self->custom_heading(line_ref=>$line_ref);
Make a custom heading. Assumes is_custom_heading is true.
unhyphenate_para
$self->unhyphenate_para($para_ref);
Join up hyphenated words that are split across lines.
tagline
$self->tagline($tag, $line_ref);
Put the given tag around the given line.
iscaps
if ($self->iscaps($line))
{
...
}
Check if a line is all capitals.
caps
$self->caps(line_ref=>$line_ref,
line_action_ref=>$line_action_ref);
Detect and deal with an all-caps line.
do_delim
$self->do_delim(line_ref=>$line_ref,
line_action_ref=>$line_action_ref,
delim=>'*',
tag=>'STRONG');
Deal with a line which has words delimited by the given delimiter; this
is used to deal with italics, bold and underline formatting.
glob2regexp
$regexp = glob2regexp($glob);
Convert very simple globs to regexps
add_regexp_to_links_table
$self->add_regexp_to_links_table(label=>$label,
pattern=>$pattern,
url=>$url,
switches=>$switches);
Add the given regexp "link definition" to the links table.
add_literal_to_links_table
$self->add_literal_to_links_table(label=>$label,
pattern=>$pattern,
url=>$url,
switches=>$switches);
Add the given literal "link definition" to the links table.
add_glob_to_links_table
$self->add_glob_to_links_table(label=>$label,
pattern=>$pattern,
url=>$url,
switches=>$switches);
Add the given glob "link definition" to the links table.
parse_dict
$self->parse_dict($dictfile, $dict);
Parse the dictionary file. (see also load_dictionary_links, for things
that were stripped)
setup_dict_checking
$self->setup_dict_checking();
Set up the dictionary checking.
in_link_context
if ($self->in_link_context($match, $before))
{
...
}
Check if we are inside a link (); certain kinds of substitution
are not allowed here.
apply_links
$self->apply_links(para_ref=>$para_ref,
para_action_ref=>$para_action_ref);
Apply links and formatting to this paragraph.
check_dictionary_links
$self->check_dictionary_links(line_ref=>$line_ref,
line_action_ref=>$line_action_ref);
Check (and alter if need be) the bits in this line matching the patterns
in the link dictionary.
load_dictionary_links
$self->load_dictionary_links();
Load the dictionary links.
do_file_start
$self->do_file_start($outhandle, $para);
Extra stuff needed for the beginning: HTML headers, and prepending a
file if desired.
do_init_call
$self->do_init_call();
Certain things, like reading link dictionaries, need to be done only
once.
FILE FORMATS
There are two files which are used which can affect the outcome of the
conversion. One is the link dictionary, which contains patterns (of how
to recognise http links and other things) and how to convert them. The
other is, naturally, the format of the input file itself.
Link Dictionary
A link dictionary file contains patterns to match, and what to convert
them to. It is called a "link" dictionary because it was intended to be
something which defined what a href link was, but it can be used for
more than that. However, if you wish to define your own links, it is
strongly advised to read up on regular expressions (regexes) because
this relies heavily on them.
The file consists of comments (which are lines starting with #) and
blank lines, and link entries. Each entry consists of a regular
expression, a -> separator (with optional flags), and a link "result".
In the simplest case, with no flags, the regular expression defines the
pattern to look for, and the result says what part of the regular
expression is the actual link, and the link which is generated has the
href as the link, and the whole matched pattern as the visible part of
the link. The first character of the regular expression is taken to be
the separator for the regex, so one could either use the traditional /
separator, or something else such as | (which can be helpful with URLs
which are full of / characters).
So, for example, an ftp URL might be defined as:
|ftp:[\w/\.:+\-]+| -> $&
This takes the whole pattern as the href, and the resultant link has the
same thing in the href as in the contents of the anchor.
But sometimes the href isn't the whole pattern.
/<URL:\s*(\S+?)\s*>/ --> $1
With the above regex, a () grouping marks the first subexpression, which
is represented as $1 (rather than $& the whole expression). This entry
matches a URL which was marked explicitly as a URL with the pattern
(note the < is shown as the entity, not the actual
character. This is because by the time the links dictionary is checked,
all such things have already been converted to their HTML entity forms,
unless, of course, the escape_HTML_chars option was turned off) This
would give us a link in the form <URL:foo>
The h flag
However, if we want more control over the way the link is constructed,
we can construct it ourself. If one gives the h flag, then the "result"
part of the entry is taken not to contain the href part of the link, but
the whole link.
For example, the entry:
/<URL:\s*(\S+?)\s*>/ -h-> $1
will take and give us foo
However, this is a very powerful mechanism, because it can be used to
construct custom tags which aren't links at all. For example, to flag
*italicised words* the following entry will surround the words with EM
tags.
/\B\*([a-z][a-z -]*[a-z])\*\B/ -hi-> $1
The i flag
This turns on ignore case in the pattern matching.
The e flag
This turns on execute in the pattern substitution. This really only
makes sense if h is turned on too. In that case, the "result" part of
the entry is taken as perl code to be executed, and the result of that
code is what replaces the pattern.
The o flag
This marks the entry as a once-only link. This will convert the first
instance of a matching pattern, and ignore any others further on.
For example, the following pattern will take the first mention of
HTML::TextToHTML and convert it to a link to the module's home page.
"HTML::TextToHTML" -io-> http://www.katspace.com/tools/text_to_html/
Input File Format
For the most part, this module tries to use intuitive conventions for
determining the structure of the text input. Unordered lists are marked
by bullets; ordered lists are marked by numbers or letters; in either
case, an increase in indentation marks a sub-list contained in the outer
list.
Headers (apart from custom headers) are distinguished by "underlines"
underneath them; headers in all-capitals are distinguished from those in
mixed case. All headers, both normal and custom headers, are expected to
start at the first line in a "paragraph".
In other words, the following is a header:
I am Head Man
-------------
But the following does not have a header:
I am not a head Man, man
I am Head Man
-------------
Tables require a more rigid convention. A table must be marked as a
separate paragraph, that is, it must be surrounded by blank lines.
Tables come in different types. For a table to be parsed, its
--table_type option must be on, and the --make_tables option must be
true.
ALIGN Table Type
Columns must be separated by two or more spaces (this prevents
accidental incorrect recognition of a paragraph where interword spaces
happen to line up). If there are two or more rows in a paragraph and all
rows share the same set of (two or more) columns, the paragraph is
assumed to be a table. For example
-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).
becomes
-e | File exists. |
-z | File has zero size. |
-s | File has nonzero size (returns size). |
This guesses for each column whether it is intended to be left, centre
or right aligned.
BORDER Table Type
This table type has nice borders around it, and will be rendered with a
border, like so:
+---------+---------+
| Column1 | Column2 |
+---------+---------+
| val1 | val2 |
| val3 | val3 |
+---------+---------+
The above becomes
Column1 | Column2 |
val1 | val2 |
val3 | val3 |
It can also have an optional caption at the start.
My Caption
+---------+---------+
| Column1 | Column2 |
+---------+---------+
| val1 | val2 |
| val3 | val3 |
+---------+---------+
PGSQL Table Type
This format of table is what one gets from the output of a Postgresql
query.
Column1 | Column2
---------+---------
val1 | val2
val3 | val3
(2 rows)
This can also have an optional caption at the start. This table is also
rendered with a border and table-headers like the BORDER type.
DELIM Table Type
This table type is delimited by non-alphanumeric characters, and has to
have at least two rows and two columns before it's recognised as a
table.
This one is delimited by the '| character:
| val1 | val2 |
| val3 | val3 |
But one can use almost any suitable character such as : # $ % + and so
on. This is clever enough to figure out what you are using as the
delimiter if you have your data set up like a table. Note that the line
has to both begin and end with the delimiter, as well as using it to
separate values.
This can also have an optional caption at the start.
EXAMPLES
use HTML::TextToHTML;
Create a new object
my $conv = new HTML::TextToHTML();
my $conv = new HTML::TextToHTML(title=>"Wonderful Things",
default_link_dict=>$my_link_file,
);
Add further arguments
$conv->args(short_line_length=>60,
preformat_trigger_lines=>4,
caps_tag=>"strong",
);
Convert a file
$conv->txt2html(infile=>[$text_file],
outfile=>$html_file,
title=>"Wonderful Things",
mail=>1
);
Make a pipleline
open(IN, "ls |") or die "could not open!";
$conv->txt2html(inhandle=>[\*IN],
outfile=>'-',
);
NOTES
* If the underline used to mark a header is off by more than 1, then
that part of the text will not be picked up as a header unless you
change the value of --underline_length_tolerance and/or
--underline_offset_tolerance. People tend to forget this.
REQUIRES
HTML::TextToHTML requires Perl 5.8.1 or later.
For installation, it needs:
Module::Build
The txt2html script needs:
Getopt::Long
Getopt::ArgvFile
Pod::Usage
File::Basename
For testing, it also needs:
Test::More
For debugging, it also needs:
YAML::Syck
INSTALLATION
Make sure you have the dependencies installed first! (see REQUIRES
above)
Some of those modules come standard with more recent versions of perl,
but I thought I'd mention them anyway, just in case you may not have
them.
If you don't know how to install these, try using the CPAN module, an
easy way of auto-installing modules from the Comprehensive Perl Archive
Network, where the above modules reside. Do "perldoc perlmodinstall" or
"perldoc CPAN" for more information.
To install this module type the following:
perl Build.PL
./Build
./Build test
./Build install
Or, if you're on a platform (like DOS or Windows) that doesn't like the
"./" notation, you can do this:
perl Build.PL
perl Build
perl Build test
perl Build install
In order to install somewhere other than the default, such as in a
directory under your home directory, like "/home/fred/perl" go
perl Build.PL --install_base /home/fred/perl
as the first step instead.
This will install the files underneath /home/fred/perl.
You will then need to make sure that you alter the PERL5LIB variable to
find the modules, and the PATH variable to find the script.
Therefore you will need to change: your path, to include
/home/fred/perl/script (where the script will be)
PATH=/home/fred/perl/script:${PATH}
the PERL5LIB variable to add /home/fred/perl/lib
PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
Note that the system links dictionary will be installed as
"/home/fred/perl/share/txt2html/txt2html.dict"
If you want to install in a temporary install directory (such as if you
are building a package) then instead of going
perl Build install
go
perl Build install destdir=/my/temp/dir
and it will be installed there, with a directory structure under
/my/temp/dir the same as it would be if it were installed plain. Note
that this is NOT the same as setting --install_base, because certain
things are done at build-time which use the install_base info.
See "perldoc perlrun" for more information on PERL5LIB, and see "perldoc
Module::Build" for more information on installation options.
BUGS
Please, send to
https://github.com/resurrecting-open-source-projects/txt2html/issues
SEE ALSO
perl txt2html.
AUTHOR
Kathryn Andersen (RUBYKAT)
perlkat AT katspace dot com
http//www.katspace.com/
based on txt2html by Seth Golub
Current homepage is
https://github.com/resurrecting-open-source-projects/txt2html
COPYRIGHT AND LICENCE
Original txt2html script copyright (c) 1994-2000 Seth Golub
Copyright (c) 2002-2005 by Kathryn Andersen
Copyright (c) 2018-2019 Joao Eriberto Mota Filho
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
txt2html-3.0/TODO 0000664 0000000 0000000 00000002131 14337466404 0013642 0 ustar 00root root 0000000 0000000 TODO list for txt2html
======================
5. try to make this more thread-safe
Added:19/01/07, 09:29 Priority: high
6. add in an "obfustcate email addresses" facility
Added:08/11/04, 15:19 Priority: medium
7. convert (c) (r) and (tm) to the correct entities (add to links)
Added:08/11/04, 15:20 Priority: medium
8. add optional "enhanced" definition lists detection, which does Term:
Defintion
Added:08/11/04, 15:20 Priority: medium
9. recognise and reproduce different Ordered List styles.
Added:08/11/04, 15:20 Priority: medium
10. links-per-section
Added:08/11/04, 15:21 Priority: medium
11. tables with multi-line cells
Added:08/11/04, 15:21 Priority: medium
12. option to add section-numbers to headers (not just header anchors)
Added:08/11/04, 15:22 Priority: medium
1. allow starting section number to be user-defined
Added:08/11/04, 15:22 Priority: medium
13. make this more efficient
Added:23/01/05, 14:42 Priority: medium
14. make a proper CGI script as an example
Added:17/05/05, 22:09 Priority: medium
txt2html-3.0/doc/ 0000775 0000000 0000000 00000000000 14337466404 0013722 5 ustar 00root root 0000000 0000000 txt2html-3.0/doc/README.samples 0000664 0000000 0000000 00000000220 14337466404 0016237 0 ustar 00root root 0000000 0000000 txt2html.dict is an example file taken from
http://web.mit.edu/wwwdev/src/txt2html/txt2html.dict
sample.txt was copied from tfiles/ directory.
txt2html-3.0/doc/UPDATE-CHECK 0000664 0000000 0000000 00000000243 14337466404 0015441 0 ustar 00root root 0000000 0000000 When updating, change the following files (if needed):
- ChangeLog
- Change version in all needed files (use rpl for tfiles/*)
- Update MANIFEST
- Test in Debian
txt2html-3.0/doc/sample.txt 0000664 0000000 0000000 00000022250 14337466404 0015745 0 ustar 00root root 0000000 0000000 txt2html/HTML::TextToHTML Sample Conversion
This sample is based hugely on the original sample.txt produced
by Seth Golub for txt2html.
I used the following options to convert this document:
-titlefirst -mailmode -make_tables
--custom_heading_regexp '^ *--[\w\s]+-- *$'
--system_link_dict txt2html.dict
--append_body sample.foot --infile sample.txt --outfile sample.html
This has either been done at the command line with:
perl -MHTML::TextToHTML -e run_txt2html -- *options*
or using the script
txt2html *options*
or from a (test) perl script with:
use HTML::TextToHTML;
my $conv = new HTML::TextToHTML();
$conv->txt2html([*options*]);
======================================================================
From bozo@clown.wustl.edu
Return-Path:
Message-Id: <9405102200.AA04736@clown.wustl.edu>
Content-Length: 1070
From: bozo@clown.wustl.edu (Bozo the Clown)
To: kitty@example.com (Kathryn Andersen)
Subject: Re: HTML::TextToHTML
Date: Sun, 12 May 2002 10:01:10 -0500
Bozo wrote:
BtC> Can you post an example text file with its html'ed output?
BtC> That would provide a much better first glance at what it does
BtC> without having to look through and see what the perl code does.
Good idea. I'll write something up.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The header lines were kept separate because they looked like mail
headers and I have mailmode on. The same thing applies to Bozo's
quoted text. Mailmode doesn't screw things up very often, but since
most people are usually converting non-mail, it's off by default.
Paragraphs are handled ok. In fact, this one is here just to
demonstrate that.
THIS LINE IS VERY IMPORTANT!
(Ok, it wasn't *that* important)
EXAMPLE HEADER
==============
Since this is the first header noticed (all caps, underlined with an
"="), it will be a level 1 header. It gets an anchor named
"section_1".
Another example
===============
This is the second type of header (not all caps, underlined with "=").
It gets an anchor named "section_1_1".
Yet another example
===================
This header was in the same style, so it was assigned the same header
tag. Note the anchor names in the HTML. (You probably can't see them
in your current document view.) Its anchor is named "section_1_2".
Get the picture?
-- This is a custom header --
You can define your own custom header patterns if you know what your
documents look like.
Features of HTML::TextToHTML
============================
* Handles different kinds of lists
1. Bulleted
2. Numbered
- You can nest them as far as you want.
- It's pretty decent about figuring out which level of list it
is supposed to be on.
- You don't need to change bullet markers to start a new list.
3. Lettered
A. Finally handles lettered lists
B. Upper and lower case both work
a) Here's an example
b) I've been meaning to add this for some time.
C. HTML without CSS can't specify how ordered lists should be
indicated, so it will be a numbered list in most browsers.
4. Definition lists (see below)
* Doesn't screw up mail-ish things
* Spots preformated text
It just needs to have enough whitespace in the line.
Surrounding blank lines aren't necessary. If it sees enough
whitespace in a line, it preformats it. How much is enough?
Set it yourself at command line if you want.
* You can append a file automatically to all converted files. This
is handy for adding signatures to your documents.
* Deals with paragraphs decently.
Looks for short lines in the middle of paragraphs and keeps them
short with the use of breaks (
). How short the lines need to
be is configurable.
Unhyphenates split words that are in the middle of para-
graphs. Let me know if trailing punctuation isn't handled "prop-
erly". It should be.
One can also have multi-paragraph list items, like this one.
* Puts anchors at all headers and, if you're using the mail header
features, at the beginning of each mail message. The anchor names
for headings are based on guessed section numbers.
- You can turn off this option too, if you don't like it.
* Groks Mosaic-style "formatted text" headers (like the one below)
* Can hyperlink things according to a dictionary file.
The sample dictionary handles URLs like http://www.aigeek.com/ and
and also shows how to do simpler
things such as linking the word txt2html the first time it appeared.
* One can also use the link-dictionary to define custom tags, for
example using the star character to indicate *italics*.
* Recognises and parses tables of different types:
o DELIM: A table determined by delimiters.
o ALIGN: No need for fancy delimiters, this figures out
a table by looking at the layout, the spacing of the cells.
o BORDER: has a nice border around the table
o PGSQL: the same format as Postgresql query results.
* Also with XHTML! Turn on the --xhtml option and it will ensure that
all paragraphs and list items have end-tags, all tags are in
lower-case, and the doctype is for XHTML.
Example of short lines
----------------------
We're the knights of the round table
We dance whene'er we're able
We do routines and chorus scenes
With footwork impeccable.
We dine well here in Camelot
We eat ham and jam and spam a lot.
Example of varied formatting
----------------------------
If I want to *emphasize* something, then I'd use stars to wrap
around the words, *even if there were more than one*, *that's*
what I'd do. But I could also _underline_ words, so long as
the darn thing was not a_variable_name, in which case I wouldn't
want to lose the underscores in something which thought it was
underlining. Though we might want to _underline more than one word_
in a sentence. Especially if it is _The Title Of A Book_.
For another kind of emphasis, let's go and #put something in bold#.
But it doesn't even need to be that simple. Something which is *really
exciting* is coping with italics and similar things *spread across
multiple lines*.
Example of Long Preformatting
-----------------------------
(extract from Let It Rain by Kristen Hall)
I have given, I have given and got none
Still I'm driven by something I can't explain
It's not a cross, it is a choice
I cannot help but hear his voice
I only wish that I could listen without shame
Let it rain, let it rain, on me
Let it rain, oh let it rain,
Let it rain, on me
I have been a witness to the perfect crime
Wipe the grin off of my face to hide the pain
It isn't worth the tears you cry
To have a perfect alibi
Now I'm beaten at the hands of my own game
Let it rain, let it rain, on me
Let it rain, oh let it rain,
Let it rain, on me
Definition Lists
----------------
A definition list comprises the following:
Term:
The term part of a DL item is a word on a line by itself, ending
with a colon.
Definition:
The definition part of a DL item is at least one paragraph following
the term.
If one has more than one paragraph in the definition, the first line of
the next paragraph needs to be indented two spaces from where the term
starts, otherwise we don't know that it belongs to the definition.
Examples of Tables
------------------
ALIGN
~~~~~
Here is a simple ALIGN table:
-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).
Here are some of the conditions of ALIGN tables:
#Context:# A table needs to be surrounded by blank lines.
#Length:# A table must contain at least two rows.
#Width:# A table must contain at least two columns.
#Spacing:# There needs to be at least two spaces between the columns,
otherwise there might be some random paragraph which
could have inter-word spacing that lined up by accident.
#Cell Size:# If you have more than one line (as just above) then
you will simply get empty cells where the other column is empty.
#Alignment:# Alignment of cells is attempted to be preserved.
BORDER
~~~~~~
This is a table with a border.
+---------+-----+
| Food | Qty |
+---------+-----+
| Bread | 1 |
| Milk | 1 |
| Oranges | 3 |
| Apples | 6 |
+---------+-----+
PGSQL
~~~~~~
This is the same table like Postgresql would make it.
Food | Qty
---------+-----
Bread | 1
Milk | 1
Oranges | 3
Apples | 6
(4 rows)
DELIM
~~~~~
A delimited table needs to have its delimiters at the start and end,
just to be sure that this is a table.
:Fred:Nurk:58:
:George:Washington:62:
:Mary:Quant:35:
And one can have almost any delimiter one wishes.
| Darcy, Fitzwilliam | hero |
| Bennet, Elizabeth | heroine |
| Wickham, George | villain |
THINGS TO DO
============
There are some things which this module doesn't handle yet which
I would like to implement.
A. I would like to be able to preserve lettered lists, that is:
a) recognise that they are letters and not numbers (which it already
does)
b) display the correct OL properties with CSS so as to preserve
that information.
----------------------------------------
The footer is everything from the end of this sentence to the