pax_global_header00006660000000000000000000000064137171477650014534gustar00rootroot0000000000000052 comment=2d9159d041c7dc5596fd2546cd80f3c6bc3a10a9 perl-text2html-master/000077500000000000000000000000001371714776500152645ustar00rootroot00000000000000perl-text2html-master/ChangeLog000066400000000000000000001251021371714776500170370ustar00rootroot000000000000002020-08-19 Wed Jari Aalto * bin/t2html.pl: Code style update. Remove extra spaces from statements like if ( x ) => if (x) and the like. 2019-05-05 Sun Jari Aalto * ALL: update copyright years. Point URLs to Github. * Makefile (html): Add doc/conversion/index.html 2016-10-20 Thu Jari Aalto * bin/t2html.pl: (POD): Fix typos. * doc/index.html: Remove call to http://tiny-tools.sourceforge.net/bmt-time.js 2015-10-19 Jari Aalto * bin/t2html.pl: (Help): emove eval and use plain "use Pod::Man" at the beginning. 2014-03-05 Wed Jari Aalto * bin/t2html.pl (Help::POD::FORMAT DESCRIPTION): Fix typo. (HandleCommandLineArgs): Remove defined() test from array. (Help::POD): Add UTF-8 encoding tag. 2012-01-11 Wed Jari Aalto * bin/t2html.pl (XlatPicture): Improve regexp. (XlatUrlInline): Support multiple #URL tags on line. 2010-12-17 Fri Jari Aalto * bin/t2html.pl (XlatUrl): Fix https support. 2010-12-16 Fri Jari Aalto * bin/t2html.pl (XlatRef): Support https. (AcceptUrl): Ignore example.net, example.org (XlatUrl): Support https. 2010-12-09 Thu Jari Aalto * bin/t2html.pl (Main): Read $TITLE variable set in command line. 2010-12-05 Sun Jari Aalto * bin/t2html.pl (HandleCommandLineArgs): Correct --Link-check-single top --link-check-single. 2010-05-01 Sat Jari Aalto * bin/t2html.pl (HandleCommandLineArgs): Remove "require_order" from Getopt::Long::config(). 2010-04-15 Thu Jari Aalto * bin/t2html.pl (Help): rewrite eval Pod::Man call. 2010-04-13 Tue Jari Aalto * bin/t2html.pl: Add 'use 5.10' because "Named Capture Buffers" are used. 2010-03-24 Wed Jari Aalto * README (at the end): Add Copyright and License. 2010-03-13 Sat Jari Aalto * t2html.pl: (top level): rearrange globals and "use" commands. (Version*): new. 2010-03-12 Sat Jari Aalto * t2html.pl: (top level): English qw( -no_match_vars ) 2010-03-02 Tue Jari Aalto * t2html.pl: (top level): update copyright template. (Help): Change all mixed case options like --Link-check to lowercase --link-check. (DESCRIPTION): improve documentation and mention pm-doc example page. (FORMAT DESCRIPTION): rewrite the layout example: make sure all the columns line up with the presented text. * conversion/index.txt: Add index.css * conversion/index.css: New file. 2010-02-16 Jari Aalto * t2html.pl (Initialize): add "http://www.w3.org/TR/html4/loose.dtd" to doctype to fix W3C validator informational message. (HtmlFixes): remove duplicate

occurrences. 2010-02-13 Sat Jari Aalto * t2html.pl (PrintEnd): at 'This file has been automatically generated' do not surround with

tags as HTML standard does not allow it Use only opening tag

. 2009-08-31 Jari Aalto * t2html.pl (Help): Fix Perl 5.x problem where Pod::Text is broken. (HandleCommandLineArgs): Add --help-css. Change all --help-* options to lowercase. 2009-03-16 Jari Aalto * t2html.pl (HtmlFixes): Correct raw HTML entity translations # => # 2009-02-11 Jari Aalto * t2html.pl (XlatDirectives): Adjust #t2html-* directive handling. Do not delete whole line, but only text starting at the directive. This makes it possible to embed them beside lines: example text #t2html-comment .... => example text 2008-09-25 Jari Aalto * t2html.pl (HELP::Special text markings): Document subscript support. (XlatWordMarkup): Add subscript support. (Initialize): Add subscript support. (MakeComment): Remove extra spaces. (PrintEnd): Fix by removing class=... 2008-09-21 Jari Aalto * t2html.pl (HandleOneFile): Write proper file:/// URL. 2008-09-20 Jari Aalto * t2html.pl (JavaScript): Remove EOL whitespace from output. 2008-09-19 Jari Aalto * t2html.pl (CssData): Move closing qq() so that no extra white spaces are written to EOL. (MakeToc): , remove id=toc, because it is already defined one line below. W3C validator error. 2008-03-25 Jari Aalto * t2html.pl (HandleCommandLineArgs): Use 3way open() call. (WriteFile): Use 3way open() call. (UrlInclude): Use 3way open() call. (LinkCache): Use 3way open() call. 2007-11-08 Jari Aalto * t2html.pl (MakeMetaTags): If no description was given, do not output description http-equiv tag. 2007-10-23 Jari Aalto * t2html.pl (CssData::border-width): Change value from 94% to 'thin'. This makes the dashed table display correctly in Opera. 2007-10-19 Jari Aalto * t2html.pl (HtmlFixes): Correct grouping expressions that did not catch correct $1. This was causes by recent change where the case sensitive options were put in '(?:)' => '((?:))' Fix column 12, 'Note:' and html tag rendering. Change to '' in order to render it as wished. 2007-10-12 Jari Aalto * t2html.pl (HELP::Referring to local documents): Removed documentation of option '#URL-AS-IS-'. This is superceded by generic '#URL' * t2html.pl (UrlInclude): Fixed reading of function argument. This made 'raw:' include mode to work again. 2004-08-28 Sat Jari Aalto * txt/conversion.txt (2.1 Preface, Jan 1997): 1.7 Removed. The pfficial description is in t2html.pl --help. (2.3 Other text to HTML tools): 1.7 Added t2php project. 2007-09-30 Jari Aalto * txt/COPYING.GNU-FDL: New. * txt/COPYING.GNU-GPL: New. * txt/LICENSE.txt: New. 2007-09-18 Jari Aalto RELEASE: 20071002 * t2html.pl (HtmlFixes): Delete trailing whitespaces. (TestPage): Output to current directory, not $HOME/tmp. 2007-05-26 Jari Aalto * t2html.pl (UrlInclude): Added new option $mode.x (DoLine): Added new feature#INCLUDE-raw: to embed content as is. 2007-05-25 Jari Aalto * t2html.pl (MakeHeadingName): Increased word grab from 5 to 8 2007-05-24 Jari Aalto * t2html.pl (XlatTag2html): Added few french and spanish symbols. 2007-05-23 Jari Aalto * t2html.pl (UrlInclude): Added calls to process the included file: #URL substitution etc. (XlatUrlInline): Added 'm'ultiline modifier. 2007-03-09 Jari Aalto * t2html.pl (XlatTag2html): Added raquo, laquo 2007-03-01 Jari Aalto * t2html.pl -- changed some HERE documents to use qq() to that Emacs cperl-mode fontifies the code correctly. (DoLine): Fixed long standing bug where first paragraph after heading was not surrounded by the usual

. (ReadLinksBasic): Improved regexp to match link's last character better. (DOC::HELP): Mention that XHTML strict cannot be achived, because program was never designed for that. (TestPage): Improved error handling. (TestStyle): New. (WriteFile): Accept ARRAY or SCALAR(string). (TestPage): Added 4th test to demontrate external style file. Removed unnecessary exmaples. 2007-02-27 Jari Aalto * t2html.pl (top level): added 'use locale'. (MakeHeadingName): More words: from 5 to 8. Also delete punctuation. Handle Latin-1 conversions better. (IsHeading): Use [:upper:] instead of [A-Z] (XlatDirectives): Fixed empty #t2html-comment and number "1" error in output. 2007-02-21 Jari Aalto * t2html.pl (XlatUrl): If A HEREF was already handled then do nothing. This fixes bug where XlatUrlInline() already converted the URL. 2006-12-05 Jari Aalto * t2html.pl (XlatWordMarkup): Corrected superscripting in 'words[like this]'. 2006-10-27 Jari Aalto * t2html.pl (HtmlTable): EOF => "EOF". Update to new perl style. (HtmlFixes): Add necessary double quotes for tags like: (was: ) and (was:
). 2006-10-26 Jari Aalto * t2html.pl 1.178: All the
elements were removed. It is better to use CSS for marginal settings. * t2html.pl (MakeToc): Moved
the the beginning of '

Table of contents

' (MakeToc): Move last
to the correct location. 2006-10-25 Jari Aalto * t2html.pl (DoLine): Added input arg $file. Use it to contruct path name for relative #INCLUDE directive. (Doline::#include): Parse current $file and derive $dir to use for include. (HandleOneFile): Send $file argument to DoLine(). (UrlInclude): Converted into HASH args. Added input arg $dir. 2006-04-20 Jari Aalto * t2html.pl (HELP::OPTIONS): --css-font-type example fixed. Added single quotes. 2006-02-25 Sat Jari Aalto * t2html.pl (sub MakeToc): 1.171 Added div around the TOC. 2006-02-23 Thu Jari Aalto * t2html.pl (sub CssData): 1.171 Added 2em margin to blockquote. Added margin to PRE-tags. 2006-02-15 Jari Aalto * t2html.pl (HandleCommandLineArgs): Added missing backslash reference-indicator to `$deleteDefault'. 2006-02-03 Fri Jari Aalto * t2html.pl (sub HandleCommandLineArgs): 1.168 Added support to include multiple stylesheets by repeating --css-file option. (sub MakeMetaTags): 1.168 Added \n at the end of http-equiv="Content-Type" output. (sub HtmlFixes): 1.168 There wa missing

after each paragraph, Added fix for it. (sub CssData): 1.168 Added CSS for 'table' element: border: none; width: 100%; cellpadding: 10px; cellspacing: 0px; (sub HtmlTable): 1.168 Removed settings of table attributes like cellpadding. Now controlled by stylesheet. (sub HtmlFixes): 1.168 updated regexp, whic searched

and attributes 'cellpadding' etc. which are now gone. (sub DoLine): 1.168 The small !! marker that drew
tag is now defined in CSS with 'class=special'. 2006-01-25 Jari Aalto * t2html.pl (HtmlFixes): Fix double closing which was incorrect according to http://validator.w3.org/ (MakeMetaTags): Use qq() instead of HERE document for $charset. (MakeToc): Added closing to the Table of Contents. Now HTML 4.01 valid. 2005-05-29 Jari Aalto * t2html.pl (HELP::Embedding pictures): Corrected 'length=' to 'height=' for correct HTML picture attribute. 2005-02-16 Jari Aalto Updated Copyright year in all files. 2005-01-26 Wed Jari * t2html.pl (CssData CSS:p.column6): Added Unix font 'New Century Schoolbook' (CssData CSS:shade-note-attrib): Added Unix font 'New Century Schoolbook' 2005-02-08 Jari Aalto * t2html.pl (CssData): Added new styles `color-blue-light', `color-blue-medium'. 2005-02-06 Jari Aalto * t2html.pl (PrintEnd): Changed lincence to Creative Commons. 2004-12-13 Mon Jari Aalto * t2html.pl (sub WriteFile): 1.161 Added binmode. (LinkCache): Removed 'local *FILE', and used more modern 'open my $FILE'Added binmode. 2004-11-30 Tue Jari Aalto * t2html.pl (sub MakeUrlPicture): 1.153 Picture text did not contans spaces before the actual text. Like 'Pic 1.Text follows' => 'Pic 1. Text follows'. 2004-11-27 Sat Jari Aalto * t2html.pl (XlatHtml2tag): Added 1/2-sign, law-sign, pound-sign. (XlatTag2html): Added 1/2-sign, law-sign, pound-sign. 2004-11-16 Tue Jari Aalto * t2html.pl - Many, many changes. All HTMl tags are now lowercase due to xhtml compatibility. The XHTM tags are now in a hash table instead of hard coded to the program. - Versions after this date are probably unstable and may produce incorrect HTML. (Getopt::Long): 1.153 Added proper HTML_DOCTYPE for xhtml. Use
for xhtml. Check for simultaneous --html-frame and --Xhtml options. Only one can be used. (sub Initialize): 1.153 New variable %HTML_HASH. (HELP::BUGS): 1.153 Added note that --Xhtml option is broken and does not produce valid markup. (sub GetFile): 1.153 Added -f $file check against accidental directory read. 2004-11-15 Mon Jari Aalto * t2html.pl (sub InitArgs): 1.153 Removed `local @ARGV'. 2004-11-14 Sun Jari Aalto * t2html.pl (HtmlFixes): 1.153 The special 'Note:' paragraph at column 12 did not go through works markup substitution. Fixed and added debug. (HandleCommandLineArgs): 1.153 Added default value 'Note:' for $$CSS_CODE_STYLE_NOTE. 2004-11-13 Sat Jari Aalto * t2html.pl (sub XlatWordMarkup): 1.147 Added staticBegQuote, staticEndQuote. (sub XlatWordMarkup): 1.147 Respect $type -basic. (sub XlatWordMarkup): 1.147 Corrected $prefix to include beginning-of-line anchor (^). (sub XlatPicture): 1.147 Now markup can be used in #PIC element text too. Added call to XlatWordMarkup(). 2004-11-11 Thu Jari Aalto -- output now conforms http://validator.w3.org/ as HTML 4.01 Transitional. All checks passed. * t2html.pl (CssData): Added `color-beige3' and changed color of `shade-normal-attrib' color from #EFEFEF to #F1F1F1 which is slightly lighter (HandleCommandLineArgs): 1.147 Added new global variable `$HTML_TAG_PAGE'. (PrintEnd): Do not output

, which is not legal HTML 4.01. (MakeHeadingHtml): Do not output

, which is not legal HTML 4.01. (t2html::td:bgcolor): 1.147 Added newline after PRE tag. Was: '
   text here'.
        (DoLine::staticPreMode): 1.147 Removed extra \n when html was
        glued after 
. This makes text to follow PRE immedately.
        (sub HtmlCodeSectionStart): 1.147 Added \n after 
.
        (sub HtmlFixes): 1.147 The closing 
tag was lost. This was a bug in the s() operator. Now includes
which is stored. 2004-11-06 Sat Jari Aalto * t2html.pl (ReadLinksLinkExtractor): 1.150 Added 'defined' test. (Html2txt): 1.150 Added 'defined' test for $arrayRef. (ReadLinksBasic): 1.150 Added 'defined' test for $arrayRef. (LinkCheckMain): 1.150 Added 'defined' test for $arrayRef. (PrintHtmlDoc): 1.150 Added 'defined' test for $arrayRef. (KillToc): 1.150 Added 'defined' test for $arrayRef. (DoLine): 1.150 Added 'defined' test for $arrayRef. (HandleOneFile): 1.150 Added 'defined' test for $txt. 2004-11-05 Fri Jari Aalto * t2html.pl (PrintArray): 1.147 Added 'defined' test for `$arrayRef' (IsHTML): 1.150 Added 'defined' test (WriteFile): 1.150 Added 'defined' test. Removed 'local *FILE' and used new perl 'my $FILE'. 2004-11-04 Thu Jari Aalto * t2html.pl (CheckEmail): 1.147 Added debug. Removed unnecessary variable $die. 2004-10-30 Sat Jari Aalto * t2html.pl (sub Help): 1.144 Incorrect variable name $is => $id 2004-10-27 Wed Jari Aalto * t2html.pl (sub Help): 1.144 Added debug calls. 2004-10-26 Tue Jari Aalto * t2html.pl (sub InitArgs): Changed inpur paratemer call syntax: now takes a HASH, nstead of ARRAY. 1.144 Check if `argvRef' is defined before using it. 2004-10-23 Sat Jari Aalto * t2html.pl (HELP::ENVIRONMENT): 1.137 Added missing '=over 4' and +back directives. 2004-10-22 Fri Jari Aalto * etc/makefile/vars.mk: (prefix): changed from /usr to /usr/local which is more appropriate for 3rd party packages. * t2html.pl (Getopt::DEBUG"): 1.137 Fixed warning 'Name Getopt::DEBUG used only once: ...' 2004-10-20 Wed Jari Aalto * t2html.pl (sub XlatUrl): 1.137 No more add qq(target="_top") to the A HREF, because Opera does not display it correctly. (sub ReadLinksBasic): 1.137 BASE url compositon was wrong. Fixed. 2004-10-18 Jari Aalto * t2html.pl (GetFile): Improved error message when file couldn't be opened. (Main): Added PANIC in case cwd() fails to return correct directory. 2004-10-14 Jari Aalto * t2html.pl (LinkCheckMain): Rename. Was LinkCheck(). (ReadLinksMain): Renamed. Was ReadLinks() (ReadLinksBasic): Added variable $tag. Added IMG SRC and LINK HREF checking. Added $root variable for links, that have href=/this/page.htm. Multiple links at the same line were not found; now fixed. (LinkCheckExternal): Exclude example.net, .org, .biz and .info (LinkCache): Removed $staticActive requirement from -add Added imput parameter -code to accept HTTP return value. Write out only URL links with code 200 (ok). (LinkCheckLwp): Cache all URLs, so that if same URL is seen again, no request is sent to remote site. The previous status code is reused from cache. Return HTTP error codes and not just 1 or 0 status code. Disabled GET request and used solely HEAD. 2004-10-12 Tue Jari Aalto * t2html.pl (sub ReadLinksBasic): 1.132 Clarified regex by introducing variables `urlset' and `quote'. Added one more debug 5 level print. (sub ReadLinksBasic): 1.132 Fixed mismatches like http://example.org/links.html> => http://example.org/links.html 2004-09-11 Jari Aalto * t2html.pl (PrintHtmlDoc): typos corrected in manual page. 2004-08-29 Sun Jari Aalto * t2html.pl (sub XlatWordMarkup): 1.131 Changed [ \t] to [\s] which is more general. Code now uses more readable s{}{}gx and new variable $prefix. 2004-08-12 Thu Jari Aalto * t2html.pl (sub ReadLinksBasic): 1.130 Incorrect regexp to find links. Added \s to exclude characters that do not belong to address. 2004-06-28 Mon Jari Aalto * t2html.pl (Help): 1.129 Added more documentation to --language option. 2004-03-30 Jari Aalto * t2html.pl Extra newlines were removed from whole program. Sort of clean up work. (CssData): Change word-small from 0.8 to 0.7, because if the web page is zoomed 200% or more the font with 0.8 looks too big. (DoLine): Corrected bug, where right hand angle bracked and text after it in tag #URL-AS-IS was not outputted. The regexp must include >, not exclude it in output. 2004-02-11 Wed Jari Aalto * t2html.pl (sub IsHeading): 1.122 Wrong regexp did not recognize heading without numbering. "Like This", but wanter "1.1 Like This". (HandleCommandLineArgs): 1.123 Options --css-code-bg and --css-code-bg2 has mistakenly taken option `:s'. Removed. 2004-02-06 Fri Jari Aalto * t2html.pl (CssData() => .shade-note-attrib): Added `font-family' and `font-size'. It is not 0.8 smaller. (sub InitArgs): 1.121 Fixed local Hash() function. Always list @values was returned and then assignef to scalar. That caused the count of members. Now return single scalar. This fixes E.g 1 problem. 2004-02-03 Tue Jari Aalto * t2html.pl (sub XlatTag2html): 1.121 Special HTML entities that had number at the end, like ³ were translated incorrectly as &sup3;. Fixed. 2004-02-02 Mon Jari Aalto * t2html.pl (sub DoLineUserTags): 1.121 Added debug. (sub XlatTag2htmlSpecial): 1.121 Added debug. (sub XlatTag2html): 1.121 Added debug. (sub XlatPicture): 1.121 Added debug. 2004-01-27 Tue Jari Aalto * t2html.pl - Added many new lines of debug code all over the program. (Getopt::Long): 1.120 Option --Auto-detect mistakenly had `:s' option request which ate the file name following the option. (sub IsHTML): 1.120 Searching first 100 for is too much. Reduced to 10 lines. 2004-01-24 Sat Jari Aalto * t2html.pl (sub Main): 1.117 Hash('option') CAll did not check values correctly. It always returned try, so all user's command line options were cleared. The Hash-function must not return 'undef' or any other values, because it is stored to an array. The only correct value to return is empty list (). This fix corrects e.g. option --name-uniq. 2003-12-14 Jari Aalto * t2html.pl (XlatDirectives): Typo corrected if $line = /\S/ => if $line =~ /\S/ 2003-12-05 Fri Jari Aalto * t2html.pl (XlatDirectives): 1.116 Incorrectly required that all #t2html- tags start from left. Regexp included '^\s*'. The anchor is now removed. 2003-11-11 Jari Aalto * t2html.pl (XlatDirectives): If the directove line was empty line plain '#t2html-comment' the line was not stripped. Now fixed. No directives are left in text. 2003-09-26 Fri Jari Aalto * t2html.pl (unless ( /\S/ )): 1.115 --script-file no longer dies on error if file was not found. New option --css-file to import external 2003-09-21 Jari Aalto * t2html.pl (AcceptUrl($)): 'foo' excluded also valid links like 'foorum'. Fixed by aadding \b. 2003-08-19 Tue Jari Aalto * t2html.pl (sub ReadLinksBasic): 1.113 the BASE value included extra /, which made all links invalid. Now regexp correctly removed only trailing slash. Ignore `mailto' HREFS. 2003-08-17 Sun Jari Aalto * t2html.pl (Help): New user option #INCLUDE-. With this is is possible to include other files into this current one. Either via http://address or just plain files. The included portion is copied to the point verbatim. No translations happen. (sub RemoveHTMLaround): 1.109 New. (sub UrlInclude): 1.109 New. (sub EnvExpand): 1.112 New. 2003-08-12 Tue Jari Aalto * t2html.pl (CssData): 1.107 Changed colors to dimmer. [samp.word] is no longer `blue', but lighter #4C9CD4 + bold. [shade-note-attrib] is no longer #E0E0F0 but lighter #E5ECF3. (CheckModuleLWP): Removed. (CheckModuleLinkExtractor): Removed. (ReadLinksLinkExtractor): New. If HTML::LinkExtractor is around, use it and skip program's own imperfect link extracting routine. (TestDriverLinkExtractor): 1.108 New. 2003-08-03 Sun Jari Aalto * t2html.pl (sub GetFile): 1.106 HTTP get did not preserve newlined. Added for-loop to split(). (sub ReadLinks): 1.106 Whole logic rewritten. There were serious URL link reading errors in this whole function. The current implementation is not very robust, but should be adequate for simple external links (http://). Added support for detecting local 'A HREF=local.html' links by prepending BASE to them. 2003-07-31 Thu Jari Aalto * t2html.pl (sub HandleCommandLineArgs): 1.103 Added empty HOME test. (Top level;): 1.103 Added qw($HOME), to satisfy Windows users, who do not have that set. Bug reported by vimma@jippii.fi (sub Help): 1.103 Documented new option --button-heading-top. Added new topic `BUGS'. (sub HandleCommandLineArgs): 1.103 New option `button-heading-top'. (sub MakeHeadingHtml): 1.103 Added support for $OPT_HEADING_TOP_BUTTON (The feature was previously commented out). (sub HandleCommandLineArgs): 1.103 --Version option now prints to stdout instead of stderr. In addition, print $PROGRAM_NAME; full path. (sub MakeHeadingHtml): 1.104 Now the [toc] link respects the --Language setting. 2003-07-22 Tue Jari Aalto * t2html.pl (sub MakeToc): 1.102 Simplified FRAMESET main() html file, so that there is no CSS any more, because it is only needed in the xxxx-body.html file. 2003-05-02 Fri Jari Aalto RELEASED: 2003.0502 * t2html.pl (sub XlatUrlInline): 1.100 New feature, see inline #URL from the --help section. (sub XlatUrl): 1.100 Added Backward lookahead test so that XlatUrlInline() cases are skipped. 2003-01-24 Fri Jari Aalto * t2html.pl (sub ReadLinks): 1.97 Do not check links that are prefixed with a minus character, like -http://this.is/example.html 2003-01-12 Sun Jari Aalto * t2html.pl (DoLine): 1.97 Corrected #URL-AS-IS regexp. Do not count in the last character `>', like in <#URL-AS-IS-this.html> 2003-01-02 Thu Jari Aalto * t2html.pl (MakeHeadingHtml): 1.97 If --as-is option is active, do not output
code at all. If --Xhtml option has been give, output
2002-12-22 Jari Aalto * t2html.pl (MakeHeadingName): Multiple underscores are squeezed to one. 2002-11-22 Jari Aalto * t2html.pl (CssData): Missing required semicolon from CSS defineition at `p.column7' => `font-weight: bold;' 2002-10-10 Jari Aalto * t2html.pl (MakeUrlPicture): Changed picture ALT text. Now display the image name instead of the description text. 2002-10-03 Thu Jari Aalto * t2html.pl (CssData): 1.95 Changed quote7 CSS spec: 0.7 => 1em. 2002-09-17 Tue Jari Aalto * t2html.pl (HtmlFixes): 1.94 Remove P-tag just before
    and
      . Not needed. (MakeHeadingHtml): 1.94 Added special character handling to header string. Call `XlatTag2htmlSpecial'. (DoLine): 1.94 Fixed mysterious '--' which was not converted into –. Added to the top of function, next to (XlatWordMarkup): 1.94 Added return immediately if ARG is empty (XlatTag2html): 1.94 Added return immediately if ARG is empty (XlatTag2htmlSpecial): 1.94 Added return immediately if ARG is empty 2002-09-16 Mon Jari Aalto * t2html.pl (t2html::td:bgcolor): Corrected 'else if' => 'elsif' 2002-09-01 Sun Jari Aalto * t2html.pl (CssData): 1.90 Changed quote7 from 1.2em to 0.7em. The font is not that loud any more. 2002-08-31 Sat Jari Aalto * examples/: The other test files (2, 3) were not identical to (1). Copied. * t2html.pl (HandleCommandLineArgs): 1.90 Removed non-descriptive --Butt, --Buttp, --Butt options. Use the long names instead. (HandleCommandLineArgs): 1.90 Removed non-descriptive --mk and --md options. Use --meta-description and --meta-keywords instead. 2002-08-30 Jari Aalto * t2html.pl (CssData): Added 'color: Navy;' to many defaul CSS style. Most useful in #t2html:: code column 12 directives. * t2html.pl (HtmlFixes):
       code cleanup required
              in whitespace before newline. This was incorrect. There
              can be zero spac before newline. Fixed. In layout, this
              causes extra newline just before CODE column 12.
      
      2002-08-30  Jari Aalto   
      
              * t2html.pl (__DATA__): Changed default text: wording and
              some pragraphs rewritten.
      
      2002-08-30  Jari Aalto   
      
      
              * admin.bashrc (function sfperl2html_docexamples ()):
              Example 4 was not generated correctly. Bash gor mixed up
              with the embedded single quotes in definition
              --css-code='(?:Note|Notice)' => reduced to simple
              --css-code=Note: which does not use single quotes. This
              is only a problem inside bash command file, which uses `eval'.
      
      2002-08-28  Jari Aalto   
      
              * t2html.pl (XlatTag2htmlSpecial): New.
      
      2002-08-27  Jari Aalto  
      
              * t2html.pl (HtmlFixes): Corrected 
       tag fix to delete
              all newlines after the tag. This shortens the gap that is shown
              at column from regular text.
              (XlatWordMarkup): Corrected superscripting: special characters
              are ignore '\"!?;,.(<> before superscript.
      
      2002-08-19  Jari Aalto  
      
              * t2html.pl (CssData): Redesigned styles`table.solid',
              new style `table.dashed'.
              (__DATA__): Added new table rendering examples for --test-page.
              (Help): Added POD documentation to new section
              `Controlling CSS generation (HTML tables)'. Long story
              about how 
and
tags can be controlled. * html/index.html: Added news for 2002-08-19 version, which contains lot of impreovements. * examples/t2html.pl-1.txt: Spell cheked file with Emacs M-x flyspell-buffer. (Table rendering examples): New section to demontrate the new table controlling features. 2002-08-18 Jari Aalto * t2html.pl (HtmlFixes): Whent the new feature --css-code-note was put to a test (big documnet), it had serious flaws in regexps (too greedy). Fixed. Now coloring should be ok. ADDEd NEW FEATURES. It is now posisble to say exactly what kind of table redering is wanted. See manual for directive #t2html:: 2002-08-18 Jari Aalto KIT RELEASED 2002.0818 * t2html.pl - NEW FEATURE: It is possible to mark special 'notice' paragraphs with option --css-code-note=REGEXP. See --test-page for demonstration. (DATA): Example updated. (HtmlFixes): New features. Now render with different color if there is special keyword. (HtmlTable): New. (HandleCommandLineArgs): Added documentation for options --css-code-bg and --css-code-note=REGEXP (HandleCommandLineArgs): Added support new option --css-code-note=REGEXP 2002-08-14 Jari Aalto * t2html.pl (Initialize): 1.76 Changed all conatant names to uppercase. E.g. OutputSimple => OUTPUT_TYPE_SIMPLE. Changed contant values to strings also, and removed non-descriptive numbers (used to be type 1, now uses bareword -simple) 2002-08-13 Jari Aalto * t2html.pl (XlatHtml2tag): Added more conversion: Swedish, Norweigian, German. (MakeHeadingName): With foreign languages, like in Finnish the Heading may contain non-7bit characters and the NAME tag generated is composed form ä, which is not good. Not substitutes ä with simple 'a'. Same for Swedish, Norweigian, and German. (UpdateHeaderArray): Adde dmore debug. The substitution removed crucial & and ; characters that delimit special HTML tokens. Do not remove those, they are handled elswhere. (HandleCommandLineArgs): environment variable $EMAIL is no longer used. User must supply --email EMAIL. (HandleCommandLineArgs): New opion --Auto-detect. Only files that 'should be converted' are converted to HTML. (OutputDir): Check if OUTPUT_DIR is defined. (PrintEndSimple): Check OPT_EMAIL for value, before inderting (MakeMetaTags): Check $email and $author for values before using them. (HeaderArrayClear): New. (HandleOneFile): Added call to HeaderArrayClear() 2002-08-08 Thu Jari Aalto * admin.bashrc (function sfperl2html_ask ()): 1.6 New. (function sfperl2html_ask ()): 1.6 New. (function sfperl2html_release_check ()): 1.6 New. (function sfperl2html_release ()): 1.6 Call sfperl2html_release_check() 2002-08-07 Wed Jari Aalto * admin.bashrc (function sfperl2html_html ()): 1.5 Renamed. (function sfperl2html_doc ()): 1.5 Renamed. * t2html.pl - NEW REVOLUTIONARY FEATURE ADDED. - It is now possible to embed command line options directly into text file. Like if you would like to set Code sections grey-shade on, Add line: #T2HTML-OPTION --css-code-bg #T2HTML-OPTION --as-is Ran through Perl 5.8.0 -w => fixed errors. (HandleCommandLineArgs): Added `outputUndefined' to set initial value of OUTPUT_TYPE. (DoLine): 1.75 in 'study line' Move regexp /^( +)[^ ]/ inside if. It may not necessarily match. Initialise $spaces to 0. (CssData): 1.75 Initialize $ARG if not passed to function. (JavaScript): 1.75 Check defined $JAVA_CODE. (PrintEnd): 1.75 Check defined $file. (PrintEnd): 1.75 Set $url to initial value. (PrintStart): 1.75 initialise @ret; (MakeMetaTags): 1.75 initialise variables: author, email, kwd, desc (CssData): 1.75 Check defined $CSS_FONT_TYPE (PrintStart): 1.75 initialise bodyAttr, email and all %arg parameters. (PrintStart): 1.75 (DoLine): 1.75 Correctd staticPreMode => $staticPreMode (XlatTag2html): 1.75 Do not translate " in PURE HTML directives. (Main): 1.75 Created local function Arr() to access new data structure. (XlatDirectives): 1.75 New data structure. The hash VALUE is now anonymous array. (GetFile): 1.75 Do not exit from subroutine via `next' (PrintArray): 1.75 Wrong offset. $count must be -1, due to zero offset. (MakeHeadingHtml): 1.75 Initalised variable $button. (MakeMetaTags): 1.75 Changed URL to sourceforge. (MakeToc): 1.75 Initialised variables $stylee, $styleb (MakeToc): 1.75 Corrected two calls to MakeMetaTags(). Must be hash. (Main): 1.75 Fixed multiple passig of $verb to command line parser. It must be reseted to original value in InitArgs() (MakeToc): 1.75 Corrected reference generation. If there is o frame name, do not add it to the . (HtmlFixes): 1.75 New. Now --css-code-bg works correctly. (HtmlCodeSectionStart): 1.75 Added
 inside 
        so that -css-code-bg works correctly.
        (HandleCommandLineArgs): 1.75 Added options
        --t2html-tags and turn off option --not2html-tags
        (PrintHash): 1.75 Added ability to print anonymous arrays
        (HandleCommandLineArgs): 1.75 Fixed bug where --options-file=FILE
        was only allowed option. Now rest of the options in ARGV are merged
        with the options from file.
        (XlatTag2html): 1.75 Fixed << and >> shell redirection
        parsing. They must be translated i <>, but they should
        not be translated in `cat >> file'.
        (XlatTag2html): 1.75 Added German translations Uml and SS.

2002-08-05 Mon  Jari Aalto  

        * admin.bashrc (function sfperl2htmldoc ()): More better
        perl file hanfling. Added two for-loops. Last one removes
        unneeded POD temporary files.
        (function sfperl2htmlinit ()): Added SF_UPLOAD_DIRECTORY

        * t2html.pl (HandleOneFile): Changed debug to verb when
        so that --Out will print where the destination file
        is outputted.
        (HandleCommandLineArgs): OUTPUT_AUTOMATIC (--Out) must be turned
        also on if user supplies --Out-dir.
        (CssData): Removed `HIDE BROWSER' tags because they interfred
        in CSS handling (IE 5.5). Corrected code that added semicolon
        at the end of SIZE and FONT styles. (added double ;;)
        (HandleCommandLineArgs): No longer set default
        CSS_FONT_SIZE, but let the browser decide font size.

        * html/index.html: Removed Copyright year
        from the end of page.

        * /html/index.html (

Links and download ): Added CHANGELOG printing from CVS. Rewote teh example perl commands. Had to copy files from doc/examples/ to doc/html directory, because otherwise it is not possible to locally browse doc/html/index.html with the example links in the page. Updated NEWS section to mention about new #T2HTML features. Chnages CPAN text to say, that it no longer the primary source - warning, that the CPAN file may not be up to date. Removed copyright year from the end of file. 2002-08-03 Sat Jari Aalto * t2html.pl (HandleCommandLineArgs): Die, if library LWP is not available and user requests link check. 2002-08-02 Fri Jari Aalto * t2html.pl (HandleCommandLineArgs): Incorect setting of $verb. couldn't pass --verbose 3, because it was set to 0. fixed. (HandleCommandLineArgs): Added to the documentation Seaction about extra `Directives' that can be embedded inside file with tag #T2HTML- (PrintStart): Added support for many new input parameters instead of using globals. (PrintHtmlDoc): Added support for many new input parameters instead of using globals. (HandleOneFile): Added support for many new input parameters instead of using globals. (Main): Added reading directove codes from source file. 2002-07-26 Jari Aalto * t2html.pl (Help): Ran --help-man and fixed the documentation, because too many paragraphs were too long when read through 2002-05-07 Tue Jari Aalto * t2html.pl (sub XlatTag2html): 1.69 Added support for embedding HTML code into the text. Now surround HTML code with double- lt-gt tags in format: <> 2002-04-28 Sun Jari Aalto * t2html.pl (HandleCommandLineArgs): 1.68 Added new option --help-man (sub Help): 1.68 Now print Unix manual page with argument -man * admin.bashrc (function sfperl2htmldoc): 1.2 New. Updates the t2html.html page and now generates Unix manuala page t2html.1 2002-04-20 Sat Jari Aalto * t2html.pl (sub XlatTag2html): 1.66 Added lot more HTML token conversions. Copyright sigh (C), Registered trade mark sign (R), Plus minus sign +-, long dash --. Added support to embed direct html tokes like × 2002-02-27 Wed Jari Aalto * t2html.pl: 1.62 Added superscript feature: Person said[1]. the Bracket MUST BE next to the word. 2002-02-04 Mon Jari Aalto * t2html.pl (sub OutputDir): 1.62 Corrected --Out options. Mistakenly put files to /, now looks at cwd(). 2002-02-02 Sat Jari Aalto * t2html.pl (sub MakeToc): 1.61 Converted to use HASH input parameters. (sub LinkCheckExternal): 1.61 Converted to use HASH input parameters. (sub LinkCheck): 1.61 Converted to use HASH input parameters. (sub MakeMetaTags): 1.61 Converted to use HASH input parameters. (sub PrintStart): 1.61 Converted to use HASH input parameters. (sub PrintEnd): 1.61 Converted to use HASH input parameters. (sub PrintHtmlDoc): 1.61 Converted to use HASH input parameters. (sub MakeHeadingHtml): 1.61 Converted to use HASH input parameters. (sub DoLine): 1.61 Converted to use HASH input parameters. (sub HandleOneFile): 1.61 Converted to use HASH input parameters. 2002-02-01 Fri Jari Aalto * t2html.pl (sub Main): 1.61 Split into smaller parts. (sub GetFile): 1.61 New. (sub OutputDir): 1.61 New. 2002-01-31 Thu Jari Aalto * t2html.pl -- Many big internal changes (sub HandleCommandLineArgs): 1.57 Added `Link-cache' options. (sub Help): 1.57 Wrote `Link-cache' docs. (sub Min): 1.57 New. (sub IsHTML): 1.57 New. (sub LinkCache): 1.57 New. Now can use local cache for OK checked urls. (sub LinkHash): 1.57 New. (sub LinkCheckLwp): 1.57 New. More abstraction. (sub LinkCheckExternal): 1.57 Exploded function to smaller pieces. (sub LinkCheck): 1.57 Converted to use HASH function arguments. (sub Main): 1.57 Send more globals to HandleOneFile(). (sub HandleOneFile): 1.57 Converted to use HASH function arguments. emoved globals due to new architechture. 2002-01-24 Thu Jari Aalto * t2html.pl (sub LinkCheckExternal): 1.57 Added reegxp to exclude some non-readl links like example.com and {foo,bar,baz}.site.com. perl-text2html-master/INSTALL000066400000000000000000000017741371714776500163260ustar00rootroot00000000000000INTALL: Perl Text to HTML converter ------------------------------------ System wide install Run makefile with appropriate parameters. The program is installed without the .pl file suffix make DESTDIR= prefix=/usr/local install To test the installation (to see how files are installed): make install-test find -type f tmp/ Manual install 1. Copy bin/*.pl somewhere along $PATH 2. Copy bin/*.1 somewhere along $MANPATH Optional In order to use the link checking feature (--Link* option), extra Perl modules are needed. Check with these commands if they are already installed. perl -MHTML::FormatText -e 'print ok' perl -MHTML::Parse -e 'print ok' perl -MLWP::UserAgent -e 'print ok' To install them, visit http://cpan.perl.org or if you have administrative rights to install software, use commands: perl -MCPAN -e shell cpan> install End of file perl-text2html-master/Makefile000077500000000000000000000116301371714776500167300ustar00rootroot00000000000000#!/usr/bin/make -f # # Copyright information # # Copyright (C) 2002-2020 Jari Aalto # # License # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . ifneq (,) This makefile requires GNU Make. endif PATH := bin:$(PATH) PACKAGE = t2html DESTDIR = prefix = /usr exec_prefix = $(prefix) man_prefix = $(prefix)/share mandir = $(man_prefix)/man bindir = $(exec_prefix)/bin sharedir = $(prefix)/share BINDIR = $(DESTDIR)$(bindir) DOCDIR = $(DESTDIR)$(sharedir)/doc/$(PACKAGE) LOCALEDIR = $(DESTDIR)$(sharedir)/locale SHAREDIR = $(DESTDIR)$(sharedir)/$(PACKAGE) LIBDIR = $(DESTDIR)$(prefix)/lib/$(PACKAGE) SBINDIR = $(DESTDIR)$(exec_prefix)/sbin ETCDIR = $(DESTDIR)/etc/$(PACKAGE) # 1 = regular, 5 = conf, 6 = games, 8 = daemons MANDIR = $(DESTDIR)$(mandir) MANDIR1 = $(MANDIR)/man1 MANDIR5 = $(MANDIR)/man5 MANDIR6 = $(MANDIR)/man6 MANDIR8 = $(MANDIR)/man8 TAR = tar TAR_OPT_NO = --exclude='.build' \ --exclude='.sinst' \ --exclude='.inst' \ --exclude='tmp' \ --exclude='*.bak' \ --exclude='*[~\#]' \ --exclude='.\#*' \ --exclude='CVS' \ --exclude='.svn' \ --exclude='.git' \ --exclude='.bzr' \ --exclude='*.tar*' \ --exclude='*.tgz' # E.g. --verbose TAR_OPT_USER = INSTALL = /usr/bin/install INSTALL_BIN = $(INSTALL) -m 755 INSTALL_DATA = $(INSTALL) -m 644 INSTALL_SUID = $(INSTALL) -m 4755 DIST_DIR = ../build-area DATE = `date +"%Y.%m%d"` VERSION = $(DATE) RELEASE = $(PACKAGE)-$(VERSION) BIN = $(PACKAGE) PL_SCRIPT = bin/$(BIN).pl INSTALL_OBJS_BIN = $(PL_SCRIPT) INSTALL_OBJS_DOC = README INSTALL_OBJS_MAN = bin/*.1 all: @echo "Nothing to compile." @echo "Try 'make help' or 'make -n DESTDIR= prefix=/usr/local install'" # Rule: help - display Makefile rules help: @grep "^# Rule:" Makefile | sort # Rule: clean - remove temporary files clean: # clean -rm -f *[#~] *.\#* \ *.x~~ pod*.tmp rm -rf tmp distclean: clean realclean: clean dist-git: rm -f $(DIST_DIR)/$(RELEASE)* git archive --format=tar --prefix=$(RELEASE)/ master | \ gzip --best > $(DIST_DIR)/$(RELEASE).tar.gz chmod 644 $(DIST_DIR)/$(RELEASE).tar.gz tar -tvf $(DIST_DIR)/$(RELEASE).tar.gz | sort -k 5 ls -la $(DIST_DIR)/$(RELEASE).tar.gz # The "gt" is maintainer's program frontend to Git # Rule: dist-snap - [maintainer] release snapshot from Git repository dist-snap: echo gt tar -q -z -p $(PACKAGE) -c -D master # Rule: dist - [maintainer] release from Git repository dist: dist-git dist-ls: @ls -1tr $(DIST_DIR)/$(PACKAGE)* # Rule: dist - [maintainer] list of release files ls: dist-ls bin/$(PACKAGE).1: $(PL_SCRIPT) $(PERL) $< --help-man > $@ @-rm -f *.x~~ pod*.tmp doc/manual/index.html: $(PL_SCRIPT) $(PERL) $< --help-html > $@ @-rm -f *.x~~ pod*.tmp doc/manual/index.txt: $(PL_SCRIPT) $(PERL) $< --help > $@ @-rm -f *.x~~ pod*.tmp doc/conversion/index.html: doc/conversion/index.txt perl -S t2html.pl --auto-detect --out --print-url $< @-rm -f *.x~~ pod*.tmp # Rule: man - Generate or update manual page man: bin/$(PACKAGE).1 html: doc/manual/index.html doc/conversion/index.html txt: doc/manual/index.txt # Rule: doc - Generate or update all documentation doc: man html txt # Internal: perl-test - Check program syntax perl-test: # perl-test - Check syntax perl -cw $(PL_SCRIPT) podchecker $(PL_SCRIPT) # Rule: test - Run tests test: perl-test check: test install-doc: # Rule install-doc - Install documentation $(INSTALL_BIN) -d $(DOCDIR) [ ! "$(INSTALL_OBJS_DOC)" ] || \ $(INSTALL_DATA) $(INSTALL_OBJS_DOC) $(DOCDIR) $(TAR) -C doc $(TAR_OPT_NO) $(TAR_OPT_USER) --create --file=- . | \ $(TAR) -C $(DOCDIR) --extract --file=- install-man: man # install-man - Install manual pages $(INSTALL_BIN) -d $(MANDIR1) $(INSTALL_DATA) $(INSTALL_OBJS_MAN) $(MANDIR1) install-bin: # install-bin - Install programs $(INSTALL_BIN) -d $(BINDIR) for f in $(INSTALL_OBJS_BIN); \ do \ dest=$$(basename $$f | sed -e 's/\.pl$$//' -e 's/\.py$$//' ); \ $(INSTALL_BIN) $$f $(BINDIR)/$$dest; \ done # Rule: install - Standard install install: install-bin install-man install-doc # Rule: install-test - [maintainer] test installation in tmp directory install-test: rm -rf tmp make DESTDIR=`pwd`/tmp prefix=/usr install find tmp | sort .PHONY: clean distclean realclean .PHONY: install install-bin install-man .PHONY: all man doc test install-test perl-test .PHONY: dist dist-git dist-ls ls # End of file perl-text2html-master/README000066400000000000000000000032721371714776500161500ustar00rootroot00000000000000README: Perl Text to HTML converter ----------------------------------- Convert text file into HTML 4.01/CSS2 format. The is written in natural white paper format by using standard headings and indented paragraphs at standard tab position column 8. The text can *contain* _ASCII_ =markup= `tokens'. Embedding HTML is also possible via INCLUDE directives. admin/ Administrative files for the project bin/ The program and manual page *.1 doc/ Documentation Important files doc/license/ Licensing information ChangeLog Project change records Project details Homepage https://github.com/jaalto/project--perl-text2html http://savannah.nongnu.org/projects/perl-text2html (backup) Reporting bugs See homepage Source code repository See homepage Depends Perl (any version) External Perl CPAN library dependencies LWP::UserAgent [1a] HTML::FormatText [1b] HTML::Parse [1c] HTML::LinkExtractor [2] [1a] Required if option --link-check is used. Included in library libwww-perl in Debian: [1b] Same as 1a. Included in Debian library libhtml-format-perl [1c] Same as 1a. Included in Debian library libhtml-linkextractor-perl [2] Optional. Used only if available to extract links from document for use with --link-check. If library is not available, the internal extractor is used. Copyright Copyright (C) 1996-2020 Jari Aalto License This program is free software; you can redistribute and/or modify program under the terms of GNU General Public license either version 2 of the License, or (at your option) any later version. End of file perl-text2html-master/bin/000077500000000000000000000000001371714776500160345ustar00rootroot00000000000000perl-text2html-master/bin/t2html.1000066400000000000000000001725421371714776500173430ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "T2HTML 1" .TH T2HTML 1 "2020-08-19" "perl v5.30.3" "Perl Text to HTML Converter" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" t2html \- Simple text to HTML converter. Relies on text indentation rules. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& t2html [options] file.txt > file.html .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" Convert pure text files into nice looking, possibly framed, \s-1HTML\s0 pages. An example of conversion: .PP .Vb 2 \& 1. Plain text source code \& http://pm\-doc.git.sourceforge.net/git/gitweb.cgi?p=pm\-doc/pm\-doc;a=blob_plain;f=doc/index.txt;hb=HEAD \& \& 2. reusult of conversion with custom \-\-css\-file option: \& http://pm\-doc.sourceforge.net/pm\-tips.html \& http://pm\-doc.sourceforge.net/pm\-tips.css \& \& 3. An Emacs mode tinytf.el for writing the text files (optional) \& https://github.com/jaalto/project\-\-emacs\-tiny\-tools .Ve .PP \&\fBRequirements for the input ascii files\fR .PP The file must be written in Technical Format, whose layout is described in the this manual. Basically the idea is simple and there are only two heading levels: one at column 0 and the other at column 4 (halfway between the tab width). Standard text starts at column 8 (the position after pressed tab-key). .PP The idea of technical format is that each column represents different rendering layout in the generated \s-1HTML.\s0 There is no special markup needed in the text file, so you can use the text version as a master copy of a \s-1FAQ\s0 etc. Bullets, numbered lists, word emphasis and quotation etc. can expressed in natural way. .PP \&\fB\s-1HTML\s0 description\fR .PP The generated \s-1HTML\s0 includes embedded Cascading Style Sheet 2 (\s-1CSS2\s0) and a small piece of Java code. The \s-1CSS2\s0 is used to colorize the page loyout and to define suitable printing font sizes. The generated \s-1HTML\s0 also takes an approach to support \s-1XHTML.\s0 See page http://www.w3.org/TR/xhtml1/#guidelines where the backward compatibility recommendations are outlined: .PP .Vb 4 \& Legal HTML XHTML requires \&

..

\&


\&

.Ve .PP \&\s-1XHTML\s0 does not support fragment identifiers #foo, with the \f(CW\*(C`name\*(C'\fR element, but uses \f(CW\*(C`id\*(C'\fR instead. For backward compatibility both elements are defined: .PP .Vb 1 \& < ..name="tag"> Is now <.. name="tag" id="tag"> .Ve .PP \&\s-1NOTE:\s0 This program was never designed to be used for \s-1XHTML\s0 and the strict \s-1XHTML\s0 validity is not to be expected. .PP \&\fBMotivation\fR .PP The easiest format to write large documents, like FAQs, is text. A text file offers WysiWyg editing and it can be turned easily into \s-1HTML\s0 format. Text files are easily maintained and there is no requirements for special text editors. Any text editor like notepad, vi, Emacs can be used to maintain the documents. .PP Text files are also the only sensible format if documents are kept under version control like \s-1RCS, CVS, SVN,\s0 Arch, Perforce, ClearCase. They can be asily compared with diff and patches can be easily received and sent to them. .PP To help maintining large documents, there is also available an \&\fIEmacs\fR minor mode, package called \fItinytf.el\fR, which offers text fontification with colors, Indentation control, bullet filling, heading renumbering, word markup, syntax highlighting etc. See https://github.com/jaalto/project\*(--emacs\-tiny\-tools .SH "OPTIONS" .IX Header "OPTIONS" .SS "Html: Header and Footer options" .IX Subsection "Html: Header and Footer options" .IP "\fB\-\-as\-is\fR" 4 .IX Item "--as-is" Any extra \s-1HTML\s0 formatting or text manipulation is suppressed. Text is preserved as it appears in file. Use this option if you plan to deliver or and print the text as seen. .Sp .Vb 2 \& o If file contains "Table of Contents" it is not removed \& o Table of Content block is not created (it usually would) .Ve .IP "\fB\-\-author \-a \s-1STR\s0\fR" 4 .IX Item "--author -a STR" Author of document e.g. \fB\-\-author \*(L"John Doe\*(R"\fR .IP "\fB\-\-disclaimer\-file\fR \s-1FILE\s0" 4 .IX Item "--disclaimer-file FILE" The text that appears at the footer is read from this file. If not given the default copyright text is added. Options \f(CW\*(C`\-\-quiet\*(C'\fR and \&\f(CW\*(C`\-\-simple\*(C'\fR suppress disclaimers. .IP "\fB\-\-document \s-1FILE\s0\fR" 4 .IX Item "--document FILE" \&\fBName\fR of the document or filename. You could list all alternative URLs to the document with this option. .IP "\fB\-\-email \-e \s-1EMAIL\s0\fR" 4 .IX Item "--email -e EMAIL" The contact address of the author of the document. Must be pure email address with no \*(L"<\*(R" and \*(L">\*(R" characters included. Eg. \&\fB\-\-email foo@example.com\fR .Sp .Vb 2 \& \-\-email "" WRONG \& \-\-email "me@here.com" right .Ve .IP "\fB\-\-simple\fR \fB\-s\fR" 4 .IX Item "--simple -s" Print minimum footer only: contact, email and date. Use \f(CW\*(C`\-\-quiet\*(C'\fR to completely discard footer. .IP "\fB\-\-t2html\-tags\fR" 4 .IX Item "--t2html-tags" Allow processing embedded #T2HTML\- directives inside file. See full explanation by reading topic \f(CW\*(C`EMBEDDED DIRECTIVES INSIDE TEXT\*(C'\fR. By default, you do not need to to supply this option \- it is \*(L"on\*(R" by default. .Sp To disregard embedded directives in text file, supply \*(L"no\*(R" option: \&\fB\-\-not2html\-tags\fR. .IP "\fB\-\-title \s-1STR\s0\fR \fB\-t \s-1STR\s0\fR" 4 .IX Item "--title STR -t STR" The title text that appears in top frame of browser. .IP "\fB\-\-url \s-1URL\s0\fR" 4 .IX Item "--url URL" .PP Location of the \s-1HTML\s0 file. When \fB\-\-document\fR gave the name, this gives the location. This information is printed at the Footer. .SS "Html: Navigation urls" .IX Subsection "Html: Navigation urls" .IP "\fB\-\-base \s-1URL\s0\fR" 4 .IX Item "--base URL" \&\s-1URL\s0 location of the \s-1HTML\s0 file in the \fBdestination site\fR where it will be put available. This option is needed only if the document is hosted on a \&\s-1FTP\s0 server (rare, but possible). A \s-1FTP\s0 server based document cannot use Table Of Contents links (fragment \fI#tag\fR identifiers) unless \s-1HTML\s0 tag \s-1BASE\s0 is also defined. .Sp The argument can be full \s-1URL\s0 to the document: .Sp .Vb 2 \& \-\-base ftp://ftp.example.com/file.html \& \-\-base ftp://ftp.example.com/ .Ve .IP "\fB\-\-button\-heading\-top\fR" 4 .IX Item "--button-heading-top" Add additional \fB[toc]\fR navigation button to the end of each heading. This may be useful in long non-framed \s-1HTML\s0 files. .IP "\fB\-\-button\-top \s-1URL\s0\fR" 4 .IX Item "--button-top URL" Buttons are placed at the top of document in order: [previous][top][next] and \fI\-\-button\-*\fR options define the URLs. .Sp If \s-1URL\s0 is string \fInone\fR then no button is inserted. This may be handy if the buttons are defined by a separate program. And example using Perl: .Sp .Vb 1 \& #!/usr/bin/perl \& \& my $top = "index.html"; # set defaults \& my $prev = "none"; \& my $next = "none"; \& \& # ... somewhere $prev or $next may get set, or then not \& \& qx(t2html \-\-button\-top "$top" \-\-button\-prev "$prev" \-\-button\-next "$next" ...); \& \& # End of sample program .Ve .IP "\fB\-\-button\-prev \s-1URL\s0\fR" 4 .IX Item "--button-prev URL" \&\s-1URL\s0 to go to previous document or string \fInone\fR. .IP "\fB\-\-button\-next \s-1URL\s0\fR" 4 .IX Item "--button-next URL" \&\s-1URL\s0 to go to next document or string \fInone\fR. .IP "\fB\-\-reference tag=value\fR" 4 .IX Item "--reference tag=value" You can add any custom references (tags) inside text and get them expand to any value. This option can be given multiple times and every occurrence of \s-1TAG\s0 is replaced with \s-1VALUE. E\s0.g. when given following options: .Sp .Vb 2 \& \-\-reference "#HOME\-URL=http://www.example.com/dir" \& \-\-reference "#ARCHIVE\-URL=http://www.example.com/dir/dir2" .Ve .Sp When referenced in text, the generated \s-1HTML\s0 includes expanded expanded to values. An example text: .Sp .Vb 2 \& The homepage is #HOME\-URL/page.html and the mirrot page it at \& #ARCHIVE\-URL/page.html where you can find the latest version. .Ve .IP "\fB\-R, \-\-reference\-separator \s-1STRING\s0\fR" 4 .IX Item "-R, --reference-separator STRING" See above. String that is used to split the \s-1TAG\s0 and \s-1VALUE.\s0 Default is equal sign \*(L"=\*(R". .IP "\fB\-T, \-\-toc\-url\-print\fR" 4 .IX Item "-T, --toc-url-print" Display URLs (constructed from headings) that build up the Table of Contents (\s-1NAME AHREF\s0 tags) in a document. The list is outputted to stderr, so that it can be separated: .Sp .Vb 1 \& % t2html \-\-toc\-url\-print tmp.txt > file.html 2> toc\-list.txt .Ve .Sp Where would you need this? If you want to know the fragment identifies for your file, you need the list of names. .Sp .Vb 1 \& http://www.example.com/myfile.html#fragment\-identifier .Ve .SS "Html: Controlling \s-1CSS\s0 generation (\s-1HTML\s0 tables)" .IX Subsection "Html: Controlling CSS generation (HTML tables)" .IP "\fB\-\-css\-code\-bg\fR" 4 .IX Item "--css-code-bg" This option affects how the code section (column 12) is rendered. Normally the section is surrounded with a
..
codes, but with this options, something more fancier is used. The code is wrapped inside a

...
and the background color is set to a shade of gray. .ie n .IP "\fB\-\-css\-code\-note ""\s-1REGEXP""\s0 \fR" 4 .el .IP "\fB\-\-css\-code\-note ``\s-1REGEXP''\s0 \fR" 4 .IX Item "--css-code-note REGEXP " Option \fB\-\-css\-code\-bg\fR is required to activate this option. A special word defined using regexp (default is 'Note:') will mark code sections specially. The \f(CW\*(C`first word\*(C'\fR is matched against the supplied Perl regexp. .Sp The supplied regexp must not, repeat, must not, include any matching group operators. This simply means, that grouping parenthesis like \&\f(CW\*(C`(one|two|three)\*(C'\fR are not allowed. You must use the Perl non-grouping ones like \f(CW\*(C`(?:one|two|three)\*(C'\fR. Please refer to perl manual page [perlre] if this short introduction did not give enough rope. .Sp With this options, instead of rendering column 12 text with
..
, the text appears just like regular text, but with a twist. The background color of the text has been changed to darker grey to visually stand out form the text. .Sp An example will clarify. Suppose that you passed options \fB\-\-css\-code\-bg\fR and \fB\-\-css\-code\-note='(?:Notice|Note):'\fR, which instructed to treat the first paragraphs at column 12 differently. Like this: .Sp .Vb 2 \& This is the regular text that appears somewhere at column 8. \& It may contain several lines of text in this paragraph. \& \& Notice: Here is the special section, at column 12, \& and the first word in this paragraph is \*(AqNotice:\*(Aq. \& Only that makes this paragraph at column 12 special. \& \& Now, we have some code to show to the user: \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } .Ve .Sp One note, text written with initial special word, like \f(CW\*(C`Notice:\*(C'\fR, must all fit in one full pragraph. Any other paragraphs that follow, are rendered as code sections. Like here: .Sp .Vb 2 \& This is the regular text that appears somewhere \& It may contain several lines of text in this paragraph \& \& Notice: Here is the special section, at column 12, \& and the first word in this paragraph is \*(AqNotice:\*(Aq \& which makes it special \& \& Hoewver, this paragraph IS NOT rendered specially \& any more. Only the first paragraph above. \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } .Ve .Sp As if this were not enough, there are some special table control directives that let you control the ..
which is put around the code section at column 12. Here are few examples: .Sp .Vb 1 \& Here is example 1 \& \& #t2html::td:bgcolor=#F7F7DE \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } \& \& Here is example 2 \& \& #t2html::td:bgcolor=#F7F7DE:tableborder:1 \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } \& \& Here is example 3 \& \& #t2html::td:bgcolor="#FFFFFF":tableclass:dashed \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } \& \& Here is example 4 \& \& #t2html::td:bgcolor="#FFFFFF":table:border=1_width=94%_border=0_cellpadding="10"_cellspacing="0" \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } .Ve .Sp Looks cryptic? Cannot help that and in order for you to completely understand what these directives do, you need to undertand what elements can be added to the and
tokens. Refer to \s-1HTML\s0 specification for available attributes. Here is briefing what you can do: .Sp The start command is: .Sp .Vb 4 \& #t2html:: \& | \& After this comes attribute pairs in form key:value \& and multiple ones as key1:value1:key2:value2 ... .Ve .Sp The \f(CW\*(C`key:value\*(C'\fR pairs can be: .Sp .Vb 3 \& td:ATTRIBUTES \& | \& This is converted into \& \& table:ATTRIBUTES \& | \& This is converted into .Ve .Sp There can be no spaces in the \s-1ATTRIBUTES,\s0 because the \f(CW\*(C`First\-word\*(C'\fR must be one contiguous word. An underscore can be used in place of space: .Sp .Vb 3 \& table:border=1_width=94% \& | \& Interpreted as
.Ve .Sp It is also possible to change the default \s-1CLASS\s0 style with word \&\f(CW\*(C`tableclass\*(C'\fR. In order the \s-1CLASS\s0 to be useful, its \s-1CSS\s0 definitions must be either in the default configuration or supplied from a external file. See option \fB\-\-script\-file\fR. .Sp .Vb 3 \& tableclass:name \& | \& Interpreted as
.Ve .Sp For example, there are couple of default styles that can be used: .Sp .Vb 1 \& 1) Here is CLASS "dashed" example \& \& #t2html::tableclass:dashed \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } \& \& 2) Here is CLASS "solid" example: \& \& #t2html::tableclass:solid \& \& for ( i = 0; i++; i < 10 ) \& { \& // Doing something in this loop \& } .Ve .Sp You can change any individual value of the default table definition which is: .Sp .Vb 1 \&
.Ve .Sp To change e.g. only value cellpadding, you would say: .Sp .Vb 1 \& #t2html::table:tablecellpadding:2 .Ve .Sp If you are unsure what all of these were about, simply run program with \&\fB\-\-test\-page\fR and look at the source and generated \s-1HTML\s0 files. That should offer more rope to experiment with. .IP "\fB\-\-css\-file \s-1FILE\s0\fR" 4 .IX Item "--css-file FILE" Include <\s-1LINK ...\s0> which refers to external \s-1CSS\s0 style definition source. This option is ignored if \fB\-\-script\-file\fR option has been given, because that option imports whole content inside \s-1HEAD\s0 tag. This option can appear multiple times and the external \s-1CSS\s0 files are added in listed order. .IP "\fB\-\-css\-font\-type CSS-DEFINITION\fR" 4 .IX Item "--css-font-type CSS-DEFINITION" Set the \s-1BODY\s0 element's font definition to CSS-DEFINITION. The default value used is the regular typeset used in newspapers and books: .Sp .Vb 1 \& \-\-css\-font\-type=\*(Aqfont\-family: "Times New Roman", serif;\*(Aq .Ve .IP "\fB\-\-css\-font\-size CSS-DEFINITION\fR" 4 .IX Item "--css-font-size CSS-DEFINITION" Set the body element's font size to CSS-DEFINITION. The default font size is expressed in points: .Sp .Vb 1 \& \-\-css\-font\-size="font\-size: 12pt;" .Ve .SS "Html: Controlling the body of document" .IX Subsection "Html: Controlling the body of document" .IP "\fB\-\-delete \s-1REGEXP\s0\fR" 4 .IX Item "--delete REGEXP" Delete lines matching perl \s-1REGEXP.\s0 This is useful if you use some document tool that uses navigation tags in the text file that you do not want to show up in generated \s-1HTML.\s0 .IP "\fB\-\-delete\-email\-headers\fR" 4 .IX Item "--delete-email-headers" Delete email headers at the beginning of file, until first empty line that starts the body. If you keep your document ready for Usenet news posting, they may contain headers and body: .Sp .Vb 4 \& From: ... \& Newsgroups: ... \& X\-Sender\-Info: \& Summary: \& \& BODY\-OF\-TEXT .Ve .IP "\fB\-\-nodelete\-default\fR" 4 .IX Item "--nodelete-default" Use this option to suppress default text deletion (which is on). .Sp Emacs \f(CW\*(C`folding.el\*(C'\fR package and vi can be used with any text or programming language to place sections of text between tags \fB{{{\fR and \&\fB}}}\fR. You can open or close such folds. This allows keeping big documents in order and manageable quite easily. For Emacs support, see. ftp://ftp.csd.uu.se/pub/users/andersl/beta/ .Sp The default value deletes these markers and special comments \&\f(CW\*(C`#_comment\*(C'\fR which make it possible to cinlude your own notes which are not included in the generated output. .Sp .Vb 1 \& {{{ Security section \& \& #_comment Make sure you revise this section to \& #_comment the next release \& \& The seecurity is an important issue in everyday administration... \& More text ... \& \& }}} .Ve .IP "\fB\-\-html\-body \s-1STR\s0\fR" 4 .IX Item "--html-body STR" Additional attributes to add to \s-1HTML\s0 tag <\s-1BODY\s0>. You could e.g. define language of the text with \fB\-\-html\-body LANG=en\fR which would generate \&\s-1HTML\s0 tag <\s-1BODY\s0 LANG=\*(L"en\*(R"> See section \*(L"\s-1SEE ALSO\*(R"\s0 for \s-1ISO 639.\s0 .ie n .IP "\fB\-\-html\-column\-beg=""\s-1SPEC\s0 HTML-SPEC""\fR" 4 .el .IP "\fB\-\-html\-column\-beg=``\s-1SPEC\s0 HTML-SPEC''\fR" 4 .IX Item "--html-column-beg=SPEC HTML-SPEC" The default interpretation of columns 1,2,3 5,6,7,8,9,10,11,12 can be changed with \fIbeg\fR and \fIend\fR swithes. Columns 0,4 can't be changed because they are reserved for headings. Here are some samples: .Sp .Vb 2 \& \-\-html\-column\-beg="7quote " \& \-\-html\-column\-end="7quote " \& \& \-\-html\-column\-beg="10
 class=\*(Aqcolumn10\*(Aq"
\&    \-\-html\-column\-end="10    
" \& \& \-\-html\-column\-beg="quote " \& \-\-html\-column\-end="quote " .Ve .Sp \&\fBNote:\fR You can only give specifications up till column 12. If text is beyound column 12, it is interpreted like it were at column 12. .Sp In addition to column number, the \fI\s-1SPEC\s0\fR can also be one of the following strings .Sp .Vb 8 \& Spec equivalent word markup \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \& quote \`\*(Aq \& bold _ \& emp * \& small + \& big = \& ref [] like: [Michael] referred to [rfc822] \& \& Other available Specs \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \& 7quote When column 7 starts with double quote. .Ve .Sp For style sheet values for each color, refer to \fIclass\fR attribute and use \&\fB\-\-script\-file\fR option to import definitions. Usually /usr/lib/X11/rgb.txt lists possible color values and the \s-1HTML\s0 standard at http://www.w3.org/ defines following standard named colors: .Sp .Vb 8 \& Black #000000 Maroon #800000 \& Green #008000 Navy #000080 \& Silver #C0C0C0 Red #FF0000 \& Lime #00FF00 Blue #0000FF \& Gray #808080 Purple #800080 \& Olive #808000 Teal #008080 \& White #FFFFFF Fuchsia #FF00FF \& Yellow #FFFF00 Aqua #00FFFF .Ve .ie n .IP "\fB\-\-html\-column\-end=""\s-1COL\s0 HTML-SPEC""\fR" 4 .el .IP "\fB\-\-html\-column\-end=``\s-1COL\s0 HTML-SPEC''\fR" 4 .IX Item "--html-column-end=COL HTML-SPEC" See \fB\-\-html\-column\-beg\fR .IP "\fB\-\-html\-font \s-1SIZE\s0\fR" 4 .IX Item "--html-font SIZE" Define \s-1FONT SIZE.\s0 It might be useful to set bigger font size for presentations. .IP "\fB\-F, \-\-html\-frame [\s-1FRAME\-PARAMS\s0]\fR" 4 .IX Item "-F, --html-frame [FRAME-PARAMS]" If given, then three separate \s-1HTML\s0 files are generated. The left frame will contain \s-1TOC\s0 and right frame contains rest of the text. The \fIFRAME-PARAMS\fR can be any valid parameters for \s-1HTML\s0 tag \s-1FRAMESET.\s0 The default is \&\f(CW\*(C`cols="25%,75%"\*(C'\fR. .Sp Using this implies \fB\-\-out\fR option automatically, because three files cannot be printed to stdout. .Sp .Vb 1 \& file.html \& \& \-\-> file.html The Frame file, point browser here \& file\-toc.html Left frame (navigation) \& file\-body.html Right frame (content) .Ve .IP "\fB\-\-language \s-1ID\s0\fR" 4 .IX Item "--language ID" Use language \s-1ID,\s0 a two character \s-1ISO\s0 identifier like \*(L"en\*(R" for English during the generation of \s-1HTML.\s0 This only affects the text that is shown to end-user, like text \*(L"Table Of contents\*(R". The default setting is \*(L"en\*(R". See section \*(L"\s-1SEE ALSO\*(R"\s0 for standards \s-1ISO 639\s0 and \s-1ISO 3166\s0 for proper codes. .Sp The selected language changes propgram's internal arrays in two ways: 1) Instead of default \*(L"Table of ocntents\*(R" heading the national langaugage equivalent will be used 2) The text \*(L"Pic\*(R" below embedded sequentially numbered pictures will use natinal equivalent. .Sp If your languagae is not supported, please send the phrase for \*(L"Table of contents\*(R" and word \*(L"Pic\*(R" in your language to the maintainer. .IP "\fB\-\-script\-file \s-1FILE\s0\fR" 4 .IX Item "--script-file FILE" Include java code that must be complete from \s-1FILE.\s0 The code is put inside of each \s-1HTML.\s0 .Sp The \fB\-\-script\-file\fR is a general way to import anything into the \s-1HEAD\s0 element. Eg. If you want to keep separate style definitions for all, you could only import a pointer to a style sheet. See \fI14.3.2 Specifying external style sheets\fR in \s-1HTML 4.0\s0 standard. .IP "\fB\-\-meta\-keywords \s-1STR\s0\fR" 4 .IX Item "--meta-keywords STR" Meta keywords. Used by search engines. Separate kwywords like \*(L"\s-1AA, BB, CC\*(R"\s0 with commas. Refer to \s-1HTML 4.01\s0 specification and topic \*(L"7.4.4 Meta data\*(R" and see http://www.htmlhelp.com/reference/wilbur/ and .Sp .Vb 1 \& \-\-meta\-keywords "AA,BB,CC" .Ve .IP "\fB\-\-meta\-description \s-1STR\s0\fR" 4 .IX Item "--meta-description STR" Meta description. Include description string, max 1000 characters. This is used by search engines. Refer to \s-1HTML 4.01\s0 specification and topic \&\*(L"7.4.4 Meta data\*(R" .IP "\fB\-\-name\-uniq\fR" 4 .IX Item "--name-uniq" First 1\-4 words from the heading are used for the \s-1HTML\s0 \fIname\fR tags. However, it is possible that two same headings start with exactly the same 1\-4 words. In those cases you have to turn on this option. It will use counter 00 \- 999 instead of words from headings to construct \s-1HTML\s0 \fIname\fR references. .Sp Please use this option only in emergencies, because referring to jump block \&\fIname\fR via .Sp .Vb 1 \& httpI://example.com/doc.html#header_name .Ve .Sp is more convenient than using obscure reference .Sp .Vb 1 \& httpI://example.com/doc.html#11 .Ve .Sp In addition, each time you add a new heading the number changes, whereas the symbolic name picked from heading stays as long as you do not change the heading. Think about welfare of your netizens who bookmark you pages. Try to make headings to not have same subjects and you do not need this option. .SS "Document maintenance and batch job commands" .IX Subsection "Document maintenance and batch job commands" .IP "\fB\-A, \-\-auto\-detect\fR" 4 .IX Item "-A, --auto-detect" Convert file only if tag \f(CW\*(C`#T2HTML\-\*(C'\fR is found from file. This option is handy if you run a batch command to convert all files to \s-1HTML,\s0 but only if they look like \s-1HTML\s0 base files: .Sp .Vb 2 \& find . \-name "*.txt" \-type f \e \& \-exec t2html \-\-auto\-detect \-\-verbose \-\-out {} \e; .Ve .Sp The command searches all *.txt files under current directory and feeds them to conversion program. The \fB\-\-auto\-detect\fR only converts files which include \f(CW\*(C`#T2HTML\-\*(C'\fR directives. Other text files are not converted. .IP "\fB\-\-link\-check \-l\fR" 4 .IX Item "--link-check -l" Check all http and ftp links. \&\fIThis option is supposed to be run standalone\fR Option \fB\-\-quiet\fR has special meaning when used with link check. .Sp With this option you can regularly validate your document and remove dead links or update moved links. Problematic links are outputted to \fIstderr\fR. This link check feature is available only if you have the \s-1LWP\s0 web library installed. Program will check if you have it at runtime. .Sp Links that are big, e.g. which match \fItar.gz .zip ...\fR or that run programs (links with ? character) are ignored because the \s-1GET\s0 request used in checking would return whole content of the link and it would. be too expensive. .Sp A suggestion: When you put binary links to your documents, add them with space: .Sp .Vb 1 \& http://example.com/dir/dir/ filename.tar.gz .Ve .Sp Then the program \fIdoes\fR check the http addresses. Users may not be able to get the file at one click, checker can validate at least the directory. If you are not the owner of the link, it is also possible that the file has moved of new version name has appeared. .IP "\fB\-L, \-\-link\-check\-single\fR" 4 .IX Item "-L, --link-check-single" Print condensed output in \fIgrep \-n\fR like manner \fI\s-1FILE:LINE:MESSAGE\s0\fR .Sp This option concatenates the url response text to single line, so that you can view the messages in one line. You can use programming tools (like Emacs M\-x compile) that can parse standard grep syntax to jump to locations in your document to correct the links later. .IP "\fB\-o, \-\-out\fR" 4 .IX Item "-o, --out" write generated \s-1HTML\s0 to file that is derived from the input filename. .Sp .Vb 3 \& \-\-out \-\-print /dir/file \-\-> /dir/file.html \& \-\-out \-\-print /dir/file.txt \-\-> /dir/file.html \& \-\-out \-\-print /dir/file.this.txt \-\-> /dir/file.this.html .Ve .IP "\fB\-\-link\-cache \s-1CACHE_FILE\s0\fR" 4 .IX Item "--link-cache CACHE_FILE" When links are checked periodically, it would be quite a rigorous to check every link every time that has already succeeded. In order to save link checking time, the \*(L"ok\*(R" links can be cached into separate file. Next time you check the links, the cache is opened and only links found that were not in the cache are checked. This should dramatically improve long searches. Consider this example, where every text file is checked recursively. .Sp .Vb 3 \& $ t2html \-\-link\-check\-single \e \& \-\-quiet \-\-link\-cache ~tmp/link.cache \e \& \`find . \-name "*.txt" \-type f\` .Ve .IP "\fB\-O, \-\-out\-dir \s-1DIR\s0\fR" 4 .IX Item "-O, --out-dir DIR" Like \fB\-\-out\fR, but chop the directory part and write output files to \&\s-1DIR.\s0 The following would generate the \s-1HTML\s0 file to current directory: .Sp .Vb 1 \& \-\-out\-dir . .Ve .Sp If you have automated tool that fills in the directory, you can use word \&\fBnone\fR to ignore this option. The following is a no-op, it will not generate output to directory \*(L"none\*(R": .Sp .Vb 1 \& \-\-out\-dir none .Ve .IP "\fB\-p, \-\-print\fR" 4 .IX Item "-p, --print" Print filename to stdout after \s-1HTML\s0 processing. Normally program prints no file names, only the generated \s-1HTML.\s0 .Sp .Vb 1 \& % t2html \-\-out \-\-print page.txt \& \& \-\-> page.html .Ve .IP "\fB\-P, \-\-print\-url\fR" 4 .IX Item "-P, --print-url" Print filename in \s-1URL\s0 format. This is useful if you want to check the layout immediately with your browser. .Sp .Vb 1 \& % t2html \-\-out \-\-print\-url page.txt | xargs lynx \& \& \-\-> file: /users/foo/txt/page.html .Ve .IP "\fB\-\-split \s-1REGEXP\s0\fR" 4 .IX Item "--split REGEXP" Split document into smaller pieces when \s-1REGEXP\s0 matches. \fISplit commands are standalone\fR, meaning, that it starts and quits. No \s-1HTML\s0 conversion for the file is engaged. .Sp If \s-1REGEXP\s0 is found from the line, it is a start point of a split. E.g. to split according to toplevel headings, which have no numbering, you would use: .Sp .Vb 1 \& \-\-split \*(Aq^[A\-Z]\*(Aq .Ve .Sp A sequential numbers, 3 digits, are added to the generated partials: .Sp .Vb 1 \& filename.txt\-NNN .Ve .Sp The split feature is handy if you want to generate slides from each heading: First split the document, then convert each part to \s-1HTML\s0 and finally print each part (page) separately to printer. .IP "\fB\-S1, \-\-split1\fR" 4 .IX Item "-S1, --split1" This is shorthand of \fB\-\-split\fR command. Define regexp to split on toplevel heading. .IP "\fB\-S2, \-\-split2\fR" 4 .IX Item "-S2, --split2" This is shorthand of \fB\-\-split\fR command. Define regexp to split on second level heading. .IP "\fB\-SN, \-\-split\-named\-files\fR" 4 .IX Item "-SN, --split-named-files" Additional directive for split commands. If you split e.g. by headings using \&\fB\-\-split1\fR, it would be more informative to generate filenames according to first few words from the heading name. Suppose the heading names where split occur were: .Sp .Vb 2 \& Program guidelines \& Conclusion .Ve .Sp Then the generated partial filenames would be as follows. .Sp .Vb 2 \& FILENAME\-program_guidelines \& FILENAME\-conclusion .Ve .IP "\fB\-X, \-\-xhtml\fR" 4 .IX Item "-X, --xhtml" Render using strict \s-1XHTML.\s0 This means using
,
and paragraphs use

..

. .Sp \&\f(CW\*(C`Note: this option is experimental. See BUGS\*(C'\fR .SS "Miscellaneous options" .IX Subsection "Miscellaneous options" .IP "\fB\-\-debug \s-1LEVEL\s0\fR" 4 .IX Item "--debug LEVEL" Turn on debug with positive \s-1LEVEL\s0 number. Zero means no debug. .IP "\fB\-\-help \-h\fR" 4 .IX Item "--help -h" Print help screen. Terminates program. .IP "\fB\-\-help\-css\fR" 4 .IX Item "--help-css" Print default \s-1CSS\s0 used. Terminates program. You can copy and modify this output and instruct to use your own with \fB\-\-css\-file=FILE\fR. You can also embed the option to files with \f(CW\*(C`#T2HTML\-OPTION\*(C'\fR directive. .IP "\fB\-\-help\-html\fR" 4 .IX Item "--help-html" Print help in \s-1HTML\s0 format. Terminates program. .IP "\fB\-\-help\-man\fR" 4 .IX Item "--help-man" Print help page in Unix manual page format. You want to feed this output to \&\fBnroff \-man\fR in order to read it. Terminates program. .IP "\fB\-\-test\-page\fR" 4 .IX Item "--test-page" Print the test page: \s-1HTML\s0 and example text file that demonstrates the capabilities. .IP "\fB\-\-time\fR" 4 .IX Item "--time" Print to stderr time spent used for handling the file. .IP "\fB\-v, \-\-verbose [\s-1LEVEL\s0]\fR" 4 .IX Item "-v, --verbose [LEVEL]" Print verbose messages. .IP "\fB\-q, \-\-quiet\fR" 4 .IX Item "-q, --quiet" Print no footer at all. This option has different meaning if \&\fI\-\-link\-check\fR option is turned on: print only erroneous links. .IP "\fBV, \-\-version\fR" 4 .IX Item "V, --version" Print program version information. .SH "FORMAT DESCRIPTION" .IX Header "FORMAT DESCRIPTION" Program converts text files to \s-1HTML.\s0 The basic idea is to rely on indentation level, and the layout used is called 'Technical format' (\s-1TF\s0) where only minimal conventions are used to mark italic, bold etc. text. The Basic principles can be demonstrated below. Notice the column poisiton ruler at the top: .PP .Vb 1 \& \-\-//\-\- description start \& \& 123456789 123456789 123456789 123456789 123456789 column numbers \& \& Heading 1 starts with a big letter at leftmost column 1 \& \& The column positions 1,2,3 are currently undefined and may not \& format correctly. Do not place text at columns 1,2 or 3. \& \& Heading level 2 starts at half\-tab column 4 with a big letter \& \& Normal but colored text at columns 5 \& \& Normal but colored text at columns 6 \& \& Heading 3 can be considered at position TAB minus 1, column 7. \& \& "Special text at column 7 starts with double quote" \& \& Standard text starts at column 8, you can *emphatize* text or \& make it _strong_ and write =SmallText= or +BigText+ show \& variable name \`ThisIsAlsoVariable\*(Aq. You can \`_*nest*_\*(Aq \`the\*(Aq \& markup. more txt in this paragraph txt txt txt txt txt txt \& txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt \& txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt \& txt txt \& \& Strong text at column 9 \& \& Column 10 is reserved for quotations \& Column 10 is reserved for quotations \& Column 10 is reserved for quotations \& Column 10 is reserved for quotations \& \& Strong text at column 11 \& \& Column 12 and further is reserved for code examples \& Column 12 and further is reserved for code examples \& All text here are surrounded by
 HTML codes
\&           This CODE column in affected by the \-\-css\-code* options.
\&
\&     Heading 2 at column 4 again
\&
\&        If you want something like Heading level 3, use column 7 (bold)
\&
\&         Column 8. Standard tab position. txt txt txt txt txt txt txt
\&         txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
\&         txt txt txt txt txt txt txt txt txt txt txt txt txt txt
\&         [1998\-09\-10 Mr. Foo said]:
\&
\&           cited text at column 10. cited text cited text cited text
\&           cited text cited text cited text cited text cited text
\&           cited text cited text cited text cited text cited text
\&           cited text
\&
\&
\&         *   Bullet at column 8. Notice 3 spaces after (*), so
\&             text starts at half\-tab forward at column 12.
\&         *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
\&         *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
\&             ,txt txt txt txt
\&
\&             Notice that previous paragraph ends to P\-comma
\&             code, it tells this paragraph to continue in
\&             bullet mode, otherwise this text at column 12
\&             would be interpreted as code section surrounded
\&             by 
 HTML codes.
\&
\&
\&         .   This is ordered list.
\&         .   This is ordered list.
\&         .   This is ordered list.
\&
\&
\&         .This line starts with dot and is displayed in line by itself.
\&         .This line starts with dot and is displayed in line by itself.
\&
\&         !! This adds an 
HTML code, text in line is marked with \& !! \& \& Make this email address clickable Do not \& make this email address clickable bar@example.com, because it \& is only an example and not a real address. Notice that the \& last one was not surrounded by <>. Common login names like \& foo, bar, quux, or internet site \*(Aqexample\*(Aq are ignored \& automatically. \& \& Also do not make < this@example.com> because there is extra \& white space. This may be more convenient way to disable email \& addresses temporarily. \& \& Heading1 again at column 0 \& \& Subheading at column 4 \& \& And regular text, column 8 txt txt txt txt txt txt txt txt txt \& txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt \& txt txt txt txt txt txt txt txt txt txt txt \& \& \-\-//\-\- description end .Ve .PP That is it, there is the whole layout described. More formally the rules of text formatting are secribed below. .SS "\s-1USED HEADINGS\s0" .IX Subsection "USED HEADINGS" .IP "\(bu" 4 There are only \fItwo\fR heading levels in this style. Heading columns are 0 and 4 and the heading must start with big letter (\*(L"Heading\*(R") or number (\*(L"1.0 Heading\*(R") .IP "\(bu" 4 At column 4, if the text starts with small letter, that line is interpreted as .IP "\(bu" 4 A \s-1HTML\s0
mark is added just before printing heading at level 1. .IP "\(bu" 4 The headings are gathered, the \s-1TOC\s0 is built and inserted to the beginning of \s-1HTML\s0 page. The \s-1HTML\s0 references used in \s-1TOC\s0 are the first 4 sequential words from the headings. Make sure your headings are uniquely named, otherwise there will be same \s-1NAME\s0 references in the generated \s-1HTML.\s0 Spaces are converted into underscore when joining the words. If you can not write unique headings by four words, then you must use \fB\-\-name\-uniq\fR switch .SH "TEXT PLACEMENT RULES" .IX Header "TEXT PLACEMENT RULES" .SS "General" .IX Subsection "General" The basic rules for positioning text in certain columns: .IP "\(bu" 4 Text at column 1 is undefined if it does not start with big letter or number to indicate Heading level 1. .IP "\(bu" 4 Text between columns 2 and 3 is marked with .IP "\(bu" 4 Column 4 is reserved for heading level 2 .IP "\(bu" 4 Text between columns 5\-7 is marked with .IP "\(bu" 4 Text at column 7 is if the first character is double quote. .IP "\(bu" 4 Column 10 is reserved for text. If you want to quote someone or to add reference text, place the text in this column. .IP "\(bu" 4 Text at columns 9 and 11 are marked with .PP Column 8 for text and special codes .IP "\(bu" 4 Column 8 is reserved for normal text .IP "\(bu" 4 At the start of text, at column 8, there can be DOT-code or COMMA-code. .PP Column 12 is special .IP "\(bu" 4 Column 12 is treated specially: block is started with
 and lines are
marked as . When the last text at \fIcolumn\fR 12 is found, the
block is closed with 
. An example: .Sp .Vb 2 \& txt txt txt ;evenly placed block, fine, do it like this \& txt txt \& \& txt txt txt txt ;Can not terminate the /pre, because last \& txt txt txt txt ;column is not at 12 \& txt txt txt txt \& \& txt txt txt txt \& txt txt txt txt \& txt txt txt txt \& ;; Finalizing comment, now the text is evenly placed .Ve .SS "Additional tokens for use at column 8" .IX Subsection "Additional tokens for use at column 8" .IP "\(bu" 4 If there is \f(CW\*(C`.\*(C'\fR(dot) at the beginning of a line and immediately non-whitespace, then
code is added to the end of line. .Sp .Vb 3 \& .This line will have a
HTML tag at the end. \& While these two line are joined together \& by the browser, depending on the frame width. .Ve .IP "\(bu" 4 If there is \f(CW\*(C`,\*(C'\fR(comma) then the

code is not inserted if the previous line is empty. If you use both \f(CW\*(C`.\*(C'\fR(dot) and \f(CW\*(C`,\*(C'\fR(comma), they must be in order dot-comma. The \f(CW\*(C`,\*(C'\fR(comma) works differently if it is used in bullet .Sp A

is always added if there is separation of paragraphs, but when you are writing a bullet, there is a problem, because a bullet exist only as long as text is kept together .Sp .Vb 2 \& * This is a bullet and it has all text ketp together \& even if there is another line in the bullet. .Ve .Sp But to write bullets tat spread multiple paragraphs, you must instruct that those are to kept together and the text in next paragraph is not while it is placed at column 12 .Sp .Vb 2 \& * This is a bullet and it has all text ketp together \& ,even if there is another line in the bullet. \& \& This is new paragrah to the previous bullet and this is \& not a text sample. See continued COMMA\-code above. \& \& * This is new bullet \& \& // and this is code sample after bullet \& if ( $flag ) { ..do something.. } .Ve .SS "Special text markings" .IX Subsection "Special text markings" .IP "italic, bold, code, small, big tokens" 4 .IX Item "italic, bold, code, small, big tokens" .Vb 3 \& _this_ is interpreted as this \& *this* is interpreted as this \& \`this\*(Aq is interpreted as this \` .Ve .Sp Exra modifiers that can be mixed with the above. Usually if you want bigger font, \s-1CAPITALIZE THE WORDS.\s0 .Sp .Vb 3 \& =this= is interpreted as this \& +this+ is interpreted as this \& [this] is interpreted as this .Ve .IP "superscripting" 4 .IX Item "superscripting" .Vb 4 \& word[this] is interpreted as superscript. You can use like \& this[1], multiple[(2)] and almost any[(ab)] and \& imaginable[IV superscritps] as long as the left \& bracket is attached to the word. .Ve .IP "subscripting" 4 .IX Item "subscripting" .Vb 5 \& 12[[10]] is representation of value 12 un base 10. \& This is interpreted as subscript. You can use like \& this[[1]], multiple[[(2)]] and almost any[[(ab)]] and \& imaginable[[IV superscritps]] as long as *two* left \& brackets are attached to the word. .Ve .IP "embedding standard \s-1HTML\s0 tokens" 4 .IX Item "embedding standard HTML tokens" Stanadard special \s-1HTML\s0 entities can be added inside text in a normal way, either using sybolic names or the hash code. Here are exmples: .Sp .Vb 7 \& × < > ≤ ≥ ≠ √ − \& α β γ ÷ \& « » ‹ › \- – — \& ≈ ≡ ∑ ƒ ∞ \& ° ± \& ™ © ® \& € £ ¥ .Ve .IP "embedding \s-1PURE HTML\s0 into text" 4 .IX Item "embedding PURE HTML into text" \&\fBThis feature is highly experimental\fR. It is possible to embed pure \&\s-1HTML\s0 inside text in occasions, where e.g. some special formatting is needed. The idea is simple: you write \s-1HTML\s0 as usual but double every '<' and '>' characters, like: .Sp .Vb 1 \& <

> .Ve .Sp The other rule is that all \s-1PURE HTML\s0 must be kept together. There must be no line breaks between pure \s-1HTML\s0 lines. This is incorrect: .Sp .Vb 1 \& <

> \& \& <>one \& <>two \& \& <
> .Ve .Sp The pure \s-1HTML\s0 must be written without extra newlines: .Sp .Vb 4 \& <> \& <>one \& <>two \& <
> .Ve .Sp This \*(L"doubling\*(R" affects normal text writing rules as well. If you write documents, where you describe Unix styled HERE-documents, you \s-1MUST NOT\s0 put the tokens next to each other: .Sp .Vb 3 \& bash$ cat< code. any text after !! in the same line is written with and inserted just after
code, therefore the word formatting commands have no effect in this line. .SS "Http and email marking control" .IX Subsection "Http and email marking control" .IP "\(bu" 4 All http and ftp references as well as email addresses are marked clickable. Email must have surrounding <> characters to be recognized. .IP "\(bu" 4 If url is preceded with hyphen, it will not be clickable. If a string foo, bar, quux, test, site is found from url, then it is not counted as clickable. .Sp .Vb 2 \& clickable \& http://example.com clickable \& \& < me@here.com> not clickable; contains space \& <5dko56$1@news02.deltanet.com> Message\-Id, not clickable \& \& \-http://example.com hyphen, not clickable \& http://$EXAMPLE variable. not clickable .Ve .SS "Lists and bullets" .IX Subsection "Lists and bullets" .IP "\(bu" 4 The bulletin table is constructed if there is \*(L"o\*(R" or \*(L"*\*(R" at column 8 and 3 spaces after it, so that text starts at column 12. Bulleted lines are advised to be kept together; no spaces between bullet blocks. .Sp .Vb 2 \& * This is a bullet \& * This is a bullte .Ve .Sp Another example: .Sp .Vb 2 \& o This is a bullet \& o This is a bullet .Ve .Sp List example: .Sp .Vb 2 \& . This is an ordered list \& . This is an ordered list .Ve .IP "\(bu" 4 The ordered list is started with \*(L".\*(R", a dot, and written like bullet where text starts at column 12. .SS "Line breaks" .IX Subsection "Line breaks" .IP "\(bu" 4 All line breaks are visible in your document, do not use more than one line break to separate paragraphs. .IP "\(bu" 4 Very important is that there is only \fIone\fR line break after headings. .SH "EMBEDDED DIRECTIVES INSIDE TEXT" .IX Header "EMBEDDED DIRECTIVES INSIDE TEXT" .IP "Command line options" 4 .IX Item "Command line options" You can cancel obeying all embedded directives by supplying option \&\fB\-\-not2html\-tags\fR. .Sp You can include these lines anywhere in the document and their content is included in \s-1HTML\s0 output. Each directive line must fit in one line and it cannot be broken to separate lines. .Sp .Vb 6 \& #T2HTML\-TITLE \& #T2HTML\-EMAIL \& #T2HTML\-AUTHOR \& #T2HTML\-DOC \& #T2HTML\-METAKEYWORDS \& #T2HTML\-METADESCRIPTION .Ve .Sp You can pass command line options embedded in the file. Like if you wanted the \s-1CODE\s0 section (column 12) to be coloured with shade of gray, you could add: .Sp .Vb 1 \& #T2HTML\-OPTION \-\-css\-code\-bg .Ve .Sp Or you could request turning on particular options. Notice that each line is exactly as you have passed the argument in command line. Imagine surrounding double quoted around lines that are arguments to the associated options. .Sp .Vb 9 \& #T2HTML\-OPTION \-\-as\-is \& #T2HTML\-OPTION \-\-quiet \& #T2HTML\-OPTION \-\-language \& #T2HTML\-OPTION en \& #T2HTML\-OPTION \-\-css\-font\-type \& #T2HTML\-OPTION Trebuchet MS \& #T2HTML\-OPTION \-\-css\-code\-bg \& #T2HTML\-OPTION \-\-css\-code\-note \& #T2HTML\-OPTION (?:Note|Notice|Warning): .Ve .Sp You can also embed your own comments to the text. These are stripped away: .Sp .Vb 2 \& #T2HTML\-COMMENT You comment here \& #T2HTML\-COMMENT You another comment here .Ve .IP "Embedding files" 4 .IX Item "Embedding files" #INCLUDE\- command .Sp This is used to include the content into current current position. The \s-1URL\s0 can be a filename reference, where every \f(CW$VAR\fR is substituted from the environment variables. The tilde(~) expansion is not supported. The included filename is operating system supported path location. .Sp A prefix \f(CW\*(C`raw:\*(C'\fR disables any normal formatting. The file content is included as is. .Sp The \s-1URL\s0 can also be a \s-1HTTP\s0 reference to a remote location, whose content is included at the point. In case of remote content or when filename ends to extension \f(CW\*(C`.html\*(C'\fR or \f(CW\*(C`.html\*(C'\fR, the content is stripped in order to make the inclusion of the content possible. In picture below, only the lines within the \s-1BODY,\s0 marked with !!, are included: .Sp .Vb 9 \& \& \& ... \& \& \& this text !! \& and more of this !! \& \& .Ve .Sp Examples: .Sp .Vb 3 \& #INCLUDE\-$HOME/lib/html/picture1.html \& #INCLUDE\-http://www.example.com/code.html \& #INCLUDE\-raw:example/code.html .Ve .IP "Embedding pictures" 4 .IX Item "Embedding pictures" #PIC command is used to include pictures into the text .Sp .Vb 2 \& #PIC picture.png#Caption Text#Picture HTML attributes#align# \& (1) (2) (3) (4) \& \& 1. The NAME or URL address of the picture. Like image/this.png \& \& 2. The Text that appears below picture \& \& 3. Additional attributes that are attached inside tag. \& For , the line would \& read: \& \& #PIC some.png#Caption Text#width=200 length=200## \& \& 4. The position of image: "left" (default), "center", "right" .Ve .Sp Note: The \f(CW\*(C`Caption Text\*(C'\fR will also become the \s-1ALT\s0 text of the image which is used in case the browser is not capable of showing pictures. You can suppress the \s-1ALT\s0 text with option \fB\-\-no\-picture\-alt\fR. .IP "Fragment identifiers for named tags" 4 .IX Item "Fragment identifiers for named tags" #REF command is used for referring to \s-1HTML\s0 tag inside current document. The whole command must be placed on one single line and cannot be broken to multiple lines. An example: .Sp .Vb 2 \& #REF #how_to_profile;(Note: profiling); \& (1) (2) \& \& 1. The NAME HTML tag reference in current document, a single word. \& This can also be a full URL link. \& You can get NAME list by enabling \-\-toc\-url\-print option. \& \& 2. The clickable text is delimited by ; characters. .Ve .IP "Referring to external documents." 4 .IX Item "Referring to external documents." \&\f(CW\*(C`#URL\*(C'\fR tag can be used to embed URLs inline, so that the full link is not visible. Only the shown text is used to jump to \s-1URL.\s0 This directive cannot be broken to separate lines, .Sp .Vb 4 \& #URL \& | | \& | Displayed, clickable, text \& Must be kept together .Ve .Sp An example: .Sp .Vb 1 \& See search engine #URL .Ve .SH "TABLE OF CONTENT HEADING" .IX Header "TABLE OF CONTENT HEADING" If there is heading 1, which is named exactly \*(L"Table of Contents\*(R", then all text up to next heading are discarded from the generated \s-1HTML\s0 file. This is done because program generates its own \s-1TOC.\s0 It is supposed that you use some text formatting program to generate the toc for you in .txt file and you do not maintain it manually. For example Emacs package \fItinytf.el\fR can be used. .SH "TROUBLESHOOTING" .IX Header "TROUBLESHOOTING" .SS "Generated \s-1HTML\s0 document did not look what I intended" .IX Subsection "Generated HTML document did not look what I intended" Did you use editor that inseted TABs which inserts single ascii code (\et) and 8 spaces? check our editor's settings and prefer writing in-all-space format. .PP The most common mistake is that there are extra newlines in the document. Keeep \fIone\fR empty line between headings and text, keep \fIone\fR empty line between paragraphs, keep \fIone\fR empty line between body text and bullet. Make it your mantra: \fIone\fR \fIone\fR \fIone\fR ... .PP Next, you may have put text at wrong column position. Remember that the regular text is at column 8. .PP If generated \s-1HTML\s0 suddendly starts using only one font, eg
, then
you have forgot to close the block. Make it read even, like this:
.PP
.Vb 4
\&    Code block
\&        Code block
\&        Code block
\&    ;;  Add empty comment here to "close" the code example at column 12
.Ve
.PP
Headings start with a big letter or number, likein \*(L"Heading\*(R", not
\&\*(L"heading\*(R". Double check the spelling.
.SH "EXAMPLES"
.IX Header "EXAMPLES"
To print the test page and demonstrate possibilities:
.PP
.Vb 1
\&    t2html \-\-test\-page
.Ve
.PP
To make simple \s-1HTML\s0 page without any meta information:
.PP
.Vb 2
\&    t2html \-\-title "Html Page Title" \-\-author "Mr. Foo" \e
\&           \-\-simple \-\-out \-\-print file.txt
.Ve
.PP
If you have periodic post in email format, use \fB\-\-delete\-email\-headers\fR to
ignore the header text:
.PP
.Vb 1
\&    t2html \-\-out \-\-print \-\-delete\-email\-headers page.txt
.Ve
.PP
To make page fast
.PP
.Vb 1
\&    t2html \-\-out \-\-print page.txt
.Ve
.PP
To convert page from a text document, including meta tags, buttons, colors
and frames. Pay attention to switch \fI\-\-html\-body\fR which defines document
language.
.PP
.Vb 10
\&    t2html                                              \e
\&    \-\-print                                             \e
\&    \-\-out                                               \e
\&    \-\-author    "Mr. foo"                               \e
\&    \-\-email     "foo@example.com"                       \e
\&    \-\-title     "This is manual page of page BAR"       \e
\&    \-\-html\-body LANG=en                                 \e
\&    \-\-button\-prev  previous.html                        \e
\&    \-\-button\-top   index.html                           \e
\&    \-\-buttion\-next next.html                            \e
\&    \-\-document  http://example.com/dir/this\-page.html   \e
\&    \-\-url       manual.html                             \e
\&    \-\-css\-code\-bg                                       \e
\&    \-\-css\-code\-note \*(Aq(?:Note|Notice|Warning):\*(Aq          \e
\&    \-\-html\-frame                                        \e
\&    \-\-disclaimer\-file   $HOME/txt/my\-html\-footer.txt    \e
\&    \-\-meta\-keywords    "language\-en,manual,program"     \e
\&    \-\-meta\-description "Bar program to do this that and more of those" \e
\&    manual.txt
.Ve
.PP
To check links and print status of all links in par with the http error
message (most verbose):
.PP
.Vb 1
\&    t2html \-\-link\-check file.txt | tee link\-error.log
.Ve
.PP
To print only problematic links:
.PP
.Vb 1
\&    t2html \-\-link\-check \-\-quiet file.txt | tee link\-error.log
.Ve
.PP
To print terse output in egep \-n like manner: line number, link and
error code:
.PP
.Vb 1
\&    t2html \-\-link\-check\-single \-\-quiet file.txt | tee link\-error.log
.Ve
.PP
To check links from multiple pages and cache good links to separate file,
use \fB\-\-link\-cache\fR option. The next link check will run much faster
because cached valid links will not be fetched again. At regular intervals
delete the link cache file to force complete check.
.PP
.Vb 3
\&    t2html \-\-link\-check\-single \e
\&           \-\-link\-cache $HOME/tmp/link.cache \e
\&           \-\-quiet file.txt
.Ve
.PP
To split large document into pieces, and convert each piece to \s-1HTML:\s0
.PP
.Vb 1
\&    t2html \-\-split1 \-\-split\-name file.txt | t2html \-\-simple \-\-out
.Ve
.SH "ENVIRONMENT"
.IX Header "ENVIRONMENT"
.IP "\fB\s-1EMAIL\s0\fR" 4
.IX Item "EMAIL"
If environment variable \fI\s-1EMAIL\s0\fR is defined, it is used in footer for
contact address. Option \fB\-\-email\fR overrides environment setting.
.IP "\fB\s-1LANG\s0\fR" 4
.IX Item "LANG"
The default language setting for switch \f(CW\*(C`\-\-language\*(C'\fR Make sure the
first two characters contains the language definition, like in:
LANG=en.iso88591
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBasciidoc\fR\|(1)
\&\fBhtml2ps\fR\|(1)
\&\fBhtmlpp\fR\|(1)
\&\fBmarkdown\fR\|(1)
.SS "Related programs"
.IX Subsection "Related programs"
Jan KXrrman  has written Perl html2ps which was 2004\-11\-11
available at http://www.tdb.uu.se/~jan/html2ps.html
.PP
\&\s-1HTML\s0 validator is at http://validator.w3.org/
.PP
iMATIX created htmlpp which is available from http://www.imatix.com and seen
2014\-03\-05 at http://legacy.imatix.com/html/htmlpp
.PP
Emacs minor mode to help writing documents based on \s-1TF\s0 layout is
available. See package tinytf.el in project
https://github.com/jaalto/project\*(--emacs\-tiny\-tools
.SS "Standards"
.IX Subsection "Standards"
\&\s-1RFC\s0 \fB1766\fR contains list of language codes at
http://www.rfc.net/
.PP
Latest \s-1HTML/XHTML\s0 and \s-1CSS\s0 specifications are at http://www.w3c.org/
.SS "\s-1ISO\s0 standards"
.IX Subsection "ISO standards"
\&\fB639\fR Code for the representation of the names of languages
http://www.oasis\-open.org/cover/iso639a.html
.PP
\&\fB3166\fR Standard Country Codes
http://www.niso.org/3166.html and
http://www.netstrider.com/tutorials/HTMLRef/standards/
.SH "BUGS"
.IX Header "BUGS"
The implementation was originally designed to work linewise, so it is
unfortunately impossible to add or modify any existing feature to look for
items that span more than one line.
.PP
As the options \fB\-\-xhtml\fR was much later added, it may not produce
completely syntactically valid markup.
.SH "SCRIPT CATEGORIES"
.IX Header "SCRIPT CATEGORIES"
CPAN/Administrative
html
.SH "PREREQUISITES"
.IX Header "PREREQUISITES"
No additional Perl \s-1CPAN\s0 modules needed for text to \s-1HTML\s0 conversion.
.SH "COREQUISITES"
.IX Header "COREQUISITES"
If link check feature is used to to validate \s-1URL\s0 links, then following
modules are needed from Perl \s-1CPAN\s0 \f(CW\*(C`use LWP::UserAgent\*(C'\fR \f(CW\*(C`HTML::FormatText\*(C'\fR
and \f(CW\*(C`HTML::Parse\*(C'\fR
.PP
If you module \f(CW\*(C`HTML::LinkExtractor\*(C'\fR is available, it is used
instead of included link extracting algorithm.
.SH "AVAILABILITY"
.IX Header "AVAILABILITY"
Homepage is at https://github.com/jaalto/project\*(--perl\-text2html
.SH "AUTHOR"
.IX Header "AUTHOR"
Copyright (C) 1996\-2020 
.PP
This program is free software; you can redistribute and/or modify
program under the terms of \s-1GNU\s0 General Public license either version 2
of the License, or (at your option) any later version.
.PP
This documentation may be distributed subject to the terms and
conditions set forth in \s-1GNU\s0 General Public License v2 or later; or, at
your option, distributed under the terms of \s-1GNU\s0 Free Documentation
License version 1.2 or later (\s-1GNU FDL\s0).
perl-text2html-master/bin/t2html.pl000077500000000000000000007156441371714776500176270ustar00rootroot00000000000000#!/usr/bin/perl
#
#   t2html -- Perl, text2html converter. Uses Techical text format (TF)
#
#   Copyright information
#
#       Copyright (C) 1996-2020 Jari Aalto
#
#   License
#
#       This program is free software; you can redistribute it and/or modify
#       it under the terms of the GNU General Public License as published by
#       the Free Software Foundation; either version 2 of the License, or
#       (at your option) any later version.
#
#       This program is distributed in the hope that it will be useful,
#       but WITHOUT ANY WARRANTY; without even the implied warranty of
#       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#       GNU General Public License for more details.
#
#       You should have received a copy of the GNU General Public License
#       along with this program. If not, see .
#
#   Introduction
#
#       Please start this perl script with option
#
#           --help      to get the help page
#
#   Description
#
#       The program converts text files that are written in rigid
#       (T)echnical layout (f)ormat to html pages. See --help for
#       explanation of the format.
#
#       There is an Emacs package that helps in writing and formating
#       text files. See "Emacs Tiny Tools" project:
#
#           tinytf.el
#
#   Profiling results
#
#       The Devel::Dprof profiling results for 560k text file. Time in
#       seconds is User time.
#
#           perl -d:DProf ./t2html page.txt > /dev/null
#
#           Time Seconds     #Calls sec/call Name
#           52.1   22.96      12880   0.0018 main::DoLine
#           8.31   3.660      19702   0.0002 main::IsHeading
#           5.72   2.520       9853   0.0003 main::XlatUrl
#           5.56   2.450       9853   0.0002 main::XlatMailto
#           5.22   2.300          1   2.3000 main::HandleOneFile
#           4.22   1.860       9853   0.0002 main::XlatHtml
#           4.06   1.790       9853   0.0002 main::IsBullet
#           3.18   1.400       9853   0.0001 main::XlatRef
#           1.77   0.780          1   0.7800 main::KillToc
#           1.43   0.630          1   0.6300 Text::Tabs::expand
#           1.09   0.480          1   0.4800 main::PrintEnd
#           0.61   0.270        353   0.0008 main::MakeHeadingName
#           0.57   0.250          1   0.2500 main::CODE(0x401e4fb0)
#           0.48   0.210          1   0.2100 LWP::UserAgent::CODE(0x4023394c)
#           0.41   0.180          1   0.1800 main::PrintHtmlDoc

# "Named Capture Buffers" are used
# use 5.10.0;

# ****************************************************************************
#
#   Globals
#
# ****************************************************************************

use vars qw ($VERSION);

#   This is for use of Makefile.PL and ExtUtils::MakeMaker
#
#   The following variable is updated by Emacs setup whenever
#   this file is saved.

$VERSION = '2020.0819.0706';

# ****************************************************************************
#
#   Standard perl modules
#
# ****************************************************************************

use strict;

use autouse 'Carp'          => qw(croak carp cluck confess);
use autouse 'Pod::Html'     => qw(pod2html);

# Perl 5.x bug, doesn't work
# use autouse 'Pod::Text'     => qw(pod2text);

use Pod::Man;

use locale;
use Cwd;
use English;
use File::Basename;
use Getopt::Long;
use Text::Tabs;

IMPORT:
{
    use Env;
    use vars qw
    (
	$HOME
	$TEMP
	$TEMPDIR
	$PATH
	$LANG
    );
}

# }}}
# {{{ Initial setup

# ****************************************************************************
#
#   DESCRIPTION
#
#       Ignore HERE document indentation. Use function like this
#
#           @var = Here << "EOF";
#                   Indented text
#                   Indented text
#           EOF
#
#   INPUT PARAMETERS
#
#       none
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub Here ($)
{
    (my $str = shift) =~ s/^\s+//gm;
    $str
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Preserve first whitespace indentation. See Perl Cookbook 1.11 p.23
#
#   INPUT PARAMETERS
#
#       none
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub HereQuote ($)
{
    local $ARG = shift;

    my ($white, $lead);

    if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) #Emacs font-lock fix s//
    {
	($white, $lead) = ($2, quotemeta $1);
    }
    else
    {
	($white, $lead) = (/^(\s+)/, '');
    }

    s/^\s*?$lead(?:$white)?//gm;

    $ARG;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Set global variables for the program
#
#   INPUT PARAMETERS
#
#       none
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub Initialize ()
{
    # ........................................... internal variables ...

    use vars qw
    (
	$HTTP_CODE_OK
	$LIB
	$PROGNAME
        $LICENSE
        $AUTHOR
	$URL
	%HTML_HASH
	$debug
    );

    $PROGNAME   = "t2html";
    $LIB        = $PROGNAME;      # library where each function belongs: PRGNAME

    $LICENSE	= "GPL-2+";
    $AUTHOR     = "Jari Aalto";
    $URL        = "https://github.com/jaalto/project--perl-text2html";

    $OUTPUT_AUTOFLUSH = 1;
    $HTTP_CODE_OK     = 200;

    # ................................ globals gathered when running ...

    use vars qw
    (
	@HEADING_ARRAY
	%HEADING_HASH
	%LINK_HASH
	%LINK_HASH_CODE
    );

    @HEADING_ARRAY  = ();
    %HEADING_HASH   = ();
    %LINK_HASH      = ();   # Links that are invalid: 'link' -- errCode
    %LINK_HASH_CODE = ();   # Error code table: errCode -- 'text'

    # .................................................... constants ...

    use vars qw
    (
	$OUTPUT_TYPE_SIMPLE
	$OUTPUT_TYPE_QUIET
	$OUTPUT_TYPE_UNDEFINED

	$BULLET_TYPE_NUMBERED
	$BULLET_TYPE_NORMAL
    );

    #   Some constants:  old Perl style. New Perl uses "use constant"
    #   I like these better, because you can use "$" in front of variables.
    #   With "use contant" you cannot use "$".

    *OUTPUT_TYPE_SIMPLE    = \-simple;
    *OUTPUT_TYPE_QUIET     = \-quiet;
    *OUTPUT_TYPE_UNDEFINED = \-undefined;

    *BULLET_TYPE_NUMBERED = \-numbered;
    *BULLET_TYPE_NORMAL   = \-normal;

    use vars qw(%COLUMN_HASH);

    %COLUMN_HASH =
    (
	"" => ""

	, beg7  => qq(

) , end7 => "" , beg9 => qq(

) , end9 => "" , beg10 => qq(

) , end10 => "" , beg7quote => qq() , end7quote => "" , begemp => qq() , endemp => "" , begbold => qq() , endbold => "" , begquote => qq() , endquote => "" , begsmall => qq() , endsmall => "" , begbig => qq() , endbig => "" , begref => qq() , endref => "" , superscriptbeg => qq() , superscriptend => "" , subscriptbeg => qq() , subscriptend => "" ); # ..................................................... language ... # There are some visible LANGUAGE dependent things which must # be changed. the internal HTML, NAMES and all can be in English. use vars qw(%LANGUAGE_HASH); %LANGUAGE_HASH = ( -toc => { en => 'Table of Contents' # U.S. English -- all caps , es => 'Tabla de Contenidos' , fi => 'Sisällysluettelo' }, -pic => { en => 'Picture' , fi => 'Kuva' , de => 'Bilde' } ); # .......................................................... dtd ... sub Here($); my $doctype = Here <<"EOF"; EOF my $doctype_frame = HereQuote <<"EOF"; EOF %HTML_HASH = ( doctype => $doctype , doctype_frame => $doctype_frame , beg => "" , end => "" , br => "
" , hr => "


" , pbeg => "

" , pend => "" ); # ............................................... css properties ... use vars qw ( $CSS_BODY_FONT_TYPE_NORMAL $CSS_BODY_FONT_TYPE_READABLE $CSS_BODY_FONT_SIZE_FRAME $CSS_BODY_FONT_SIZE_NORMAL ); $CSS_BODY_FONT_TYPE_NORMAL = qq("Times New Roman", serif;); $CSS_BODY_FONT_TYPE_READABLE = qq(verdana, arial, helvetica, sans-serif;); $CSS_BODY_FONT_SIZE_FRAME = qq(0.6em; /* relative, 8pt */); $CSS_BODY_FONT_SIZE_NORMAL = qq(12pt; /* points */); # ............................................. run time globals ... use vars qw ( $ARG_PATH $ARG_FILE $ARG_DIR ); } # }}} # {{{ Args parsing # ************************************************************** &args ******* # # DESCRIPTION # # Read command line options from file. This is necessary, because # many operating systems have a limit how long and how many options # can be passed in command line. The file can have "#" comments and # options spread on multiple lines. # # Putting the options to separate file overcomes this limitation. # # INPUT PARAMETERS # # $file File where the command line call is. # # RETURN VALUES # # @array Like if you got the options via @ARGV # # **************************************************************************** sub HandleCommandLineArgsFromFile ($) { my $id = "$LIB.HandleCommandLineArgsFromFile"; my ($file) = @ARG; local (*FILE, $ARG); my (@arr, $line); unless (open FILE, $file) { die "$id: Cannot open file [$file] $ERRNO"; } while (defined($ARG = )) { s/#\s.*//g; # Delete comments next if /^\s*$/; # if empty line s/^\s+//; # trim leading and trailing spaces s/\s+$//; #font-lock s // $debug and warn "$id: ADD => $ARG\n"; $line .= $ARG; } # Now comes the difficult part, We can't just split()' # Because thre may be options like # # --autor "John doe" # # Which soule beome as split() # # --author # "John # Doe" # # But it should really be two arguments # # --author # John doe $ARG = $line; while ($ARG ne "") { s/^\s+//; if (/^(-+\S+)(.*)/) #font-lock s// { $debug and warn "$id: PARSE option $1\n"; push @arr, $1; $ARG = $2; } elsif (/^[\"]([^\"]*)[\"](.*)/) #font-lock s// { $debug and warn "$id: PARSE dquote $1\n"; push @arr, $1; $ARG = $2; } elsif (/^'([^']*)'(.*)/) #font-lock s'/ { $debug and warn "$id: PARSE squote $1\n"; push @arr, $1; $ARG = $2; } elsif (/^(\S+)(.*)/) #font-lock s// # { $debug and warn "$id: PARSE value $1\n"; push @arr, $1; $ARG = $2; } } close FILE; @arr; } # **************************************************************************** # # DESCRIPTION # # Return version string # # INPUT PARAMETERS # # none # # RETURN VALUES # # string # # **************************************************************************** sub Version () { "$VERSION"; } sub VersionInfo () { Version() . " $AUTHOR $LICENSE $URL" } sub VersionPrint () { print(VersionInfo() . "\n"); exit 0; } # ************************************************************** &args ******* # # DESCRIPTION # # Read and interpret command line arguments # # INPUT PARAMETERS # # none # # RETURN VALUES # # none # # **************************************************************************** sub HandleCommandLineArgs () { my $id = "$LIB.HandleCommandLineArgs"; local $ARG; $debug and print "$id: start\n"; # ....................................... options but not globals ... # The variables are defined in Getopt, but they are locally used # only inside this fucntion my $deleteDefault; my $versionOption; # .......................................... command line options ... # Global variables use vars qw ( $AS_IS $AUTHOR $BASE $BASE_URL $BASE_URL_ALL $BUT_NEXT $BUT_PREV $BUT_TOP $CSS_CODE_STYLE $CSS_CODE_STYLE_ATTRIBUTES $CSS_CODE_STYLE_NOTE $CSS_FONT_SIZE $CSS_FONT_TYPE $DELETE_EMAIL $DELETE_REGEXP $DISCLAIMER_FILE $DOC $DOC_URL $FONT $FORGET_HEAD_NUMBERS $FRAME $HTML_BODY_ATTRIBUTES $JAVA_CODE $LANG_ISO $LINK_CHECK $LINK_CHECK_ERR_TEXT_ONE_LINE $META_DESC $META_KEYWORDS $NAME_UNIQ $OBEY_T2HTML_DIRECTIVES $OPT_AUTO_DETECT $OPT_EMAIL $OPT_HEADING_TOP_BUTTON $OUTPUT_AUTOMATIC $OUTPUT_DIR $OUTPUT_SIMPLE $OUTPUT_TYPE $PICTURE_ALT $PRINT $PRINT_NAME_REFS $PRINT_URL $QUIET $SCRIPT_FILE $SPLIT1 $SPLIT2 $SPLIT_NAME_FILENAMES $SPLIT_REGEXP $TITLE $XHTML_RENDER @CSS_FILE %REFERENCE_HASH $debug $time $verb ); # When heading string is read, forget the numbering by default # # 1.1 heading --> "Heading" $FORGET_HEAD_NUMBERS = 1; # When gathering toc jump points, NAME AHREF="" # # NAME_UNIQ if 1, then use sequential numbers for headings # PRINT_NAME_REFS if 1, print to stderr the gathered NAME REFS. $NAME_UNIQ = 0; $PRINT_NAME_REFS = 0; $PICTURE_ALT = 1; # add ALT="picture 1" to images # ................................................... link check ... # The LWP module is optional and we raise a flag # if we were able to import it. See function CheckModuleLWP() # # LINK_CHECK requires that LWP module is present use vars qw ( $MODULE_LWP_OK $MODULE_LINKEXTRACTOR_OK ); $MODULE_LWP_OK = 0; $MODULE_LINKEXTRACTOR_OK = 0; # ..................................................... language ... $LANG_ISO = "en"; # Standard ISO language name, two chars if (defined $LANG and $LANG =~ /^[a-z][a-z]/i) # s/ environment var { $LANG_ISO = lc $LANG; } # ......................................................... Other ... $debug and PrintArray("$id: before options-file", \@ARGV); $ARG = join ' ', @ARGV; if (/(--options?-file(?:=|\s+)(\S+))/) # s/ { my $opt = $1; my $file = $2; my @argv; for my $arg (@ARGV) # Remove option { next if $arg eq $opt; push @argv, $arg; } # Merge options @ARGV = (@argv, HandleCommandLineArgsFromFile($file)); } my @argv = @ARGV; # Save value for debugging; $debug and PrintArray("$id: after options-file", \@ARGV); # .................................................. column-args ... # Remember that shell eats the double spaces. # --html-column-beg="10 " --> # --html-column-beg=10 my ($key, $tag, $val, $email); for (@ARGV) { if (/--html-column-(beg|end)/) { if (/--html-column-(beg|end)=(\w+) +(.+)/) #font-lock s// { ($key, $tag, $val) = ($1, $2, $3); $COLUMN_HASH{ $key . $tag } = $val; $debug and warn "$key$tag ==> $val\n"; } else { warn "Unregognized option: $ARG"; } } } @ARGV = grep ! /--html-column-/, @ARGV; $debug and PrintArray("$id: after for-loop checks", \@ARGV); $BASE = ""; my (@reference , $referenceSeparator); my ($fontNormal, $fontReadable, $linkCacheFile); my ($help, $helpHTML, $helpMan, $helpCss); my ( $version, $testpage, $code3d); my ( $codeBg, $codeBg2, $codeNote); # .................................................... read args ... # $Getopt::Long::debug = 1; Getopt::Long::config(qw ( no_ignore_case no_ignore_case_always )); $debug and PrintArray("$id: before GetOption", \@ARGV); # The doubling quitet '-cw' check which would say # Name "Getopt::DEBUG" used only once: possible typo at ... $Getopt::DEBUG = 1; $Getopt::DEBUG = 1; GetOptions # Getopt::Long ( "debug:i" => \$debug , "d:i" => \$debug , "h|help" => \$help , "help-html" => \$helpHTML , "help-man" => \$helpMan , "help-css" => \$helpCss , "test-page" => \$testpage , "V|version" => \$version , "verbose:i" => \$verb , "A|auto-detect" => \$OPT_AUTO_DETECT , "as-is" => \$AS_IS , "author=s" => \$AUTHOR , "email=s" => \$email , "base=s" => \$BASE , "document=s" => \$DOC , "disclaimer-file=s" => \$DISCLAIMER_FILE , "t|title=s" => \$TITLE , "language=s" => \$LANG_ISO , "button-previous=s" => \$BUT_PREV , "button-next=s" => \$BUT_NEXT , "button-top=s" => \$BUT_TOP , "button-heading-top" => \$OPT_HEADING_TOP_BUTTON , "html-body=s" => \$HTML_BODY_ATTRIBUTES , "html-font=s" => \$FONT , "F|html-frame" => \$FRAME , "script-file=s" => \$SCRIPT_FILE , "css-file=s" => \@CSS_FILE , "css-font-type=s" => \$CSS_FONT_TYPE , "css-font-size=s" => \$CSS_FONT_SIZE , "css-font-normal" => \$fontNormal , "css-font-readable" => \$fontReadable , "css-code-note=s" => \$codeNote , "css-code-3d" => \$code3d , "css-code-bg" => \$codeBg , "css-code-bg2" => \$codeBg2 , "delete-lines=s" => \$DELETE_REGEXP , "delete-email-headers" => \$DELETE_EMAIL , "delete-default!" => \$deleteDefault , "name-uniq" => \$NAME_UNIQ , "T|toc-url-print" => \$PRINT_NAME_REFS , "url=s" => \$DOC_URL , "simple" => \$OUTPUT_SIMPLE , "quiet" => \$QUIET , "print" => \$PRINT , "P|print-url" => \$PRINT_URL , "time" => \$time , "picture-alt!" => \$PICTURE_ALT , "split=s" => \$SPLIT_REGEXP , "S1|split1" => \$SPLIT1 , "S2|split2" => \$SPLIT2 , "SN|split-name-files" => \$SPLIT_NAME_FILENAMES , "t2html-tags!" => \$OBEY_T2HTML_DIRECTIVES , "out" => \$OUTPUT_AUTOMATIC , "O|out-dir=s" => \$OUTPUT_DIR , "R|reference-separator=s@" => \$referenceSeparator , "reference=s@" => \@reference , "link-check" => \$LINK_CHECK , "L|link-check-single" => \$LINK_CHECK_ERR_TEXT_ONE_LINE , "link-cache=s" => \$linkCacheFile , "X|xhtml" => \$XHTML_RENDER , "meta-description=s" => \$META_DESC , "meta-keywords=s" => \$META_KEYWORDS ); $verb = 1 if defined $verb and $verb == 0; $verb = 0 if ! defined $verb; if ($debug) { warn "$id: ARGV => [@ARGV]\n"; PrintArray("$id: ARGV after getopt", \@ARGV); $verb = 10; } else { $debug = 0; } $version and VersionPrint(); $help and Help(); $helpCss and HelpCss(); $helpHTML and Help(undef, -html); $helpMan and Help(undef, -man); $testpage and TestPage(); if ($XHTML_RENDER) { my $doctype = Here <<"EOF"; EOF # xml:lang="" lang="" my $begin = qq(); $HTML_HASH{doctype} = $doctype; @HTML_HASH{qw(br hr pend)} = ("
", "


", "

"); } if (defined $OPT_HEADING_TOP_BUTTON) { $OPT_HEADING_TOP_BUTTON = 1; } if (defined $code3d) { $CSS_CODE_STYLE = -d3; $CSS_CODE_STYLE_ATTRIBUTES = $code3d if $code3d =~ /[a-z]/i } elsif (defined $codeBg) { $CSS_CODE_STYLE = -shade; $CSS_CODE_STYLE_ATTRIBUTES = $codeBg if $codeBg =~ /[a-z]/i } elsif (defined $codeBg2) { $CSS_CODE_STYLE = -shade2; $CSS_CODE_STYLE_ATTRIBUTES = $codeBg2 if $codeBg2 =~ /[a-z]/i } unless ($CSS_CODE_STYLE) { $CSS_CODE_STYLE = -notset; } if (defined $codeNote) { if ($CSS_CODE_STYLE eq -notset) { die "$id: Which css style you want with --css-code-note? " . "Please select one of -css-code-* options."; } $ARG = $codeNote; unless (/\S/) { die "$id: You must supply search regexp: --css-code-note='REGEXP'"; } if (s/\(([^?])/(?:$1/g) { $verb and warn "$id: Incorrect --css-code-note." , " Must use non-grouping parens in regexp." , " Fixed to format: $ARG "; } $CSS_CODE_STYLE_NOTE = $ARG; } else { $CSS_CODE_STYLE_NOTE = 'Note:'; } unless (defined $OBEY_T2HTML_DIRECTIVES) { $OBEY_T2HTML_DIRECTIVES = 1; } $LINK_CHECK = 1 if $LINK_CHECK_ERR_TEXT_ONE_LINE; if ($linkCacheFile) { LinkCache(-action => '-read', -arg => $linkCacheFile); } for (@reference) { my $sep = $referenceSeparator || "="; my ($key, $value) = split /$sep/, $ARG; #font-lock s/ unless ($key and $value) { die "No separator [$sep] found from --reference [$ARG]"; } $REFERENCE_HASH{ $key } = $value; $debug and warn "$id: [$ARG] Making reference [$key] => [$value]\n"; } if ($LANG_ISO !~ /^[a-z][a-z]/) #font s/ { die "$id: --language setting must contain two character ISO 639 code." } else { my $lang = substr lc $LANG_ISO, 0, 2; if (exists $LANGUAGE_HASH{-toc }{$lang}) { $LANG_ISO = $lang; } else { warn "$id: Language [$LANG_ISO] is not supported, please contact " , "maintainer. Switched to English." ; $LANG_ISO = "en"; } } if (defined $email) { $OPT_EMAIL = $email; } else { $OPT_EMAIL = ''; } if (defined $DOC_URL) { local $ARG = $DOC_URL; m,/$, and die "$id: trailing slash in --url ? [$DOC_URL]"; #font m" } if (defined $OUTPUT_DIR and $OUTPUT_DIR eq "none") #font m" { undef $OUTPUT_DIR; } $OUTPUT_DIR and $OUTPUT_AUTOMATIC = 1; if ($FRAME and $XHTML_RENDER) { die "$id: Conflicting options --html-frame and --xhtml. Use only one."; } if ($FRAME) { $OUTPUT_AUTOMATIC = 1; } if (not defined $deleteDefault or $deleteDefault == 1) { # Delete Emacs folding.el marks that keeps text in sections. #fl # # # {{{ Folding begin mark # # }}} Folding end mark # # Delete also comments # # #_COMMENT $DELETE_REGEXP = '^(?:#\s*)?([{]{3}|[}]{3}|(#_comment(?i)))' } if ($BASE ne '') { $BASE_URL_ALL = $BASE; # copy original local $ARG = $BASE; s,\n,,g; # No newlines # If /users/foo/ given, treat as file access protocol m,^/, and $ARG = "file:$ARG"; #font s, # To ensure that we really get filename not m,/, and die "Base must contain slash, URI [$ARG]"; #font m" warn "Base may need trailing slash: $ARG" if /file/ and not m,/$,; # Exclude the filename part $BASE_URL = $ARG; $BASE_URL = $1 if m,(.*)/,; } if (@CSS_FILE and @CSS_FILE) { $JAVA_CODE = ''; for my $file (@CSS_FILE) { $JAVA_CODE .= qq(\n); } } if (defined $SCRIPT_FILE and $SCRIPT_FILE ne '') { local *FILE; $debug and print "$id: Reading CSS and Java definitions form $SCRIPT_FILE\n"; if (open FILE, "<", $SCRIPT_FILE) { $JAVA_CODE = join '', ; close FILE; } else { warn "$id: Couldn't read [$SCRIPT_FILE] $ERRNO"; $JAVA_CODE = ""; } } if ($LINK_CHECK) { $LINK_CHECK = 1; $MODULE_LWP_OK = CheckModule('LWP::UserAgent'); # http://search.cpan.org/author/PODMASTER/HTML-LinkExtractor-0.07/LinkExtractor.pm $MODULE_LINKEXTRACTOR_OK = CheckModule('HTML::LinkExtractor'); unless ($MODULE_LWP_OK) { die "Need library LWP::UserAgent to check links."; } } $OUTPUT_TYPE = $OUTPUT_TYPE_UNDEFINED; $OUTPUT_TYPE = $OUTPUT_TYPE_SIMPLE if $OUTPUT_SIMPLE; $OUTPUT_TYPE = $OUTPUT_TYPE_QUIET if $QUIET; if (defined $OPT_AUTO_DETECT) { if ($OPT_AUTO_DETECT =~ /^$|^\d+$/) { # Default value $OPT_AUTO_DETECT = "(?i)#T2HTML-"; } } if (defined $SPLIT1) { $SPLIT_REGEXP = '^([.0-9]+ )?[A-Z][a-z0-9]'; $debug and warn "$id: SPLIT_REGEXP = $SPLIT_REGEXP\n"; } if (defined $SPLIT2) { $SPLIT_REGEXP = '^ ([.0-9]+ )?[A-Z][a-z0-9]'; $debug and warn "$id: SPLIT_REGEXP = $SPLIT_REGEXP\n"; } use vars qw($HOME_ABS_PATH); if (defined $PRINT_URL) { # We can't print absolute references like: # file:/usr136/users/PM3/foo/file.html because that cannot # be swallowed by browser. We must canonilise it to $HOME # format file:///users/foo/file.html # # Find out where is HOME my $previous = cwd(); if (defined $HOME and $HOME ne '') { chdir $HOME; $HOME_ABS_PATH = cwd(); chdir $previous; } } if ($AS_IS) { $BUT_TOP = $BUT_PREV = $BUT_NEXT = ""; } # .................................................... css fonts ... unless (defined $CSS_FONT_SIZE) { # $CSS_FONT_SIZE = $CSS_BODY_FONT_SIZE_NORMAL; } unless (defined $CSS_FONT_TYPE) { $CSS_FONT_TYPE = $CSS_BODY_FONT_TYPE_NORMAL; } if ($fontNormal) { $CSS_FONT_TYPE = $CSS_BODY_FONT_TYPE_NORMAL; } elsif ($fontReadable) { $CSS_FONT_TYPE = $CSS_BODY_FONT_TYPE_READABLE } if ($AS_IS and $FRAME) { warn "$id: [WARNING] --as-is cancels option --html-frame." . " Did you mean --quiet?"; } $debug and PrintArray("$id: end [debug=$debug]", \@ARGV); } # }}} # {{{ usage/help # ***************************************************************** help **** # # DESCRIPTION # # Print help and exit. # # INPUT PARAMETERS # # $msg [optional] Reason why function was called. # # RETURN VALUES # # none # # **************************************************************************** =pod =encoding UTF-8 =head1 NAME t2html - Simple text to HTML converter. Relies on text indentation rules. =head1 SYNOPSIS t2html [options] file.txt > file.html =head1 DESCRIPTION Convert pure text files into nice looking, possibly framed, HTML pages. An example of conversion: 1. Plain text source code http://pm-doc.git.sourceforge.net/git/gitweb.cgi?p=pm-doc/pm-doc;a=blob_plain;f=doc/index.txt;hb=HEAD 2. reusult of conversion with custom --css-file option: http://pm-doc.sourceforge.net/pm-tips.html http://pm-doc.sourceforge.net/pm-tips.css 3. An Emacs mode tinytf.el for writing the text files (optional) https://github.com/jaalto/project--emacs-tiny-tools B The file must be written in Technical Format, whose layout is described in the this manual. Basically the idea is simple and there are only two heading levels: one at column 0 and the other at column 4 (halfway between the tab width). Standard text starts at column 8 (the position after pressed tab-key). The idea of technical format is that each column represents different rendering layout in the generated HTML. There is no special markup needed in the text file, so you can use the text version as a master copy of a FAQ etc. Bullets, numbered lists, word emphasis and quotation etc. can expressed in natural way. B The generated HTML includes embedded Cascading Style Sheet 2 (CSS2) and a small piece of Java code. The CSS2 is used to colorize the page loyout and to define suitable printing font sizes. The generated HTML also takes an approach to support XHTML. See page http://www.w3.org/TR/xhtml1/#guidelines where the backward compatibility recommendations are outlined: Legal HTML XHTML requires

..






XHTML does not support fragment identifiers #foo, with the C element, but uses C instead. For backward compatibility both elements are defined: < ..name="tag"> Is now <.. name="tag" id="tag"> NOTE: This program was never designed to be used for XHTML and the strict XHTML validity is not to be expected. B The easiest format to write large documents, like FAQs, is text. A text file offers WysiWyg editing and it can be turned easily into HTML format. Text files are easily maintained and there is no requirements for special text editors. Any text editor like notepad, vi, Emacs can be used to maintain the documents. Text files are also the only sensible format if documents are kept under version control like RCS, CVS, SVN, Arch, Perforce, ClearCase. They can be asily compared with diff and patches can be easily received and sent to them. To help maintining large documents, there is also available an I minor mode, package called I, which offers text fontification with colors, Indentation control, bullet filling, heading renumbering, word markup, syntax highlighting etc. See https://github.com/jaalto/project--emacs-tiny-tools =head1 OPTIONS =head2 Html: Header and Footer options =over 4 =item B<--as-is> Any extra HTML formatting or text manipulation is suppressed. Text is preserved as it appears in file. Use this option if you plan to deliver or and print the text as seen. o If file contains "Table of Contents" it is not removed o Table of Content block is not created (it usually would) =item B<--author -a STR> Author of document e.g. B<--author "John Doe"> =item B<--disclaimer-file> FILE The text that appears at the footer is read from this file. If not given the default copyright text is added. Options C<--quiet> and C<--simple> suppress disclaimers. =item B<--document FILE> B of the document or filename. You could list all alternative URLs to the document with this option. =item B<--email -e EMAIL> The contact address of the author of the document. Must be pure email address with no "<" and ">" characters included. Eg. B<--email foo@example.com> --email "" WRONG --email "me@here.com" right =item B<--simple> B<-s> Print minimum footer only: contact, email and date. Use C<--quiet> to completely discard footer. =item B<--t2html-tags> Allow processing embedded #T2HTML- directives inside file. See full explanation by reading topic C. By default, you do not need to to supply this option - it is "on" by default. To disregard embedded directives in text file, supply "no" option: B<--not2html-tags>. =item B<--title STR> B<-t STR> The title text that appears in top frame of browser. =item B<--url URL> =back Location of the HTML file. When B<--document> gave the name, this gives the location. This information is printed at the Footer. =head2 Html: Navigation urls =over 4 =item B<--base URL> URL location of the HTML file in the B where it will be put available. This option is needed only if the document is hosted on a FTP server (rare, but possible). A FTP server based document cannot use Table Of Contents links (fragment I<#tag> identifiers) unless HTML tag BASE is also defined. The argument can be full URL to the document: --base ftp://ftp.example.com/file.html --base ftp://ftp.example.com/ =item B<--button-heading-top> Add additional B<[toc]> navigation button to the end of each heading. This may be useful in long non-framed HTML files. =item B<--button-top URL> Buttons are placed at the top of document in order: [previous][top][next] and I<--button-*> options define the URLs. If URL is string I then no button is inserted. This may be handy if the buttons are defined by a separate program. And example using Perl: #!/usr/bin/perl my $top = "index.html"; # set defaults my $prev = "none"; my $next = "none"; # ... somewhere $prev or $next may get set, or then not qx(t2html --button-top "$top" --button-prev "$prev" --button-next "$next" ...); # End of sample program =item B<--button-prev URL> URL to go to previous document or string I. =item B<--button-next URL> URL to go to next document or string I. =item B<--reference tag=value> You can add any custom references (tags) inside text and get them expand to any value. This option can be given multiple times and every occurrence of TAG is replaced with VALUE. E.g. when given following options: --reference "#HOME-URL=http://www.example.com/dir" --reference "#ARCHIVE-URL=http://www.example.com/dir/dir2" When referenced in text, the generated HTML includes expanded expanded to values. An example text: The homepage is #HOME-URL/page.html and the mirrot page it at #ARCHIVE-URL/page.html where you can find the latest version. =item B<-R, --reference-separator STRING> See above. String that is used to split the TAG and VALUE. Default is equal sign "=". =item B<-T, --toc-url-print> Display URLs (constructed from headings) that build up the Table of Contents (NAME AHREF tags) in a document. The list is outputted to stderr, so that it can be separated: % t2html --toc-url-print tmp.txt > file.html 2> toc-list.txt Where would you need this? If you want to know the fragment identifies for your file, you need the list of names. http://www.example.com/myfile.html#fragment-identifier =back =head2 Html: Controlling CSS generation (HTML tables) =over 4 =item B<--css-code-bg> This option affects how the code section (column 12) is rendered. Normally the section is surrounded with a
..
codes, but with this options, something more fancier is used. The code is wrapped inside a ...
and the background color is set to a shade of gray. =item B<--css-code-note "REGEXP" > Option B<--css-code-bg> is required to activate this option. A special word defined using regexp (default is 'Note:') will mark code sections specially. The C is matched against the supplied Perl regexp. The supplied regexp must not, repeat, must not, include any matching group operators. This simply means, that grouping parenthesis like C<(one|two|three)> are not allowed. You must use the Perl non-grouping ones like C<(?:one|two|three)>. Please refer to perl manual page [perlre] if this short introduction did not give enough rope. With this options, instead of rendering column 12 text with
..
, the text appears just like regular text, but with a twist. The background color of the text has been changed to darker grey to visually stand out form the text. An example will clarify. Suppose that you passed options B<--css-code-bg> and B<--css-code-note='(?:Notice|Note):'>, which instructed to treat the first paragraphs at column 12 differently. Like this: This is the regular text that appears somewhere at column 8. It may contain several lines of text in this paragraph. Notice: Here is the special section, at column 12, and the first word in this paragraph is 'Notice:'. Only that makes this paragraph at column 12 special. Now, we have some code to show to the user: for (i = 0; i++; i < 10) { // Doing something in this loop } One note, text written with initial special word, like C, must all fit in one full pragraph. Any other paragraphs that follow, are rendered as code sections. Like here: This is the regular text that appears somewhere It may contain several lines of text in this paragraph Notice: Here is the special section, at column 12, and the first word in this paragraph is 'Notice:' which makes it special Hoewver, this paragraph IS NOT rendered specially any more. Only the first paragraph above. for (i = 0; i++; i < 10) { // Doing something in this loop } As if this were not enough, there are some special table control directives that let you control the ..
which is put around the code section at column 12. Here are few examples: Here is example 1 #t2html::td:bgcolor=#F7F7DE for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 2 #t2html::td:bgcolor=#F7F7DE:tableborder:1 for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 3 #t2html::td:bgcolor="#FFFFFF":tableclass:dashed for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 4 #t2html::td:bgcolor="#FFFFFF":table:border=1_width=94%_border=0_cellpadding="10"_cellspacing="0" for (i = 0; i++; i < 10) { // Doing something in this loop } Looks cryptic? Cannot help that and in order for you to completely understand what these directives do, you need to undertand what elements can be added to the and
tokens. Refer to HTML specification for available attributes. Here is briefing what you can do: The start command is: #t2html:: | After this comes attribute pairs in form key:value and multiple ones as key1:value1:key2:value2 ... The C pairs can be: td:ATTRIBUTES | This is converted into table:ATTRIBUTES | This is converted into There can be no spaces in the ATTRIBUTES, because the C must be one contiguous word. An underscore can be used in place of space: table:border=1_width=94% | Interpreted as
It is also possible to change the default CLASS style with word C. In order the CLASS to be useful, its CSS definitions must be either in the default configuration or supplied from a external file. See option B<--script-file>. tableclass:name | Interpreted as
For example, there are couple of default styles that can be used: 1) Here is CLASS "dashed" example #t2html::tableclass:dashed for (i = 0; i++; i < 10) { // Doing something in this loop } 2) Here is CLASS "solid" example: #t2html::tableclass:solid for (i = 0; i++; i < 10) { // Doing something in this loop } You can change any individual value of the default table definition which is:
To change e.g. only value cellpadding, you would say: #t2html::table:tablecellpadding:2 If you are unsure what all of these were about, simply run program with B<--test-page> and look at the source and generated HTML files. That should offer more rope to experiment with. =item B<--css-file FILE> Include which refers to external CSS style definition source. This option is ignored if B<--script-file> option has been given, because that option imports whole content inside HEAD tag. This option can appear multiple times and the external CSS files are added in listed order. =item B<--css-font-type CSS-DEFINITION> Set the BODY element's font definition to CSS-DEFINITION. The default value used is the regular typeset used in newspapers and books: --css-font-type='font-family: "Times New Roman", serif;' =item B<--css-font-size CSS-DEFINITION> Set the body element's font size to CSS-DEFINITION. The default font size is expressed in points: --css-font-size="font-size: 12pt;" =back =head2 Html: Controlling the body of document =over 4 =item B<--delete REGEXP> Delete lines matching perl REGEXP. This is useful if you use some document tool that uses navigation tags in the text file that you do not want to show up in generated HTML. =item B<--delete-email-headers> Delete email headers at the beginning of file, until first empty line that starts the body. If you keep your document ready for Usenet news posting, they may contain headers and body: From: ... Newsgroups: ... X-Sender-Info: Summary: BODY-OF-TEXT =item B<--nodelete-default> Use this option to suppress default text deletion (which is on). Emacs C package and vi can be used with any text or programming language to place sections of text between tags B<{{{> and B<}}}>. You can open or close such folds. This allows keeping big documents in order and manageable quite easily. For Emacs support, see. ftp://ftp.csd.uu.se/pub/users/andersl/beta/ The default value deletes these markers and special comments C<#_comment> which make it possible to cinlude your own notes which are not included in the generated output. {{{ Security section #_comment Make sure you revise this section to #_comment the next release The seecurity is an important issue in everyday administration... More text ... }}} =item B<--html-body STR> Additional attributes to add to HTML tag . You could e.g. define language of the text with B<--html-body LANG=en> which would generate HTML tag See section "SEE ALSO" for ISO 639. =item B<--html-column-beg="SPEC HTML-SPEC"> The default interpretation of columns 1,2,3 5,6,7,8,9,10,11,12 can be changed with I and I swithes. Columns 0,4 can't be changed because they are reserved for headings. Here are some samples: --html-column-beg="7quote " --html-column-end="7quote " --html-column-beg="10
 class='column10'"
    --html-column-end="10    
" --html-column-beg="quote " --html-column-end="quote " B You can only give specifications up till column 12. If text is beyound column 12, it is interpreted like it were at column 12. In addition to column number, the I can also be one of the following strings Spec equivalent word markup ------------------------------ quote `' bold _ emp * small + big = ref [] like: [Michael] referred to [rfc822] Other available Specs ------------------------------ 7quote When column 7 starts with double quote. For style sheet values for each color, refer to I attribute and use B<--script-file> option to import definitions. Usually /usr/lib/X11/rgb.txt lists possible color values and the HTML standard at http://www.w3.org/ defines following standard named colors: Black #000000 Maroon #800000 Green #008000 Navy #000080 Silver #C0C0C0 Red #FF0000 Lime #00FF00 Blue #0000FF Gray #808080 Purple #800080 Olive #808000 Teal #008080 White #FFFFFF Fuchsia #FF00FF Yellow #FFFF00 Aqua #00FFFF =item B<--html-column-end="COL HTML-SPEC"> See B<--html-column-beg> =item B<--html-font SIZE> Define FONT SIZE. It might be useful to set bigger font size for presentations. =item B<-F, --html-frame [FRAME-PARAMS]> If given, then three separate HTML files are generated. The left frame will contain TOC and right frame contains rest of the text. The I can be any valid parameters for HTML tag FRAMESET. The default is C. Using this implies B<--out> option automatically, because three files cannot be printed to stdout. file.html --> file.html The Frame file, point browser here file-toc.html Left frame (navigation) file-body.html Right frame (content) =item B<--language ID> Use language ID, a two character ISO identifier like "en" for English during the generation of HTML. This only affects the text that is shown to end-user, like text "Table Of contents". The default setting is "en". See section "SEE ALSO" for standards ISO 639 and ISO 3166 for proper codes. The selected language changes propgram's internal arrays in two ways: 1) Instead of default "Table of ocntents" heading the national langaugage equivalent will be used 2) The text "Pic" below embedded sequentially numbered pictures will use natinal equivalent. If your languagae is not supported, please send the phrase for "Table of contents" and word "Pic" in your language to the maintainer. =item B<--script-file FILE> Include java code that must be complete from FILE. The code is put inside of each HTML. The B<--script-file> is a general way to import anything into the HEAD element. Eg. If you want to keep separate style definitions for all, you could only import a pointer to a style sheet. See I<14.3.2 Specifying external style sheets> in HTML 4.0 standard. =item B<--meta-keywords STR> Meta keywords. Used by search engines. Separate kwywords like "AA, BB, CC" with commas. Refer to HTML 4.01 specification and topic "7.4.4 Meta data" and see http://www.htmlhelp.com/reference/wilbur/ and --meta-keywords "AA,BB,CC" =item B<--meta-description STR> Meta description. Include description string, max 1000 characters. This is used by search engines. Refer to HTML 4.01 specification and topic "7.4.4 Meta data" =item B<--name-uniq> First 1-4 words from the heading are used for the HTML I tags. However, it is possible that two same headings start with exactly the same 1-4 words. In those cases you have to turn on this option. It will use counter 00 - 999 instead of words from headings to construct HTML I references. Please use this option only in emergencies, because referring to jump block I via httpI://example.com/doc.html#header_name is more convenient than using obscure reference httpI://example.com/doc.html#11 In addition, each time you add a new heading the number changes, whereas the symbolic name picked from heading stays as long as you do not change the heading. Think about welfare of your netizens who bookmark you pages. Try to make headings to not have same subjects and you do not need this option. =back =head2 Document maintenance and batch job commands =over 4 =item B<-A, --auto-detect> Convert file only if tag C<#T2HTML-> is found from file. This option is handy if you run a batch command to convert all files to HTML, but only if they look like HTML base files: find . -name "*.txt" -type f \ -exec t2html --auto-detect --verbose --out {} \; The command searches all *.txt files under current directory and feeds them to conversion program. The B<--auto-detect> only converts files which include C<#T2HTML-> directives. Other text files are not converted. =item B<--link-check -l> Check all http and ftp links. I Option B<--quiet> has special meaning when used with link check. With this option you can regularly validate your document and remove dead links or update moved links. Problematic links are outputted to I. This link check feature is available only if you have the LWP web library installed. Program will check if you have it at runtime. Links that are big, e.g. which match I or that run programs (links with ? character) are ignored because the GET request used in checking would return whole content of the link and it would. be too expensive. A suggestion: When you put binary links to your documents, add them with space: http://example.com/dir/dir/ filename.tar.gz Then the program I check the http addresses. Users may not be able to get the file at one click, checker can validate at least the directory. If you are not the owner of the link, it is also possible that the file has moved of new version name has appeared. =item B<-L, --link-check-single> Print condensed output in I like manner I This option concatenates the url response text to single line, so that you can view the messages in one line. You can use programming tools (like Emacs M-x compile) that can parse standard grep syntax to jump to locations in your document to correct the links later. =item B<-o, --out> write generated HTML to file that is derived from the input filename. --out --print /dir/file --> /dir/file.html --out --print /dir/file.txt --> /dir/file.html --out --print /dir/file.this.txt --> /dir/file.this.html =item B<--link-cache CACHE_FILE> When links are checked periodically, it would be quite a rigorous to check every link every time that has already succeeded. In order to save link checking time, the "ok" links can be cached into separate file. Next time you check the links, the cache is opened and only links found that were not in the cache are checked. This should dramatically improve long searches. Consider this example, where every text file is checked recursively. $ t2html --link-check-single \ --quiet --link-cache ~tmp/link.cache \ `find . -name "*.txt" -type f` =item B<-O, --out-dir DIR> Like B<--out>, but chop the directory part and write output files to DIR. The following would generate the HTML file to current directory: --out-dir . If you have automated tool that fills in the directory, you can use word B to ignore this option. The following is a no-op, it will not generate output to directory "none": --out-dir none =item B<-p, --print> Print filename to stdout after HTML processing. Normally program prints no file names, only the generated HTML. % t2html --out --print page.txt --> page.html =item B<-P, --print-url> Print filename in URL format. This is useful if you want to check the layout immediately with your browser. % t2html --out --print-url page.txt | xargs lynx --> file: /users/foo/txt/page.html =item B<--split REGEXP> Split document into smaller pieces when REGEXP matches. I, meaning, that it starts and quits. No HTML conversion for the file is engaged. If REGEXP is found from the line, it is a start point of a split. E.g. to split according to toplevel headings, which have no numbering, you would use: --split '^[A-Z]' A sequential numbers, 3 digits, are added to the generated partials: filename.txt-NNN The split feature is handy if you want to generate slides from each heading: First split the document, then convert each part to HTML and finally print each part (page) separately to printer. =item B<-S1, --split1> This is shorthand of B<--split> command. Define regexp to split on toplevel heading. =item B<-S2, --split2> This is shorthand of B<--split> command. Define regexp to split on second level heading. =item B<-SN, --split-named-files> Additional directive for split commands. If you split e.g. by headings using B<--split1>, it would be more informative to generate filenames according to first few words from the heading name. Suppose the heading names where split occur were: Program guidelines Conclusion Then the generated partial filenames would be as follows. FILENAME-program_guidelines FILENAME-conclusion =item B<-X, --xhtml> Render using strict XHTML. This means using
,
and paragraphs use

..

. C =back =head2 Miscellaneous options =over 4 =item B<--debug LEVEL> Turn on debug with positive LEVEL number. Zero means no debug. =item B<--help -h> Print help screen. Terminates program. =item B<--help-css> Print default CSS used. Terminates program. You can copy and modify this output and instruct to use your own with B<--css-file=FILE>. You can also embed the option to files with C<#T2HTML-OPTION> directive. =item B<--help-html> Print help in HTML format. Terminates program. =item B<--help-man> Print help page in Unix manual page format. You want to feed this output to B in order to read it. Terminates program. =item B<--test-page> Print the test page: HTML and example text file that demonstrates the capabilities. =item B<--time> Print to stderr time spent used for handling the file. =item B<-v, --verbose [LEVEL]> Print verbose messages. =item B<-q, --quiet> Print no footer at all. This option has different meaning if I<--link-check> option is turned on: print only erroneous links. =item B Print program version information. =back =head1 FORMAT DESCRIPTION Program converts text files to HTML. The basic idea is to rely on indentation level, and the layout used is called 'Technical format' (TF) where only minimal conventions are used to mark italic, bold etc. text. The Basic principles can be demonstrated below. Notice the column poisiton ruler at the top: --//-- description start 123456789 123456789 123456789 123456789 123456789 column numbers Heading 1 starts with a big letter at leftmost column 1 The column positions 1,2,3 are currently undefined and may not format correctly. Do not place text at columns 1,2 or 3. Heading level 2 starts at half-tab column 4 with a big letter Normal but colored text at columns 5 Normal but colored text at columns 6 Heading 3 can be considered at position TAB minus 1, column 7. "Special text at column 7 starts with double quote" Standard text starts at column 8, you can *emphatize* text or make it _strong_ and write =SmallText= or +BigText+ show variable name `ThisIsAlsoVariable'. You can `_*nest*_' `the' markup. more txt in this paragraph txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt Strong text at column 9 Column 10 is reserved for quotations Column 10 is reserved for quotations Column 10 is reserved for quotations Column 10 is reserved for quotations Strong text at column 11 Column 12 and further is reserved for code examples Column 12 and further is reserved for code examples All text here are surrounded by
 HTML codes
	   This CODE column in affected by the --css-code* options.

     Heading 2 at column 4 again

	If you want something like Heading level 3, use column 7 (bold)

	 Column 8. Standard tab position. txt txt txt txt txt txt txt
	 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
	 txt txt txt txt txt txt txt txt txt txt txt txt txt txt
	 [1998-09-10 Mr. Foo said]:

	   cited text at column 10. cited text cited text cited text
	   cited text cited text cited text cited text cited text
	   cited text cited text cited text cited text cited text
	   cited text


	 *   Bullet at column 8. Notice 3 spaces after (*), so
	     text starts at half-tab forward at column 12.
	 *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
	 *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
	     ,txt txt txt txt

	     Notice that previous paragraph ends to P-comma
	     code, it tells this paragraph to continue in
	     bullet mode, otherwise this text at column 12
	     would be interpreted as code section surrounded
	     by 
 HTML codes.


	 .   This is ordered list.
	 .   This is ordered list.
	 .   This is ordered list.


	 .This line starts with dot and is displayed in line by itself.
	 .This line starts with dot and is displayed in line by itself.

	 !! This adds an 
HTML code, text in line is marked with !! Make this email address clickable Do not make this email address clickable bar@example.com, because it is only an example and not a real address. Notice that the last one was not surrounded by <>. Common login names like foo, bar, quux, or internet site 'example' are ignored automatically. Also do not make < this@example.com> because there is extra white space. This may be more convenient way to disable email addresses temporarily. Heading1 again at column 0 Subheading at column 4 And regular text, column 8 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt --//-- description end That is it, there is the whole layout described. More formally the rules of text formatting are secribed below. =head2 USED HEADINGS =over 4 =item * There are only I heading levels in this style. Heading columns are 0 and 4 and the heading must start with big letter ("Heading") or number ("1.0 Heading") =item * At column 4, if the text starts with small letter, that line is interpreted as =item * A HTML
mark is added just before printing heading at level 1. =item * The headings are gathered, the TOC is built and inserted to the beginning of HTML page. The HTML references used in TOC are the first 4 sequential words from the headings. Make sure your headings are uniquely named, otherwise there will be same NAME references in the generated HTML. Spaces are converted into underscore when joining the words. If you can not write unique headings by four words, then you must use B<--name-uniq> switch =back =head1 TEXT PLACEMENT RULES =head2 General The basic rules for positioning text in certain columns: =over 4 =item * Text at column 1 is undefined if it does not start with big letter or number to indicate Heading level 1. =item * Text between columns 2 and 3 is marked with =item * Column 4 is reserved for heading level 2 =item * Text between columns 5-7 is marked with =item * Text at column 7 is if the first character is double quote. =item * Column 10 is reserved for text. If you want to quote someone or to add reference text, place the text in this column. =item * Text at columns 9 and 11 are marked with =back Column 8 for text and special codes =over 4 =item * Column 8 is reserved for normal text =item * At the start of text, at column 8, there can be DOT-code or COMMA-code. =back Column 12 is special =over 4 =item * Column 12 is treated specially: block is started with
 and lines are
marked as . When the last text at I 12 is found, the
block is closed with 
. An example: txt txt txt ;evenly placed block, fine, do it like this txt txt txt txt txt txt ;Can not terminate the /pre, because last txt txt txt txt ;column is not at 12 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt ;; Finalizing comment, now the text is evenly placed =back =head2 Additional tokens for use at column 8 =over 4 =item * If there is C<.>(dot) at the beginning of a line and immediately non-whitespace, then
code is added to the end of line. .This line will have a
HTML tag at the end. While these two line are joined together by the browser, depending on the frame width. =item * If there is C<,>(comma) then the

code is not inserted if the previous line is empty. If you use both C<.>(dot) and C<,>(comma), they must be in order dot-comma. The C<,>(comma) works differently if it is used in bullet A

is always added if there is separation of paragraphs, but when you are writing a bullet, there is a problem, because a bullet exist only as long as text is kept together * This is a bullet and it has all text ketp together even if there is another line in the bullet. But to write bullets tat spread multiple paragraphs, you must instruct that those are to kept together and the text in next paragraph is not while it is placed at column 12 * This is a bullet and it has all text ketp together ,even if there is another line in the bullet. This is new paragrah to the previous bullet and this is not a text sample. See continued COMMA-code above. * This is new bullet // and this is code sample after bullet if ($flag) { ..do something.. } =back =head2 Special text markings =over 4 =item italic, bold, code, small, big tokens _this_ is interpreted as this *this* is interpreted as this `this' is interpreted as this ` Exra modifiers that can be mixed with the above. Usually if you want bigger font, CAPITALIZE THE WORDS. =this= is interpreted as this +this+ is interpreted as this [this] is interpreted as this =item superscripting word[this] is interpreted as superscript. You can use like this[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscritps] as long as the left bracket is attached to the word. =item subscripting 12[[10]] is representation of value 12 un base 10. This is interpreted as subscript. You can use like this[[1]], multiple[[(2)]] and almost any[[(ab)]] and imaginable[[IV superscritps]] as long as *two* left brackets are attached to the word. =item embedding standard HTML tokens Stanadard special HTML entities can be added inside text in a normal way, either using sybolic names or the hash code. Here are exmples: × < > ≤ ≥ ≠ √ − α β γ ÷ « » ‹ › - – — ≈ ≡ ∑ ƒ ∞ ° ± ™ © ® € £ ¥ =item embedding PURE HTML into text B. It is possible to embed pure HTML inside text in occasions, where e.g. some special formatting is needed. The idea is simple: you write HTML as usual but double every '<' and '>' characters, like: <

> The other rule is that all PURE HTML must be kept together. There must be no line breaks between pure HTML lines. This is incorrect: <

> <>one <>two <
> The pure HTML must be written without extra newlines: <> <>one <>two <
> This "doubling" affects normal text writing rules as well. If you write documents, where you describe Unix styled HERE-documents, you MUST NOT put the tokens next to each other: bash$ cat< code. any text after !! in the same line is written with and inserted just after
code, therefore the word formatting commands have no effect in this line. =back =head2 Http and email marking control =over 4 =item * All http and ftp references as well as email addresses are marked clickable. Email must have surrounding <> characters to be recognized. =item * If url is preceded with hyphen, it will not be clickable. If a string foo, bar, quux, test, site is found from url, then it is not counted as clickable. clickable http://example.com clickable < me@here.com> not clickable; contains space <5dko56$1@news02.deltanet.com> Message-Id, not clickable -http://example.com hyphen, not clickable http://$EXAMPLE variable. not clickable =back =head2 Lists and bullets =over 4 =item * The bulletin table is constructed if there is "o" or "*" at column 8 and 3 spaces after it, so that text starts at column 12. Bulleted lines are advised to be kept together; no spaces between bullet blocks. * This is a bullet * This is a bullte Another example: o This is a bullet o This is a bullet List example: . This is an ordered list . This is an ordered list =item * The ordered list is started with ".", a dot, and written like bullet where text starts at column 12. =back =head2 Line breaks =over 4 =item * All line breaks are visible in your document, do not use more than one line break to separate paragraphs. =item * Very important is that there is only I line break after headings. =back =head1 EMBEDDED DIRECTIVES INSIDE TEXT =over 4 =item Command line options You can cancel obeying all embedded directives by supplying option B<--not2html-tags>. You can include these lines anywhere in the document and their content is included in HTML output. Each directive line must fit in one line and it cannot be broken to separate lines. #T2HTML-TITLE #T2HTML-EMAIL #T2HTML-AUTHOR #T2HTML-DOC #T2HTML-METAKEYWORDS #T2HTML-METADESCRIPTION You can pass command line options embedded in the file. Like if you wanted the CODE section (column 12) to be coloured with shade of gray, you could add: #T2HTML-OPTION --css-code-bg Or you could request turning on particular options. Notice that each line is exactly as you have passed the argument in command line. Imagine surrounding double quoted around lines that are arguments to the associated options. #T2HTML-OPTION --as-is #T2HTML-OPTION --quiet #T2HTML-OPTION --language #T2HTML-OPTION en #T2HTML-OPTION --css-font-type #T2HTML-OPTION Trebuchet MS #T2HTML-OPTION --css-code-bg #T2HTML-OPTION --css-code-note #T2HTML-OPTION (?:Note|Notice|Warning): You can also embed your own comments to the text. These are stripped away: #T2HTML-COMMENT You comment here #T2HTML-COMMENT You another comment here =item Embedding files #INCLUDE- command This is used to include the content into current current position. The URL can be a filename reference, where every $VAR is substituted from the environment variables. The tilde(~) expansion is not supported. The included filename is operating system supported path location. A prefix C disables any normal formatting. The file content is included as is. The URL can also be a HTTP reference to a remote location, whose content is included at the point. In case of remote content or when filename ends to extension C<.html> or C<.html>, the content is stripped in order to make the inclusion of the content possible. In picture below, only the lines within the BODY, marked with !!, are included: ... this text !! and more of this !! Examples: #INCLUDE-$HOME/lib/html/picture1.html #INCLUDE-http://www.example.com/code.html #INCLUDE-raw:example/code.html =item Embedding pictures #PIC command is used to include pictures into the text #PIC picture.png#Caption Text#Picture HTML attributes#align# (1) (2) (3) (4) 1. The NAME or URL address of the picture. Like image/this.png 2. The Text that appears below picture 3. Additional attributes that are attached inside tag. For , the line would read: #PIC some.png#Caption Text#width=200 length=200## 4. The position of image: "left" (default), "center", "right" Note: The C
will also become the ALT text of the image which is used in case the browser is not capable of showing pictures. You can suppress the ALT text with option B<--no-picture-alt>. =item Fragment identifiers for named tags #REF command is used for referring to HTML tag inside current document. The whole command must be placed on one single line and cannot be broken to multiple lines. An example: #REF #how_to_profile;(Note: profiling); (1) (2) 1. The NAME HTML tag reference in current document, a single word. This can also be a full URL link. You can get NAME list by enabling --toc-url-print option. 2. The clickable text is delimited by ; characters. =item Referring to external documents. C<#URL> tag can be used to embed URLs inline, so that the full link is not visible. Only the shown text is used to jump to URL. This directive cannot be broken to separate lines, #URL | | | Displayed, clickable, text Must be kept together An example: See search engine #URL =back =head1 TABLE OF CONTENT HEADING If there is heading 1, which is named exactly "Table of Contents", then all text up to next heading are discarded from the generated HTML file. This is done because program generates its own TOC. It is supposed that you use some text formatting program to generate the toc for you in .txt file and you do not maintain it manually. For example Emacs package I can be used. =head1 TROUBLESHOOTING =head2 Generated HTML document did not look what I intended Did you use editor that inseted TABs which inserts single ascii code (\t) and 8 spaces? check our editor's settings and prefer writing in-all-space format. The most common mistake is that there are extra newlines in the document. Keeep I empty line between headings and text, keep I empty line between paragraphs, keep I empty line between body text and bullet. Make it your mantra: I I I ... Next, you may have put text at wrong column position. Remember that the regular text is at column 8. If generated HTML suddendly starts using only one font, eg
, then
you have forgot to close the block. Make it read even, like this:

    Code block
	Code block
	Code block
    ;;  Add empty comment here to "close" the code example at column 12

Headings start with a big letter or number, likein "Heading", not
"heading". Double check the spelling.

=head1 EXAMPLES

To print the test page and demonstrate possibilities:

    t2html --test-page

To make simple HTML page without any meta information:

    t2html --title "Html Page Title" --author "Mr. Foo" \
	   --simple --out --print file.txt

If you have periodic post in email format, use B<--delete-email-headers> to
ignore the header text:

    t2html --out --print --delete-email-headers page.txt

To make page fast

    t2html --out --print page.txt

To convert page from a text document, including meta tags, buttons, colors
and frames. Pay attention to switch I<--html-body> which defines document
language.

    t2html                                              \
    --print                                             \
    --out                                               \
    --author    "Mr. foo"                               \
    --email     "foo@example.com"                       \
    --title     "This is manual page of page BAR"       \
    --html-body LANG=en                                 \
    --button-prev  previous.html                        \
    --button-top   index.html                           \
    --buttion-next next.html                            \
    --document  http://example.com/dir/this-page.html   \
    --url       manual.html                             \
    --css-code-bg                                       \
    --css-code-note '(?:Note|Notice|Warning):'          \
    --html-frame                                        \
    --disclaimer-file   $HOME/txt/my-html-footer.txt    \
    --meta-keywords    "language-en,manual,program"     \
    --meta-description "Bar program to do this that and more of those" \
    manual.txt

To check links and print status of all links in par with the http error
message (most verbose):

    t2html --link-check file.txt | tee link-error.log

To print only problematic links:

    t2html --link-check --quiet file.txt | tee link-error.log

To print terse output in egep -n like manner: line number, link and
error code:

    t2html --link-check-single --quiet file.txt | tee link-error.log

To check links from multiple pages and cache good links to separate file,
use B<--link-cache> option. The next link check will run much faster
because cached valid links will not be fetched again. At regular intervals
delete the link cache file to force complete check.

    t2html --link-check-single \
	   --link-cache $HOME/tmp/link.cache \
	   --quiet file.txt

To split large document into pieces, and convert each piece to HTML:

    t2html --split1 --split-name file.txt | t2html --simple --out

=head1 ENVIRONMENT

=over 4

=item B

If environment variable I is defined, it is used in footer for
contact address. Option B<--email> overrides environment setting.

=item B

The default language setting for switch C<--language> Make sure the
first two characters contains the language definition, like in:
LANG=en.iso88591

=back

=head1 SEE ALSO

asciidoc(1)
html2ps(1)
htmlpp(1)
markdown(1)

=head2 Related programs

Jan Kärrman  has written Perl html2ps which was 2004-11-11
available at http://www.tdb.uu.se/~jan/html2ps.html

HTML validator is at http://validator.w3.org/

iMATIX created htmlpp which is available from http://www.imatix.com and seen
2014-03-05 at http://legacy.imatix.com/html/htmlpp

Emacs minor mode to help writing documents based on TF layout is
available. See package tinytf.el in project
https://github.com/jaalto/project--emacs-tiny-tools

=head2 Standards

RFC B<1766> contains list of language codes at
http://www.rfc.net/

Latest HTML/XHTML and CSS specifications are at http://www.w3c.org/

=head2 ISO standards

B<639> Code for the representation of the names of languages
http://www.oasis-open.org/cover/iso639a.html

B<3166> Standard Country Codes
http://www.niso.org/3166.html and
http://www.netstrider.com/tutorials/HTMLRef/standards/

=head1 BUGS

The implementation was originally designed to work linewise, so it is
unfortunately impossible to add or modify any existing feature to look for
items that span more than one line.

As the options B<--xhtml> was much later added, it may not produce
completely syntactically valid markup.

=head1 SCRIPT CATEGORIES

CPAN/Administrative
html

=head1 PREREQUISITES

No additional Perl CPAN modules needed for text to HTML conversion.

=head1 COREQUISITES

If link check feature is used to to validate URL links, then following
modules are needed from Perl CPAN C C
and C

If you module C is available, it is used
instead of included link extracting algorithm.

=head1 AVAILABILITY

Homepage is at https://github.com/jaalto/project--perl-text2html

=head1 AUTHOR

Copyright (C) 1996-2020 

This program is free software; you can redistribute and/or modify
program under the terms of GNU General Public license either version 2
of the License, or (at your option) any later version.

This documentation may be distributed subject to the terms and
conditions set forth in GNU General Public License v2 or later; or, at
your option, distributed under the terms of GNU Free Documentation
License version 1.2 or later (GNU FDL).

=cut

sub Help (;$ $)
{
    my $id   = "$LIB.Help";
    my $msg  = shift;  # optional arg, why are we here...
    my $type = shift;  # optional arg, type

    if ($type eq -html)
    {
	$debug  and  print "$id: -html option\n";
	pod2html $PROGRAM_NAME;
    }
    elsif ($type eq -man)
    {
	$debug  and  print "$id: -man option\n";

	my %options;
	$options{center} = 'Perl Text to HTML Converter';

	my $parser = Pod::Man->new(%options);
	$parser->parse_from_file($PROGRAM_NAME);
    }
    else
    {
	$debug  and  print "$id: no options\n";

	system "pod2text $PROGRAM_NAME";
    }

    if (defined $msg)
    {
	print $msg;
	exit 1;
    }

    exit 0;
}

sub HelpCss ()
{
    print "\n\n"
    , "Default CSS and JAVA code inserted to the beginning of each file\n"
    , "See option --css-file to replace default CSS.\n"
    , JavaScript()
    ;

    exit 0;
}

# }}}
# {{{ misc

# ****************************************************************************
#
#   DESCRIPTION
#
#       Return minimum value
#
#   INPUT PARAMETERS
#
#       LIST
#
#   RETURN VALUES
#
#       $number
#
# ****************************************************************************

sub Min (@)
{
    (sort{$a <=> $b} @ARG)[0];
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Check if content looks like HTML
#
#   INPUT PARAMETERS
#
#       $arrayRef   reference to list.
#
#   RETURN VALUES
#
#       $status     True, if looks like HTML or XML
#
# ****************************************************************************

sub IsHTML ($)
{
    my $id = "$LIB.IsHTML";
    my ($arrRef) = @ARG;

    #   Search first 10 lines or lesss if there is not that many
    #   lines in array.

    local $ARG;
    my    $ret = 0;

    unless (defined $arrRef)
    {
	warn "$id: [ERROR] arrRef is not defined";
	return;
    }

    for (@$arrRef[0 .. Min(10, scalar(@$arrRef) -1) ] )
    {
	if (/<\s*(HTML|XML)\s*>/i)
	{
	    $ret = 1;
	    last;
	}
    }

    $debug  and  print "$id: RET [$ret]\n";

    $ret;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Load URL support libraries
#
#   INPUT PARAMETERS
#
#       none
#
#   RETURN VALUES
#
#       0       Error
#       1       Ok, support present
#
# ****************************************************************************

sub LoadUrlSupport ()
{
    my $id       = "$LIB.LoadUrlSupport";
    my $error    = 0;

    local *LoadLib = sub ($)
    {
	my $lib            = shift;
	local $EVAL_ERROR  = '';
	eval "use $lib";

	if ($EVAL_ERROR )
	{
	    warn "$id: $lib is not available [$EVAL_ERROR]\n";
	    $error++;
	}
    };

    LoadLib("LWP::UserAgent");
    LoadLib("HTML::Parse");
    LoadLib("HTML::FormatText");

    return 0 if $error;
    1;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Convert to Unix or dos styled path
#
#   INPUT PARAMETERS
#
#       $path       Path to convert
#       $unix       If non-zero, convert to unix slashes. If missing or zero,
#                   convert to dos paths.
#       $tail       if set, make sure there is trailing slash or backslash
#
#   RETURN VALUES
#
#       $           New path
#
# ****************************************************************************

sub PathConvert ($ ; $)
{
    my $id      = "$LIB.PathConvert";
    local $ARG  = shift;
    my $unix    = shift;
    my $trail   = shift;

    if (defined $unix)
    {
	s,\\,/,g;                   #font s/

	if ($trail)
	{
	    s,/*$,/,;               #font s/
	}
	else
	{
	    s,/+$,,;
	}
    }
    else
    {
	s,/,\\,g;                   #fonct s/

	if ($trail)
	{
	    s,\\*$,\\,;
	}
	else
	{
	    s,\\+$,,;
	}
    }

    $ARG;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Return HOME location if possible. Guess, if cannot determine.
#
#   INPUT PARAMETERS
#
#       None
#
#   RETURN VALUES
#
#       $dir
#
# ****************************************************************************

sub GetHomeDir ()
{
    my $id = "$LIB.GetHomeDir";

    my $ret;

    unless (defined $HOME )
    {
	print "$id: WARNING Please set environement variable HOME"
	    , " to your home directory location. In Win32 This might be c:/home"
	    ;
    }

    if (defined $HOME)
    {
	$ret = $HOME;
    }
    else
    {
	local $ARG;
	for (qw(~/tmp /tmp c:/temp))
	{
	    -d  and   $ret = $ARG, last;
	}
    }

    $debug   and   warn "$id: RETURN $ret\n";
    $ret;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Debug function: Print content of an array
#
#   INPUT PARAMETERS
#
#       $title      String to name the array or other information
#       \@array     Reference to an Array
#       $fh         [optional] Filehandle
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub PrintArray ($$;*)
{
    my $id = "$LIB.PrintArray";
    my ($title, $arrayRef, $fh) = @ARG;

    if (defined $arrayRef)
    {
	$fh       = $fh || \*STDERR;
	my $i     = 1;
	my $count = @$arrayRef;

	print $fh "\n ------ ARRAY BEG $title\n";

	for (@$arrayRef)
	{
	    print $fh "[$i/$count] $ARG\n";
	    $i++;
	}

	print $fh " ------ ARRAY END $title\n";
    }
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Print Array
#
#   INPUT PARAMETERS
#
#       $name       The name of the array
#       @array      array itself
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub PrintArray2 ($ @)
{
    my $id = "$LIB.PrintArray";
    my ($name, @arr) = @ARG;

    local $ARG;

    my $i     = 0;
    my $count = @arr;

    warn "$id: $name is empty"  if  not @arr;

    for (@arr)
    {
	warn "$id: $name\[$i\] = $ARG/$count\n";
	$i++;
    }
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Debug function: Print content of a hash
#
#   INPUT PARAMETERS
#
#       $title      String to name the array or other information
#       \%array     Reference to a hash
#       $fh         [optional] Filehandle. Default is \*STDOUT
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub PrintHash ($$;*)
{
    my $id = "$LIB.PrintHash";
    my ($title, $hashRef, $fh) = @ARG;

    $fh = $fh || \*STDOUT;

    my ($i, $out) = (0, "");

    print $fh "\n ------ HASH $title -----------\n";

    for (sort keys %$hashRef)
    {
	if ($$hashRef{$ARG})
	{
	    $out = $$hashRef{ $ARG };

	    if (ref $out eq  "ARRAY")
	    {
		$out = "ARRAY => @$out";
	    }
	}
	else
	{
	    $out = "";
	}
	print $fh "$i / $ARG = $out \n";
	$i++;
    }
    print $fh " ------ END $title ------------\n";
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Check that email variables is good. if not ok.
#
#   INPUT PARAMETERS
#
#       $email
#
#   RETURN VALUES
#
#       none
#
# ****************************************************************************

sub CheckEmail ($)
{
    my $id    = "$LIB.CheckEmail";
    my $email = shift;

    $debug  and  print "$id: check [$email]\n";

    not defined $email  and  Help "--email missing";

    if  ($email =~ /^\S*$/)         # Contains something
    {
	if  ($email !~ /@/  or  $email =~ /[<>]/)
	{
	    die "Invalid EMAIL [$email]. It must not contain characters <> "
	      , "or you didn't include \@\n"
	      , "Example: me\@example.com"
	      ;
	}
    }
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Remove Headers from the text array.
#
#   INPUT PARAMETERS
#
#       \@array     Text
#
#   RETURN VALUES
#
#       \@array
#
# ****************************************************************************

sub DeleteEmailHeaders ($)
{
    my $id    = "$LIB.DeleteEmailHeaders";
    my ($txt) = @ARG;

    unless (defined $txt)
    {
	warn "$id: \$txt is not defined";
	return;
    }

    my (@array, $body);
    my $line = @$txt[0];

    if ($line !~ /^[-\w]+:|^From/)
    {
	$debug  and print "$id: Skipped, no email ", @$txt[0];
	@array = @$txt;
    }
    else
    {
	for $line (@$txt)
	{
	    next if   $body == 0  and  $line !~ /^\s*$/;

	    unless ($body)
	    {
		$body = 1;
		next;                           # Ignore one empty line
	    }

	    push @array, $line;
	}
    }

    \@array;
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Make clickable url
#
#   INPUT PARAMETERS
#
#       $ref        url reference or "none"
#       $txt        text
#       $attr       [optional] additional attributes
#
#   RETURN VALUES
#
#       $string     html code
#
# ****************************************************************************

sub MakeUrlRef ($$;$)
{
    my $id = "$LIB.MakeUrlRef";
    my($ref, $txt, $attr) = @ARG;

    qq($txt);
}

# ****************************************************************************
#
#   DESCRIPTION
#
#       Make Picture URL tag
#
#   INPUT PARAMETERS
#
#       $ref        url reference or "none"
#       $txt        text
#       $attr       [optional] additional IMG attributes
#       $align      [optional] How to align picture: "left", "right",
#       $count      [optional] Picture number
#
#   RETURN VALUES
#
#       $string     html code
#
# ****************************************************************************

{
    my $staticReference = "";

sub MakeUrlPicture (%)
{
    my $id = "$LIB.MakeUrlPicture";

    my %arg     = @ARG;
    my $ref     = $arg{-url};
    my $txt     = $arg{-text};
    my $attr    = $arg{-attrib};
    my $align   = $arg{-align};
    my $nbr     = $arg{-number};

    if (not defined $align  or  not $align)
    {
	$align  = "left";
    }

    unless ($staticReference)
    {
	$staticReference = Language(-pic);
    }

    my $picText;
    $picText = "$staticReference $nbr. " if $nbr;

    my $alt;
    $alt = qq(alt="[$picText $ref]")  if  $PICTURE_ALT;

    #  td     .. align="center" valign="middle"
    #  table  .. width="220" height="300"
    #  img    .. width="180" height="250"

    my $ret = << "EOF";

$picText$txt
EOF $ret; }} # **************************************************************************** # # DESCRIPTION # # Check if Module is available. # # INPUT PARAMETERS # # $module Like 'LWP::UserAgent' # # RETURN VALUES # # 0 Error # 1 Ok, Module is present # # **************************************************************************** sub CheckModule ($) { my $id = "$LIB.CheckModule"; my ($module) = @ARG; # exists $INC{ $module ); eval "use $module"; $debug and warn "$id: $module => eval [$EVAL_ERROR] \n"; return 0 if $EVAL_ERROR; 1; } # **************************************************************************** # # DESCRIPTION # # Translate html back tho HTML href # <a href="... => "; }egi; s,</a>,,gi; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate html to text # # INPUT PARAMETERS # # $line html # # RETURN VALUES # # $line text # # **************************************************************************** sub XlatHtml2tag ($) { my $id = "$LIB.XlatHtml2tag"; local $ARG = shift; # According to "Mastering regular expressions: O'Reilly", the # /i is slower than charset [] # # s/a//i is slow # s/[aA]// is faster # s,»,,g; s,&,\&,gi; s,>,>,gi; s,<,<,gi; s,",\",gi; # dummy-comment to close opened quote (") # The special alphabet conversions s,ä,\xE4,g; # 228 Finnish a s,Ä,\xC4,g; # 196 s,ö,\xF6,g; # 246 Finnish o s,Ö,\xD6,g; # 214 s,å,\xE5,g; # 229 Swedish a s,Å,\xC5,g; # 197 s,ø,\xF8,g; # 248 Norweigian o s,Ø,\xD8,g; # 216 s,Ü,\xDC,g; # German big U diaresis s,ü,\xFC,g; s,ß,\xDF,g; # German ss s,§,§,g; # Law-sign s,½,½,g; # 1/2-sign s,£,\xA3,g; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate _word_ =word= *word* markup to HTML # # INPUT PARAMETERS # # $ARG string # $type -basic, Translate only the most basic things. # # RETURN VALUES # # $html # # **************************************************************************** { my $staticBegBold; my $staticEndBold; my $staticBegEmp; my $staticEndEmp; my $staticBegSmall; my $staticEndSmall; my $staticBegBig; my $staticEndBig; my $staticBegRef; my $staticEndRef; my $staticBegSup; my $staticEndSup; my $staticBegSub; my $staticEndSub; my $staticBegQuote; my $staticEndQuote; sub XlatWordMarkup ($; $) { my $id = "$LIB.XlatWordMarkup"; local $ARG = shift; my $type = shift; $debug > 2 and print "$id: INPUT $ARG"; return unless $ARG; # Prevent hash lookup, when these are set once. unless ($staticBegBold) { $staticBegBold = $COLUMN_HASH{ begbold }; $staticEndBold = $COLUMN_HASH{ endbold }; $staticBegEmp = $COLUMN_HASH{ begemp }; $staticEndEmp = $COLUMN_HASH{ endemp }; $staticBegSmall = $COLUMN_HASH{ begsmall }; $staticEndSmall = $COLUMN_HASH{ endsmall }; $staticBegBig = $COLUMN_HASH{ begbig }; $staticEndBig = $COLUMN_HASH{ endbig }; $staticBegRef = $COLUMN_HASH{ begref }; $staticEndRef = $COLUMN_HASH{ endref }; $staticBegSup = $COLUMN_HASH{ superscriptbeg }; $staticEndSup = $COLUMN_HASH{ superscriptend }; $staticBegSub = $COLUMN_HASH{ subscriptbeg }; $staticEndSub = $COLUMN_HASH{ subscriptend }; $staticBegQuote = $COLUMN_HASH{ begquote }; $staticEndQuote = $COLUMN_HASH{ endquote }; } my ($beg, $end); my $prefix = '(?:[\s>=+*_\"()]|^)'; # Handle `this' text $beg = $staticBegQuote; $end = $staticEndQuote; s,($prefix)\`(\S+?)\',$1$beg$2$end,g; $debug > 3 and print "$id: after \`this' [$ARG]"; # Handle _this_ text # # The '>' is included in the start of the regexp because this # may be the end of html tag and there may not be a space # # `;' is included because the HTML is already expanded, like # quotation mark(") becomed " $beg = $staticBegBold; $end = $staticEndBold; s,($prefix)_(\S+?)_,$1$beg$2$end,g; $debug > 3 and print "$id: after _this_ [$ARG]"; # Handle *this* text $beg = $staticBegEmp; $end = $staticEndEmp; $debug > 3 and print "$id: after *this* [$ARG]"; if ( s,($prefix)\*(\S+?)\*,$1$beg$2$end,g) { # For debug only # warn "$id: $ARG"; # die if m,Joka,; } $debug > 3 and print "$id: after *this2* [$ARG]"; # Handle =small= text $beg = $staticBegSmall; $end = $staticEndSmall; s{ ($prefix) =(\S+)= } {$1$beg$2$end}gx; $debug > 3 and print "$id: after =this= [$ARG]"; $beg = $staticBegBig; $end = $staticEndBig; s,($prefix)\+(\S+?)\+,$1$beg$2$end,g; $debug > 3 and print "$id: after +this+ [$ARG]"; unless ($type eq -basic) { # [Mike] referred to [rfc822] $beg = $staticBegRef; $end = $staticEndRef; s{ ($prefix) \[ ([[:alpha:]]\S*) \] ([\s,.!?:;]|$) } {$1$beg\[$2\]$end$3}gx; $debug > 3 and print "$id: after [this] [$ARG]"; # [Figure: this here] s{ ([\s>]) \[ (\s*[^][\r\n]+[\s][^][\n\r]+) \] } {$1$beg\[$2\]$end}gx; $debug > 3 and print "$id: after [this here] [$ARG]"; # Value 1234[[10]] is base 10. $beg = $staticBegSub; $end = $staticEndSub; s{ ([^\s\'\",!?;.(<>]) \[\[ ([^][\r\n]+) \]\] ([\s\,.:;<>)]|$) } {$1$beg$2$end$3}gx; $debug > 3 and print "$id: after this[subscript] [$ARG]"; # Superscripts, raised to a "power" # professor John says[1] $beg = $staticBegSup; $end = $staticEndSup; s{ ([^\s\'\",!?;.(<>]) \[ ([^][\r\n]+) \] ([\s\,.:;)<>]|$) } {$1$beg$2$end$3}gx; $debug > 3 and print "$id: after this[superscript] [$ARG]"; } $debug > 2 and print "$id: RETURN $ARG"; $ARG; }} # **************************************************************************** # # DESCRIPTION # # Translate some special characters into Html codes. # # See "Standard Character entity" # http://www.stephstuff.com/ISOCactrs4.html # # INPUT PARAMETERS # # $line text # # RETURN VALUES # # $line html # # **************************************************************************** sub XlatTag2html ($) { my $id = "$LIB.XlatTag2html"; local $ARG = shift; my $localDebug = 1 if $debug > 5; $localDebug and print "$id: INPUT [$ARG]\n"; return unless $ARG; # Leave alone all HTML entities, like ² s,\&(?![a-zA-z][a-z]+[123]?;|#\d\d\d;),&,g; $localDebug and print "$id: -0- $ARG\n"; unless (/<<|>>/) { # You can write PURE HTML inside text like this: # # <> # # We do not want to translate this line into # # <
> s,\",",g; # dummy-coment " to fix Emacs font-lock highlighting } # Hand Debug. Turn this on, if converson does not work. # $localDebug = 1 if /<<|>>/; $localDebug and print "$id: -1- $ARG\n"; # This code uses negative look-behind and looh-ahead regexp. The idea # is that # # <> is rendered as embedded # ),<<,g; # # Because it converts: # # <
# | # Can't know that there is not yet ">" like in <
> # # Whereas this would be valid # # cat file <>,$1>>,go; s,<<($re),<<$1,go; $localDebug and print "$id: -2- $ARG\n"; s,(?)>(?!>),>,g; s,(?>, convert it into standard HTML tag. s,>>,>,g; s,<<,<,g; $localDebug and print "$id: -4- $ARG\n"; # The special alphabet conversions s,\xE4,ä,g; # 228 Finnish a s,\xC4,Ä,g; # 196 s,\xF6,ö,g; # 246 Finnish o s,\xD6,Ö,g; # 214 s,\xE5,å,g; # 229 Swedish a s,\xC5,Å,g; # 197 s,\xF8,ø,g; # 248 Norweigian o s,\xD8,Ø,g; # 216 # German characters s,\xDC,Ü,g; # big U diaresis s,\xFC,ü,g; s,\xDF,ß,g; # ss # French s,\xE9,é,g; # e + forward accent (') s,\xC9,É,g; # Spanish s,\xD1,ñ,g; # n + accent (~) s,\xF1,Ñ,g; # Other signs s,\xA7,§,g; # Law-sign s,\xBD,½,g; # 1/2-sign s,\xA3,£,g; # Pound s,\xAB,«,g; # << s,\xBB,»,g; # >> $debug and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate convertions in this program's markup to HTML. # Like "--" will become – # # INPUT PARAMETERS # # $line text # # RETURN VALUES # # $line html # # **************************************************************************** sub XlatTag2htmlSpecial ($) { my $id = "$LIB.XlatTag2htmlSpecial"; local $ARG = shift; return unless $ARG; # -- long dash s,(\s)--(\s|$),$1–$2,g; # +-40 s,([+][-]|[-][+])(\d),±$2,g; # European Union currency: 400e s,(\d)e(\s|$),$1 €$2,g; # Some frequent tokens, like # (C) Copyright) sign, # (R) Registered trade mark # 3 (0)C Celsius degrees s,([.\,;\s\d ])\Q(C)\E([\s\w]),$1©$2,g; s,([.\,;\s\d ])\Q(0)\E([\s\w]),$1°$2,g; s,([.\,;\s\d ])\Q(R)\E([\s\w]),$1®$2,g; $debug and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate $REF special markers to clickable html. # A reference link looks like # # #REF link-to; shown text; # # INPUT PARAMETERS # # $line # # RETURN VALUES # # $html # # **************************************************************************** sub XlatRef ($) { my $id = "$LIB.XlatRef"; local $ARG = shift; if (/(.*)#REF\s+(.*)\s*;(.*);(.*)/) { # There already may be absolute reference, check it first # # http:/www.example.com#referece_here # $s2 = "#$s2" if not /(\#REF.+\#)/ and /ftp:|htp:/; $debug and print "$id: #REF--> [$1]\n [$2]\n [$3]\n [$ARG]"; $ARG = $1 . MakeUrlRef($2, $3) . $4; unless ($ARG =~ /#|https?:|file:|news:|wais:|ftp:/) { warn "$id: Suspicious REF. Did you forgot # or http?\n\t$ARG" } $debug and print "$id:LINE[$ARG]"; } elsif (/#REF.+#/) { warn "$id: Suspicious #REF format [$ARG]. Must contain hash-sign(#)"; } $debug > 2 and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate PIC special markers to pictures # # #PIC link-to; caption text; image-attributes; # # INPUT PARAMETERS # # $line # # RETURN VALUES # # $html # # **************************************************************************** { my $staticPicCount = 0; sub XlatPicture ($) { my $id = "$LIB.XlatPicture"; local $ARG = shift; if (/(.*)#PIC\s+([^#]+[^ #])\s*#\s*(.*)#\s*(.*)#\s*(.*)#(.*)/) { my ($before, $url, $text, $attr, $align, $rest) = ($1, $2, $3, $4, $5, $6); # This is used to number each picture as it appears $staticPicCount++; # There already may be absolute reference, check it first # # http:/www.example.com#referece_here $debug and warn "$id: #PIC-->\n\$1[$1]\n\$2[$2]\n\$3[$3]\nLINE[$ARG]"; my $pictureHtml = MakeUrlPicture -url => $url , -text => XlatWordMarkup($text, -basic) , -attrib => $attr , -align => $align , -number => $staticPicCount ; $ARG = $before . $pictureHtml . $rest; # Try finding .gif .jpg .png or something ... unless (m,\.[a-z][a-z][a-z],i) { warn "$id: Suspicious #PIC [$ARG]. Did you forgot .png .jpg ...?" } $debug and warn "$id:LINE[$ARG]"; } elsif (/#PIC.*#/) { warn "$id: Suspicious #PIC format [$ARG]. Must have 3 separators(#)"; } $debug > 2 and print "$id: RET [$ARG]\n"; $ARG; }} # **************************************************************************** # # DESCRIPTION # # Search all named directived that start with #T2HTML- # and return their values. The lines are removed from the text. # # #T2HTML-TITLE This is the HTML file title # #T2HTML-EMAIL foo@somewhere.net # ... # # INPUT PARAMETERS # # @content The HTML file. # # RETURN VALUES # # \%directives key => [ value, value ...] # @content Lines matching #T2HTML have been removed. # # **************************************************************************** sub XlatDirectives (@) { my $id = "$LIB.XlatDirectives"; my (@content) = @ARG; ! @content and die "$id: \@content is empty"; local $ARG; my (@ret, %hash); $debug and print "$id: line count: ", scalar @content, "\n"; for (@content) { if (/^(.*)\s*#T2HTML-(\S+)\s+(.*\S)/i) # Directive + value { $debug > 2 and warn "$id: if-1 [$ARG]\n"; my ($line, $name, $value) = ($1, $2, $3); $debug > 2 and warn "$id: if-2 ($name,$value,[$line])\n"; push @ret, $line . "\n" if $line =~ /\S/; $name = lc $name; next if $name =~ /comment/i; $verb > 1 and print "$id: if-3 [$name] = [$value]\n"; unless (defined $hash{$name}) { $hash{ $name } = [$value]; } else { my $arrRef = $hash{ $name }; push @$arrRef, $value; $hash{ $name } = $arrRef; } } elsif (/^(.*)\s*#T2HTML-(\S+)/i) # Plain directive { # Empty directive $debug and print "$id: elsif 1 $ARG"; my $line = $1; $debug > 2 and warn "$id: elsif 2 [$line]\n"; push @ret, $line if $line =~ /\S/; } else { push @ret, $ARG; } } $debug and PrintHash("$id: RET", \%hash); \%hash, @ret; } # **************************************************************************** # # DESCRIPTION # # Check if we accept URL. Any foo|bar|baz|quu|test or the like # is discarded. In exmaples, you should use "example" domain # that is Valud, but non-sensial. (See RFCs for more) # # http://www.example.com/ # ftp:/ftp.example.com/ # # INPUT PARAMETERS # # $url # # RETURN VALUES # # 1, 0 # # **************************************************************************** sub AcceptUrl($) { if ($ARG[0] !~ m,\b(foo |baz |quu[zx])\b |:/\S*\.?example\. |example\.(com|net|org) |:/test\. ,x ) { 1; } else { 0; } } # **************************************************************************** # # DESCRIPTION # # Translate URL special markers for inline texts # # #URL # # INPUT PARAMETERS # # $line # # RETURN VALUES # # $html # # **************************************************************************** sub XlatUrlInline ($) { my $id = "$LIB.XlatUrlInline"; local $ARG = shift; s { (.*?) \#URL \s* < (.+?) > \s* < (.+?) > } { my $before = $1; my $url = $2; my $inline = $3; qq($before$inline); }gmex; $debug > 2 and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate url references to clickable html format # # INPUT PARAMETERS # # $line # # RETURN VALUES # # $html # # **************************************************************************** sub XlatUrl ($) { my $id = "$LIB.XlatUrl"; local $ARG = shift; my ($url, $pre); # Already handled? return $ARG if /a href/i; s { ([^\"]?) # Emacs font-lock comment to terminate opening " (?. New sentence starts here. [^][\s<>]+ # Beginning and "in between characters" [^\s,.!?;:<>] # End character for URL, not a sentence punctuation ) } { $pre = $1; $url = $2; $debug > 4 and print "$id: PRE=[$pre] URL=[$url]\n"; # Unfortunately the Link that is passed to us has already # gone through conversion of "<" and ">" as in # so we must treat the ending # ">" as a separate case my $last = ""; if ($url =~ /(>?.*)/i) { $last = $1; $url =~ s/>?.*//; } # Do not make -http://some.com clickable. Remove "-" in # front of the URL. my $clickable = 1; if ($pre =~ /-/) { $clickable = 0; $pre = ""; } $debug > 4 and print "$id: ARG=[$ARG] pre=[$pre] url=[$url] " , " click=$clickable, accept=", AcceptUrl $url, "\n"; if (not $clickable or not AcceptUrl $url) { $pre . $url . $last ; } else { # When we make HREF target to point to "_top", then # the destination page will occupy whole browser window # automatically and delete any existing frames. # # --> Destination may freely set up its own frames my $opt = qq!target="_top"! ; $opt = ''; # disabled for now. join '' , $pre , MakeUrlRef( $url, $url, $opt ) , $last ; } }egix; $debug > 2 and print "$id: RET=[$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Translate email references to clickable html format # # INPUT PARAMETERS # # $line # # RETURN VALUES # # $html # # **************************************************************************** sub XlatMailto ($) { my $id = "$LIB.Mailto"; local $ARG = shift; # Handle Mail references, we need while because there may be # multiple mail addresses on the line # # A special case; in text there may be written like these. They are NOT # clickable email addresses. # # References: <5dfqlm$m50@basement.replay.com> # Message-ID: <5dko56$1lv$1@news02.deltanet.com> # # Ignore certain email addresses like # foo@example.com bar@example.com ... that are used as examples # in the document. # # Ignore also any address that is like # - Leading dash # < addr@example.com> space follows character < s { (^|.) # must not start with "-" < # html < tag. ([^ \t$<>]+@[^ \t$<>]+) > } { my $pre = $1; my $url = $2; my $clickable = 1; if ($pre eq '-') { $clickable = 0; $pre = ""; } if (not $clickable or not AcceptUrl $url) { $pre . $url; } else { $pre . "" . MakeUrlRef( "mailto:$url" , $url) . "" } }egx; $debug > 2 and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Return standard Unix date # # Tue, 20 Aug 1999 14:25:27 GMT # # The HTML 4.0 specification gives an example date in that format in # chapter "Attribute definitions". # # INPUT PARAMETERS # # $ How many days before expiring # # RETURN VALUES # # $str # # **************************************************************************** sub GetExpiryDate (;$) { my $id = "$LIB.GetExpiryDate"; my $days = shift || 60; # 60 days Expiry period, about two months gmtime(time + 60*60*24 * $days) =~ /(...)( ...)( ..)( .{8})( ....)/; "$1,$3$2$5$4 GMT"; } # **************************************************************************** # # DESCRIPTION # # Return ISO 8601 date YYYY-MM-DD HH:MM # # INPUT PARAMETERS # # none # # RETURN VALUES # # $str # # **************************************************************************** sub GetDate () { my $id = "$LIB.GetDate"; my (@time) = localtime(time); my $YY = 1900 + $time[5]; my ($DD, $MM) = @time[3..4]; my ($mm, $hh) = @time[1..2]; $debug and warn "$id: @time\n"; # Count from zero, That's why +1. sprintf "%d-%02d-%02d %02d:%02d", $YY, $MM + 1, $DD, $hh, $mm; } # **************************************************************************** # # DESCRIPTION # # Return ISO 8601 date YYYY-MM-DD HH:MM # # INPUT PARAMETERS # # none # # RETURN VALUES # # $str # # **************************************************************************** sub GetDateYear () { my $id = "$LIB.GetDateYear"; my (@time) = localtime(time); my $YY = 1900 + $time[5]; $debug and warn "$id: @time\n"; # I do not know why Month(MM) is one less that the number month # in my calendar. That's why +1. Does it count from zero? $YY; } # **************************************************************************** # # DESCRIPTION # # Return approproate sentence in requested language. # # INPUT PARAMETERS # # $token The name of the token to get. e.g "-toc" # # RETURN VALUES # # $string String in the set language. See --language switch # # **************************************************************************** sub Language ($) { my $id = "$LIB.Language"; XlatTag2html $LANGUAGE_HASH{ shift() }{ $LANG_ISO }; } # **************************************************************************** # # DESCRIPTION # # Add string to filename. file.html --> fileSTRING.html # # INPUT PARAMETERS # # $file filename # $string string to add to the adn of name, but before extension # $extension # # RETURN VALUES # # $file # # **************************************************************************** sub FileNameChange ($$;$) { my $id = "$LIB.FileNameChange"; my ($file, $string , $ext) = @ARG; my ($filename, $path, $extension) = fileparse $file, '\.[^.]+$'; #font ' my $ret = $path . $filename . $string . ($ext or $extension); $debug and print "$id: RET $ret\n"; $ret; } # **************************************************************************** # # DESCRIPTION # # Return frame's file name # # INPUT PARAMETERS # # $type "-frm", "-toc", "-txt" # # USE GLOBAL # # $ARG_PATH # # RETURN VALUES # # $file # # **************************************************************************** sub FileFrameName ($) { my $id = "$LIB.FileFrameName"; my $type = shift; if ($ARG_PATH ne '') { $debug and print "$id: $ARG_PATH + $type + .html\n"; FileNameChange $ARG_PATH, $type, ".html"; } } sub FileFrameNameMain() { FileFrameName "" } sub FileFrameNameToc() { FileFrameName "-toc" } sub FileFrameNameBody() { FileFrameName "-body" } # **************************************************************************** # # DESCRIPTION # # CLOSURE. Return new filename file.txt-NNN based on initial values. # Each NNN is incremented during call. # # INPUT PARAMETERS # # $file starting filename # $heading Flag. If 1, generate name from headings, instead of # numeric names. # # RETURN VALUES # # &Sub($) Anonymous subroutine that must be called with string. # # **************************************************************************** sub GeneratefileName ($;$) { my $id = "$LIB.GeneratefileName"; my ($file, $headings ) = @ARG; if ($headings) { return sub { my $line = shift; not defined $line and croak "You must pass one ARG"; not $line =~ /[a-z]/ and croak "ARG must contain some words. Cannot make filename"; sprintf "$file-%s", MakeHeadingName($line); } } else { my $i = 0; return sub { # this function ignores passed ARGS sprintf "$file-%03d", $i++; } } } # **************************************************************************** # # DESCRIPTION # # Write content to file # # INPUT PARAMETERS # # $file # \@content reference to array (text) or plain string. # # RETURN VALUES # # @ list of filenames # # **************************************************************************** sub WriteFile ($$) { my $id = "$LIB.WriteFile"; my ($file, $value) = @ARG; unless (defined $value) { warn "$id: \$value is not defined"; return; } open my $FILE, ">", $file or die "$id: Cannot write to [$file] $ERRNO"; binmode $FILE; my $type = ref $value; $debug and warn "$id: TYPE [$type]\n"; if ($type eq "ARRAY") { print $FILE @$value; } elsif (not $type) { print $FILE $value; } close $FILE; $debug and warn "$id: Wrote $file\n"; } # **************************************************************************** # # DESCRIPTION # # Split text into separate files file.txt-NNN, search REGEXP. # Files are ruthlessly overwritten. # # INPUT PARAMETERS # # $regexp If found. The line is discarded and anything gathered # for far is printed to file. This is the Split point. # $file Used in split mode only to generate multiple files. # $useNames Flag. If set compose filenames based on REGEXP split. # \@content text # # RETURN VALUES # # @ list of filenames # # **************************************************************************** sub SplitToFiles ($ $$ $) { my $id = "$LIB.SplitToFiles"; my ($regexp, $file, $useNames, $array) = @ARG; unless (defined $array) { warn "$id: [ERROR] \$array is not defined"; return; } my (@fileArray, @tmp); my $FileName = GeneratefileName $file, $useNames; local $ARG; for (@$array ) { if (/$regexp/o && @tmp) { # Get the first line that matched and use it as filename # base my ($match) = grep /$regexp/o, @tmp; my $name = &$FileName( $match); WriteFile $name, \@tmp; @tmp = (); push @tmp, $ARG; push @fileArray, $name; } else { push @tmp, $ARG; } } if ( @tmp) # last block { my $name = &$FileName($tmp[0]); WriteFile $name, \@tmp; push @fileArray, $name; } @fileArray; } # **************************************************************************** # # DESCRIPTION # # Expand environmetn variables in STRING. # # INPUT PARAMETERS # # $str String to process # # RETURN VALUES # # $out Expanded # # **************************************************************************** sub EnvExpand ($) { my $id = "$LIB.EnvExpand"; local($ARG) = @ARG; $debug and print "$id: INPUT [$ARG]\n"; # Substitution must happen so that longest match takes # precedence. my $val; for my $key (sort {length($b) <=> length($a)} keys %ENV) { $val = $ENV{$key}; s/\$$key/$val/; } $debug and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Remove everything up till and after . This effectively # makes it possible to have clean HTML whis is not a "page" any more. # The portion marked with !! to the right is preserved, everything else # is stripped. # # # # ... # # # This text !! # And more of this !! # # # # INPUT PARAMETERS # # $str String to process # # RETURN VALUES # # $content # # **************************************************************************** sub RemoveHTMLaround ($) { my $id = "$LIB.RemoveHTML"; local($ARG) = @ARG; $debug > 2 and print "$id: [$ARG]\n"; # Remove # Delete everything after s,^.+<\s*body\s*>,,i; s,<\s*/\s*body\s*>.*,,i; # Malformed web paged do not even bother to use BODY, so # try if there are HEAD or HTML and kill those s,^.+<\s*/\s*head\s*>,,i; s,^.*<\s*html\s*>.*,,i; s,<\s*/\s*html\s*>.*,,i; $ARG; } # **************************************************************************** # # DESCRIPTION # # Return content of URL as string. # # INPUT PARAMETERS # # $url File path or HTTL URL. # # RETURN VALUES # # $content This value is empty if couldn't read URL. # # **************************************************************************** sub UrlInclude (%) { my $id = "$LIB.UrlInclude"; my %arg = @ARG; my $dir = $arg{-dir}; my $url = $arg{-url}; my $mode = $arg{-mode}; $debug and print "$id: url [$url] dir [$dir] mode [$mode]\n"; my $ret; if ($MODULE_LWP_OK and $url =~ m,http://,i) { my $ua = new LWP::UserAgent; my $req = new HTTP::Request(GET => $url); my $response = $ua->request($req); my $ok = $response->is_success(); $debug and print "$id: GET status $ok\n"; if ($ok) { $ret = $response->content(); $debug > 2 and print "$id: content BEFORE =>\n$ret\n"; $ret = RemoveHTMLaround $ret; } } else { # 1) There is no path, so use current directory # 2) It start with relative path ../ if ($dir and ($url !~ m,[/\\], or $url =~ m,^[.],, ) ) { $debug > 2 and print "$id: dir added: $dir + $url\n"; $url = "$dir/" . $url; } local *FILE; $url = EnvExpand $url; unless (open FILE, "<", $url) { $verb and warn "[WARN] Cannot open '$url' $ERRNO"; return; } $ret = join '', ; close FILE; if ($url =~ /\.s?html?/) { $ret = RemoveHTMLaround $ret; } $debug > 2 and print "$id: content of [$url] START:" . $ret . "$id: content of [$url] END:\n"; unless ($mode) { $ret = DoLineUserTags($ret); $ret = XlatTag2html $ret; $ret = XlatRef $ret; $ret = XlatUrlInline $ret; $ret = XlatUrl $ret; $ret = XlatPicture $ret; $ret = XlatMailto $ret; $ret = XlatWordMarkup $ret; } } $debug > 2 and print "$id: RET =>\n$ret\n"; $ret; } # }}} # {{{ misc - make # **************************************************************************** # # DESCRIPTION # # Return BASE. must be inside HEAD tag # # INPUT PARAMETERS # # $file html file # $attrib Additional attributes # # USES GLOBAL # # $BASE_URL # # RETURN VALUES # # $html # # **************************************************************************** sub Base (;$$) { my $id = "$LIB.Base"; my ($file, $attrib) = @ARG; if (defined $BASE_URL and $BASE_URL ne '') { qq( \n) ; } } # **************************************************************************** # # DESCRIPTION # # Return CSS Style sheet data without the tokens # The correct way to include external CSS is: # # # # RETURN VALUES # # code # # **************************************************************************** sub CssData (; $) { local ($ARG) = @ARG; $ARG = '' unless defined $ARG; my $bodyFontType = '' ; if (defined $CSS_FONT_TYPE) { # Css must end to ";", Add semicolon if it's missing. $bodyFontType = "font-family: $CSS_FONT_TYPE"; $bodyFontType .= ";" unless $bodyFontType =~ /;/; } my $bodyFontSize = ''; if (defined $CSS_FONT_SIZE) { $bodyFontSize = qq(font-size: $CSS_FONT_SIZE); $bodyFontSize .= ";" unless $bodyFontSize =~ /;/; } if (/toc/i) { $bodyFontSize = $CSS_BODY_FONT_SIZE_FRAME; } return qq( /* /////////////////////////////////////////////////////////// NOTE NOTE NOTE NOTE NOTE NOTE NOTE This is the default CSS 2.0 generated by the program, please see "t2html --help" for option --script-file to import your own CSS and Java definitions into the page. XHTML note: at page http://www.w3.org/TR/xhtml1/#guidelines It is recommended that CSS2 with XHTML use lowercase element and attribute names This default CSS2 has been validated according to http://jigsaw.w3.org/css-validator/validator-uri.html.en To design colors, visit: http://www.btexact.com/people/rigdence/colours/ NOTE NOTE NOTE NOTE NOTE NOTE NOTE /////////////////////////////////////////////////////////// Comments on the CSS tags: - block-width: "thin" (Netscape ok, MSIE nok) NETSCAPE 4.05 - In general does not render CSS very well. Eg font size changes does not show up in screen. - :hover property is not recognised NETSCAPE 4.75 as of 2000-10-01 - Shows garbage for stylesheet section that marked CITATION. (IE has no trouble to show it) MSIE 4.0+ - Renders CSS very well. Media types - Netscape does not transfer the CSS element definitions to the "print" media as it should. They only affect Browser or media "screen" - That is why you really have to say EM STRONG ... /STRONG EM to get that kind of text seen in printer too. You cannot just define P.column7 { ... } The \@media CSS definition is not supported by Netscape 4.05 I do not know if MSIE 4.0 supports it. So doing this would cause CSS to be ignored completely (never mind that CSS says the default CSS applies to "visual", which means both print and scree types.) \@media print, screen { P.code {..} } To work around that, we separate the definitions with P.code { .. } // For screen \@media print { P.code // for printer { .. }} And wish that some newer browser will render it right. */ /* ///////////////////////////////////////////////// HEADINGS */ h1.default { font-family: bold x-large Arial,helvetica,Sans-serif; padding-top: 10pt; } h2.default { font-family: bold large Arial,Helvetica,Sans-serif; } h3.default { font-family: bold medium Arial,Helvetica,Sans-serif; } h4.default { font-family: medium Arial,Helvetica,Sans-serif; } /* ////////////////////////// Make pointing AHREF more visual */ body { $bodyFontType $bodyFontSize /* More readable font, Like Arial in MS Word The background color is grey font-family: "verdana", sans-serif; background-color: #dddddd; foreground-color: #000000; Traditional "Book" and newspaper font font-family: "Times New Roman", serif; */ } a:link { font-style: italic; } /* A name=... link references */ a.name { font-style: normal; } a:hover { color: purple; background: #AFB; text-decoration: none; } /* cancel above italic in TOC and Navigation buttons */ a.btn:link { font-style: normal; } /* each link in TOC */ a.toc { font-family: verdana, sans-serif; font-style: normal; } a.toc:link { font-style: normal; } /* [toc] heading button which appears in non-frame html */ a.btn-toc:link { font-style: normal; font-family: verdana, sans-serif; /* font-size: 0.7em; */ } /* //////////////////////////////////// Format the code sections */ /* MSIE ok, Netscape nok: Indent text to same level to the right */ blockquote { margin-right: 2em; } \@media print { BLOCKQUOTE { margin-right: 0; }} samp.code { color: Navy; } hr.special { width: 50%; text-align; left; } pre { font-family: "Courier New", monospace; font-size: 0.8em; margin-top: 1em; margin-bottom: 1em; } pre.code { color: Navy; } p.code, p.code1, p.code2 { /* margin-top: 0.4em; margin-bottom: 0.4em; line-height: 0.9em; */ font-family: "Courier New", monospace; font-size: 0.8em; color: Navy; } /* //////////////////////// tables /////////////////////////// */ table { border: none; width: 100%; cellpadding: 10px; cellspacing: 0px; } table.basic { font-family: "Courier New", monospace; color: Navy; } table.dashed { /* font-family: sans-serif; /* /* background: #F7DE9C; */ color: Navy; border-top: 1px #999999 solid; border-left: 1px #999999 solid; border-right: 1px #666666 solid; border-bottom: 1px #666666 solid; border-width: 94%; border-style: dashed; /* dotted */ /* line-height: 105%; */ } table.solid { font-family: "Courier New", monospace; /* afont-size: 0.8em; */ color: Navy; /* font-family: sans-serif; /* /* background: #F7DE9C; */ border-top: 1px #CCCCCC solid; border-left: 1px #CCCCCC solid; /* 999999 */ border-right: 1px #666666 solid; border-bottom: 1px #666666 solid; /* dark grey */ /* line-height: 105%; */ } /* Make 3D styled layout by thickening the boton + right. */ table.shade-3d { font-family: "Courier New", monospace; font-size: 0.8em; color: #999999; /* Navy; */ /* font-family: sans-serif; /* /* background: #F7DE9C; */ /* border-top: 1px #999999 solid; */ /* border-left: 1px #999999 solid; */ border-right: 4px #666666 solid; border-bottom: 3px #666666 solid; /* line-height: 105%; */ } .shade-3d-attrib { /* F9EDCC Light Orange FAEFD2 Even lighter Orange #FFFFCC Light yellow, lime */ background: #FFFFCC; } table tr td pre { /* Make PRE tables "airy" */ margin-top: 1em; margin-bottom: 1em; } table.shade-normal { font-family: "Courier New", monospace; /* font-size: 0.9em; */ color: Navy; } .shade-normal-attrib { /* grey: EAEAEA, F0F0F0 FFFFCC lime: F7F7DE CCFFCC pinkish: E6F1FD D8E9FB C6DEFA FFEEFF (light ... darker) slightly darker than F1F1F1: #EFEFEF; */ background: #F1F1F1; } table.shade-normal2 { font-family: "Courier New", monospace; } .shade-normal2-attrib { background: #E0E0F0; } .shade-note-attrib { /* darker is #E0E0F0; */ /* background: #E5ECF3; */ background: #E5ECF3; font-family: Georgia, "New Century Schoolbook", Palatino, Verdana, Helvetica, serif; font-size: 0.8em; } /* ..................................... colors ................. */ .color-white { color: Navy; background: #FFFFFF; } .color-fg-navy { color: navy; } .color-fg-blue { color: blue; } .color-fg-teal { color: teal; } /* Nice combination: teal-dark, beige2 and beige-dark */ .color-teal-dark { color: #96EFF2; } .color-beige { color: Navy; background: #F7F7DE; } .color-beige2 { color: Navy; background: #FAFACA; } .color-beige3 { color: Navy; background: #F5F5E9; } .color-beige-dark { color: Navy; background: #CFEFBD; } .color-pink-dark { background: #E6F1FD; } .color-pink-medium { background: #D8E9FB; } .color-pink { /* grey: EAEAEA, F0F0F0 FFFFCC lime: F7F7DE CCFFCC pinkish: E6F1FD D8E9FB C6DEFA FFEEFF (light ... darker) */ background: #C6DEFA; } .color-pink-light { background: #FFEEFF; } .color-blue-light { background: #F0F0FF; } .color-blue-medium { background: #4A88BE; } /* ////////////////////////////////////////////// Format columns */ p.column3 { color: Green; } p.column5 { color: #87C0FF; /* shaded casual blue */ } p.column6 { /* #809F69 is Forest green But web safe colors are: Lighter ForestGreen: 66CC00 ForestGreen: #999966 669900 339900 669966 color: #669900; font-family: "Goudy Old Style" */ margin-left: 3em; font-family: Georgia, "New Century Schoolbook", Palatino, Verdana, Arial, Helvetica; font-size: 0.9em; } /* This is so called 3rd heading */ p.column7 { font-style: italic; font-weight: bold; } \@media print { P.column7 { font-style: italic; font-weight: bold; }} p.column8 { } p.column9 { font-weight: bold; } p.column10 { padding-top: 0; } em.quote10 { /* #FF00FF Fuchsia; #0000FF Blue #87C0FF casual blue #87CAF0 #A0FFFF Very light blue #809F69 = Forest Green , see /usr/lib/X11/rgb.txt background-color: color: #80871F ; Orange, short of # font-family: "Gill Sans", sans-serif; line-height: 0.9em; font-style: italic; font-size: 0.8em; line-height: 0.9em; color: #008080; background-color: #F5F5F5; #809F69; forest green #F5F5F5; Pale grey #FFf098; pale green ##bfefff; #ffefff; LightBlue1 background-color: #ffefff; ................. #FFFCE7 Orange very light #FFE7BF Orange dark #FFFFBF Orange limon */ /* # See a nice page at # http://www.cs.helsinki.fi/linux/ # http://www.cs.helsinki.fi/include/tktl.css # # 3-4 of these first fonts have almost identical look # Browser will pick the one that is supported */ font-family: lucida, lucida sans unicode, verdana, arial, "Trebuchet MS", helvetica, sans-serif; background-color: #eeeeff; font-size: 0.8em; } \@media print { em.quote10 { font-style: italic; line-height: 0.9em; font-size: 0.8em; }} p.column11 { font-family: arial, verdana, helvetica, sans-serif; font-size: 0.9em; font-style: italic; color: Fuchsia; } /* /////////////////////////////////////////////// Format words */ em.word { /* #809F69 Forest green */ color: #80B06A; /*Darker Forest green */ } strong.word { } samp.word { color: #4C9CD4; font-weight: bold; font-family: "Courier New", monospace; font-size: 0.85em; } span.super { /* superscripts */ color: teal; vertical-align: super; font-family: Verdana, Arial, sans-serif; font-size: 0.8em; } span.sub { /* subscripts */ color: teal; vertical-align: sub; font-family: Verdana, Arial, sans-serif; font-size: 0.8em; } span.word-ref { color: teal; } span.word-big { color: teal; font-size: 1.2em; } span.word-small { color: #CC66FF; font-family: Verdana, Arial, sans-serif; font-size: 0.7em; } /* /////////////////////////////////////////////// Format other */ /* 7th column starting with double quote */ span.quote7 { /* color: Green; */ /* font-style: italic; */ font-family: Verdana; font-weigh: bold; font-size: 1em; } /* This appears in FRAME version: xxx-toc.html */ div.toc { font-size: 0.8em; } /* This appears in picture: the acption text beneath */ div.picture { font-style: italic; } /* This is the document info footer */ em.footer { font-size: 0.9em; } ); # end of double quote qq(); } # **************************************************************************** # # DESCRIPTION # # Return CSS Style sheet and Java Script data. # # USES GLOBAL # # JAVA_CODE See options. # # INPUT VALUES # # $type What page we're creating? eg: "toc" # # RETURN VALUES # # $html # # **************************************************************************** sub JavaScript (; $) { my $id = "$LIB.JavaScript"; my ($type)= @ARG; if (defined $JAVA_CODE) { $JAVA_CODE; } else { my $css = CssData $type; $css =~ s/[ \t]+$//gm; # won't work in Browsers.... # ); # end of qq() } } # **************************************************************************** # # DESCRIPTION # # Return Basic html start: doctype, head, body-start # # INPUT PARAMETERS # # $title # $baseFile [optional] The html filename at $BASE_URL # $attrib [optional] Attitional attributes # $rest [optional] Rest HTML before # # USES GLOBAL # # $BASE_URL # # RETURN VALUES # # $html # # **************************************************************************** sub HtmlStartBasic (%) { # [HTML 4.0/12.4] When present, the BASE element must appear in the # HEAD section of an HTML document, before any element that refers to # an external source. The path information specified by the BASE # element only affects URIs in the document # where the element appears. my $id = "$LIB.HtmlStartBasic"; my %arg = @ARG; my $title = $arg{-title} || '' ; my $baseFile = $arg{-file} || '' ; my $attrib = $arg{-attrib} || '' ; my $rest = $arg{-html} || '' ; $debug and print "$id: INPUT title [$title] baseFile [$baseFile] " , "attrib [$attrib] rest [$rest]\n"; my $ret = HereQuote <<"........EOF"; $HTML_HASH{doctype} $HTML_HASH{beg} $title ........EOF $ret .= join '' , JavaScript() , Base($baseFile, $attrib) , $rest , "\n\n\n" ; $ret; } # **************************************************************************** # # DESCRIPTION # # Create html tag # # Advanced net browsers can use the included LINK tags. # http://www.htmlhelp.com/reference/wilbur/ # # REL="home": indicates the location of the homepage, or # starting page in this site. # # REL="next" # # Indicates the location of the next document in a series, # relative to the current document. # # REL="previous" # # Indicates the location of the previous document in a series, # relative to the current document. # # NOTES # # Note, 1997-10, you should not use this function because # # a) netscape 3.0 doesn't obey LINK HREF # b) If you supply LINK and normal HREF; then lynx would show both # which is not a good thing. # # Let's just conclude,t that LINK tag is not yet handled right # in browsers. # # INPUT PARAMETERS # # $type the value of REL # $url the value for HREF # $title [optional] An advisory title for the linked resource. # # RETURN VALUES # # $string html string # # ************************************************************************** sub MakeLinkHtml ($$$) { my $id = "$LIB.MakeLinkHtml"; my($type, $url , $title) = @ARG; $title = $title || qq(TITLE="$title"); qq(\n); } # **************************************************************************** # # DESCRIPTION # # Wrap text inkside comment # # INPUT PARAMETERS # # $text Text to be put inside comment block # # RETURN VALUES # # $string Html codes # # **************************************************************************** sub MakeComment ($) { my $id = "$LIB.MakeComment"; my $txt = shift; join '' , "\n\n\n\n" ; } # **************************************************************************** # # DESCRIPTION # # Create Table of contents jump table to the html page # # INPUT PARAMETERS # # \@headingArrayRef All heading in the text: 'heading', 'heading' .. # \%headingHashRef 'heading' -- 'NAME(html)' pairs # $doc [optional] Url address pointing to the document # $frame [optional] Aadd frame codes. # $file [optional] Needed if frame is given. # $author [optional] # $email [optional] # # RETURN VALUES # # @array Html codes for TOC # # **************************************************************************** sub MakeToc (%) { my $id = "$LIB.MakeToc"; my %arg = @ARG; my $headingArrayRef = $arg{-headingListRef}; my $headingHashRef = $arg{-headingHashRef}; my $doc = $arg{-doc}; my $frame = $arg{-frame}; my $file = $arg{-file}; my $author = $arg{-author}; my $email = $arg{-email}; local $ARG; my($txt, $li, $ul , $refname); my(@ret, $ref); my($styleb, $stylee, $spc, $str) = ("") x 4; my $br = $HTML_HASH{br}; my $frameFrm = basename FileFrameNameMain(); my $frameToc = basename FileFrameNameToc(); my $frameTxt = basename FileFrameNameBody(); if ($debug and $frame) { warn "$id: arg_dir $ARG_DIR $frameFrm, $frameToc, $frameTxt\n"; } if (0) # disabled now { $styleb = ""; $stylee = ""; } # ........................................................ start ... if ($frame) { push @ret, <<"........EOF"; $HTML_HASH{doctype} $HTML_HASH{beg} Navigation ........EOF push @ret, , MakeMetaTags( -author => $author, -email => $email) , qq(\n \n) , JavaScript( "toc") ; push @ret, Here <<"........EOF";
........EOF # ......................................... write frame file ... my @frame; my $head = HtmlStartBasic -title => $TITLE , -file => undef , -attrib => qq(TARGET="body") , -html => join '', MakeMetaTags(-author => $author, -email => $email) ; # push @frame, $head; # Set default value my $frameSize = qq(cols="25%,75%") if $frame !~ /=/; my $attributes = qq(frameborder="0"); # Attributes push @frame, <<"........EOF"; $HTML_HASH{beg} ........EOF WriteFile $ARG_DIR . $frameFrm, \@frame; } else { $doc = ""; my $toc = Language -toc; push @ret , MakeComment "TABLE OF CONTENT START"; push @ret, <<"........EOF";

$toc $doc

........EOF } # .................................................. print items ... $ul = 0; $frame = basename FileFrameNameBody() if $frame; for (@$headingArrayRef) { $refname = $$headingHashRef{ $ARG }; # print "\n" if not /^\s+/; $spc = ""; $spc = $1 if /^(\s+)/; $txt = $1 if /^\s*(.*)\s*$/; $li = $str = ""; if (/^ +[A-Z0-9]/) { $str = "\n
    \n" if $ul == 0; $li = "\t
  • "; $ul++; } else { $str = "
\n" if $ul != 0; $ul = 0; } $ref = "#${refname}"; $ref = $frame . $ref if defined $frame; $str .= HereQuote <<"........EOF"; $spc$styleb $li $txt $stylee$br ........EOF push @ret, $str; } # The closing table element. push @ret, "\n\n"; # .......................................................... end ... if($frame) { push @ret, Here <<"........EOF";
........EOF } else { push @ret , "
\n" , MakeComment "TABLE OF CONTENT END" ; } $debug and PrintArray "$id", \@ret; @ret; } # }}} # {{{ URL Link # **************************************************************************** # # DESCRIPTION # # Link cache actions. Read, Write or check against the cache. # # INPUT PARAMETERS # # -action This can be -read, -write, -exist or -add. # Action -read is special: it enables the cache # immediately. Otherwise if -read has not been called # all the other actions are no-op. # # If argument is -write, the -arg is ignored, because a # write file request is only action. # # -arg [optional] Parameter for actions. # -code [optional] HTTP code to acctach with the URL (-arg). # Used with -add option. # # # RETURN VALUES # # If action is -check, then URL link is checked # against the cache. A true value is returned if the # link is already there. # # If -read, then a true value indicates that the # cache file could be opened and read. # # **************************************************************************** { my $staticActive = 0; my $staticFile; my %staticLinkCache; sub LinkCache (%) { my $id = "$LIB.LinkCache"; my %arg = @ARG; local $ARG = $arg{-action} || "" ; my $arg = $arg{-arg} || "" ; my $code = $arg{-code} || 200; my $ret = 1; if ($debug > 1) { print "$id: action [$ARG] arg [$arg] " , "act [$staticActive] code [$code]\n"; } if (/-read/) { $staticActive = 1; # start using cache $staticFile = $arg; local *FILE; # It is not an serious error if we can't open the cache. # This means, that user has deleted cache file and forcing # a full scan of every link. unless (open FILE, "<", $arg) { $verb > 1 and warn "$id: Cannot open $arg $ERRNO"; $ret = 0; } else { $verb and print "$id: reading [$arg]\n"; while () { # Filter out empty lines and extra spaces s/^\s+//; s/\s+$//; $staticLinkCache{ $ARG } = $HTTP_CODE_OK if $ARG; $debug > 2 and print "$id: -read => $ARG\n"; } close FILE; } } elsif ($staticActive and /-write/) { $arg = $staticFile; # Same as used in open $verb and print "$id: writing [$arg]\n"; my $stat = open my $FILE, ">", $arg; unless ($stat) { not $QUIET and warn "$id: Cannot write $arg $ERRNO"; $ret = 0; } else { binmode $FILE; # PrintHash "$id", \%staticLinkCache; while (my($url, $ccode) = each %staticLinkCache) { if ($ccode != $HTTP_CODE_OK) { $debug > 2 and print "$id: Ignored $url $ccode\n"; next; } $debug > 2 and print "$id: write => $url\n"; if ($url) { print $FILE $url, "\n"; } } close $FILE; } } elsif (/-add/) { $staticLinkCache{ $arg } = $code; $ret = 1; $debug > 1 and print "$id: added ok\n"; } elsif (/-exist/) { $ret = exists $staticLinkCache{$arg} ? $staticLinkCache{$arg} : 0; $verb > 1 and print "$id: exist status [$ret]\n"; } elsif ($staticActive) { die "$id: Unknown action [$ARG] arg [$arg]"; } $ret; }} # *************************************************************** &link ****** # # DESCRIPTION # # Update status code in link hash # # INPUT PARAMETERS # # $url string containing the link or pure URL link # # RETURN VALUES # # Global %LINK_HASH is updated too with key 'link' -- 'response' # # **************************************************************************** sub LinkHash (%) { my $id = "$LIB.LinkHash"; my %arg = @ARG; my $url = $arg{-url}; my $error = $arg{-error}; my $text = $arg{-text}; $LINK_HASH{ $url } = $error; # There is new error code, record it. if (not defined $LINK_HASH_CODE{$error}) { $LINK_HASH_CODE{ $error } = $text; } } # **************************************************************************** # # DESCRIPTION # # Check if link is valid # # INPUT PARAMETERS # # $str string containing URL # # RETURN VALUES # # $nbr Error code. # $txt Error text # # **************************************************************************** sub LinkCheckLwp ($) { my $id = "$LIB.LinkCheckLwp"; my ($url) = @ARG; $debug and print "$id: processing... $url\n"; my $code = LinkCache -action => '-exist', -arg => $url; if ($code == $HTTP_CODE_OK) { # Found from cache. Last check gave OK to this link $debug > 1 and print "$id: Return; cached value $code $url\n"; return $code, "local-cache"; } # Note: 'HEAD' request doesn't actually download the # whole document. 'GET' would. # # Code 200 is "OK" response my $ua = new LWP::UserAgent; my $request = new HTTP::Request('HEAD', $url); my $obj = $ua->request($request); my $ok = $obj->is_success; my $status = $ok; my $txt = $obj->message; $debug and printf "$id: HEAD response [$ok] code [%d] msg [%s]\n" , $obj->code , $obj->message ; LinkCache -action => '-add' , -arg => $url , -code => $obj->code ; # GET request is disabled because it would call 2 time on # fialure. Trust HEAD all the way. unless (0 and $status != $HTTP_CODE_OK) { # Hm, # HEAD is not the total answer because there are still servers # that do not understand it. If the HEAD fails, revert to GET. HEAD # can only tell you that a URL has something behind it. It can't tell # you that it doesn't, necessarily. my $ua2 = new LWP::UserAgent; my $request2 = new HTTP::Request('GET', $url); my $obj2 = $ua2->request($request2); $status = $obj2->code; $txt = $obj2->message; $debug and printf "$id: GET response [$ok] code [%d] [%s]\n" , $obj2->code , $txt ; } unless ($status != $HTTP_CODE_OK) { LinkHash -url => $url, -error => $status, -txt => $txt; } $status, $txt; } # **************************************************************************** # # DESCRIPTION # # Check if link is valid # # INPUT PARAMETERS # # $str string containing the link or pure URL link # # RETURN VALUES # # nbr Error code. # Global %LINK_HASH is updated too with key 'link' -- 'response' # # **************************************************************************** sub LinkCheckExternal (%) { my $id = "$LIB.LinkCheckExternal"; my %arg = @ARG; my $url = $arg{-url}; $debug and print "$id: Checking... $url\n"; my $regexp = 'example\.(com|org|net|info|biz)' . '|http://(localhost|127\.(0.0.)?1' . '|foo|bar|baz|quuz)\.' ; my($ret, $txt) = (0, ""); if ($url =~ /$regexp/o) { $verb and print "$id: Link [$url] excluded by regexp [$regexp]\n"; } elsif ($MODULE_LWP_OK) { ($ret, $txt) = LinkCheckLwp $url; } $debug and warn "$id: RET [$ret] URL [$url] TEXT [$txt]\n"; $ret, $txt; } # **************************************************************************** # # DESCRIPTION # # convert html into ascii by just stripping anything between # < and > written 1996-04-21 by Michael Smith for WebGlimpse # # INPUT PARAMETERS # # \@arrayRef text lines # # RETURN VALUES # # @ # # **************************************************************************** sub Html2txt ($) { my $id = "$LIB.Html2txt"; my $arrayRef = shift; unless (defined $arrayRef) { warn "$id: [ERROR] \$arrayRef is not defined"; return; } my (@ret, $carry, $comment); for (@$arrayRef) { if (0) # enable/disable comment stripping { $comment = 1 if //; $comment = 0 if /-->/; next if $comment; } if ($carry) { # remove all until the first > next if not s/[^>]*>// ; # if we didn't do next, it succeeded -- reset carry $carry = 0; } while(s/<[^>]*>//g) { } if(s/<.*$//) { $carry = 1; } $ARG = XlatHtml2tag $ARG; push @ret, $ARG; } $debug and print "$id: RET => [[[@ret]]]\n"; @ret; } # **************************************************************************** # # DESCRIPTION # # Read external links. # http://search.cpan.org/author/PODMASTER/HTML-LinkExtractor-0.07/LinkExtractor.pm # INPUT PARAMETERS # # %arg Options # # RETURN VALUES # # % all found links 'line nbr' => link # # # **************************************************************************** sub ReadLinksLinkExtractor (%) { my $id = "$LIB.ReadLinksLinkExtractor"; my %arg = @ARG ; my $file = $arg{-file}; # also URL my $arrayRef = $arg{-array}; unless (defined $arrayRef) { warn "$id: [ERROR] \$arrayRef is not defined"; return; } local $ARG = join '', @$arrayRef; my (@list, $base); $base = $file if $file =~ m,http://,i; local *callback = sub { my($tag, %links) = @ARG; # Only look at "A" HREF links if ($tag eq "a" ) { while (my($key, $ref) = each %links) { # Reference to URI::URL object my $url = $ref->as_string(); push @list, $url; } } }; my $parser = HTML::LinkExtractor->new(\&callback, $base); # $debug > 2 and print "$id: Calling parse() => $ARG"; $parser->parse($ARG); # Add fake line numbers, we can't get those from LinkExtractor my %ret; my $i = 1; for my $link (@list) { $ret{$i++} = $link; } %ret; } # **************************************************************************** # # DESCRIPTION # # Read external links. Any link that is started with (-) is skipped, like # -http://skip.this.net/ # # INPUT PARAMETERS (hash) # # -array \@array, list of lines. # -file local file name or remote URL. # # RETURN VALUES # # % all found links in format NN=countXX => link, where # NN is the line number and XX is the the Nth link in the same # line. # # **************************************************************************** sub ReadLinksBasic (%) { my $id = "$LIB.ReadLinksBasic"; my %arg = @ARG ; my $file = $arg{-file}; my $arrayRef = $arg{-array}; unless (defined $arrayRef) { warn "$id: [ERROR] \$arrayRef is not defined"; return; } local $ARG = join '', @$arrayRef; # Make on big line my ($url, %ret, $char, $link, $tmp); # ftp links cannot be checked like HTTP links. It's too slow. # Allow http://site:PORT/page.html my $base = ''; my $root = ''; if ($file =~ m,^\s*(http://[^/\s]+),) { $base = $1 . '/'; # Add trailing slash $root = $base; $debug and print "$id: ROOT $root BASE $base\n"; } my $tag = '<\s*(?:A\s+HREF|IMG\s+SRC|LINK[^<>=]+HREF)\s*'; my $urlset = '[^][<>\"\s]'; my $lastch = '[^][(){}<>\"\':;.,\s]'; my $quote = '[\"\']'; while ( m { (.?) ( # http://URL:PORT http://[-A-Za-z.\d]+(?::\d+)? # the directory part is optional # Start with X ... until X is the last character $urlset*$lastch | $tag=\s*$quote?[^<>\"'\s]+ # (') Dummy comment to fix Emacs font loack for # quotation mark from previous line ) }gmoix ) { $char = $1; $link = $2; $tmp = $PREMATCH; $debug > 4 and print "$id: raw link [$char] [$link]\n"; # Fix mismatches http://example.org/links.html> # only GET parameters can have '?': this.php?arg=1&more=2 if ($link !~ /[?]/ and $link =~ /^(.+)&/) { $link = $1; $debug > 4 and print "$id: fixed link [$link]\n"; } if ($link =~ /mailto:/) { $link = ''; } if ($link =~ m,(?:HREF|SRC)\s*=\s*$quote?(.+),oi) { # (') Dummy comment to fix Emacs font lock quotation mark # from previous line $link = $1; $debug > 2 and print "$id: LINK $link\n"; # Not an external http:// reference, so it's local link if ($base and $link !~ m,//,) { my $glue = $base; $link =~ m,^/, and $glue = $root; $link = "$glue$link"; } } $link =~ s/\s+$//; $debug > 2 and print "$id: AFTER $link\n"; if ($char eq '-') # Ignore -http://this.is/example.html { not $QUIET and warn "$id: ignored MINUS url: $ARG\n"; next; } # Do not check the "tar.gz" links. or "url?args" cgi calls if ($link =~ m,\.(gz|tgz|Z|bz2|rar)$|\?,) { not $QUIET and warn "$id: ignored complex url: $ARG\n"; next if m,\?,; # forget cgi programs # but try to verify at least directory $link =~ s,(.*/),$1,; } if ($link ne '') { # What is the line number so far before match? my $i = 0; $i++ while ($tmp =~ /\n/g); # There can be many links at the the same line. # Like if page is generated with a tool, which outputs whole # page as single line. my $count = 0; my $name; while (exists $ret{ $name = sprintf "$i=count%03d", $count }) { $count++; } $debug and print "$id: ADDED $id $link\n"; $ret{ $name } = $link ; } } if ($verb > 1 and not keys %ret) { print "$id: WARNING No links found\n"; } %ret; } # **************************************************************************** # # DESCRIPTION # # Read external links. Any link that is started with (-) is skipped, like # -http://skip.this.net/ # # INPUT PARAMETERS # # -array \@lines, content of web page. # -file local file name or remote URL # # RETURN VALUES # # % all found links # # **************************************************************************** sub ReadLinksMain (%) { my $id = "$LIB.ReadLinks"; my %arg = @ARG ; if ($debug) { print "$id: file => " , $arg{-file}; $debug > 6 and print " content => CONTENT_START\n" , @{ $arg{-array} } , "\n$id: CONTENT_END" ; print "\n"; } $MODULE_LINKEXTRACTOR_OK = 0; #todo: 0.07 does not work $verb and print "$id: Parsing links\n"; my %hash; if ($MODULE_LINKEXTRACTOR_OK) { %hash = ReadLinksLinkExtractor %arg; } else { %hash = ReadLinksBasic %arg; } $debug > 4 and PrintHash $id, \%hash; %hash; } # **************************************************************************** # # DESCRIPTION # # Check all links in a file # # INPUT PARAMETERS (hash) # # -file local disk filename or remote url. # -array \@lines, content of the file # -cache Enable Link cache # -oneline [Not used] # # RETURN VALUES # # none # # **************************************************************************** sub LinkCheckMain (%) { my $id = "$LIB.LinkCheck"; my %arg = @ARG ; my $file = $arg{-file}; my $arrayRef = $arg{-array}; my $oneLine = $arg{-oneline}; if (not defined $arrayRef or not @$arrayRef) { warn "$id: WARNING [$file] is empty\n"; return; } my %link = ReadLinksMain -array => $arrayRef , -file => $file; $debug and PrintHash "$id: LINKS", \%link; $verb and print "$id: Validating links.\n"; local $ARG; for (sort {$a <=> $b} keys %link) { my ($i) = $ARG =~ /^(\d+)/; my $lnk = $link{ $ARG }; my($status, $err) = LinkCheckExternal -url => $lnk; not $QUIET and print "$file:$i:$lnk"; my $text = ""; if ($err and $LINK_CHECK_ERR_TEXT_ONE_LINE) { ($text = $err) =~ s/\n/./; } if (not $QUIET) { print " $status $text\n"; # this print() is continuation... } elsif ($status != 0 and $status != $HTTP_CODE_OK) { printf "$file:$i:%-4d $lnk $text\n", $status; } } } # }}} # {{{ Is, testing # **************************************************************** &test ***** # # DESCRIPTION # # Check if TEXT contains no data. Empty, only whitespaces # or "none" word is considered empty text. # # INPUT PARAMETERS # # $text string # # RETURN VALUES # # 0,1 # # **************************************************************************** sub IsEmptyText ($) { my $id = "$LIB.IsEmptyText"; my $text = shift; if (not defined $text or $text eq '' or $text =~ /^\s+$|[Nn][Oo][Nn][Ee]$/ ) { return 1; } 0; } # **************************************************************** &test ***** # # DESCRIPTION # # If LINE is heading, return level of header. # Heading starts at column 0 or 4 and the first leffter must be capital. # # INPUT PARAMETERS # # $line # # RETURN VALUES # # 1..2 Level of heading # 0 Was not a heading # # **************************************************************************** sub IsHeading ($) { my $id = "$LIB.IsHeading"; my $line = shift; my $ret = 0; $ret = 1 if $line =~ /^([\d.]+ )?[[:upper:]]/; $ret = 2 if $line =~ /^ {4}([\d.]+ )?[[:upper:]]/; $debug > 2 and warn "$id: [$line] RET $ret"; $ret; } # **************************************************************** &test ***** # # DESCRIPTION # # If LINE is bullet, return type of bullet # # INPUT PARAMETERS # # $line line # $textRef [returned] the bullet text # # RETURN VALUES # # $BULLET_TYPE_NUMBERED constants # $Bulletnormal # # **************************************************************************** sub IsBullet ($$) { my $id = "$LIB.IsBullet"; my($line, $textRef) = @ARG; my $type = 0; # Bullet can starters: # # . Numbered list # . Numbered list # # o Regular bullet # o Regular bullet # # * Regular bullet # * Regular bullet if ($line =~ /^ {8}([*o.]) {3}(.+)/) { $$textRef = $2; # fill return value if ($1 eq "o" or $1 eq "*") { $debug and warn "$id: BULLET_TYPE_NORMAL >>$2\n"; $type = $BULLET_TYPE_NORMAL; } elsif ($1 eq ".") { $debug and warn "$id: BULLET_TYPE_NUMBERED >>$2\n"; $type = $BULLET_TYPE_NUMBERED; } } $type; } # }}} # {{{ start, end # **************************************************************************** # # DESCRIPTION # # Return HTML string containing meta tags. # # INPUT PARAMETERS # # $author # $email # $kwd [optional] # $desc [optional] # # RETURN VALUES # # @html # # **************************************************************************** sub MakeMetaTags (%) { my $id = "$LIB.MakeMetaTags"; my %arg = @ARG; my $author = $arg{-author} || '' ; my $email = $arg{-email} || '' ; my $kwd = $arg{-keywords} || '' ; my $desc = $arg{-description}|| '' ; # META tags provide "meta information" about the document. # # [wilbur] You can use either HTTP-EQUIV or NAME to name the # meta-information, but CONTENT must be used in both cases. By using # HTTP-EQUIV, a server should use the name indicated as a header, # with the specified CONTENT as its value. my @ret; my $META = "meta http-equiv"; my $METAN = "meta name"; # ............................................. meta information ... # META must be inside HEAD block push @ret, MakeComment "META TAGS (FOR SEARCH ENGINES)"; if ($kwd =~ /\S+/ and $kwd !~ /^\S+$/) { # "keywords" [according to Wilbur] # Provides keywords for search engines such as Infoseek or Alta # Vista. These are added to the keywords found in the document # itself. If you insert a keyword more than seven times here, # the whole tag will be ignored! if ($kwd !~ /,/) { $kwd = join "," , split ' ', $kwd; warn "$id: META KEYWORDS must have commas (fixed): ", " [$kwd]"; } push @ret, qq( <$META="keywords"\n\tCONTENT="$kwd">\n\n); } if ($desc = /\S/) { length($desc) > 1000 and warn "$id: META DESC over 1000 characters"; push @ret, qq( <$META="description"\n\tcontent="$desc">\n\n); } # ................................................. general meta ... my $charset = qq(<$META="Content-Type" content="text/html; charset=utf-8">\n); push @ret, $charset; push @ret, qq( <$META="Expires" ) . qq(content=") . GetExpiryDate() . qq(">\n\n) ; if (defined $author and $author) { $author = qq( <$META="Author"\n\tcontent="$author">\n\n); } if (defined $email and $email) { $email = qq( <$META="Made"\n\tcontent="mailto:$email">\n\n); } my $gen = qq( <$METAN="Generator"\n) . qq(\tcontent=") #font " . GetDate() . qq( Perl program $PROGNAME v$VERSION $URL) . qq(">\n) #font " ; push @ret, "$author\n", "$email\n", "$gen\n"; @ret; } # **************************************************************************** # # DESCRIPTION # # Print start of html document # # INPUT PARAMETERS # # $doc # $author Author of the document # $title Title of the document, appears in Browser Frame # $base URL to this localtion of the document. # $butt Url Button to point to "Top" # $butp Url Button to point to "Previous" # $butn Url Button to point to "next" # $metaDesc [optional] # $metaKeywords [optional] # $bodyAttr [optional] Attributes to attach to BODY tag, # e.g. when value would be "LANG=en". # $email [optional] # # RETURN VALUES # # @ list of html lines # # **************************************************************************** sub PrintStart (%) { my $id = "$LIB.PrintStart"; my %arg = @ARG; my $doc = $arg{-doc} || ''; my $author = $arg{-author} || ''; my $title = $arg{-title} || ''; my $base = $arg{-base} || ''; my $butt = $arg{-butt} || ''; my $butn = $arg{-butn} || ''; my $butp = $arg{-butp} || ''; my $metaDesc = $arg{-metaDescription} || ''; my $metaKeywords= $arg{-metaKeywords} || ''; my $bodyAttr = $arg{-bodyAttr} || ''; my $email = $arg{-email} || ''; $debug and warn << "EOF"; $id: INPUT my \%arg = @ARG; my doc = $arg{-doc} my author = $arg{-author} my title = $arg{-title} my base = $arg{-base} my butt = $arg{-butt} my butn = $arg{-butn} my butp = $arg{-butp} my metaDesc = $arg{-metaDescription} my metaKeywords= $arg{-metaKeywords} my bodyAttr = $arg{-bodyAttr} my email = $arg{-email} EOF my($str , $tmp) = ( "", ""); my @ret = (); my $link = 0; # Flag; Do we add LINK AHREF ? my $tab = " "; $title = "No title" if $title eq ''; # ................................................ start of html ... # 1998-08 Note: Microsoft Internet Explorer can't show the html page # if the comment ' ........EOF # ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. push ... $base = Base(basename FileFrameName ""); $base = Base(basename FileFrameNameBody()) if $FRAME; push @ret, HereQuote <<"........EOF"; $title $base ........EOF push @ret, MakeMetaTags( -author => $author , -email => $email , -keywords => $metaKeywords , -description => $metaDesc ); # ....................................................... button ... my $attr; # [wc3 html 4.0 / 6.16 Frame target names] # _top # The user agent should load the document into the full, original window # (thus cancelling all other frames). This value is equivalent to _self # if the current frame has no parent. $attr = qq(target="_top" class="btn"); push @ret, MakeComment "BUTTON DEFINITION START"; unless (IsEmptyText $butp) { $tmp = "Previous document"; $link and push @ret, $tab , MakeLinkHtml("previous","$butp", $tmp); push @ret , $tab , MakeUrlRef( $butp, "[Previous]", $attr) , "\n"; } unless (IsEmptyText $butt) { $tmp = "The homepage of site"; $link and push @ret, $tab , MakeLinkHtml("home","$butt", $tmp); push @ret , $tab , MakeUrlRef( $butt, "[home]", $attr) , "\n"; } unless (IsEmptyText $butn) { $tmp = "Next document"; $link and push @ret, $tab . MakeLinkHtml("next","$butt", $tmp); push @ret , $tab , MakeUrlRef( $butn, "[Next]", $attr) , "\n"; } push @ret , JavaScript() , "\n\n" , "\n"; $debug and PrintArray "$id", \@ret; @ret; } # **************************************************************************** # # DESCRIPTION # # Print end of html (quiet) # # INPUT PARAMETERS # # none # # RETURN VALUES # # $html # # **************************************************************************** sub PrintEndQuiet () { my $id = "$LIB.PrintEndQuiet"; $debug and print "$id\n"; join '' , MakeComment "DOCUMENT END BLOCK" , "\n" , "\n" , "\n" ; } # **************************************************************************** # # DESCRIPTION # # Print end of html (simple) # # INPUT PARAMETERS # # $doc The document filename, defaults to "document" if empty # # RETURN VALUES # # $html # # **************************************************************************** sub PrintEndSimple ($;$) { my $id = "$LIB.PrintEndSimple"; my ($doc, $email) = @ARG; $debug and print "$id: doc [$doc] [$email]\n"; my $date = GetDate(); if (defined $OPT_EMAIL and $OPT_EMAIL) { $email = qq(Contact: <) . qq($email>$HTML_HASH{br}\n) } join '' , MakeComment "DOCUMENT END BLOCK" , "\n" , "$HTML_HASH{hr}\n\n" , qq() , $email , qq(Html date: $date$HTML_HASH{br}\n) , "\n" , "\n\n" , "\n" , "\n" ; } # **************************************************************************** # # DESCRIPTION # # Print end of html # # INPUT PARAMETERS # # $doc The document filename, defaults to "document" if empty # $author Author of the document # $url Url location of the file # $file [optional] The disclaimer text file # $email Email contact address. Without <> # # RETURN VALUES # # none # # **************************************************************************** sub PrintEnd (%) { my $id = "$LIB.PrintEnd"; my %arg = @ARG; my $doc = $arg{-doc} || "document" ; my $author = $arg{-author} || ""; my $url = $arg{-url} || ""; my $file = $arg{-file}; my $email = $arg{-email} || ""; $debug and warn << "EOF"; $id: INPUT \%arg = @ARG; doc = $arg{-doc} author = $arg{-author} url = $arg{-url} file = $arg{-file}; email = $arg{-email} EOF my(@ret, $str); my $date = GetDate(); my $year = GetDateYear(); my ($br, $hr, $pbeg, $pend) = @HTML_HASH{qw(br hr pbeg pend)}; # ................................................... disclaimer ... # Set default value first # #todo: Change license my $disc = Here <<"........EOF"; $pbeg Copyright © $year by $author. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/ $pend ........EOF if (defined $file) # Read the disclaimer from separate file. { local *F; open F, $file or die "$id: Can't open [$file] $ERRNO"; binmode F; $disc = join '', ; close F; } # ....................................................... footer ... push @ret, MakeComment "DOCUMENT END BLOCK"; $author ne '' and $author = qq(Document author: $author$br); $url ne '' and $url = qq(Url: $url$br); $email ne '' and $email = qq(Contact: <) . qq($email>$br); $author eq '' and $disc = ''; push @ret, Here <<"........EOF"; $hr $disc This file has been automatically generated from plain text file with $PROGNAME $br $author $url $email Last updated: $date$br ........EOF # ................................................. return value ... @ret; } # **************************************************************************** # # DESCRIPTION # # Print whole generated html body with header and footer. # # INPUT PARAMETERS # # The Global variables that have been defined at the start # are used here # # $arrayRef Content of the body already in html # $lines # $file # $type # # RETURN VALUES # # \@ Whole html # # **************************************************************************** sub PrintHtmlDoc (%) { my $id = "$LIB.PrintHtmlDoc"; my %arg = @ARG; my $arrayRef = $arg{-arrayRef}; my $lines = $arg{-lines}; my $file = $arg{-file}; my $type = $arg{-type}; my $title = $arg{-title}; my $author = $arg{-author}; my $email = $arg{-email}; my $doc = $arg{-doc}; my $keywords = $arg{-metakeywords}; my $description = $arg{-metadescription}; $debug and warn << "EOF"; $id: INPUT \%arg = @ARG; arrayRef = $arg{-arrayRef}; lines = $arg{-lines}; file = $arg{-file}; type = $arg{-type}; title = $arg{-title}; author = $arg{-author}; email = $arg{-email}; doc = $arg{-doc}; keywords = $arg{-metakeywords}; description = $arg{-metadescription}; EOF my $str; my $base = $BASE; # With filename (single file) $base = $BASE_URL if $FRAME; # directory my @ret = PrintStart -doc => $doc , -author => $author , -title => $title , -base => $base , -butt => $BUT_TOP , -butp => $BUT_PREV , -butn => $BUT_NEXT , -metaDesc => $description , -metaKeywords => $keywords , -bodyAttr => $HTML_BODY_ATTRIBUTES , -email => $email ; unless ($AS_IS) { my @toc = MakeToc -headingListRef => \@HEADING_ARRAY , -headingHashRef => \%HEADING_HASH , -doc => $DOC , -frame => $FRAME , -file => $file , -author => $AUTHOR , -email => $OPT_EMAIL ; if ($FRAME) { WriteFile FileFrameNameToc(), \@toc; } else { push @ret, @toc; } } push @ret, @$arrayRef if defined $arrayRef; $debug and print "$id: output type [$type]\n"; if ($type eq $OUTPUT_TYPE_SIMPLE) { push @ret, PrintEndSimple $DOC, $OPT_EMAIL; } elsif ($type eq $OUTPUT_TYPE_QUIET) { push @ret, PrintEndQuiet(); } else { push @ret, PrintEnd -doc => $DOC , -author => $AUTHOR, , -url => $DOC_URL , -file => $DISCLAIMER_FILE , -email => $OPT_EMAIL ; } \@ret; } # }}} # {{{ misc # **************************************************************************** # # DESCRIPTION # # Delete section "Table of contents" from text file # # INPUT PARAMETERS # # \@arrayRef whole text # # RETURN VALUES # # @ modified text # # **************************************************************************** sub KillToc ($) { my $id = "$LIB.KillToc"; my $arrayRef = shift; unless (defined $arrayRef) { warn "$id: [ERROR] \$arrayRef is not defined"; return; } my(@ret, $flag); for (@$arrayRef) { $flag = 1 if /^Table\s+of\s+contents\s*$/i; if ($flag) { # save next header next if /^Table/; if (/^[A-Z0-9]/) { $flag = 0; } else { next; } } push @ret, $ARG; } @ret; } # **************************************************************************** # # DESCRIPTION # # Read 4 first words and make heading name. Any numbering or # special marks are removed. The result is all lowercase. # # INPUT PARAMETERS # # $lien Heading string # # RETURN VALUES # # $ Abbreviated name. Suitable eg for #NAME tag. # # **************************************************************************** { # Static variables. Only used once to make constructiong regexp easier my $w = "[.\\w]+"; # A word. my $ws = "$w\\s+"; # A word and A space sub MakeHeadingName ($) { my $id = "$LIB.MakeHeadingName"; local ($ARG) = @ARG; $debug > 2 and print "$id: -1- $ARG\n"; s,ä,a,g; # 228 Finnish a s,Ä,A,g ; # 228 Finnish A s,ö,o,g; # 246 Finnish o s,Ö,O,g; # 246 Finnish O s,å,a,g; # 229 Swedish a s,Å,A,g; # 229 Swedish A s,ø,o,g; # 248 Norweigian o s,Ø,O,g; # 248 Norweigian O s,ü,u,g; # German u diaresis s,Ü,U,g; # German U diaresis s,ß,ss,g; # German ss s,&Szlig;,SS,g; # German SS # Remove unknown HTML tags like: © #255; s/[&][a-zA-Z]+;//g; s/#\d+;//g; # Remove punctuation s/[.,:;?!\'\"\`]/ /g; $debug > 2 and print "$id: -2- $ARG\n"; # Pick first 1-8 words for header name if ( /($ws$ws$ws$ws$ws$ws$ws$ws$w)/o or /($ws$ws$ws$ws$ws$ws$ws$w)/o or /($ws$ws$ws$ws$ws$ws$w)/o or /($ws$ws$ws$ws$ws$w)/o or /($ws$ws$ws$ws$w)/o or /($ws$ws$ws$w)/o or /($ws$ws$w)/o or /($ws$w)/o or /($w)/o ) { $ARG = $1; } $debug > 2 and print "$id: -3- $ARG\n"; s/^\s+//; s/\s+$//; # strip trailing spaces s/\s/_/g; s/__/_/g; $debug > 2 and print "$id: -4- $ARG\n"; lc $ARG; }} # **************************************************************************** # # DESCRIPTION # # After you have checked that line is header with IsHeading() # the line is sent to here. It reformats the lie and # # Contructs 1-5 first words to forn the TOC NAME reference # # SETS GLOBALS # # @HEADING_ARRAY 'heading', 'heading' ... # The headings as they appear in the text. # This is used as index when reading # HEADING_HASH in ordered manner. # # %HEADING_HASH 'heading' -- 'NAME(html)' # Original headings from text. This is ordered # as the heading apper in the text. # # USES STATIC VARIABLES (closures) # # %staticNameHash 'NAME(html)' -- 1 # We must index the hash in this order to find # out if we clash duplicate NAME later in text. # Remember, we only pick 1-5 unique words. # # $staticCounter Counts headings. This is used for NAME(html) # rteference name if NAME_UNIQ option has been # turned on. # # INPUT PARAMETERS # # $line string, header line # # $clear [optional] if sent, then clear all related values. # You should call with this parameter as a first invocation # to this function. The $line parameter is not used. # # RETURN VALUES # # none # # **************************************************************************** { my %staticNameHash; my $staticCounter; sub HeaderArrayUpdate ($; $) { my $id = "$LIB.HeaderArrayUpdate"; local $ARG = shift; my $clear = shift; $debug > 1 and warn "$id: INPUT line [$ARG] clear [$clear]\n"; if ($clear) { # Because this function "remembers" values, a NEW # file handling must first clear the hash. @HEADING_ARRAY = (); %HEADING_HASH = (); %staticNameHash = (); $staticCounter = 1; $debug > 2 and print "$id: ARRAYS CLEARED .\n"; return; } my $origHeading = $ARG; my $name = $ARG ; # the NAME html reference $debug > 2 and warn "$id: original: $ARG\n"; # When constructing names, the numbers may move, # So it is more logical to link to words only when making NAME ref. # # 11.0 Using lambda notation --> Using lambda notation s/^\s*[0-9][0-9.]*// if $FORGET_HEAD_NUMBERS; $debug > 2 and warn "$id: substitute A: $ARG\n"; # Kill characters that we do not want to see in NAME reference s/[-+,:!\"#%=?^{}()<>?!\\\/~*'|]//g; # dummy for font-lock ' $debug > 2 and warn "$id: substitute B: $ARG\n"; # Kill hyphens "Perl -- the extract language" # --> "Perl the extract language" s/\s+-+//g; s/-+\s+//g; $debug > 2 and warn "$id: substitute D: $ARG\n"; if ($NAME_UNIQ) # use numbers for AHREF name="" { $ARG = $staticCounter; } else { $ARG = MakeHeadingName $ARG; } # If MakeHeadingName() Did not get rid of all ä and other # special tokens, remove these characters now. s/[;&]//g; $debug > 2 and warn "$id: substitute E: $ARG\n"; # ........................................ check duplicate clash ... if (not defined $staticNameHash{$ARG}) # are 1-5 words unique? { $debug and warn "$id: Added $ARG\n"; $staticNameHash{ $ARG } = $origHeading; # add new } else { print "$id: $staticNameHash{$ARG}"; # current value PrintHash "$id: HEADING_HASH", \%HEADING_HASH, \*STDERR; warn Here <<"............EOF"; $id: LINE NOW : $origHeading ALREADY : $staticNameHash{ $ARG } CONVERSION: [$name] --> [$ARG] Cannot pick 1-8 words to construct HTML fragment identifier, because there already is an entry with the same name. Please rename all heading so that they do not have the same first 1-5 words. Alternatively you have to turn on option --name-uniq which forces using numbered NAME fragment identifiers instead of more descriptive id strings from headings. ............EOF die; } # ............................................... update globals ... $debug and warn "$id: $origHeading -- $ARG\n"; push @HEADING_ARRAY, $origHeading; $HEADING_HASH{ $origHeading } = $ARG; $staticCounter++; $ARG; }} # close sub and static block # **************************************************************************** # # DESCRIPTION # # Prepare Heading arrays for HTML. This fucntion should be called # first before doing any heading hathering. # # INPUT PARAMETERS # # None # # RETURN VALUES # # None # # **************************************************************************** sub HeaderArrayClear () { my $id = "$LIB.HeaderArrayClear"; HeaderArrayUpdate undef, -clear; } # **************************************************************************** # # DESCRIPTION # # Start a heading. Only headings 1 and 2 are supported. # # INPUT PARAMETERS # # $header the full header text # $hname the NAME reference for this header # $level heading level 1..x # # RETURN VALUES # # $ ready html text # # **************************************************************************** sub MakeHeadingHtml (%) { my $id = "$LIB.PrintHeader"; my %arg = @ARG; my $header = $arg{-header}; my $hname = $arg{-headerName}; my $level = $arg{-headerLevel}; $debug and print "$id INPUT header [$header] hname [$hname]", , "level [$level]\n"; my ($ret, $button) = ("", ""); $PRINT_NAME_REFS and warn "NAME REFERENCE: $hname\n"; if (not $AS_IS and not $FRAME) { my $attr = qq( class="btn-toc"); # It doesn't matter how the FONT is reduced, it # won't make the [toc] button any smaller inside the tag. # -- too bad -- if ($OPT_HEADING_TOP_BUTTON) { my $toc = Language -toc; $button = qq() . MakeUrlRef( "#toc", "[$toc]", $attr) . "" ; } if (0) # disabled { $button = MakeUrlRef ( "#toc", qq() . "[toc]" . "" , $attr ); } } $header =~ s/^\s+//; $header = XlatTag2htmlSpecial $header; if ($level == 1) { my $hr = $AS_IS ? "" : $HTML_HASH{hr}; $ret = HereQuote << "EOF"; $HTML_HASH{p_end} $hr

$header $button

EOF } elsif ($level > 1) { $ret = << "EOF"; $HTML_HASH{p_end}

$header $button

EOF } $ret; } # }}} # {{{ Do the line, txt --> html # **************************************************************************** # # DESCRIPTION # # Return HTML table. # # INPUT PARAMETERS # # $text Text to put inside table # $styleT Style for table # $styleTD Style for TDs # # RETURN VALUES # # html # # **************************************************************************** sub HtmlTable ($$$) { my $id = "$LIB.HtmlTable"; my ($text, $stylet, $styletd) = @ARG; return qq(
$text
); } # **************************************************************************** # # DESCRIPTION # # Return HTML code that is fixed. The basic DoLine() parser # is old and work line-by-line basis when it would have been better to # to work with multiple lines. # # After all HTML has been generated, program calls this function # to give finishing touch to those glitches that remained in the # HTML. # # No, this is not optimal solution, but the practical one. # # INPUT PARAMETERS # # \@html Final HTML # # RETURN VALUES # # \@html Fixed HTML # # **************************************************************************** sub HtmlFixes ($) { my $id = "$LIB.HtmlFixes"; my ($arrRef) = @ARG; unless (defined $arrRef) { warn "$id: [ERROR] \$arrRef is not defined"; return; } local $ARG = join '', @$arrRef; if (1) # Enabled { my $tag = '\S+'; # $CSS_CODE_STYLE_NOTE; # Search
 tags and change style to "shade-note"

	s{
	    # $1
	    (
		# $2
		
		\s+    
		# $3
		\s+    
\s+
[ ]*[\r\n]+
	    )
	    # $4, $5, $6
	    (\s*$tag)(.+?)(
) \s+ } { my $orig = $1; my $classT = $2; my $classTD = $3; my $tagWord = $4; my $text = $5; my $end = $6; my $tagWord2 = XlatHtml2tag $tagWord; # Fix > ==> ">" my $tagcss = $tagWord2 =~ /$CSS_CODE_STYLE_NOTE/o; $debug > 7 and print "$id: #STYLE-CSS [$CSS_CODE_STYLE_NOTE]" , " word [$tagWord] tagWord2 [$tagWord2]" , " tagcss [$tagcss]"; my $pre = 0; my $table = $orig; my $found = 0; if ($tagcss) { $table =~ s/$classT\"/shade-note\"/g; $table =~ s/$classTD\"/shade-note-attrib\"/g; $table =~ s,<(?i:pre)>,$tagWord,; $end = ""; # remove $found = 1; } elsif ($tagWord2 =~ /t2html::(\S+)/) { # Command directives for table rendering # # #t2html::td:bgcolor="#FFFFFF":tableclass:dashed # #t2html::td:bgcolor="#FFFFFF":tableborder:1 # #t2html::td:class=color-beige my $directives = $1; $directives =~ s/_/ /g; $tagWord = ""; # Kill first line $pre = 1; # Put PRE back while ($directives =~ /([^:]+):([^:]+)/g) { my ($key, $val) = ($1, $2); # Fix for the HTML # $key = class=color-beige # => $key = class="color-beige" if ($val =~ /=/ and $val =~ /(.*)=([^\"']+)/) { $val = qq($1="$2"); } if ($key eq 'td') { $table =~ s/((?i:td[^>]+))class=.[^\"']+./$1$val/g; } elsif ($key eq 'table') { $table =~ s/((?i:table\s+))[^>]+/$1$val/g; } elsif ($key =~ /table(\S+)/) { $key = $1; $val = qq("$val") unless $val =~ /[\"']/; $table =~ s/((?i:table[^>]+))$key=.[^\"]+./$1$key=$val/g; } } } my ($para, $rest) = ("", ""); # This code is a bit hairy. # - If there a paragraph (\n\s*\n), then treat it as # individual TABLE. # - After this initial pragraph, the rest of # the text is returned back to the original

	    if ($found  and  $text =~ /\A(.+?\S)\n\s*\n(.+)/sgm)
	    {
		($para, $rest) = ($1, $2);

		$debug > 7  and  print "$id: PARAGRAPH [$para] [$rest]\n";

		$table = $orig;
		$text  = $rest;
		$pre   = 1;

		$para =  XlatWordMarkup(XlatTag2html $para);
		$para = qq($tagWord ) . $para;

		$para = HtmlTable $para, "shade-note", "shade-note-attrib";

		#  Fix HREF tags back to normal html.
		$para = XlatHtml2href $para;
	    }
	    else
	    {
		$tagcss  and $tagWord = "";
		$text = (XlatTag2html $tagWord . $text);

		$debug > 7  and  print "$id: PARAGRAPH-ELSE tagcss [$tagcss]"
				     , " found [$found] text [$text]\n";

		$found  and  $text = XlatWordMarkup $text;

		#  Fix HREF tags back to normal html.
		$text = XlatHtml2href $text;

		$debug > 7  and  print "$id: PARAGRAPH-ELSE (final) text [$text]\n";

		# Separate paragraphs
		# $text =~ s/^\s*$/    

/g; } $text .= "

\n" if $pre and $text !~ /
 7  and  print "$id: REPLACED [$ret]\n";

	    $ret;
	}esmgx;
    }

    #   There must be no gaps here
    #
    #       
    #
    #       code example
    #
    #       
# # # => #
    #       code example
    #       
s,<(pre|code)>[ \t]*[\r\n]+,<$1>\n,igm; s,(?:\s*[\n\r])( *),$1,igm; # Remove P before OL or UL - already fixed in DoLine(). # s,

(<[ou]l>),$1,igm; # Remove extra newline that is generated by

.

# already adds one empty line. # s,(
\s+),$1,gsmi; # Remove extra gap before table # s,

\s+((

),$1,gi; # Afer each heading(1), there must be paragraph marker #FIXME #TODO s,(\s+.*blockquote.*\s+)([^<]),$1

$2,gi; # Afer each heading(2), there must be paragraph marker s,(\s+)([^<]),$1

$2,gi; # Correct raw HTML entity translations. # &#35; => # s,&(#\d\d?\d?;),&$1,gi; # Remove dead code #

#

      s,

      \s+(<(?:p|[ou]l)[ >]),$1,mg; # Final clean up: remove trailing spaces s,[ \t]+$,,mg; # Restore array and put newlines back. my $str = $ARG; my @arr = map { $ARG .= "\n" } split '\n', $str; \@arr; } # **************************************************************************** # # DESCRIPTION # # Substitute user tags given at --refrence "TAG=value". The values # are stored in %REFERENCE_HASH # # INPUT PARAMETERS # # $ Plain text # # RETURN VALUES # # $ formatted html line # # **************************************************************************** sub DoLineUserTags ($) { my $id = "$LIB.DoLineUserTags"; local ($ARG) = @ARG; # ........................................ substitute user tags ... while (my($key, $value) = each %REFERENCE_HASH) { if (/$key/) { $debug and print "$id: $ARG -- KEY $key => VAL $value\n"; s,$key,$value,gi; $debug and print "$id: $ARG"; } } $debug and print "$id: RET [$ARG]\n"; $ARG; } # **************************************************************************** # # DESCRIPTION # # Return HTML code to start

       section
      #
      #   INPUT PARAMETERS
      #
      #       None.       THe style is looked up in CDD_CODE_FORMAT
      #
      #   RETURN VALUES
      #
      #
      #
      # ****************************************************************************
      
      sub HtmlCodeSectionEnd ()
      {
          my $id = "$LIB.HtmlCodeSectionEnd";
      
          if ($CSS_CODE_STYLE ne -notset)
          {
      	#   This will format nicely in the generated HTML
      
      	my $html = << "EOF";
          
EOF $html; } else { "
\n"; } } # **************************************************************************** # # DESCRIPTION # # Return HTML code to start
 section
#
#   INPUT PARAMETERS
#
#       None.       THe style is looked up in CDD_CODE_FORMAT
#
#   RETURN VALUES
#
#
#
# ****************************************************************************

sub HtmlCodeSectionStart ()
{
    my $id = "$LIB.HtmlCodeSectionStart";

    my $html;
    my %style =
    (
	  -d3           => ["shade-3d"      , "shade-3d-attrib"]
	, -shade        => ["shade-normal"  , "shade-normal-attrib" ]
	, -shade2       => ["shade-normal2" , "shade-normal2-attrib" ]
    );

    if($CSS_CODE_STYLE ne -notset
	and  my $arrRef = $style{$CSS_CODE_STYLE} )
    {
	my ($class, $attrib) = @{$arrRef};


	$html = << "EOF";

EOF
    }
    else
    {
	$html = qq(\n
\n);
    }

    $debug > 6  and  print "$id: RET [$html]";

    $html;
}

# ************************************************************ &DoLine *******
#
#   DESCRIPTION
#
#       Add html tags per line basis. This function sets some global
#       states to keep track on bullet mode etc.
#
#   USES FUNCTION STATIC VARIABLES
#
#       $staticBulletMode   When bullet is opened, the flag is set to 0
#
#   INPUT PARAMETERS
#
#       $line
#
#   RETURN VALUES
#
#       $       formatted html line
#
# ****************************************************************************

{
    my $staticBulletMode = 0;
    my $staticPreMode    = 0;

    my $static7beg;
    my $static7end;

sub DoLine (%)
{
    # .................................................... arguments ...

    my $id = "$LIB.DoLine";

    my %arg     = @ARG;
    my $file    = $arg{-file};
    my $input   = $arg{-line};
    my $base    = $arg{-base};
    my $line    = $arg{-lineNumber};
    my $arrayRef= $arg{-lineArrayRef};

    unless (defined $arrayRef)
    {
	warn "$id: [ERROR] \$arrayRef is not defined";
	return;
    }

    not defined $input      and warn "$id: INPUT not defined?";
    not defined $line       and warn "$id: LINE not defined?";

    return "" if not defined $input;

    # ........................................................... $ARG ...

    local $ARG   = $input; chomp;
    my $origLine = $ARG;

    # ............................................... misc variables ...

    my
    (
	$s1
	, $s2
	, $hname
	, $tmp
	, $tmpLine
	, $beg
	, $end
    );

    my $spaces      = 0;
    my $bulletText  = "";
    my $i           = -1;
    my $br          = $HTML_HASH{br};

    # .................................... lines around current-line ...
    #       HEADER                  <-- search this
    #           
    #           text starts here

    my $prev2   = "";
    $prev2      = $$arrayRef[ $line -2] if $line > 1;

    my $prev    = "";
    $prev       = $$arrayRef[ $line -1] if $line > 0;

    my $next    = "";
    $next       = $$arrayRef[ $line +1] if $line +1 < @$arrayRef ;

    my $prevEmpty   = 0;
    $prevEmpty      = 1 if $prev    =~ /^\s*$/;

    my $nextEmpty   = 0;
    $nextEmpty      = 1 if $next    =~ /^\s*$/;

    # ............................................... flag variables ...

    my($AsIs, $hlevel, $isBullet);

    my $isCode      = 0;
    my $isText      = 0;
    my $isPcode     = 0;
    my $isBrCode    = 0;

    my $isPrevHdr   = 0;
    $isPrevHdr      = IsHeading $prev2   if $line > 1;

    my $isPureText  = 0;
    $tmp            = "  ";                     # 4 spaces
    $isPureText     = 1 if /^$tmp$tmp$tmp/o;    # {12}

    unless ($static7beg)
    {
	$static7beg = $COLUMN_HASH{ beg7quote };
	$static7end = $COLUMN_HASH{ end7quote };
    }

    # ................................................. command tags ...

    if  (/^( {1,11})\.([^ \t.].*)/)
    {
	# The "DOT" code at the beginning of word. Notice that the dot
	# code is efective only at columns 1..11

	$debug > 6 and warn "BR $line <$ARG>\n";

	$isBrCode   = 1;
	$s1         = $1;
	$s2         = $2;
	$ARG = $s1 . $s2;    #       Remove the DOT control code
    }

    if (/^([ \t]+),([^ \t,].*)/)                  # The "P" tag
    {
	# Remove the command from the output.

	$isPcode    = 1;
	$s1         = $1;
	$s2         = $2;
	$ARG = $s1 . $s2;

	$debug > 6 and warn "P-code $line $ARG\n";
    }

    # .................................................. Strip lines ...
    # It is usual that the is "End of file" marker left flushed.
    # Strip that tag away and do not interpret it as a heading. Allow
    # optional heading numbering at front.
    #
    #    1.1  End
    #    1.2.3 End of document

    if
    (
	/^([\d.]*[\d]\s+)?End\s+of\s+(doc(ument)?|file).*$
	 |
	 ^([\d.]\s+)?End\s*$
	/xi
    )
    {
	#   This is the marked that ends the dokument of file. Do not
	#   print it.

	return "";
    }

    # ........................................ substitute user tags ...

    $ARG = DoLineUserTags $ARG;

    if(/#URL-BASE/)
    {
	$debug > 6 and warn ">> $ARG";

	s,#URL-BASE,$base,gi;
    }

    $ARG = XlatTag2html $ARG;

    # ......................................................... &url ...

    $ARG = XlatRef       $ARG;
    $ARG = XlatPicture   $ARG;
    $ARG = XlatUrlInline $ARG;
    $ARG = XlatUrl       $ARG;
    $ARG = XlatMailto    $ARG;

    # .................................................... url-as-is ...

    if(/(.*)#URL-AS-IS-\s*(\S+)((?:>|>).*)/ or
	/(.*)#URL-AS-IS-\s*(\S+)(.*)/
      )
    {
	my $before = $1;
	my $url    = $2;
	my $after  = $3;

	#   Extract the last part after directories "dir/dir/file.doc"

	my $name   = $url;

	if ($url =~ m,.*/(.*),)
	{
	    $name = $1;
	}

	$debug > 6 and warn "URL-AS-IS>> $url";

	$url =  qq($name);

	$ARG = $before . $url . $after;
    }

    # ......................................................... &rcs ...

    #   The bullet text must be examined only after the expansions
    #   in the line

    $isBullet   = IsBullet $ARG, \$bulletText;
    $bulletText = XlatTag2htmlSpecial $bulletText  if $isBullet;

    # ................................................... study line ...

    if (/^( +)[^ ]/)
    {
	($spaces) = /^( +)[^ ]/;
	$spaces   = length $spaces;
    }

    if (/^ {8}[^ ]/o)
    {
	$isText = 1;
    }
    # elsif (/^$s1(!!)([^!\n\r]*)$/o)
    elsif (/^ {4}(!!)([^!\n\r]*)/o)
    {
	#   A special !! code means adding 
tag if (defined $2) { $ARG = qq(\n
\n) . qq(\t $2 $br \n) ; } else { $ARG = "\n
\n\t $br\n"; } } elsif ($hlevel = IsHeading $ARG) { $debug > 1 and warn "$id: IsHeading ok, $hlevel, $ARG\n"; $hname = HeaderArrayUpdate $ARG; $ARG = MakeHeadingHtml -header => $ARG , -headerName => $hname , -headerLevel => $hlevel ; return $ARG; } elsif ( /^ {12,}[^ ]/ and not $staticBulletMode and not $isBullet ) { $AsIs = 1; $isCode = 1; # Make it a little shorter by removing spaces # Otherwise the indent level is too deep $debug > 6 and print "$id: PRE before [$ARG]\n"; $ARG = substr $ARG, 6; $debug > 6 and print "$id: PRE after [$ARG]\n"; # $beg = $COLUMN_HASH{beg12}; # $end = $COLUMN_HASH{end12}; # $ARG = $beg . $ARG . $end; } elsif (/^ {7}\"(.*)\"/o) { # Remove quotes $ARG = $1; $debug > 1 and warn "pos7:$ARG\n"; $beg = $static7beg; $end = $static7end; $ARG = $beg . $ARG . $end . $br; $spaces = 8; # for

} # ...................................................... picture ... if (/IMG src=/i) { if ($line > 0 and $AsIs and $prevEmpty) { # if the Image reference #PIC is placed to the code column, # the

 tags are not good at all.

	    if ($staticPreMode)
	    {
		#   Don't leave pictures inside pre tags.

		my $html = HtmlCodeSectionEnd();

		$ARG = "$html\n\n$ARG";
		$staticPreMode = 0;
	    }
	}

	return "$ARG\n";
    }

    # .......................................................... PRE ...

    $ARG = XlatTag2htmlSpecial $ARG   unless  $AsIs;

    if ($line > 0  and  $AsIs  and  $prevEmpty)
    {
	unless ($staticPreMode)
	{
	    my $html = HtmlCodeSectionStart();
	    $ARG = $html . $ARG;

	    $staticPreMode = 1   unless $staticPreMode;

	    if ($staticPreMode)
	    {
		$debug > 6  and  print "$id: PRE-1 [$ARG]\n";
	    }
	}
    }

    if (not $AsIs and  $next !~ /^ {12,}[^ ]|^[\r\n]+$/)
    {
	#   Next non-empty line terminates PRE mode

	if ($staticPreMode)
	{
	    my $html = HtmlCodeSectionEnd();
	    $ARG = "$html$ARG";

	    $staticPreMode = 0;

	    $debug > 6  and  print "$id: PRE-0 [$ARG]\n";
	}
    }

    # disable, not needed

    if (0  and  $staticPreMode  and $AsIs  and
	  $CSS_CODE_STYLE  ne -notset
	)
    {
	$ARG .= $br;
    }


#print "[$origLine]\n[$ARG]\n>> pre mode = $staticPreMode as = $AsIs\n\n";

    # ...................................................... bullets ...

    $debug > 1 and  warn "$id: line $line: "
		, " spaces $spaces "
		, " PrevEmpty $prevEmpty "
		, " NextEmpty $nextEmpty "
		, " isPrevHdr $isPrevHdr "
		, " hlevel $hlevel "
		, " IsBR $isBrCode "
		, " isPcode $isPcode "
		, " IsBullet $isBullet "
		, " StaticBulletMode $staticBulletMode\n"
		, "ARG[$ARG]\n"
		, "next[$next]\n"
		;

    if ($isBullet and $prevEmpty)
    {
	$s1 =   "
    "; $s1 = "
      " if $isBullet eq $BULLET_TYPE_NUMBERED; $ARG = $s1 . "\n\t
    1. " . $bulletText; $staticBulletMode = 1; $isBullet = 0; # we handled this. Marks as used. $debug > 1 and warn "______________BULLET ON [$isBullet] $ARG\n"; } if (($isBullet or $staticBulletMode) and $nextEmpty) { $s1 = "
"; $s1 = "" if $isBullet eq $BULLET_TYPE_NUMBERED; $ARG = "
  • $bulletText" if $isBullet; if (not $isPcode) { # if previous paragraph does not contain P code, # then terminate this bullet $staticBulletMode = 0; $ARG = "\t$ARG
  • \n$s1\n\n"; } else { $ARG = "\t$ARG\n

    \n"; # Continue in bullet mode } $debug > 1 and warn "______________BULLET OFF [$isBullet] $ARG\n"; $isBullet = 0; } if ($isBullet) { my $end = "\t\n" if $staticBulletMode > 1; $ARG = "$end

  • $bulletText"; $staticBulletMode++; $debug > 1 and warn "BULLET $ARG\n"; } # ...................................... determining line context ... # LOGIC: the

    and all that # # If this is column 8, suppose regular text. # See if this is begining or end of paragraph. if ($spaces == 1 or $spaces == 2) { $AsIs = $isCode = 1; } $debug > 6 and print "$id: %%P-before%% $ARG\n"; #print qq( # $spaces > 0 # # and not $isCode # # # if this the above line was header, we must not insert P tag, # # because it would double the line spacing # # BUT, if user has moved this line out of col 8, go ahead # # and (not $isPrevHdr or ($isPrevHdr and $spaces != 8 )) # # and not $hlevel # and not $isBullet # and not $staticBulletMode # # # If user has not prohibited using P code # # and not $isPcode # # # these tags do not need P tag, otw line doubles # # and not /

    /i
    #);
    
        if
        (
    	$spaces > 0
    
    	and not $isCode
    
    	# if this the above line was header, we must not insert P tag,
    	# because it would double the line spacing
    	# BUT, if user has moved this line out of col 8, go ahead
    	#
    	# 2007-03-01 not used any more
    	# and (not $isPrevHdr or ($isPrevHdr and $spaces != 8 ))
    
    	and not $hlevel
    	and not $isBullet
    	and not $staticBulletMode
    
    	#   If user has not prohibited using P code
    
    	and not $isPcode
    
    	#   these tags do not need P tag, otw line doubles
    
    	and not /
    /i
        )
        {
    	my $code;
    
    	$debug > 6 and
    	    print "$id: %%P-in%% prevEmpty [$prevEmpty] nextEmpty [$nextEmpty]\n";
    
    	if ($prevEmpty)
    	{
    	    if (exists $COLUMN_HASH{"beg" . $spaces})
    	    {
    		$code = $COLUMN_HASH{ "beg" . $spaces };
    		$ARG = "\n$code\n$ARG";
    	    }
    	    elsif ($spaces <= 12)
    	    {
    		$code = " class=" . qq("column) . $spaces . qq(");
    		$ARG = "\n\n$ARG";
    	    }
    	}
    
    	if ($nextEmpty)
    	{
    	    if (exists $COLUMN_HASH{"end" . $spaces})
    	    {
    		$code = $COLUMN_HASH{ "end" . $spaces };
    		$ARG .= $code . "\n";
    	    }
    	    elsif ($spaces <= 12)
    	    {
    		# No 

    needed } } } $debug > 6 and print "$id: %%P-after%% $ARG\n"; # _WORD_ is strong # *WORD* is emphasised # The '_' must preceede whitespace and '>' which could be # html code ending character. # do not touch "code" text above 12 column amd IMAGES if (not $AsIs) { $ARG = XlatWordMarkup $ARG; # If already has /P then do nothing. if ($isBrCode and not m,

    ,i) { $ARG .= $br; } } # ...................................................... include ... if(/(.*)#INCLUDE-(\S+)(.*)/) { my $dir = dirname $file; my $before = $1; my $url = $2; my $after = $3; my $mode = ""; if ($url =~ /^raw:(.*)/) { $mode = -raw; $url = $1; } my $out = UrlInclude -dir => $dir, -url => $url, -mode => $mode; unless ($out) { warn "$id: Include error '$url' in [$file:$ARG]"; } $ARG = $before . $out . $after; } $debug > 6 and print "$id: RET [$ARG]\n"; "$ARG\n"; }} # }}} # {{{ Main # **************************************************************************** # # DESCRIPTION # # Handle htmlizing the file # # INPUT PARAMETERS # # \@content text # $filename Used in split mode only to generate multiple files. # $regexp Split Regexp. # $splitUseFileNames Use symbolic names instead of numeric filenames # when splitting. # $auto Flag or string. # If 1, write directly to .html files. no stdout # If String, then write to file. # $frame Is frame html requested # $cache boolean, start using URL cache. # # RETURN VALUES # # none # # **************************************************************************** sub HandleOneFile (%) { my $id = "$LIB.HandleOneFile"; my %arg = @ARG; my $txt = $arg{-array}; my $file = $arg{-file}; my $regexp = $arg{-regexp}; my $splitUseFileNames = $arg{-split}; my $auto = $arg{-auto}; my $frame = $arg{-frame}; my $linkCheck = $arg{-linkCheck}; my $linkCheckOneLine = $arg{-linkCheckOneLine}; my $title = $arg{-title}; my $author = $arg{-author}; my $doc = $arg{-doc}; my $email = $arg{-email}; my $metaDescription = $arg{-metadescription}; my $metaKeywords = $arg{-metakeywords}; unless (defined $txt) { warn "$id: [ERROR] \$txt is not defined"; return; } $debug and warn << "EOF"; $id: INPUT \%arg = @ARG; txt = $arg{-array}; file = $arg{-file}; regexp = $arg{-regexp}; splitUseFileNames = $arg{-split}; auto = $arg{-auto}; frame = $arg{-frame}; linkCheck = $arg{-linkCheck}; linkCheckOneLine = $arg{-linkCheckOneLine}; title = $arg{-title}; author = $arg{-author}; doc = $arg{-doc}; email = $arg{-email}; metaDescription = $arg{-metadescription}; metaKeywords = $arg{-metakeywords}; EOF $debug and printf "$id: File [$file] content length [%d]\n", scalar @$txt; $debug > 2 and print "$id: content <<<@$txt>>>\n"; # ........................................................ local ... my ($i, $line , @arr, $htmlArrRef); my $timeStart = time(); unless (defined @$txt[0]) { warn "$id: [$file] No input lines found"; # We got no input return; } # ..................................................... html2txt ... # - If text contain tag in the begining of file then automatically # convert the input into text if (defined @$txt[2] and IsHTML $txt) { # warn "$id: Conversion to text:\n"; # @$txt = split /\n/, Html2txt($txt); unless ($LINK_CHECK or $LINK_CHECK_ERR_TEXT_ONE_LINE) { warn "$id: [WARNING] $file looks like HTML page.\n"; die "$id: Did you meant to add option for link check? See --help" } } $txt = DeleteEmailHeaders $txt if $DELETE_EMAIL; # We can't remove TOC if link check mode is on, because then the line # numbers reported wouoldn't match the original if TOC were removed. @$txt = KillToc $txt unless $LINK_CHECK; # handle split marks if (defined $regexp) { @arr = SplitToFiles $regexp, $file, $splitUseFileNames, $txt; print join("\n", @arr), "\n" ; return; #todo: } # Should we ignore some lines according to regexp ? if (defined $DELETE_REGEXP and not $DELETE_REGEXP eq "") { @$txt = grep !/$DELETE_REGEXP/o, @$txt ; } @$txt = expand @$txt; # Text::Tabs if ($linkCheck) { LinkCheckMain -file => $file , -array => $txt , -oneline => $linkCheckOneLine ; return; } else { HeaderArrayClear(); for my $line (@$txt) { if (defined $line) { my $tmp = DoLine -line => $line , -file => $file , -base => $BASE_URL , -lineNumber => $i++ , -lineArrayRef => $txt ; push @arr, $tmp; } } } $htmlArrRef = PrintHtmlDoc -arrayRef => \@arr , -lines => scalar @$txt , -file => $file , -type => $OUTPUT_TYPE , -title => $title , -autor => $author , -doc => $doc , -email => $email , -metadescription => $metaDescription , -metakeywords => $metaKeywords ; $htmlArrRef = HtmlFixes $htmlArrRef; my $timeDiff = time() - $timeStart; if (length $auto) { my ($name, $path, $extension) = fileparse $file, '\.[^.]+$'; #font ' $debug and print "$id: fileparse [$name] [$path] [$extension]\n"; if ($auto =~ /../) # Suppose filename if more than 2 chars { $path = $auto; } my $htmlFile = $path . $name . ".html"; $verb and warn "$id: output automatic => $htmlFile\n"; if ($frame) { $htmlFile = FileFrameNameBody(); WriteFile $htmlFile, $htmlArrRef; # This is the file browser wants to read. Printed to stdout $htmlFile = FileFrameNameMain(); } else { $debug and print "$id: WRITE non-frame [$htmlFile]\n"; WriteFile $htmlFile, $htmlArrRef; } $htmlFile =~ s/$HOME_ABS_PATH/$HOME/ if defined $HOME_ABS_PATH; $PRINT and print "$name\n"; $PRINT_URL and print "file:///$htmlFile\n" } else { print @$htmlArrRef; } $time and warn "Lines: ", scalar @$txt, " $timeDiff secs\n"; } # **************************************************************************** # # DESCRIPTION # # Run the test page creation command # # INPUT PARAMETERS # # $cmd Additional option to perl command # $fileText Text file source # $fileHtml [optional] Generated Html destination # # RETURN VALUES # # None # # **************************************************************************** sub TestPageRun ($ $ ; $) { my $id = "$LIB.TestPageRun"; my ($cmd, $fileText, $fileHtml) = @ARG; not defined $fileHtml and $fileHtml = ""; print "\n Run cmd : $cmd\n"; my @ret = `$cmd`; if (grep /fail/i, @ret) { print "$id: Please run the command manually and " . "use absolute path names"; } else { print " Original text : $fileText\n" , " Generated html: $fileHtml\n" ; } print @ret if @ret; } # **************************************************************************** # # DESCRIPTION # # Print the test pages: html and txt and sample style sheet. # # INPUT PARAMETERS # # None # # RETURN VALUES # # None # # **************************************************************************** sub TestStyle () { return qq( /* An example CSS */ body { font-family: Georgia, "Times New Roman", times, serif; padding-left: 0px; margin-left: 30px; font-size: 12px; line-height: 140%; text-align: left; max-width: 700px; } div.toc { font-family: Verdana, Tahoma, Arial, sans-serif; margin-left: 40px; } div.toc h1 { font-family: Georgia, "Times New Roman", times, serif; margin-left: -40px; } h1, h2, h3, h4 { color: #6BA4DC; text-align: left; } h1 { font-size: 20px; margin-left: 0px; } h2 { font-size: 14px; margin: 0; margin-left: 35px; } hr { border: 0; width: 0%; } p { text-align: justify; margin-left: 3em; } pre { margin-left: 35px; } li { text-align: justify; } p.column8 { text-align: justify; } ul, ol { margin-left: 35px; } .word-ref { color: teal; } em.word { color: #809F69; } samp.word { color: #4C9CD4; font-family: "Courier New", Courier, monospace; font-size: 1em; } span.super { /* superscripts */ color: teal; vertical-align: super; font-family: Verdana, Arial, sans-serif; font-size: 0.8em; } span.word-small { color: #CC66FF; font-family: Verdana, Arial, sans-serif; } table { border: none; width: 100%; cellpadding: 10px; cellspacing: 0px; } table tr td pre { /* Make PRE tables "airy" */ margin-top: 1em; margin-bottom: 1em; } table.shade-normal { color: #777; } table.dashed { color: Navy; border-top: 1px #00e6e8 solid; border-left: 1px #00e6e8 solid; border-right: 1px #00c6c8 solid; border-bottom: 1px #00c6c8 solid; border-width: 94%; border-style: dashed; /* dotted */ } /* End of CSS */ ); } sub TestPage ($) { my $id = "$LIB.TestPage"; # ............................................. initial settings ... my $destdir = "."; # GetHomeDir(); # my $tmp = "$destdir/tmp"; # # $destdir = $tmp if -d $tmp; # # if (not $destdir) # { # $destdir = $TEMPDIR || $TEMP || "/tmp"; # } # # unless (-d $destdir) # { # die "[FATAL] Cannot find temporary directory to write test files to."; # } my $fileText1 = "$destdir/$PROGNAME-1.txt"; my $fileHtml1 = "$destdir/$PROGNAME-1.html"; my $fileText2 = "$destdir/$PROGNAME-2.txt"; my $fileHtml2 = "$destdir/$PROGNAME-2.html"; my $fileText3 = "$destdir/$PROGNAME-3.txt"; my $fileHtml3 = "$destdir/$PROGNAME-3.html"; my $fileText4 = "$destdir/$PROGNAME-4.txt"; my $fileHtml4 = "$destdir/$PROGNAME-4.html"; my $cssFile = "$destdir/$PROGNAME-4.css"; my $fileFrame = "$destdir/$PROGNAME-5.txt"; # ............................................. write test files ... my $cmd; my @test = grep ! /__END__/, ; unless (@test) { die "[FATAL] Couldn't read DATA. Report this problem"; } WriteFile $fileText1, \@test; WriteFile $fileText2, \@test; WriteFile $fileText3, \@test; WriteFile $fileText4, \@test; WriteFile $fileFrame, \@test; WriteFile $cssFile, TestStyle(); local $ARG = $PROGRAM_NAME; if (not m,[/\\],) { # There is no absolute dir that we could refer to ourself. # the -S forces perl to search the path, but what if the progrma # is not in the PATH yet? --> failure. print "$id: WARNING No absolute PROGRAM_NAME $PROGRAM_NAME", "$id: The automatic call may fail, if program is not in \$PATH;" ; $cmd = "perl -S $PROGRAM_NAME"; } else { $cmd = "perl $PROGRAM_NAME"; } # ..................................................... generate ... TestPageRun "$cmd --css-code-bg --css-code-note=\"(?:Notice|Note):\"" . " --css-file=\"$cssFile\"" . " --quiet --simple --out $fileText1" , $fileText1, $fileHtml1 ; TestPageRun "$cmd --as-is --css-code-bg --css-code-note=\"(?:Notice|Note):\"" . " --out $fileText2" , $fileText2, $fileHtml2 ; # TestPageRun # "$cmd --css-font-normal --out $fileText3" # , $fileText3, $fileHtml3 # ; # TestPageRun # "$cmd --css-font-readable --out $fileText4" # , $fileText4, $fileHtml4 # ; # my $base = $fileFrame; # TestPageRun # "$cmd --html-frame --print-url --out $fileFrame" # , $fileFrame # ; exit 0; } # **************************************************************************** # # DESCRIPTION # # Read Web page # # INPUT PARAMETERS # # page HTML page # # RETURN VALUES # # $content plain text # # **************************************************************************** { my $staticLibChecked = 0; my $staticLibStatus = 0; sub Html2Text (@) { my $id = "$LIB.Html2Text"; my (@page) = @ARG; $debug and print "$id: CONTENT =>[[[@page]]]"; unless ($staticLibChecked) { $staticLibChecked = 1; $staticLibStatus = LoadUrlSupport(); if (not $staticLibStatus and $verb) { warn "$id: Cannot Convert to HTML. Please get more Perl libraries."; } } my $content = join '', @page; my $formatter = new HTML::FormatText (leftmargin => 0, rightmargin => 76); # my $parser = HTML::Parser->new(); # $parser->parse(join '', @list); # $parser-eof(); # $verb and $HTML::Parse::WARN = 1; my $html = parse_html($content); $verb > 1 and warn "$id: Making conversion\n"; $content = $formatter->format($html); $html->delete(); # mandatory to free memory $debug and print "$id: RET =>[[[$content]]]"; $content; }} # **************************************************************************** # # DESCRIPTION # # Read Web page # # INPUT PARAMETERS # # url URL address # mode [optional] if option is [-text] convert page to text # # RETURN VALUES # # $content # # **************************************************************************** { my $staticLibChecked = 0; my $staticLibStatus = 0; sub UrlGet ($; $) { my $id = "$LIB.UrlGet"; my ($url, $opt) = @ARG; $debug and print "$id: OPT [$opt] Getting URL $url\n"; unless ($staticLibChecked) { $staticLibChecked = 1; $staticLibStatus = LoadUrlSupport(); if (not $staticLibStatus and $verb) { warn "$id: Cannot check remote URLs. Please get more Perl libraries."; } } unless ($staticLibStatus) { $verb and print "$id: No URL support: $url\n"; return; } my $ua = new LWP::UserAgent; my $request = new HTTP::Request( 'GET' => $url); my $obj = $ua->request($request); my $stat = $obj->is_success; unless ($stat) { warn "$id ** error: $url ", $obj->message, "\n"; return; } my $content = $obj->content(); my $ret = $content; # my $head = $obj->headers_as_string(); if ($opt) { $ret = Html2Text $content; if ($ret =~ /TABLE NOT SHOWN/) { $verb and print "$id: HTML to text conversion failed. Using original."; $ret = $content; } } $content; }} # **************************************************************************** # # DESCRIPTION # # Dtermine output directory. # # INPUT PARAMETERS # # File # # RETURN VALUES # # Sets globals ARG_PATH and ARG_DIR # # **************************************************************************** sub OutputDir ($) { my $id = "$LIB.OutputDir"; my ($file) = @ARG; $ARG_PATH = $file; $ARG_PATH = "stdin" if $file eq '-'; if ($ARG_PATH eq "stdin") { $ARG_PATH = "./stdout"; } elsif ($ARG_PATH !~ m,[/\\], or $OUTPUT_DIR) { $debug and print "$id: output dir [$OUTPUT_DIR]\n"; if (not defined $OUTPUT_DIR or $OUTPUT_DIR =~ /^\.$|^\s*$/) { $ARG_PATH = cwd(); } else { $ARG_PATH = $OUTPUT_DIR; } $debug and print "$id: arg_path 1 [$ARG_PATH]\n"; $ARG_PATH .= "/" if $ARG_PATH !~ m,/$,; $ARG_PATH .= basename $file; $debug and print "$id: arg_path 2 [$ARG_PATH]\n"; } ($ARG_FILE, $ARG_DIR) = fileparse $ARG_PATH; $debug and print "$id: RET arg_file [$ARG_FILE] arg_dir [$ARG_DIR]\n"; $ARG_FILE, $ARG_DIR; } # **************************************************************************** # # DESCRIPTION # # Get file # # INPUT PARAMETERS # # $file Can be URL # $dir Default directory # # RETURN VALUES # # none # # **************************************************************************** sub GetFile (%) { my $id = "$LIB.GetFile"; my %arg = @ARG; my $file = $arg{-file}; my $dir = $arg{-dir}; if (not $file and not $dir) { warn "$id: [ERROR] file and dir arguments are empty."; return; } my @content; $debug and print "$id: -file [$file] -dir [$dir]\n"; if ($file =~ m,://,) { my $content = UrlGet $file, -text; if ($content) { for my $line (split /\r?\n/, $content) { push @content, $line . "\n"; } } } else { if ($file !~ m,[\\/]|^[-~]$, and $dir) { $file = "$dir/$file"; } unless (-f $file) { warn "$id: [WARNING] does not look like a file [$file]"; return; } local *FILE; unless (open FILE, $file) { warn "$id: Cannot open [$file] $ERRNO" ; } else { @content = ; } close FILE or warn "$id: Cannot close [$file] $ERRNO"; } if ($debug > 3) { print "$id: file [$file] [$file] CONTENT-START [" , @content , "] CONTENT-END\n"; } @content; } # **************************************************************************** # # DESCRIPTION # # Initialize all global variables. # # INPUT PARAMETERS # # $verb default verbose setting # \@argvRef Original value of @ARGV # \@addArrRef [optional] Options to add to @ARGV # # RETURN VALUES # # @ARGV command line arguments that remain after processing # # **************************************************************************** sub InitArgs (%) { my $id = "$LIB.InitArgs()"; my %arg = @ARG; my $origOptVerb = $arg{-verb} || ''; my $argvRef = $arg{-argv} || []; my $addArrRef = $arg{-argvadd} || []; # Put all #T2HTML-OPTION directived first and # combine them with command line args, which should # override any user directives in file. my @argv = @$argvRef if defined $argvRef; unshift @argv, @$addArrRef if defined $addArrRef; @ARGV = @argv; $debug and PrintArray "$id: ARGV (before) ", \@ARGV; # We must undefine VERB, so that the detection will # work in command line parser. ! $origOptVerb and undef $verb; HandleCommandLineArgs(); $debug and PrintArray "$id: ARGV (after) ", \@ARGV; if (defined $OPT_EMAIL and $OPT_EMAIL ne '') { $OPT_EMAIL =~ s/[<>]//g; # Do this automatic fix CheckEmail $OPT_EMAIL; } @ARGV; } # **************************************************************************** # # DESCRIPTION # # Main entry point # # INPUT PARAMETERS # # none # # RETURN VALUES # # none # # **************************************************************************** sub Main () { # The --debug option is recognized in HandleCommandLineArgs() but # we want to know it immediately here my $cmdline = join ' ', @ARGV if @ARGV; if (defined $cmdline and $cmdline =~ /(^|\s)(?:-d|--debug)[\s=]*(\d+)*/) { PrintArray "Main() started - ARGV (orig) ", \@ARGV; $debug = defined $2 ? $2 : 1; } $debug and warn "main: ARGV before Initialize() call [@ARGV]\n"; Initialize(); my @origARGV = @ARGV; my $origOptVerb = 0; my $id = "$LIB.Main"; # Must be after Initialize(), defined $LIB. $debug and warn "$id: ARGV before InitArgs() call [@ARGV]\n"; @ARGV = InitArgs -verb => $origOptVerb , -argv => \@origARGV; $debug and warn "$id: ARGV after InitArgs() call [@ARGV]\n"; $origOptVerb = $verb; # ................................................... read file ... my $dir = cwd(); # One time at Emacs M-x shell buffer, these calls printed # directoried without the leading '/home'. Go figure why. # # perl -MCwd -e 'print cwd(),qq(\n);' # # A retry with 'cd' command to the same directory fixed the problem. ! -d $dir and die "$id: [PANIC] Perl cwd() returned invalid dir $dir"; unless (@ARGV) { warn "$id: No command line files, reading STDIN."; push @ARGV, "-"; } for my $url (@ARGV) { my @content = GetFile -file => $url, -dir => $dir; my ($outFile, $outDir) = OutputDir $url; # .............................................. auto detect ... # See if this file should be converted at all if ($OPT_AUTO_DETECT) { local $ARG; my $ok; for (@content) { /$OPT_AUTO_DETECT/o and $ok = 1, last; } unless ($ok) { $verb and print "$id: [AUTO-DETECT] skip $url\n"; next; } } # ....................................... ready to make html ... $verb and warn "$id: Handling URL [$url]\n"; # ............................................... directives ... # Read #T2HTML directives $debug > 3 and print "$id: content before\n<<<\n@content>>>\n"; my ($hashRef); ($hashRef, @content) = XlatDirectives @content; my %hash = %$hashRef; $debug > 3 and print "$id: content after\n<<<\n@content>>>\n"; # Create local function to access the hash structure. sub Hash($; $); local *Hash = sub ($; $) { my ($key, $first) = @ARG; if (exists $hash{$key}) { my $ref = $hash{$key}; my @values = $first ? @$ref[0] : @$ref; if ($debug > 2) { warn "$id.Hash: ($key, $first) => " , join('::', @values) , "\n"; } return shift @values if @values == 1; return @values; } return (); }; $debug > 1 and PrintHash "$LIB.hash before", \%hash; # Cancel all embedded options if user did not want them. %hash = () unless $OBEY_T2HTML_DIRECTIVES; my @options = Hash("option"); if (@options) { # Parse user embedded command line directives $debug and PrintArray "$id: #T2HTML-OPTION list ($url) " , \@options; InitArgs -verb => $origOptVerb , -argv => \@origARGV , -argvadd => \@options; } $debug > 1 and PrintHash "$LIB.hash after", \%hash; my $title = $TITLE || Hash("title", 1) || "No title"; my $doc = $DOC || Hash("doc", 1); my $author = $AUTHOR || Hash("author", 1); my $email = $OPT_EMAIL || Hash("email", 1); my $keywords = $META_KEYWORDS || Hash("metakeywords", 1); my $description = $META_DESC || Hash("metadescription", 1); my $auto = $OUTPUT_AUTOMATIC ? $outDir : ""; if (@content) { HandleOneFile -array => \@content , -title => $title , -doc => $doc , -author => $author , -email => $email , -file => $url , -regexp => $SPLIT_REGEXP , -split => $SPLIT_NAME_FILENAMES , -auto => $auto , -frame => $FRAME , -linkCheck => $LINK_CHECK , -linkCheckOneLine => $LINK_CHECK_ERR_TEXT_ONE_LINE , -metakeywords => $keywords , -metadescription => $description ; } } LinkCache -action => '-write'; } sub TestDriverLinkExtractor () { Initialize(); my $id = "$LIB.TestDriverLinkExtractor"; $debug = 1; for my $lib ("LWP::UserAgent", "HTML::LinkExtractor") { CheckModule "$lib" or die "$id: $lib [ERROR] $ERRNO"; } $MODULE_LINKEXTRACTOR_OK = 1; my $url = "http://www.tpu.fi/~jaalto"; my $ua = new LWP::UserAgent; my $req = new HTTP::Request(GET => $url); my $response = $ua->request($req); if ($response->is_success()) { my %hash = ReadLinksMain -file => $url , -array => [$response->content()] ; PrintHash "$id: $url ", \%hash, \*STDOUT; } else { warn "$ERRNO"; } } # TestDriverLinkExtractor; Main(); # }}} 0; __DATA__ t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. (Emacs mode in tinytf.el) Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for (i = 0 ; i < 10 ; i++) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and '--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for (i = 0; i++; i < 10) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for (i = 0; i++; i < 10) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. __END__ perl-text2html-master/doc/000077500000000000000000000000001371714776500160315ustar00rootroot00000000000000perl-text2html-master/doc/conversion/000077500000000000000000000000001371714776500202165ustar00rootroot00000000000000perl-text2html-master/doc/conversion/index.css000066400000000000000000000246751371714776500220550ustar00rootroot00000000000000/* * Copyright (C) 2003-2020 Jari Aalto * * This code is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * Visit * * See also * * http://www.w3.org/TR/REC-CSS2/cover.html#minitoc * http://www.pitt.edu/~nisg/cis/web/cgi/rgb.html * */ /* /////////////////////////////////////////////// Format words */ body { font-family: Verdana, Arial, sans-serif; /* font-family: Georgia, "Times New Roman", times, serif; */ padding-left: 0px; margin-left: 30px; font-size: 12px; line-height: 140%; text-align: left; max-width: 700px; } a:link { color: #4080DF; text-decoration: none; } a:hover { color: #000; } h1, h2, h3 { font-family: Verdana, Arial, sans-serif; color: #6BA4DC; } div.doc h1 { font-size: 180%; padding-bottom: 3px; border-bottom-width: 1px; border-bottom-color: #00c6c8; border-bottom-style: dashed; } div.doc h2 { font-size: medium; } em.word { /* #809F69 Forest green */ /* color: #80B06A; Darker Forest green */ color: #999; font-weight: lighter; } strong.word { } samp.word { color: #4C9CD4; font-family: "Courier New", Courier, monospace; font-size: 1em; /* font-weight: bolder; */ } span.super { /* superscripts */ color: teal; vertical-align: super; font-family: Verdana, Arial, sans-serif; font-size: 0.8em; } span.word-ref { color: teal; } span.word-big { color: teal; font-size: 1.2em; } span.word-small { color: #CC66FF; font-family: Verdana, Arial, sans-serif; /* font-size: 0.95em; */ } /* /////////////////////////////////////////////// Format other */ /* 7th column starting with double quote */ span.quote7 { /* color: Green; */ /* font-style: italic; */ font-family: Verdana; font-weigh: bold; font-size: 1em; } /* This appears in FRAME version: xxx-toc.html */ div.toc { font-size: 0.9em; } div.toc a:link { color: #777 } div.toc a:hover { color: #000; } /* This appears in picture: the caption text beneath */ div.picture { font-size: 1em; font-style: italic; } /* This is the document info footer */ em.footer { font-size: 0.9em; } /* ////////////////////////////////////////////// Format columns */ p.column3 { color: Green; } p.column5 { color: #87C0FF; /* shaded casual blue */ } p.column6 { /* #809F69 is Forest green But web safe colors are: Lighter ForestGreen: 66CC00 ForestGreen: #999966 669900 339900 669966 color: #669900; font-family: "Goudy Old Style" */ margin-left: 3em; font-family: Georgia, "New Century Schoolbook", Palatino, Verdana, Arial, Helvetica; font-size: 0.9em; } /* This is so called 3rd heading */ p.column7 { font-family: Verdana, Arial, sans-serif; font-style: italic; font-weight: bold; } @media print { P.column7 { font-style: italic; font-weight: bold; }} p.column8 { text-align: justify; } p.column9 { font-weight: bold; } p.column10 { margin-left: 2em; padding-top: 0; } em.quote10 { /* #FF00FF Fuchsia; #0000FF Blue #87C0FF casual blue #87CAF0 #A0FFFF Very light blue #809F69 = Forest Green , see /usr/lib/X11/rgb.txt background-color: color: #80871F ; Orange, short of # font-family: "Gill Sans", sans-serif; line-height: 0.9em; font-style: italic; font-size: 1em; line-height: 0.9em; color: #008080; background-color: #F5F5F5; #809F69; forest green #F5F5F5; Pale grey #FFf098; pale green ##bfefff; #ffefff; LightBlue1 background-color: #ffefff; ................. #FFFCE7 Orange very light #FFE7BF Orange dark #FFFFBF Orange limon */ /* # See a nice page at # http://www.cs.helsinki.fi/linux/ # http://www.cs.helsinki.fi/include/tktl.css # # 3-4 of these first fonts have almost identical look # Browser will pick the one that is supported */ font-family: Georgia, Verdana, Arial, "Trebuchet MS", helvetica, sans-serif; /* background-color: #eeeeff; */ font-size: 1em; } @media print { em.quote10 { font-style: italic; line-height: 0.9em; font-size: 0.8em; }} p.column11 { font-family: Georgia, Verdana, Helvetica, sans-serif; font-size: 0.9em; /* font-style: italic; */ letter-spacing: 0.7; /* font-variant: small-caps; */ color: #4682b4; } p.column11 samp.word { font-size: medium; font-family: "Courier New", courier, monospace, sans-serif; font-variant: normal; font-size: 1em; font-stretch: wider; color: #191970; } /* ////////////////////////////////////////////// Format tables */ table { border: none; width: 100%; cellpadding: 10px; cellspacing: 0px; } table tr td pre { /* Make PRE tables "airy" */ margin-top: 1em; margin-bottom: 1em; } table.basic { font-family: "Courier New", Courier, monospace; color: #777; } td.info { border-left: 8px solid #ddd; } table.attention { /* font-family: sans-serif; /* /* grey (dark to white): 0 > 3 > 6 > 9 > C > F; like #666666 */ /* border-top: 1px #CCCCCC solid; border-bottom: 1px #666666 solid; */ border-top-width: thin; border-top-style: solid; border-top-color: #D3D3D3; border-bottom-style: double; border-bottom-color: #D3D3D3; } table.dashed { /* font-family: sans-serif; /* /* background: #F7DE9C; */ color: Navy; border-top: 1px #00e6e8 solid; border-left: 1px #00e6e8 solid; border-right: 1px #00c6c8 solid; border-bottom: 1px #00c6c8 solid; border-width: 94%; border-style: dashed; /* dotted */ /* line-height: 105%; */ } table.solid { font-family: "Courier New", Courier, monospace; /* afont-size: 0.8em; */ color: Navy; /* font-family: sans-serif; /* /* background: #F7DE9C; */ border-top: 1px #CCCCCC solid; border-left: 1px #CCCCCC solid; /* 999999 */ border-right: 1px #666666 solid; border-bottom: 1px #666666 solid; /* dark grey */ /* line-height: 105%; */ } /* Make 3D styled layout by thickening the boton + right. */ table.shade-3d { font-family: "Courier New", Courier, monospace; font-size: 0.8em; color: #999999; /* Navy; */ /* font-family: sans-serif; /* /* background: #F7DE9C; */ /* border-top: 1px #999999 solid; */ /* border-left: 1px #999999 solid; */ border-right: 4px #666666 solid; border-bottom: 3px #666666 solid; /* line-height: 105%; */ } .shade-3d-attrib { /* F9EDCC Light Orange FAEFD2 Even lighter Orange #FFFFCC Light yellow, lime */ background: #FFFFCC; } table.shade-normal { font-family: "Courier New", Courier, monospace; font-size: 0.95em; background: #E3F2F9; color: #5F5F5F; /* color: Navy; */ } .shade-normal-attrib { /* grey: EAEAEA, F0F0F0 FFFFCC lime: F7F7DE CCFFCC pinkish: E6F1FD D8E9FB C6DEFA FFEEFF (light ... darker) slightly darker than F1F1F1: #EFEFEF; background: #F9F9F9; */ } table.shade-normal2 { font-family: "Courier New", Courier, monospace; } .shade-normal2-attrib { background: #E0E0F0; } table.shade-note { border-bottom-style: dashed; border-bottom: #666666; border-width: thin; } .shade-note-attrib { /* darker is #E0E0F0; */ /* background: #E5ECF3; */ /* background: #E5ECF3; */ /* #afeeee; dark green */ background: #EDF3FB; /* font-family: "Times New Roman", Times, Georgia, "New Century Schoolbook", Palatino, Verdana, Helvetica, serif; */ font-size: 0.90em; font-style: normal; padding-left: 1em; padding-right: 1em; padding-top: 1em; padding-bottom: 1em; max-width: 700px; } .shade-note-attrib samp.word { color: #4C9CD4; font-family: "Courier New", Courier, monospace; font-size: 0.9em; } /* ..................................... colors ................. */ .color-white { color: #000; /* background: #FFFFFF; */ } .color-fg-navy { color: navy; } .color-fg-blue { color: blue; } .color-fg-teal { color: teal; } /* Nice combination: teal-dark, beige2 and beige-dark */ .color-teal-dark { color: #96EFF2; } .color-beige { color: Navy; background: #F7F7DE; } .color-beige2 { color: Navy; background: #FAFACA; } .color-beige3 { color: Navy; background: #F5F5E9; } .color-beige-dark { color: Navy; background: #CFEFBD; } .color-pink-dark { background: #E6F1FD; } .color-pink-medium { background: #D8E9FB; } .color-pink { /* grey: EAEAEA, F0F0F0 FFFFCC lime: F7F7DE CCFFCC pinkish: E6F1FD D8E9FB C6DEFA FFEEFF (light ... darker) */ background: #C6DEFA; } .color-pink-light { background: #FFEEFF; } .color-blue-light { background: #F0F0FF; } .color-blue-medium { background: #4A88BE; } span.bookref { font-weight: lighter; } div.footnote { /* border-top-width: 1px; border-top-style: solid; border-top-color: teal; */ font-size: 1em; font-family: Georgia, "Times New Roman", Roman, serif; color: #333333; } /* End of file */ perl-text2html-master/doc/conversion/index.html000066400000000000000000000501231371714776500222140ustar00rootroot00000000000000 Conversion for text files

    1.0 Document id

    1.1 General

    Copyright © 1996-2019 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    1.2 T2html program features

    Writing text documents is different from writing messages to Usenet or to your fellow workers. There already exists several tools to convert email messages into HTML, like MHonArc, Email hyper archiver, but for regular text documents, like for memos, FAQs, help pages and for other papers, there wasn't any suitable HTML converter couple of years back. The author wanted a simple HTML tool which would read pure plain text documents, like guides, tips pages, documentation, book mark pages etc. and convert them into HTML. Here you will find the specification how to format your text documents for t2html.pl perl script text to HTML converter.

    Few arguments, why plain text is the best source document format:

    • It is readable by all, without any extra software
    • Deliverable by email, as is.
    • Most easily kept in version control
    • Most easily patched ( when someone sends a diff -u ...)
    • Most easily handed to someone else when author no longer maintain it. (If you have specialized tools, people need to learn them in order to maintain your FAQ.)

    1.2.1 Overview of features:

    • Requires Perl 5.004 or never
    • 500K text document takes 70 seconds to convert to HTML.
    • TF to Perl POD conversion may be in a future plan.
    • Better linking of multiple files planned
    • Configuration file for individual file options planned.

    1.2.2 HTML conversion

    • minimal mark up: rendering is based on indentation level. Written text document looks like a "Natural Document", and is suitable for reading as such.
    • Text layout with indentation rules is called Technical Format (TF) and document must be formatted according to it before it is suitable for HTML generation.
    • Rules are simple: place heading to the left and text at column 8.
    • Program generates META tags for search engines.
    • Colored html page: <EM> <STRONG> <PRE> ...
    • Hyperlinks and email addresses are automatically detected. No mark up is needed.

    1.2.3 HTML 4.01

    • Make a single html (1 file) or Framed version (3 files)
    • Sample CSS2 (Cascading Style Sheet) included in HTML code for document rendering. User can import his own CSS2.

    1.2.4 Link check for the text file

    • You need LWP module in order to use this feature. (Comes with latest Perl)
    • Program has switches to run Link check on your text file to find out any broken or moved link. Currently you have to manually fix the links, nut an Emacs mode to do this automatically is planned. The output from Link check is standard grep style: *FILE:NBR:Error-Description*

    1.2.5 Splitting the text file to pieces

    • You can split very large document into pieces, e.g. according to top level headings and convert each piece to HTML. This is also handy if you're planning to print Slides for a class: Split on Headings to individual files: raise the font point and print each file separately.

    1.3 How to convert text files into HTML?

    The TF specification can be found from the Manual The command used to generate this page was:

          t2html.pl                                                     \
          --author           "Jari Aalto"                               \
          --title            "Conversion for text files"                \
          --html-body         LANG=en                                   \
          --Out                                                         \
          index.txt    

    1.4 Writing a text document

    You need nothing else but a text editor where the current column number is displayed or editor can be configured to advance your TAB by 4 spaces. That's it. An Emacs minor mode (See package tinytf.el) can make the writing documents easy. The mode will help formatting paragraphs, filling bullets numbering headings and keeping TOC up to date.

    1.5 Ripping program documentation

    1.5.1 Documentation tools in programming languages

    Perl is an exception within programmin languages, because it includes internal documentation syntax called POD (Plain Old Syntax), with which you can embed documentation right into the program source. Deriving the documentation from perl programs is a straightforward job. Another well known language (invented long after Perl) is Java, which calls the embedded documentation javadoc. fro all others, there is need to write separate documentation.

    1.5.2 Other programming languages

    But it is possible to embed documentation inside any programming language: directly into the code. A small Perl utility can be used to extract the documentation provided it was written in TF format. Documentation is put at the beginning of the file and updated there. Program ripdoc.pl extracts the documentation which follows TF guidelines. The idea is that you can generate HTML documents from the embedded 'TF pod'. The conversion goes like this:

          ripdoc.pl code.sh | t2html.pl > code.html
          ripdoc.pl code.el | t2html.pl > code.html
          ripdoc.pl code.cc | t2html.pl > code.html    

    Suitable for awk, shell, sh, ksh, C++, Java, Lisp, python, Tcl etc. programming languages. The only criteria is that the language supports one-comment-starter and that the documentation has been written by using it. Languages that have comment-start and comment-end, like C that has /* and */, are not suitable for ripdoc.pl.


    2.0 Other converters

    2.1 Postscript

    2.2 Texinfo

    • See page http://www.fido.de/kama/texinfo/texinfo-en.html where you can find C-program html2texinfo program
    • Perl program html2texi.pl http://www.cs.washington.edu/homes/mernst/software/#html2texi html2texi converts HTML documentation trees into Texinfo format. Texinfo format can be easily converted to Info format (for browsing in Emacs or the stand alone Info browser), to a printed manual, or to HTML. Thus, html2texi.pl permits conversion of HTML files to Info format, and secondarily enables producing printed versions of Web page hierarchies. Unlike HTML, Info format is searchable. Since Info is integrated into Emacs, one can read documentation without starting a separate Web browser. Additionally, Info browsers (including Emacs) contain convenient features missing from Web browsers, such as easy index lookup and mouse-free browsing.

    2.3 Other text to HTML tools

    2.4 Other Utilities

    • DocBook - SGML online book
    • Texi2html Perl script.
    • HTML tidy remove extra markup.
    • FTL Latex like Perl formatting
    • Hyperlatex "Hyperlatex is a package that allows you to prepare documents in HTML, and, at the same time, to produce a neatly printed document from your input. Unlike some other systems that you may have seen, Hyperlatex is not a general LaTeX-to-HTML converter. In my eyes, conversion is not a solution to HTML authoring. A well written HTML document must differ from a printed copy in a number of rather subtle ways. I doubt that these differences can be recognized mechanically, and I believe that converted LaTeX can never be as readable as a document written in HTML. The basic idea of Hyperlatex is to make it possible to write a document that will look like a flawless LaTeX document when printed and like a handwritten HTML document when viewed with an HTML browser."
    • html2texi "html2texi converts HTML documentation trees into Texinfo format. Texinfo format can be easily converted to Info format (for browsing in Emacs or the stand alone Info browser), to a printed manual, or to HTML. Thus, html2texi.pl permits conversion of HTML files to Info format, and secondarily enables producing printed versions of Web page hierarchies. Unlike HTML, Info format is searchable. Since Info is integrated into Emacs, one can read documentation without starting a separate Web browser. Additionally, Info browsers (including Emacs) contain convenient features missing from Web browsers, such as easy index lookup and mouse-free browsing."
    • RTF in PC
    • catdoc Viewing MS WORD files. Catdoc is simple, one C source file, compiles in any system (DOS; Unix). Feed MS word file to it and it gives 7bit text out of it.
    • word2x Viewing MS WORD files.
    • MSWordView "MSWordView is a program that can understand the microsofts word 8 binary file format (office97), it currently converts word into html, which can then be read with a browser."
    • Laola Viewing MS WORD files. "Laola(perl) does a respectable job of taking MSWord files to text ...LAOLA is giving access to the raw document streams of any program using "structured storage" technology to save its documents. ELSER is dealing especially with these streams as they are present in Word 6 and Word 7 documents."

    2.5 General Document Maintenance tools


    Html date: 2020-08-19 09:15
    perl-text2html-master/doc/conversion/index.txt000066400000000000000000000352571371714776500221020ustar00rootroot00000000000000Table of contents 1.0 Document id 1.1 General 1.2 Description 1.2.1 Overview of features: 1.2.2 HTML conversion 1.2.3 HTML 4.01 1.2.4 Link check for the text file 1.2.5 Splitting the text file to pieces 1.3 Curious what the document looks like? 1.4 Writing a text document 1.5 Emacs and minor mode 1.6 Ripping program documentation 1.6.1 Documentation tools in programming languages 1.6.2 Other programming languages 1.7 Download the code 2.0 Other converters 2.1 Postscript 2.2 Texinfo 2.3 Other text to HTML tools 2.4 General Document Maintenance tools 1.0 Document id 1.1 General #T2HTML-TITLE Conversion for text files #T2HTML-OPTION --css-file=index.css #T2HTML-OPTION --css-code-bg #T2HTML-OPTION --css-code-note=Note: #T2HTML-OPTION --simple #T2HTML-COMMENT t2html --auto-detect --out Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). 1.2 T2html program features Writing text documents is different from writing messages to Usenet or to your fellow workers. There already exists several tools to convert email messages into HTML, like *MHonArc*, Email hyper archiver, but for regular text documents, like for memos, FAQs, help pages and for other papers, there wasn't any suitable HTML converter couple of years back. The author wanted a simple HTML tool which would read _pure_ _plain_ _text_ documents, like guides, tips pages, documentation, book mark pages etc. and convert them into HTML. Here you will find the specification how to format your text documents for *t2html.pl* perl script text to HTML converter. Few arguments, why plain text is the best source document format: o It is readable by all, without any extra software o Deliverable by email, as is. o Most easily kept in version control o Most easily patched ( when someone sends a diff -u ...) o Most easily handed to someone else when author no longer maintain it. (If you have specialized tools, people need to learn them in order to maintain your FAQ.) 1.2.1 Overview of features: o Requires Perl 5.004 or never o 500K text document takes 70 seconds to convert to HTML. o TF to Perl POD conversion may be in a future plan. o Better linking of multiple files planned o Configuration file for individual file options planned. 1.2.2 HTML conversion o minimal mark up: rendering is based on indentation level. Written text document looks like a "Natural Document", and is suitable for reading as such. o Text layout with indentation rules is called Technical Format (TF) and document must be formatted according to it before it is suitable for HTML generation. o Rules are simple: place heading to the left and text at column 8. o Program generates *META* tags for search engines. o Colored html page:
     ...
            o   Hyperlinks and email addresses are automatically detected.
                No mark up is needed.
    
           1.2.3 HTML 4.01
    
            o   Make a single html (1 file) or *Framed* version (3 files)
            o   Sample CSS2 (Cascading Style Sheet) included in HTML code for
                document rendering. User can import his own CSS2.
    
           1.2.4 Link check for the text file
    
            o   You need LWP module in order to use this feature. (Comes with
                latest Perl)
            o   Program has switches to run Link check on your text file
                to find out any broken or moved link. Currently you
                have to manually fix the links, nut an Emacs mode to do this
                automatically is planned. The output from Link check is standard
                grep style:  *FILE:NBR:Error-Description*
    
           1.2.5 Splitting the text file to pieces
    
            o   You can split very large document into pieces, e.g. according
                to top level headings and convert each piece to HTML. This is
                also handy if you're planning to print Slides for a class:
                Split on Headings to individual files: raise the font point
                and print each file separately.
    
        1.3 How to convert text files into HTML?
    
            The TF specification can be found from the #URL<../manual>
            The command used to generate this page was:
    
                t2html.pl                                                     \
                --author           "Jari Aalto"                               \
                --title            "Conversion for text files"                \
                --html-body         LANG=en                                   \
                --Out                                                         \
                index.txt
    
        1.4 Writing a text document
    
            You need nothing else but a text editor where the current column
            number is displayed or editor can be configured to advance your
            TAB by 4 spaces. That's it.
            An Emacs minor mode (See package
            #URL) can
            make the writing documents easy. The mode will help formatting
            paragraphs, filling bullets numbering headings and keeping TOC
            up to date.
    
        1.5 Ripping program documentation
    
           1.5.1 Documentation tools in programming languages
    
            *Perl* is an exception within programmin languages, because it
            includes internal documentation syntax called _POD_ (Plain Old
            Syntax), with which you can embed documentation right into the
            program source. Deriving the documentation from perl programs
            is a straightforward job. Another well known language
            (invented long after Perl) is Java, which calls the embedded
            documentation *javadoc*. fro all others, there is need to
            write separate documentation.
    
           1.5.2 Other programming languages
    
            But it is possible to embed documentation inside any
            programming language: directly into the code. A small Perl
            utility can be used to extract the documentation provided it
            was written in TF format. Documentation is put at the
            beginning of the file and updated there. Program `ripdoc.pl'
            extracts the documentation which follows TF guidelines. The
            idea is that you can generate HTML documents from the embedded
            'TF pod'. The conversion goes like this:
    
                ripdoc.pl code.sh | t2html.pl > code.html
                ripdoc.pl code.el | t2html.pl > code.html
                ripdoc.pl code.cc | t2html.pl > code.html
    
            Suitable for awk, shell, sh, ksh, C++, Java, Lisp, python,
            Tcl etc. programming languages. The only criteria is that the language
            supports *one-comment-starter* and that the documentation has
            been written by using it. Languages that have *comment-start*
            and *comment-end*, like C that has /* and */, are not suitable for
            ripdoc.pl.
    
    2.0 Other converters
    
        2.1 Postscript
    
            o   *html2ps* converter by Jan Karrman's  at
                http://www.tdb.uu.se/~jan/html2ps.html
            o   html to ps converter
                http://www.tdb.uu.se/~jan/html2ps.html
            o   html to ps converter by Charlie's Perl at
                http://www.antipope.org/charlie/webbook/essays/toolkit.html
    
        2.2 Texinfo
    
            o   See page http://www.fido.de/kama/texinfo/texinfo-en.html
                where you can find C-program *html2texinfo* program
            o   Perl program *html2texi.pl*
                http://www.cs.washington.edu/homes/mernst/software/#html2texi
                html2texi converts HTML documentation trees into Texinfo
                format.  Texinfo format can be easily converted to Info format
                (for browsing in Emacs or the stand alone Info browser), to a
                printed manual, or to HTML. Thus, html2texi.pl permits
                conversion of HTML files to Info format, and secondarily
                enables producing printed versions of Web page
                hierarchies. Unlike HTML, Info format is searchable. Since Info
                is integrated into Emacs, one can read documentation without
                starting a separate Web browser. Additionally, Info browsers
                (including Emacs) contain convenient features missing from Web
                browsers, such as easy index lookup and mouse-free browsing.
    
        2.3 Other text to HTML tools
    
            o   *asciidoc* Python program to convert text files.
                http://sourceforge.net/projects/asciidoc
            o   *t2php* Implementation in PHP language of the
                technical format. Visit
                http://rule-project.org/text/en/sw/t2php.txt
            o   *Wiki*, a simple text rule mark up.
                http://c2.com/cgi/wiki?TextFormattingRules
            o   *Zope* A Stuctured text, which seems to rely on indentation
                level as well. The tool has been written in Python language.
                See http://www.zope.org/Documentation/Articles/STX and
                http://www.zope.org/Members/millejoh/structuredText
            o   *htmlpp* by iMATIX's is at http://www.imatix.com/. This
                is like C-preprosessor where you have have complex
                and powerful text-markup commands. The base file
                ,for html generation is not easily text-readable.
    
                See also http://www.imatix.com/html/gslgen/index.htm GSLgen is
                a general-purpose file generator. It generates source code,
                data, or other files from an XML file and a schema file. The
                XML file defines a particular set of data. The schema file
                tells GSLgen what to do with that data
    
            o   *No-TagsMarkup* by Scott S. Lawton. Another interesting
                plain-text style, similar to TF, is at
                http://www.prefab.com/ssl/notagsmarkup.html . Compared to TF,
                this style needs more markup and lacks come of the advanced
                features like Frame/colour/CSS2 support.
            o   *setext* by Ian Feldman's, a simple text markup is available at
                
            o   *text2html.pl* by Set Golub's Perl script is at
                http://www.cs.wustl.edu/~seth/txt2html/. This is a very good tool
                if you want to convert mail message into html quickly. Use it for
                ad hoc things.
            o   *faq2text*, A C-code (Unix) based text to HTML converter at
                http://www.fadden.com/dl-misc/#faq2html
            o   *faq2html* ftp://ftp.eyrie.org/pub/software/web/faq2html
    
        2.4 Other Utilities
    
            o    #URL
            o    #URL
                 Perl script.
            o    #URL
                 remove extra markup.
            o    #URL
                 Latex like Perl formatting
            o    #URL
                 "Hyperlatex is a package that allows you to prepare documents
                 in HTML, and, at the same time, to produce a neatly printed
                 document from your input. Unlike some other systems that you
                 may have seen, Hyperlatex is not a general LaTeX-to-HTML
                 converter. In my eyes, conversion is not a solution to HTML
                 authoring. A well written HTML document must differ from a
                 printed copy in a number of rather subtle ways. I doubt that
                 these differences can be recognized mechanically, and I
                 believe that converted LaTeX can never be as readable as a
                 document written in HTML.  The basic idea of Hyperlatex is to
                 make it possible to write a document that will look like a
                 flawless LaTeX document when printed and like a handwritten
                 HTML document when viewed with an HTML browser."
    
            o    #URL
                 "html2texi converts HTML documentation trees into Texinfo format.
                 Texinfo format can be easily converted to Info format (for browsing
                 in Emacs or the stand alone Info browser), to a printed manual, or
                 to HTML. Thus, html2texi.pl permits conversion of HTML files to
                 Info format, and secondarily enables producing printed versions of
                 Web page hierarchies. Unlike HTML, Info format is searchable. Since
                 Info is integrated into Emacs, one can read documentation without
                 starting a separate Web browser. Additionally, Info browsers
                 (including Emacs) contain convenient features missing from Web
                 browsers, such as easy index lookup and mouse-free browsing."
            o    #URL
            o    #URL
                 Viewing MS WORD files.
                 Catdoc is simple, one C source file, compiles in any system (DOS;
                 Unix). Feed MS word file to it and it gives 7bit text out of it.
            o    #URL
                 Viewing MS WORD files.
            o    #URL
                 "MSWordView is a program that can understand the microsofts word
                 8 binary file format (office97), it currently converts word into
                 html, which can then be read with a browser."
            o    #URL
                 Viewing MS WORD files.
                 "Laola(perl) does a respectable job of taking MSWord
                 files to text ...LAOLA is giving access to the raw
                 document streams of any program using "structured
                 storage" technology to save its documents. ELSER is
                 dealing especially with these streams as they are present
                 in Word 6 and Word 7 documents."
    
        2.5 General Document Maintenance tools
    
            o   Faq maintainer toolset page is at following page:
                http://www.qucis.queensu.ca/FAQs/FAQaid/ It contains all the
                known tools to make you FAQ maintenance/posting/updating easier
                in any platform.
    
    End
    perl-text2html-master/doc/examples/000077500000000000000000000000001371714776500176475ustar00rootroot00000000000000perl-text2html-master/doc/examples/README.txt000066400000000000000000000004441371714776500213470ustar00rootroot00000000000000Description
    
            This directory contains examples of how the pages would look
            like when invoked with various options. These can be
            generated with:
    
                t2html.pl --test-page
    
    	An example:
    
    	    t2html.pl --html-frame --auto-detect --out t2html-3.txt
    
    End of file
    
    perl-text2html-master/doc/examples/t2html-1.html000066400000000000000000000622001371714776500221050ustar00rootroot00000000000000
    
    
    
    Page title is embedded inside text file
    
    
    
    
    
      
    
    
      
    
    
      
    
    
    
    
    
    
    
    
        
    
    
    
        
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    Copyright © 1996-2016 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    This is a demonstration text of Perl Text To HTML converter.

    Headings

    The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5

    You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation).

    The following blue link is generated with markup code: # REF #Markup ;(Markup);

    (Markup)

    Markup

    The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML.

    Text surrounded by = equals = comes out another color
    Text surrounded by backquote/forward quote comes out color `
    Text surrounded by * asterisks * comes out italic text
    Text surrounded by _ underscores _ comes out bold
    The long dash – is signified with two consequent dashes (-)
    The plus-minus is signified with (+) and (-) markers combined ±4
    Big character "C" in parentheses ( C ) make a copyright sign (C)
    Registered trade mark sign ® is big character "R" in parentheses ( R )
    Euro sign is small character "e" right after digit: 400 €
    Degree sign is number "0" in parentheses just after number: 5°C
    Superscript is maerked with bracket immediately attached to textsee this
    Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™

    Emacs minor mode

    If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content.

    The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools

    Bullets, lists, and links

    This is ordinary text.

    • This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the comma is on the same line as the bullet.

      This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting.

    • The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf.
    1. This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items.
    2. Clickable email gork@ork.com.
    3. Non-clickable email gork@ork.com.
    4. Clickable link: http://this.com
    5. Non-clickable link: http://this.com.
    6. Clickable file: file:/home/gork/x.txt.

    Line breaking

    Ordinary text with leading dot(.) forces line breaks in the HTML. Here is a line with forced break.
    Here is another line thatcontains dot-code at the beginning.

    Specials

    You can use superscripts1, multiple(2) and almost any(ab) and imaginableIV superscripts


    Samples per column (heading level h1)

    These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, not number 1. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special.

    Column 3, plain text, with color

    Column 4, Next heading level (h2)

    Column 5, plain text, with color

    Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font.

    Column 7, bold, italic

    Column 7, start and end with double quote. Use for inner TOPICS

    Column 8, standard text strong emphasized

    Column 9, font weight bold, not italic.

    Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text.

    Column 11, another color, for questions, exercise texts etc.

          Note: It is possible to say something important at
          column 12, which is normally reserved for CODE.
          You must supply options --css-code-bg and
          --css-code-note=Note:

    Here is the code column 12:

          Note: Here is something important to tell you about this code
          This part of the text in first paragrah is rendered differently,
          because it started with magic word _Note:_ The rest of the
          pararagraphs are rendered as CODE.
    
          /* Column 12 code */
          /* 10pt courier navy */
          // col 12 and beyond stay as is to preserve code formatting.
    
          for( i=0 ; i < 10 ; i++ )
          {
              more();
              whatever();
          }

    Another level 2 heading (column 4)

    Here is more ordinary text.


    Table rendering examples

    These examples make sense only if the options --css-code-bg (use gray background for column 12) and --css-code-note=Note: have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the tableclass can take an attribute of the embedded default styles.

          Note: This is example 1 and `--css-code-note' options
          reads 'First word' in paragraph at column 12 and
          renders it differently. You can attache code right after
          this note, which must occupy only one paragraph
    
          --css-code-note=REGEXP      Regexp matches 'First word'
          --css-code-bg

    Here is example 2 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 3 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 4 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 7 using table control code default grey coloring. The cellpadding must be zeroed, around the text to make room.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Conversion program

    into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program:

    
    
    

    The frame aware html pages are generated by adding the --html-frame option.


    Copyright © 2019 by Jari Aalto. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/
    Document author: Jari Aalto
    Last updated: 2019-05-05 19:03
    perl-text2html-master/doc/examples/t2html-1.txt000066400000000000000000000246441371714776500217720ustar00rootroot00000000000000t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). This is a demonstration text of Perl Text To HTML converter. Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for( i=0 ; i < 10 ; i++ ) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and `--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. perl-text2html-master/doc/examples/t2html-2.html000066400000000000000000000622001371714776500221060ustar00rootroot00000000000000 Page title is embedded inside text file

    Copyright © 1996-2016 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    This is a demonstration text of Perl Text To HTML converter.

    Headings

    The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5

    You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation).

    The following blue link is generated with markup code: # REF #Markup ;(Markup);

    (Markup)

    Markup

    The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML.

    Text surrounded by = equals = comes out another color
    Text surrounded by backquote/forward quote comes out color `
    Text surrounded by * asterisks * comes out italic text
    Text surrounded by _ underscores _ comes out bold
    The long dash – is signified with two consequent dashes (-)
    The plus-minus is signified with (+) and (-) markers combined ±4
    Big character "C" in parentheses ( C ) make a copyright sign (C)
    Registered trade mark sign ® is big character "R" in parentheses ( R )
    Euro sign is small character "e" right after digit: 400 €
    Degree sign is number "0" in parentheses just after number: 5°C
    Superscript is maerked with bracket immediately attached to textsee this
    Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™

    Emacs minor mode

    If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content.

    The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools

    Bullets, lists, and links

    This is ordinary text.

    • This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the comma is on the same line as the bullet.

      This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting.

    • The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf.
    1. This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items.
    2. Clickable email gork@ork.com.
    3. Non-clickable email gork@ork.com.
    4. Clickable link: http://this.com
    5. Non-clickable link: http://this.com.
    6. Clickable file: file:/home/gork/x.txt.

    Line breaking

    Ordinary text with leading dot(.) forces line breaks in the HTML. Here is a line with forced break.
    Here is another line thatcontains dot-code at the beginning.

    Specials

    You can use superscripts1, multiple(2) and almost any(ab) and imaginableIV superscripts


    Samples per column (heading level h1)

    These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, not number 1. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special.

    Column 3, plain text, with color

    Column 4, Next heading level (h2)

    Column 5, plain text, with color

    Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font.

    Column 7, bold, italic

    Column 7, start and end with double quote. Use for inner TOPICS

    Column 8, standard text strong emphasized

    Column 9, font weight bold, not italic.

    Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text.

    Column 11, another color, for questions, exercise texts etc.

          Note: It is possible to say something important at
          column 12, which is normally reserved for CODE.
          You must supply options --css-code-bg and
          --css-code-note=Note:

    Here is the code column 12:

          Note: Here is something important to tell you about this code
          This part of the text in first paragrah is rendered differently,
          because it started with magic word _Note:_ The rest of the
          pararagraphs are rendered as CODE.
    
          /* Column 12 code */
          /* 10pt courier navy */
          // col 12 and beyond stay as is to preserve code formatting.
    
          for( i=0 ; i < 10 ; i++ )
          {
              more();
              whatever();
          }

    Another level 2 heading (column 4)

    Here is more ordinary text.


    Table rendering examples

    These examples make sense only if the options --css-code-bg (use gray background for column 12) and --css-code-note=Note: have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the tableclass can take an attribute of the embedded default styles.

          Note: This is example 1 and `--css-code-note' options
          reads 'First word' in paragraph at column 12 and
          renders it differently. You can attache code right after
          this note, which must occupy only one paragraph
    
          --css-code-note=REGEXP      Regexp matches 'First word'
          --css-code-bg

    Here is example 2 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 3 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 4 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 7 using table control code default grey coloring. The cellpadding must be zeroed, around the text to make room.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Conversion program

    into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program:

    
    
    

    The frame aware html pages are generated by adding the --html-frame option.


    Copyright © 2019 by Jari Aalto. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/
    Document author: Jari Aalto
    Last updated: 2019-05-05 19:03
    perl-text2html-master/doc/examples/t2html-2.txt000066400000000000000000000246441371714776500217730ustar00rootroot00000000000000t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). This is a demonstration text of Perl Text To HTML converter. Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for( i=0 ; i < 10 ; i++ ) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and `--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. perl-text2html-master/doc/examples/t2html-3-body.html000066400000000000000000000635351371714776500230560ustar00rootroot00000000000000 Page title is embedded inside text file t2html Test Page

    Copyright © 1996-2016 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    This is a demonstration text of Perl Text To HTML converter.

    Headings

    The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5

    You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation).

    The following blue link is generated with markup code: # REF #Markup ;(Markup);

    (Markup)

    Markup

    The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML.

    Text surrounded by = equals = comes out another color
    Text surrounded by backquote/forward quote comes out color `
    Text surrounded by * asterisks * comes out italic text
    Text surrounded by _ underscores _ comes out bold
    The long dash – is signified with two consequent dashes (-)
    The plus-minus is signified with (+) and (-) markers combined ±4
    Big character "C" in parentheses ( C ) make a copyright sign (C)
    Registered trade mark sign ® is big character "R" in parentheses ( R )
    Euro sign is small character "e" right after digit: 400 €
    Degree sign is number "0" in parentheses just after number: 5°C
    Superscript is maerked with bracket immediately attached to textsee this
    Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™

    Emacs minor mode

    If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content.

    The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools

    Bullets, lists, and links

    This is ordinary text.

    • This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the comma is on the same line as the bullet.

      This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting.

    • The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf.
    1. This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items.
    2. Clickable email gork@ork.com.
    3. Non-clickable email gork@ork.com.
    4. Clickable link: http://this.com
    5. Non-clickable link: http://this.com.
    6. Clickable file: file:/home/gork/x.txt.

    Line breaking

    Ordinary text with leading dot(.) forces line breaks in the HTML. Here is a line with forced break.
    Here is another line thatcontains dot-code at the beginning.

    Specials

    You can use superscripts1, multiple(2) and almost any(ab) and imaginableIV superscripts


    Samples per column (heading level h1)

    These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, not number 1. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special.

    Column 3, plain text, with color

    Column 4, Next heading level (h2)

    Column 5, plain text, with color

    Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font.

    Column 7, bold, italic

    Column 7, start and end with double quote. Use for inner TOPICS

    Column 8, standard text strong emphasized

    Column 9, font weight bold, not italic.

    Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text.

    Column 11, another color, for questions, exercise texts etc.

          Note: It is possible to say something important at
          column 12, which is normally reserved for CODE.
          You must supply options --css-code-bg and
          --css-code-note=Note:

    Here is the code column 12:

          Note: Here is something important to tell you about this code
          This part of the text in first paragrah is rendered differently,
          because it started with magic word _Note:_ The rest of the
          pararagraphs are rendered as CODE.
    
          /* Column 12 code */
          /* 10pt courier navy */
          // col 12 and beyond stay as is to preserve code formatting.
    
          for( i=0 ; i < 10 ; i++ )
          {
              more();
              whatever();
          }

    Another level 2 heading (column 4)

    Here is more ordinary text.


    Table rendering examples

    These examples make sense only if the options --css-code-bg (use gray background for column 12) and --css-code-note=Note: have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the tableclass can take an attribute of the embedded default styles.

          Note: This is example 1 and `--css-code-note' options
          reads 'First word' in paragraph at column 12 and
          renders it differently. You can attache code right after
          this note, which must occupy only one paragraph
    
          --css-code-note=REGEXP      Regexp matches 'First word'
          --css-code-bg

    Here is example 2 using table control code #t2html::tableborder:1

          #t2html::tableborder:1
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid

          #t2html::td:bgcolor=#FFEEFF:tableclass:solid
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC

          #t2html::td:bgcolor=#CCFFCC
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed

          #t2html::td:bgcolor=#FFFFFF:tableclass:dashed
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0"

          #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0"
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room.

          #t2html::td:class=color-white:table:cellpadding=0
    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Conversion program

    The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program:

          perl -S t2html --help | more

    The frame aware html pages are generated by adding the --html-frame option.


    Copyright © 2019 by Jari Aalto. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/ This file has been automatically generated from plain text file with t2html
    Document author: Jari Aalto
    Last updated: 2019-05-05 19:04
    perl-text2html-master/doc/examples/t2html-3-toc.html000066400000000000000000000330261371714776500226760ustar00rootroot00000000000000 Navigation

    perl-text2html-master/doc/examples/t2html-3.html000066400000000000000000000005411371714776500221070ustar00rootroot00000000000000 perl-text2html-master/doc/examples/t2html-3.txt000066400000000000000000000246441371714776500217740ustar00rootroot00000000000000t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). This is a demonstration text of Perl Text To HTML converter. Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for( i=0 ; i < 10 ; i++ ) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and `--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. perl-text2html-master/doc/examples/t2html-4.css000066400000000000000000000037201371714776500217360ustar00rootroot00000000000000 /* An example CSS */ body { font-family: Georgia, "Times New Roman", times, serif; padding-left: 0px; margin-left: 30px; font-size: 12px; line-height: 140%; text-align: left; max-width: 700px; } div.toc { font-family: Verdana, Tahoma, Arial, sans-serif; margin-left: 40px; } div.toc h1 { font-family: Georgia, "Times New Roman", times, serif; margin-left: -40px; } h1, h2, h3, h4 { color: #6BA4DC; text-align: left; } h1 { font-size: 20px; margin-left: 0px; } h2 { font-size: 14px; margin: 0; margin-left: 35px; } hr { border: 0; width: 0%; } p { text-align: justify; margin-left: 3em; } pre { margin-left: 35px; } li { text-align: justify; } p.column8 { text-align: justify; } ul, ol { margin-left: 35px; } .word-ref { color: teal; } em.word { color: #809F69; } samp.word { color: #4C9CD4; font-family: "Courier New", Courier, monospace; font-size: 1em; } span.super { /* superscripts */ color: teal; vertical-align: super; font-family: Verdana, Arial, sans-serif; font-size: 0.8em; } span.word-small { color: #CC66FF; font-family: Verdana, Arial, sans-serif; } table { border: none; width: 100%; cellpadding: 10px; cellspacing: 0px; } table tr td pre { /* Make PRE tables "airy" */ margin-top: 1em; margin-bottom: 1em; } table.shade-normal { color: #777; } table.dashed { color: Navy; border-top: 1px #00e6e8 solid; border-left: 1px #00e6e8 solid; border-right: 1px #00c6c8 solid; border-bottom: 1px #00c6c8 solid; border-width: 94%; border-style: dashed; /* dotted */ } /* End of CSS */ perl-text2html-master/doc/examples/t2html-4.html000066400000000000000000000622001371714776500221100ustar00rootroot00000000000000 Page title is embedded inside text file

    Copyright © 1996-2016 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    This is a demonstration text of Perl Text To HTML converter.

    Headings

    The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5

    You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation).

    The following blue link is generated with markup code: # REF #Markup ;(Markup);

    (Markup)

    Markup

    The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML.

    Text surrounded by = equals = comes out another color
    Text surrounded by backquote/forward quote comes out color `
    Text surrounded by * asterisks * comes out italic text
    Text surrounded by _ underscores _ comes out bold
    The long dash – is signified with two consequent dashes (-)
    The plus-minus is signified with (+) and (-) markers combined ±4
    Big character "C" in parentheses ( C ) make a copyright sign (C)
    Registered trade mark sign ® is big character "R" in parentheses ( R )
    Euro sign is small character "e" right after digit: 400 €
    Degree sign is number "0" in parentheses just after number: 5°C
    Superscript is maerked with bracket immediately attached to textsee this
    Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™

    Emacs minor mode

    If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content.

    The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools

    Bullets, lists, and links

    This is ordinary text.

    • This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the comma is on the same line as the bullet.

      This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting.

    • The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf.
    1. This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items.
    2. Clickable email gork@ork.com.
    3. Non-clickable email gork@ork.com.
    4. Clickable link: http://this.com
    5. Non-clickable link: http://this.com.
    6. Clickable file: file:/home/gork/x.txt.

    Line breaking

    Ordinary text with leading dot(.) forces line breaks in the HTML. Here is a line with forced break.
    Here is another line thatcontains dot-code at the beginning.

    Specials

    You can use superscripts1, multiple(2) and almost any(ab) and imaginableIV superscripts


    Samples per column (heading level h1)

    These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, not number 1. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special.

    Column 3, plain text, with color

    Column 4, Next heading level (h2)

    Column 5, plain text, with color

    Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font.

    Column 7, bold, italic

    Column 7, start and end with double quote. Use for inner TOPICS

    Column 8, standard text strong emphasized

    Column 9, font weight bold, not italic.

    Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text.

    Column 11, another color, for questions, exercise texts etc.

          Note: It is possible to say something important at
          column 12, which is normally reserved for CODE.
          You must supply options --css-code-bg and
          --css-code-note=Note:

    Here is the code column 12:

          Note: Here is something important to tell you about this code
          This part of the text in first paragrah is rendered differently,
          because it started with magic word _Note:_ The rest of the
          pararagraphs are rendered as CODE.
    
          /* Column 12 code */
          /* 10pt courier navy */
          // col 12 and beyond stay as is to preserve code formatting.
    
          for( i=0 ; i < 10 ; i++ )
          {
              more();
              whatever();
          }

    Another level 2 heading (column 4)

    Here is more ordinary text.


    Table rendering examples

    These examples make sense only if the options --css-code-bg (use gray background for column 12) and --css-code-note=Note: have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the tableclass can take an attribute of the embedded default styles.

          Note: This is example 1 and `--css-code-note' options
          reads 'First word' in paragraph at column 12 and
          renders it differently. You can attache code right after
          this note, which must occupy only one paragraph
    
          --css-code-note=REGEXP      Regexp matches 'First word'
          --css-code-bg

    Here is example 2 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 3 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 4 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 7 using table control code default grey coloring. The cellpadding must be zeroed, around the text to make room.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Conversion program

    into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program:

    
    
    

    The frame aware html pages are generated by adding the --html-frame option.


    Copyright © 2019 by Jari Aalto. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/
    Document author: Jari Aalto
    Last updated: 2019-05-05 19:03
    perl-text2html-master/doc/examples/t2html-4.txt000066400000000000000000000246441371714776500217750ustar00rootroot00000000000000t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). This is a demonstration text of Perl Text To HTML converter. Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for( i=0 ; i < 10 ; i++ ) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and `--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. perl-text2html-master/doc/examples/t2html-5.html000066400000000000000000000622001371714776500221110ustar00rootroot00000000000000 Page title is embedded inside text file

    Copyright © 1996-2016 Jari Aalto

    License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    This is a demonstration text of Perl Text To HTML converter.

    Headings

    The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5

    You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation).

    The following blue link is generated with markup code: # REF #Markup ;(Markup);

    (Markup)

    Markup

    The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML.

    Text surrounded by = equals = comes out another color
    Text surrounded by backquote/forward quote comes out color `
    Text surrounded by * asterisks * comes out italic text
    Text surrounded by _ underscores _ comes out bold
    The long dash – is signified with two consequent dashes (-)
    The plus-minus is signified with (+) and (-) markers combined ±4
    Big character "C" in parentheses ( C ) make a copyright sign (C)
    Registered trade mark sign ® is big character "R" in parentheses ( R )
    Euro sign is small character "e" right after digit: 400 €
    Degree sign is number "0" in parentheses just after number: 5°C
    Superscript is maerked with bracket immediately attached to textsee this
    Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™

    Emacs minor mode

    If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content.

    The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools

    Bullets, lists, and links

    This is ordinary text.

    • This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the comma is on the same line as the bullet.

      This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting.

    • The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf.
    1. This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items.
    2. Clickable email gork@ork.com.
    3. Non-clickable email gork@ork.com.
    4. Clickable link: http://this.com
    5. Non-clickable link: http://this.com.
    6. Clickable file: file:/home/gork/x.txt.

    Line breaking

    Ordinary text with leading dot(.) forces line breaks in the HTML. Here is a line with forced break.
    Here is another line thatcontains dot-code at the beginning.

    Specials

    You can use superscripts1, multiple(2) and almost any(ab) and imaginableIV superscripts


    Samples per column (heading level h1)

    These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, not number 1. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special.

    Column 3, plain text, with color

    Column 4, Next heading level (h2)

    Column 5, plain text, with color

    Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font.

    Column 7, bold, italic

    Column 7, start and end with double quote. Use for inner TOPICS

    Column 8, standard text strong emphasized

    Column 9, font weight bold, not italic.

    Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text.

    Column 11, another color, for questions, exercise texts etc.

          Note: It is possible to say something important at
          column 12, which is normally reserved for CODE.
          You must supply options --css-code-bg and
          --css-code-note=Note:

    Here is the code column 12:

          Note: Here is something important to tell you about this code
          This part of the text in first paragrah is rendered differently,
          because it started with magic word _Note:_ The rest of the
          pararagraphs are rendered as CODE.
    
          /* Column 12 code */
          /* 10pt courier navy */
          // col 12 and beyond stay as is to preserve code formatting.
    
          for( i=0 ; i < 10 ; i++ )
          {
              more();
              whatever();
          }

    Another level 2 heading (column 4)

    Here is more ordinary text.


    Table rendering examples

    These examples make sense only if the options --css-code-bg (use gray background for column 12) and --css-code-note=Note: have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the tableclass can take an attribute of the embedded default styles.

          Note: This is example 1 and `--css-code-note' options
          reads 'First word' in paragraph at column 12 and
          renders it differently. You can attache code right after
          this note, which must occupy only one paragraph
    
          --css-code-note=REGEXP      Regexp matches 'First word'
          --css-code-bg

    Here is example 2 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 3 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 4 using table control code

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Here is example 7 using table control code default grey coloring. The cellpadding must be zeroed, around the text to make room.

    
          for ( i = 0; i++; i < 10 )
          {
              //  Doing something in this loop
          }

    Conversion program

    into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program:

    
    
    

    The frame aware html pages are generated by adding the --html-frame option.


    Copyright © 2019 by Jari Aalto. This material may be distributed subject to the terms and conditions set forth in the Creative commons Attribution-ShareAlike License. See http://creativecommons.org/
    Document author: Jari Aalto
    Last updated: 2019-05-05 19:03
    perl-text2html-master/doc/examples/t2html-5.txt000066400000000000000000000246441371714776500217760ustar00rootroot00000000000000t2html Test Page #T2HTML-TITLE Page title is embedded inside text file #t2HTML-EMAIL author@examle.com #T2HTML-AUTHOR John Doe #T2HTML-METAKEYWORDS test, html, example #T2HTML-METADESCRIPTION This is test page of program t2html Copyright (C) 1996-2020 Jari Aalto License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL). This is a demonstration text of Perl Text To HTML converter. Headings The tool provides for two heading levels. Combined with bullets and numbered lists, it ought to be enough for most purposes, unless you really like section 1.2.3.4.5 You can insert links to headings or other documents. The convention is interior links are made by joining the first four words of the heading with underscores, so they must be unique. A link to a heading below looks like this in the text document and generates the link shown. There also is syntax for automatically inserting a base URL (see the tool documentation). The following blue link is generated with markup code: # REF #Markup ;(Markup); #REF #Markup ;(Markup); Markup The markup here is mostly based on column position, meaning mostly no tags. The exceptions are special marks for bullets and for emphasis. See later sections for the effects of column position on the output HTML. .Text surrounded by = equals = comes out =another= =color= .Text surrounded by backquote/forward quote comes out `color' ` .Text surrounded by * asterisks * comes out *italic* *text* .Text surrounded by _ underscores _ comes out _bold_ .The long dash -- is signified with two consequent dashes (-) .The plus-minus is signified with (+) and (-) markers combined +-4 .Big character "C" in parentheses ( C ) make a copyright sign (C) .Registered trade mark sign (R) is big character "R" in parentheses ( R ) .Euro sign is small character "e" right after digit: 400e .Degree sign is number "0" in parentheses just after number: 5(0)C .Superscript is maerked with bracket immediately attached to text[see this] .Special HTML entities can embedded in a normal way, like: × < > ≤ ≥ ≠ √ − α β γ ƒ ÷ « » - – — ≈ ≡ ‹ › ∑ ∞ ™ Emacs minor mode If you use the advertised Emacs minor mode (tinytf.el) you can easily renumber headings as you revise the text. Test is also colorized as you edit content. The editing mode can automatically generate the table of contents and the HTML generator can use it to generate a two frame output with the TOC in the left frame as hotlinks to the sections and subsections. Visit http://freecode.com/projects/emacs-tiny-tools Bullets, lists, and links This is ordinary text. o This is a bullet paragraph with a continuation mark (leading comma) in the last line. It will not work if the ,comma is on the same line as the bullet. This is a continued bullet paragraph. You use a leading comma in the last line of the previous block to make a continued item. This is ok except the paragraph fill code (for the previous paragraph) cannot deal with it. Maybe it is a hint not to do continued bullets, or a hint not to put the comma in until you are done formatting. o The next bullet. the sldjf sldjf sldkjf slkdjf sldkjf lsdkjf slkdjf sldkjf sldkjf lskdj flskdjf lskdjf lsdkjf. . This is a numbered list, made with a '.' in column 8 of its first line and text in column 12. You may not have blank lines between the items. . Clickable email . . Non-clickable email gork@ork.com. . Clickable link: http://this.com . Non-clickable link: -http://this.com. . Clickable file: file:/home/gork/x.txt. Line breaking Ordinary text with leading dot(.) forces line breaks in the HTML. .Here is a line with forced break. .Here is another line thatcontains dot-code at the beginning. Specials You can use superscripts[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscripts] Samples per column (heading level h1) These samples show the range of effects produced by writing text beginning in different columns. The column numbers referred to are columns in the source text, not (obviously) the output. The column numbering is counted starting from 0, _not_ _number_ _1_. Column 1, is undefined and nothing special. Column 2, is undefined and nothing special. Column 3, plain text, with color Column 4, Next heading level (h2) Column 5, plain text, with color Column 6, This i used for long quotations. The text uses Georgia font, which is designed for web, but which is equally good for laser printing font. Column 7, bold, italic "Column 7, start and end with double quote. Use for inner TOPICS" Column 8, standard text _strong_ *emphasized* Column 9, font weight bold, not italic. Column 10, quotation text, italic serif. This text has been made a little smaller and condensed than the rest of the text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. More quotation text. Column 11, another color, for questions, exercise texts etc. Note: It is possible to say something important at column 12, which is normally reserved for CODE. You must supply options --css-code-bg and --css-code-note=Note: Here is the code column 12: Note: Here is something important to tell you about this code This part of the text in first paragrah is rendered differently, because it started with magic word _Note:_ The rest of the pararagraphs are rendered as CODE. /* Column 12 code */ /* 10pt courier navy */ // col 12 and beyond stay as is to preserve code formatting. for( i=0 ; i < 10 ; i++ ) { more(); whatever(); } Another level 2 heading (column 4) Here is more ordinary text. Table rendering examples These examples make sense only if the options *--css-code-bg* (use gray background for column 12) and *--css-code-note=Note:* have been turned on. If orfer to take full advantage of all the possibilities, you should introduce yourself to the HTML 4.01 specification and peek the CSS code in the generated HTML: the *tableclass* can take an attribute of the embedded default styles. Note: This is example 1 and `--css-code-note' options reads 'First word' in paragraph at column 12 and renders it differently. You can attache code right after this note, which must occupy only one paragraph --css-code-note=REGEXP Regexp matches 'First word' --css-code-bg Here is example 2 using table control code #t2html::tableborder:1 #t2html::tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 using table control code #t2html::td:bgcolor=#FFEEFF:tableclass:solid #t2html::td:bgcolor=#FFEEFF:tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 using table control code #t2html::td:bgcolor=#CCFFCC #t2html::td:bgcolor=#CCFFCC for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 5 using table control code. Due to bug in Opera 7-9.x, this exmaple may now show correctly. Please use Firefox to see the effect. #t2html::td:bgcolor=#FFFFFF:tableclass:dashed #t2html::td:bgcolor=#FFFFFF:tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 6 using multiple table control codes. Use underscore sccharacter to separate different table attributes from each other. The underscore will be vconverted into SPACE. The double quotes around the VALUE are not strictly required by HTML standard, but they are expected in XML. #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" #t2html::td:bgcolor="#EAEAEA":table:border=1_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 7 using table control code #t2html::td:class=color-navy:table:cellpadding=0 which cancels default grey coloring. The cellpadding must be zeroed, around the text to make room. #t2html::td:class=color-white:table:cellpadding=0 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Conversion program The perl program t2html turns the raw technical text format into HTML. Among other things it can produce HTML files with an index frame, a main frame, and a master that ties the two together. It has features too numerous to list to control the output. For details see the perldoc than is embeddedinside the program: perl -S t2html --help | more The frame aware html pages are generated by adding the *--html-frame* option. perl-text2html-master/doc/index.html000066400000000000000000000212501371714776500200260ustar00rootroot00000000000000 Perl Text to HTML Conversion Project


    Perl Text to HTML Project

    Links and download

    Note: that the most up to date version is in version control. See development page instructions how get latest source code. Other pages

    Description

    The t2html is a spicy little tool that can help turning simple text files into nice looking documents. In fact many of the pages in this site has been been built using that little tool. Please see conversion page link above to learn more about the utility. After you have installed the program, invoke program with --test-page option to see some examples. follow the links to see the original page and how it was rendered according to passed options. Note: you can write any CSS file to change the layout of the HTML. These only show you some defaults that the program uses. One demonstration of custom CSS can be seen at Procmail Documentation Project .

    
        Run cmd       : t2html --css-font-normal t2html-1.txt > t2html-1.html
        Original text : t2html-1.txt
        Generated html: t2html-1.html
    
        Run cmd       : t2html --css-font-readable t2html-2.txt > t2html-2.html
        Original text : t2html-2.txt
        Generated html: t2html-2.html
    
        Run cmd       : t2html --html-frame --css-font-normal t2html-3.txt
        Original text : t2html-3.txt
        Generated html: t2html-3.html
    
    
    Here are options --css-code-* which affect the CODE section layout by adding the text into a TABLE and then rendering the table background with colors.
    
        Run cmd       : t2html.pl --css-code-bg --css-code-note="(?:Notice|Note):" t2html-4.txt > t2html-4.html
        Original text : t2html-4.txt
        Generated html: t2html-4.html
    
    

    Contact

    There is no mailing list for the project. See Development page how to contact maintainer and submit feature requests and bug reports.


    GNU GPL All files in this project are licensed under GNU GPL. Savannah Logo This project, as well as many other projects is hosted by Savannah.
    Savannah Logo W3C CSS logo W3 validated.
    perl-text2html-master/doc/license/000077500000000000000000000000001371714776500174535ustar00rootroot00000000000000perl-text2html-master/doc/license/COPYING.GNU-FDL000066400000000000000000000476631371714776500215610ustar00rootroot00000000000000 GNU Free Documentation License Version 1.2, November 2002 Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. 0. PREAMBLE The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. 1. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law. A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text. A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition. The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License. 2. VERBATIM COPYING You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. 3. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. 4. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement. C. State on the Title page the name of the publisher of the Modified Version, as the publisher. D. Preserve all the copyright notices of the Document. E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice. H. Include an unaltered copy of this License. I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. O. Preserve any Warranty Disclaimers. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles. You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. 5. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements". 6. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 7. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 8. TRANSLATION Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail. If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. 9. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 10. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/. Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. ADDENDUM: How to use this License for your documents To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this: with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software. perl-text2html-master/doc/license/COPYING.GNU-GPL000066400000000000000000000431031371714776500215570ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. perl-text2html-master/doc/license/LICENSE.txt000066400000000000000000000045661371714776500213110ustar00rootroot00000000000000Licensing information Copyright (C) 1996-2020 Jari Aalto This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with program; see file COPYING.GNU-GPL. If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Visit for more information This program runs solely with Free Software. It does not rely on any component of non-Free Software. - - Exception: The POD section of file t2html.pl is is DUAL LICENCED and may be distributed under the terms of GNU General Public License (GNU GPL) --see above--; *or*, at your option, distributed under the terms of GNU Free Documentation License (GNU FDL). The end user can continue to distribute the documentation in this dual licence form *or* select the other license (GNU GPL, GNU FDL) and remove the unwanted one. In case of removal, a notice "License removed by " must be mentioned somewhere in the distributed files; that notice is allowed to be outside of the original files. Copyright (C) 1996-2020 Jari Aalto Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License" in file COPYING.GNU-FDL. Visit for more information. End perl-text2html-master/doc/manual/000077500000000000000000000000001371714776500173065ustar00rootroot00000000000000perl-text2html-master/doc/manual/index.html000066400000000000000000001721101371714776500213050ustar00rootroot00000000000000

    NAME

    t2html - Simple text to HTML converter. Relies on text indentation rules.

    SYNOPSIS

        t2html [options] file.txt > file.html

    DESCRIPTION

    Convert pure text files into nice looking, possibly framed, HTML pages. An example of conversion:

      1. Plain text source code
      http://pm-doc.git.sourceforge.net/git/gitweb.cgi?p=pm-doc/pm-doc;a=blob_plain;f=doc/index.txt;hb=HEAD
    
      2. reusult of conversion with custom --css-file option:
      http://pm-doc.sourceforge.net/pm-tips.html
      http://pm-doc.sourceforge.net/pm-tips.css
    
      3. An Emacs mode tinytf.el for writing the text files (optional)
      https://github.com/jaalto/project--emacs-tiny-tools

    Requirements for the input ascii files

    The file must be written in Technical Format, whose layout is described in the this manual. Basically the idea is simple and there are only two heading levels: one at column 0 and the other at column 4 (halfway between the tab width). Standard text starts at column 8 (the position after pressed tab-key).

    The idea of technical format is that each column represents different rendering layout in the generated HTML. There is no special markup needed in the text file, so you can use the text version as a master copy of a FAQ etc. Bullets, numbered lists, word emphasis and quotation etc. can expressed in natural way.

    HTML description

    The generated HTML includes embedded Cascading Style Sheet 2 (CSS2) and a small piece of Java code. The CSS2 is used to colorize the page loyout and to define suitable printing font sizes. The generated HTML also takes an approach to support XHTML. See page http://www.w3.org/TR/xhtml1/#guidelines where the backward compatibility recommendations are outlined:

        Legal HTML          XHTML requires
        <P>                 <p> ..</p>
        <BR>                <br></br>
        <HR>                <hr></hr>

    XHTML does not support fragment identifiers #foo, with the name element, but uses id instead. For backward compatibility both elements are defined:

        < ..name="tag">     Is now <.. name="tag" id="tag">

    NOTE: This program was never designed to be used for XHTML and the strict XHTML validity is not to be expected.

    Motivation

    The easiest format to write large documents, like FAQs, is text. A text file offers WysiWyg editing and it can be turned easily into HTML format. Text files are easily maintained and there is no requirements for special text editors. Any text editor like notepad, vi, Emacs can be used to maintain the documents.

    Text files are also the only sensible format if documents are kept under version control like RCS, CVS, SVN, Arch, Perforce, ClearCase. They can be asily compared with diff and patches can be easily received and sent to them.

    To help maintining large documents, there is also available an Emacs minor mode, package called tinytf.el, which offers text fontification with colors, Indentation control, bullet filling, heading renumbering, word markup, syntax highlighting etc. See https://github.com/jaalto/project--emacs-tiny-tools

    OPTIONS

    --as-is

    Any extra HTML formatting or text manipulation is suppressed. Text is preserved as it appears in file. Use this option if you plan to deliver or and print the text as seen.

        o  If file contains "Table of Contents" it is not removed
        o  Table of Content block is not created (it usually would)
    --author -a STR

    Author of document e.g. --author "John Doe"

    --disclaimer-file FILE

    The text that appears at the footer is read from this file. If not given the default copyright text is added. Options --quiet and --simple suppress disclaimers.

    --document FILE

    Name of the document or filename. You could list all alternative URLs to the document with this option.

    --email -e EMAIL

    The contact address of the author of the document. Must be pure email address with no "<" and ">" characters included. Eg. --email foo@example.com

        --email "<me@here.com>"     WRONG
        --email "me@here.com"       right
    --simple -s

    Print minimum footer only: contact, email and date. Use --quiet to completely discard footer.

    --t2html-tags

    Allow processing embedded #T2HTML-<tag> directives inside file. See full explanation by reading topic EMBEDDED DIRECTIVES INSIDE TEXT. By default, you do not need to to supply this option - it is "on" by default.

    To disregard embedded directives in text file, supply "no" option: --not2html-tags.

    --title STR -t STR

    The title text that appears in top frame of browser.

    --url URL

    Location of the HTML file. When --document gave the name, this gives the location. This information is printed at the Footer.

    Html: Navigation urls

    --base URL

    URL location of the HTML file in the destination site where it will be put available. This option is needed only if the document is hosted on a FTP server (rare, but possible). A FTP server based document cannot use Table Of Contents links (fragment #tag identifiers) unless HTML tag BASE is also defined.

    The argument can be full URL to the document:

        --base ftp://ftp.example.com/file.html
        --base ftp://ftp.example.com/
    --button-heading-top

    Add additional [toc] navigation button to the end of each heading. This may be useful in long non-framed HTML files.

    --button-top URL

    Buttons are placed at the top of document in order: [previous][top][next] and --button-* options define the URLs.

    If URL is string none then no button is inserted. This may be handy if the buttons are defined by a separate program. And example using Perl:

        #!/usr/bin/perl
    
        my $top   = "index.html";             # set defaults
        my $prev  = "none";
        my $next  = "none";
    
        # ... somewhere $prev or $next may get set, or then not
    
        qx(t2html --button-top "$top" --button-prev "$prev" --button-next "$next" ...);
    
        # End of sample program
    --button-prev URL

    URL to go to previous document or string none.

    --button-next URL

    URL to go to next document or string none.

    --reference tag=value

    You can add any custom references (tags) inside text and get them expand to any value. This option can be given multiple times and every occurrence of TAG is replaced with VALUE. E.g. when given following options:

        --reference "#HOME-URL=http://www.example.com/dir"
        --reference "#ARCHIVE-URL=http://www.example.com/dir/dir2"

    When referenced in text, the generated HTML includes expanded expanded to values. An example text:

            The homepage is #HOME-URL/page.html and the mirrot page it at
            #ARCHIVE-URL/page.html where you can find the latest version.
    -R, --reference-separator STRING

    See above. String that is used to split the TAG and VALUE. Default is equal sign "=".

    -T, --toc-url-print

    Display URLs (constructed from headings) that build up the Table of Contents (NAME AHREF tags) in a document. The list is outputted to stderr, so that it can be separated:

        % t2html --toc-url-print tmp.txt > file.html 2> toc-list.txt

    Where would you need this? If you want to know the fragment identifies for your file, you need the list of names.

      http://www.example.com/myfile.html#fragment-identifier

    Html: Controlling CSS generation (HTML tables)

    --css-code-bg

    This option affects how the code section (column 12) is rendered. Normally the section is surrounded with a <pre>..</pre> codes, but with this options, something more fancier is used. The code is wrapped inside a <table>...</table> and the background color is set to a shade of gray.

    --css-code-note "REGEXP"

    Option --css-code-bg is required to activate this option. A special word defined using regexp (default is 'Note:') will mark code sections specially. The first word is matched against the supplied Perl regexp.

    The supplied regexp must not, repeat, must not, include any matching group operators. This simply means, that grouping parenthesis like (one|two|three) are not allowed. You must use the Perl non-grouping ones like (?:one|two|three). Please refer to perl manual page [perlre] if this short introduction did not give enough rope.

    With this options, instead of rendering column 12 text with <pre>..</pre>, the text appears just like regular text, but with a twist. The background color of the text has been changed to darker grey to visually stand out form the text.

    An example will clarify. Suppose that you passed options --css-code-bg and --css-code-note='(?:Notice|Note):', which instructed to treat the first paragraphs at column 12 differently. Like this:

        This is the regular text that appears somewhere at column 8.
        It may contain several lines of text in this paragraph.
    
            Notice: Here is the special section, at column 12,
            and the first word in this paragraph is 'Notice:'.
            Only that makes this paragraph at column 12 special.
    
        Now, we have some code to show to the user:
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }

    One note, text written with initial special word, like Notice:, must all fit in one full pragraph. Any other paragraphs that follow, are rendered as code sections. Like here:

        This is the regular text that appears somewhere
        It may contain several lines of text in this paragraph
    
            Notice: Here is the special section, at column 12,
            and the first word in this paragraph is 'Notice:'
            which makes it special
    
            Hoewver, this paragraph IS NOT rendered specially
            any more. Only the first paragraph above.
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }

    As if this were not enough, there are some special table control directives that let you control the <table>..</table> which is put around the code section at column 12. Here are few examples:

        Here is example 1
    
            #t2html::td:bgcolor=#F7F7DE
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }
    
        Here is example 2
    
            #t2html::td:bgcolor=#F7F7DE:tableborder:1
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }
    
        Here is example 3
    
            #t2html::td:bgcolor="#FFFFFF":tableclass:dashed
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }
    
        Here is example 4
    
            #t2html::td:bgcolor="#FFFFFF":table:border=1_width=94%_border=0_cellpadding="10"_cellspacing="0"
    
            for ( i = 0; i++; i < 10 )
            {
                //  Doing something in this loop
            }

    Looks cryptic? Cannot help that and in order for you to completely understand what these directives do, you need to undertand what elements can be added to the <table> and <td> tokens. Refer to HTML specification for available attributes. Here is briefing what you can do:

    The start command is:

        #t2html::
                |
                After this comes attribute pairs in form key:value
                and multiple ones as key1:value1:key2:value2 ...

    The key:value pairs can be:

        td:ATTRIBUTES
           |
           This is converted into <td attributes>
    
        table:ATTRIBUTES
              |
              This is converted into <table attributes>

    There can be no spaces in the ATTRIBUTES, because the First-word must be one contiguous word. An underscore can be used in place of space:

        table:border=1_width=94%
              |
              Interpreted as <table border="1" width="94%">

    It is also possible to change the default CLASS style with word tableclass. In order the CLASS to be useful, its CSS definitions must be either in the default configuration or supplied from a external file. See option --script-file.

        tableclass:name
                   |
                   Interpreted as <table class="name">

    For example, there are couple of default styles that can be used:

        1) Here is CLASS "dashed" example
    
            #t2html::tableclass:dashed
    
                for ( i = 0; i++; i < 10 )
                {
                    //  Doing something in this loop
                }
    
        2) Here is CLASS "solid" example:
    
            #t2html::tableclass:solid
    
                for ( i = 0; i++; i < 10 )
                {
                    //  Doing something in this loop
                }

    You can change any individual value of the default table definition which is:

        <table  class="shade-note">

    To change e.g. only value cellpadding, you would say:

         #t2html::table:tablecellpadding:2

    If you are unsure what all of these were about, simply run program with --test-page and look at the source and generated HTML files. That should offer more rope to experiment with.

    --css-file FILE

    Include <LINK ...> which refers to external CSS style definition source. This option is ignored if --script-file option has been given, because that option imports whole content inside HEAD tag. This option can appear multiple times and the external CSS files are added in listed order.

    --css-font-type CSS-DEFINITION

    Set the BODY element's font definition to CSS-DEFINITION. The default value used is the regular typeset used in newspapers and books:

        --css-font-type='font-family: "Times New Roman", serif;'
    --css-font-size CSS-DEFINITION

    Set the body element's font size to CSS-DEFINITION. The default font size is expressed in points:

        --css-font-size="font-size: 12pt;"

    Html: Controlling the body of document

    --delete REGEXP

    Delete lines matching perl REGEXP. This is useful if you use some document tool that uses navigation tags in the text file that you do not want to show up in generated HTML.

    --delete-email-headers

    Delete email headers at the beginning of file, until first empty line that starts the body. If you keep your document ready for Usenet news posting, they may contain headers and body:

        From: ...
        Newsgroups: ...
        X-Sender-Info:
        Summary:
    
        BODY-OF-TEXT
    --nodelete-default

    Use this option to suppress default text deletion (which is on).

    Emacs folding.el package and vi can be used with any text or programming language to place sections of text between tags {{{ and }}}. You can open or close such folds. This allows keeping big documents in order and manageable quite easily. For Emacs support, see. ftp://ftp.csd.uu.se/pub/users/andersl/beta/

    The default value deletes these markers and special comments #_comment which make it possible to cinlude your own notes which are not included in the generated output.

      {{{ Security section
    
      #_comment Make sure you revise this section to
      #_comment the next release
    
      The seecurity is an important issue in everyday administration...
      More text ...
    
      }}}
    --html-body STR

    Additional attributes to add to HTML tag <BODY>. You could e.g. define language of the text with --html-body LANG=en which would generate HTML tag <BODY LANG="en"> See section "SEE ALSO" for ISO 639.

    --html-column-beg="SPEC HTML-SPEC"

    The default interpretation of columns 1,2,3 5,6,7,8,9,10,11,12 can be changed with beg and end swithes. Columns 0,4 can't be changed because they are reserved for headings. Here are some samples:

        --html-column-beg="7quote <em class='quote7'>"
        --html-column-end="7quote </em>"
    
        --html-column-beg="10    <pre> class='column10'"
        --html-column-end="10    </pre>"
    
        --html-column-beg="quote <span class='word'>"
        --html-column-end="quote </span>"

    Note: You can only give specifications up till column 12. If text is beyound column 12, it is interpreted like it were at column 12.

    In addition to column number, the SPEC can also be one of the following strings

        Spec    equivalent word markup
        ------------------------------
        quote   `'
        bold    _
        emp     *
        small   +
        big     =
        ref     []   like: [Michael] referred to [rfc822]
    
        Other available Specs
        ------------------------------
        7quote      When column 7 starts with double quote.

    For style sheet values for each color, refer to class attribute and use --script-file option to import definitions. Usually /usr/lib/X11/rgb.txt lists possible color values and the HTML standard at http://www.w3.org/ defines following standard named colors:

        Black       #000000  Maroon  #800000
        Green       #008000  Navy    #000080
        Silver      #C0C0C0  Red     #FF0000
        Lime        #00FF00  Blue    #0000FF
        Gray        #808080  Purple  #800080
        Olive       #808000  Teal    #008080
        White       #FFFFFF  Fuchsia #FF00FF
        Yellow      #FFFF00  Aqua    #00FFFF
    --html-column-end="COL HTML-SPEC"

    See --html-column-beg

    --html-font SIZE

    Define FONT SIZE. It might be useful to set bigger font size for presentations.

    -F, --html-frame [FRAME-PARAMS]

    If given, then three separate HTML files are generated. The left frame will contain TOC and right frame contains rest of the text. The FRAME-PARAMS can be any valid parameters for HTML tag FRAMESET. The default is cols="25%,75%".

    Using this implies --out option automatically, because three files cannot be printed to stdout.

        file.html
    
        --> file.html       The Frame file, point browser here
            file-toc.html   Left frame (navigation)
            file-body.html  Right frame (content)
    --language ID

    Use language ID, a two character ISO identifier like "en" for English during the generation of HTML. This only affects the text that is shown to end-user, like text "Table Of contents". The default setting is "en". See section "SEE ALSO" for standards ISO 639 and ISO 3166 for proper codes.

    The selected language changes propgram's internal arrays in two ways: 1) Instead of default "Table of ocntents" heading the national langaugage equivalent will be used 2) The text "Pic" below embedded sequentially numbered pictures will use natinal equivalent.

    If your languagae is not supported, please send the phrase for "Table of contents" and word "Pic" in your language to the maintainer.

    --script-file FILE

    Include java code that must be complete <script...></script> from FILE. The code is put inside <head> of each HTML.

    The --script-file is a general way to import anything into the HEAD element. Eg. If you want to keep separate style definitions for all, you could only import a pointer to a style sheet. See 14.3.2 Specifying external style sheets in HTML 4.0 standard.

    --meta-keywords STR

    Meta keywords. Used by search engines. Separate kwywords like "AA, BB, CC" with commas. Refer to HTML 4.01 specification and topic "7.4.4 Meta data" and see http://www.htmlhelp.com/reference/wilbur/ and

        --meta-keywords "AA,BB,CC"
    --meta-description STR

    Meta description. Include description string, max 1000 characters. This is used by search engines. Refer to HTML 4.01 specification and topic "7.4.4 Meta data"

    --name-uniq

    First 1-4 words from the heading are used for the HTML name tags. However, it is possible that two same headings start with exactly the same 1-4 words. In those cases you have to turn on this option. It will use counter 00 - 999 instead of words from headings to construct HTML name references.

    Please use this option only in emergencies, because referring to jump block name via

        httpI://example.com/doc.html#header_name

    is more convenient than using obscure reference

        httpI://example.com/doc.html#11

    In addition, each time you add a new heading the number changes, whereas the symbolic name picked from heading stays as long as you do not change the heading. Think about welfare of your netizens who bookmark you pages. Try to make headings to not have same subjects and you do not need this option.

    Document maintenance and batch job commands

    -A, --auto-detect

    Convert file only if tag #T2HTML- is found from file. This option is handy if you run a batch command to convert all files to HTML, but only if they look like HTML base files:

        find . -name "*.txt" -type f \
             -exec t2html --auto-detect --verbose --out {} \;

    The command searches all *.txt files under current directory and feeds them to conversion program. The --auto-detect only converts files which include #T2HTML- directives. Other text files are not converted.

    Check all http and ftp links. This option is supposed to be run standalone Option --quiet has special meaning when used with link check.

    With this option you can regularly validate your document and remove dead links or update moved links. Problematic links are outputted to stderr. This link check feature is available only if you have the LWP web library installed. Program will check if you have it at runtime.

    Links that are big, e.g. which match tar.gz .zip ... or that run programs (links with ? character) are ignored because the GET request used in checking would return whole content of the link and it would. be too expensive.

    A suggestion: When you put binary links to your documents, add them with space:

        http://example.com/dir/dir/ filename.tar.gz

    Then the program does check the http addresses. Users may not be able to get the file at one click, checker can validate at least the directory. If you are not the owner of the link, it is also possible that the file has moved of new version name has appeared.

    Print condensed output in grep -n like manner FILE:LINE:MESSAGE

    This option concatenates the url response text to single line, so that you can view the messages in one line. You can use programming tools (like Emacs M-x compile) that can parse standard grep syntax to jump to locations in your document to correct the links later.

    -o, --out

    write generated HTML to file that is derived from the input filename.

        --out --print /dir/file            --> /dir/file.html
        --out --print /dir/file.txt        --> /dir/file.html
        --out --print /dir/file.this.txt   --> /dir/file.this.html

    When links are checked periodically, it would be quite a rigorous to check every link every time that has already succeeded. In order to save link checking time, the "ok" links can be cached into separate file. Next time you check the links, the cache is opened and only links found that were not in the cache are checked. This should dramatically improve long searches. Consider this example, where every text file is checked recursively.

        $ t2html --link-check-single \
          --quiet --link-cache ~tmp/link.cache \
          `find . -name "*.txt" -type f`
    -O, --out-dir DIR

    Like --out, but chop the directory part and write output files to DIR. The following would generate the HTML file to current directory:

        --out-dir .

    If you have automated tool that fills in the directory, you can use word none to ignore this option. The following is a no-op, it will not generate output to directory "none":

        --out-dir none
    -p, --print

    Print filename to stdout after HTML processing. Normally program prints no file names, only the generated HTML.

        % t2html --out --print page.txt
    
        --> page.html
    -P, --print-url

    Print filename in URL format. This is useful if you want to check the layout immediately with your browser.

        % t2html --out --print-url page.txt | xargs lynx
    
        --> file: /users/foo/txt/page.html
    --split REGEXP

    Split document into smaller pieces when REGEXP matches. Split commands are standalone, meaning, that it starts and quits. No HTML conversion for the file is engaged.

    If REGEXP is found from the line, it is a start point of a split. E.g. to split according to toplevel headings, which have no numbering, you would use:

        --split '^[A-Z]'

    A sequential numbers, 3 digits, are added to the generated partials:

        filename.txt-NNN

    The split feature is handy if you want to generate slides from each heading: First split the document, then convert each part to HTML and finally print each part (page) separately to printer.

    -S1, --split1

    This is shorthand of --split command. Define regexp to split on toplevel heading.

    -S2, --split2

    This is shorthand of --split command. Define regexp to split on second level heading.

    -SN, --split-named-files

    Additional directive for split commands. If you split e.g. by headings using --split1, it would be more informative to generate filenames according to first few words from the heading name. Suppose the heading names where split occur were:

        Program guidelines
        Conclusion

    Then the generated partial filenames would be as follows.

        FILENAME-program_guidelines
        FILENAME-conclusion
    -X, --xhtml

    Render using strict XHTML. This means using <hr/>, <br/> and paragraphs use <p>..</p>.

    Note: this option is experimental. See BUGS

    Miscellaneous options

    --debug LEVEL

    Turn on debug with positive LEVEL number. Zero means no debug.

    --help -h

    Print help screen. Terminates program.

    --help-css

    Print default CSS used. Terminates program. You can copy and modify this output and instruct to use your own with --css-file=FILE. You can also embed the option to files with #T2HTML-OPTION directive.

    --help-html

    Print help in HTML format. Terminates program.

    --help-man

    Print help page in Unix manual page format. You want to feed this output to nroff -man in order to read it. Terminates program.

    --test-page

    Print the test page: HTML and example text file that demonstrates the capabilities.

    --time

    Print to stderr time spent used for handling the file.

    -v, --verbose [LEVEL]

    Print verbose messages.

    -q, --quiet

    Print no footer at all. This option has different meaning if --link-check option is turned on: print only erroneous links.

    V, --version

    Print program version information.

    FORMAT DESCRIPTION

    Program converts text files to HTML. The basic idea is to rely on indentation level, and the layout used is called 'Technical format' (TF) where only minimal conventions are used to mark italic, bold etc. text. The Basic principles can be demonstrated below. Notice the column poisiton ruler at the top:

     --//-- description start
    
     123456789 123456789 123456789 123456789 123456789 column numbers
    
     Heading 1 starts with a big letter at leftmost column 1
    
      The column positions 1,2,3 are currently undefined and may not
      format correctly. Do not place text at columns 1,2 or 3.
    
         Heading level 2 starts at half-tab column 4 with a big letter
    
          Normal but colored text at columns 5
    
           Normal but colored text at columns 6
    
            Heading 3 can be considered at position TAB minus 1, column 7.
    
            "Special <em> text at column 7 starts with double quote"
    
             Standard text starts at column 8, you can *emphatize* text or
             make it _strong_ and write =SmallText= or +BigText+ show
             variable name `ThisIsAlsoVariable'. You can `_*nest*_' `the'
             markup. more txt in this paragraph txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
             txt txt
    
              Strong text at column 9
    
               Column 10 is reserved for quotations
               Column 10 is reserved for quotations
               Column 10 is reserved for quotations
               Column 10 is reserved for quotations
    
              Strong text at column 11
    
               Column 12 and further is reserved for code examples
               Column 12 and further is reserved for code examples
               All text here are surrounded by <pre> HTML codes
               This CODE column in affected by the --css-code* options.
    
         Heading 2 at column 4 again
    
            If you want something like Heading level 3, use column 7 (bold)
    
             Column 8. Standard tab position. txt txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt txt txt txt
             [1998-09-10 Mr. Foo said]:
    
               cited text at column 10. cited text cited text cited text
               cited text cited text cited text cited text cited text
               cited text cited text cited text cited text cited text
               cited text
    
    
             *   Bullet at column 8. Notice 3 spaces after (*), so
                 text starts at half-tab forward at column 12.
             *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
             *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
                 ,txt txt txt txt
    
                 Notice that previous paragraph ends to P-comma
                 code, it tells this paragraph to continue in
                 bullet mode, otherwise this text at column 12
                 would be interpreted as code section surrounded
                 by <pre> HTML codes.
    
    
             .   This is ordered list.
             .   This is ordered list.
             .   This is ordered list.
    
    
             .This line starts with dot and is displayed in line by itself.
             .This line starts with dot and is displayed in line by itself.
    
             !! This adds an <hr> HTML code, text in line is marked with
             !! <strong> <em>
    
             Make this email address clickable <account@tt.com> Do not
             make this email address clickable bar@example.com, because it
             is only an example and not a real address. Notice that the
             last one was not surrounded by <>. Common login names like
             foo, bar, quux, or internet site 'example' are ignored
             automatically.
    
             Also do not make < this@example.com> because there is extra
             white space. This may be more convenient way to disable email
             addresses temporarily.
    
     Heading1 again at column 0
    
         Subheading at column 4
    
             And regular text, column 8 txt txt txt txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
             txt txt txt txt txt txt txt txt txt txt txt
    
     --//-- description end

    That is it, there is the whole layout described. More formally the rules of text formatting are secribed below.

    USED HEADINGS

    • There are only two heading levels in this style. Heading columns are 0 and 4 and the heading must start with big letter ("Heading") or number ("1.0 Heading")

    • At column 4, if the text starts with small letter, that line is interpreted as <strong>

    • A HTML <hr> mark is added just before printing heading at level 1.

    • The headings are gathered, the TOC is built and inserted to the beginning of HTML page. The HTML <name> references used in TOC are the first 4 sequential words from the headings. Make sure your headings are uniquely named, otherwise there will be same NAME references in the generated HTML. Spaces are converted into underscore when joining the words. If you can not write unique headings by four words, then you must use --name-uniq switch

    TEXT PLACEMENT RULES

    General

    The basic rules for positioning text in certain columns:

    • Text at column 1 is undefined if it does not start with big letter or number to indicate Heading level 1.

    • Text between columns 2 and 3 is marked with <em>

    • Column 4 is reserved for heading level 2

    • Text between columns 5-7 is marked with <strong>

    • Text at column 7 is <em> if the first character is double quote.

    • Column 10 is reserved for <em> text. If you want to quote someone or to add reference text, place the text in this column.

    • Text at columns 9 and 11 are marked with <strong>

    Column 8 for text and special codes

    • Column 8 is reserved for normal text

    • At the start of text, at column 8, there can be DOT-code or COMMA-code.

    Column 12 is special

    • Column 12 is treated specially: block is started with <pre> and lines are marked as <samp></samp>. When the last text at column 12 is found, the block is closed with </pre>. An example:

          txt txt txt         ;evenly placed block, fine, do it like this
          txt txt
      
          txt txt txt txt     ;Can not terminate the /pre, because last
          txt txt txt txt     ;column is not at 12
              txt txt txt txt
      
          txt txt txt txt
          txt txt txt txt
              txt txt txt txt
          ;; Finalizing comment, now the text is evenly placed

    Additional tokens for use at column 8

    • If there is .(dot) at the beginning of a line and immediately non-whitespace, then <br> code is added to the end of line.

          .This line will have a <BR> HTML tag at the end.
          While these two line are joined together
          by the browser, depending on the frame width.
    • If there is ,(comma) then the <p> code is not inserted if the previous line is empty. If you use both .(dot) and ,(comma), they must be in order dot-comma. The ,(comma) works differently if it is used in bullet

      A <p> is always added if there is separation of paragraphs, but when you are writing a bullet, there is a problem, because a bullet exist only as long as text is kept together

          *   This is a bullet and it has all text ketp together
              even if there is another line in the bullet.

      But to write bullets tat spread multiple paragraphs, you must instruct that those are to kept together and the text in next paragraph is not <sample> while it is placed at column 12

          *   This is a bullet and it has all text ketp together
              ,even if there is another line in the bullet.
      
              This is new paragrah to the previous bullet and this is
              not a text sample. See continued COMMA-code above.
      
          *   This is new bullet
      
              // and this is code sample after bullet
              if ( $flag ) { ..do something.. }

    Special text markings

    italic, bold, code, small, big tokens
        _this_      is interpreted as <strong class='word'>this</strong>
        *this*      is interpreted as <em class='word'>this</em>
        `this'      is interpreted as <sample class='word'>this</sample> `

    Exra modifiers that can be mixed with the above. Usually if you want bigger font, CAPITALIZE THE WORDS.

        =this=      is interpreted as <span class="word-small">this</span>
        +this+      is interpreted as <span class="word-big">this</span>
        [this]      is interpreted as <span class="word-ref">this</span>
    superscripting
        word[this]  is interpreted as superscript. You can use like
                    this[1], multiple[(2)] and almost any[(ab)] and
                    imaginable[IV superscritps] as long as the left
                    bracket is attached to the word.
    subscripting
        12[[10]]    is representation of value 12 un base 10.
                    This is interpreted as subscript. You can use like
                    this[[1]], multiple[[(2)]] and almost any[[(ab)]] and
                    imaginable[[IV superscritps]] as long as *two* left
                    brackets are attached to the word.
    embedding standard HTML tokens

    Stanadard special HTML entities can be added inside text in a normal way, either using sybolic names or the hash code. Here are exmples:

        &times; &lt; &gt; &le; &ge; &ne; &radic; &minus;
        &alpha; &beta; &gamma; &divide;
        &laquo; &raquo; &lsaquo; &rsaquo; - &ndash; &mdash;
        &asymp; &equiv; &sum; &fnof; &infin;
        &deg; &plusmn;
        &trade; &copy; &reg;
        &euro; &pound; &yen;
    embedding PURE HTML into text

    This feature is highly experimental. It is possible to embed pure HTML inside text in occasions, where e.g. some special formatting is needed. The idea is simple: you write HTML as usual but double every '<' and '>' characters, like:

        <<p>>

    The other rule is that all PURE HTML must be kept together. There must be no line breaks between pure HTML lines. This is incorrect:

        <<table>>
    
            <<tr>>one
            <<tr>>two
    
        <</table>>

    The pure HTML must be written without extra newlines:

        <<table>>
            <<tr>>one
            <<tr>>two
        <</table>>

    This "doubling" affects normal text writing rules as well. If you write documents, where you describe Unix styled HERE-documents, you MUST NOT put the tokens next to each other:

            bash$ cat<<EOF              # DON'T, this will confuse parser.
            one
            EOF

    You must write the above code example using spaces to prevent "<<" from interpreting as PURE HTML:

            bash$ cat << EOF            # RIGHT, add spaces
            one
            EOF
    drawing a short separator

    A !! (two exclamation marks) at text column (position 8) causes adding immediate <hr> code. any text after !! in the same line is written with <strong> <em> and inserted just after <hr> code, therefore the word formatting commands have no effect in this line.

    Http and email marking control

    • All http and ftp references as well as <foo@example.com> email addresses are marked clickable. Email must have surrounding <> characters to be recognized.

    • If url is preceded with hyphen, it will not be clickable. If a string foo, bar, quux, test, site is found from url, then it is not counted as clickable.

          <me@here.com>                   clickable
          http://example.com              clickable
      
          < me@here.com>                  not clickable; contains space
          <5dko56$1@news02.deltanet.com>  Message-Id, not clickable
      
          -http://example.com             hyphen, not clickable
          http://$EXAMPLE                 variable. not clickable

    Lists and bullets

    • The bulletin table is constructed if there is "o" or "*" at column 8 and 3 spaces after it, so that text starts at column 12. Bulleted lines are advised to be kept together; no spaces between bullet blocks.

          *   This is a bullet
          *   This is a bullte

      Another example:

          o   This is a bullet
          o   This is a bullet

      List example:

          .   This is an ordered list
          .   This is an ordered list
    • The ordered list is started with ".", a dot, and written like bullet where text starts at column 12.

    Line breaks

    • All line breaks are visible in your document, do not use more than one line break to separate paragraphs.

    • Very important is that there is only one line break after headings.

    EMBEDDED DIRECTIVES INSIDE TEXT

    Command line options

    You can cancel obeying all embedded directives by supplying option --not2html-tags.

    You can include these lines anywhere in the document and their content is included in HTML output. Each directive line must fit in one line and it cannot be broken to separate lines.

        #T2HTML-TITLE            <as passed option --title>
        #T2HTML-EMAIL            <as passed option --email>
        #T2HTML-AUTHOR           <as passed option --author>
        #T2HTML-DOC              <as passed option --doc>
        #T2HTML-METAKEYWORDS     <as passed option --meta-keywords>
        #T2HTML-METADESCRIPTION  <as passed option --meta-description>

    You can pass command line options embedded in the file. Like if you wanted the CODE section (column 12) to be coloured with shade of gray, you could add:

        #T2HTML-OPTION  --css-code-bg

    Or you could request turning on particular options. Notice that each line is exactly as you have passed the argument in command line. Imagine surrounding double quoted around lines that are arguments to the associated options.

        #T2HTML-OPTION  --as-is
        #T2HTML-OPTION  --quiet
        #T2HTML-OPTION  --language
        #T2HTML-OPTION  en
        #T2HTML-OPTION  --css-font-type
        #T2HTML-OPTION  Trebuchet MS
        #T2HTML-OPTION --css-code-bg
        #T2HTML-OPTION --css-code-note
        #T2HTML-OPTION (?:Note|Notice|Warning):

    You can also embed your own comments to the text. These are stripped away:

        #T2HTML-COMMENT  You comment here
        #T2HTML-COMMENT  You another comment here
    Embedding files

    #INCLUDE- command

    This is used to include the content into current current position. The URL can be a filename reference, where every $VAR is substituted from the environment variables. The tilde(~) expansion is not supported. The included filename is operating system supported path location.

    A prefix raw: disables any normal formatting. The file content is included as is.

    The URL can also be a HTTP reference to a remote location, whose content is included at the point. In case of remote content or when filename ends to extension .html or .html, the content is stripped in order to make the inclusion of the content possible. In picture below, only the lines within the BODY, marked with !!, are included:

        <html>
          <head>
            ...
          </head>
          <body>
            this text                 !!
            and more of this          !!
          </body>
        </html>

    Examples:

        #INCLUDE-$HOME/lib/html/picture1.html
        #INCLUDE-http://www.example.com/code.html
        #INCLUDE-raw:example/code.html
    Embedding pictures

    #PIC command is used to include pictures into the text

        #PIC picture.png#Caption Text#Picture HTML attributes#align#
              (1)        (2)          (3)                     (4)
    
        1.  The NAME or URL address of the picture. Like image/this.png
    
        2.  The Text that appears below picture
    
        3.  Additional attributes that are attached inside <img> tag.
            For <img width="200" height="200">, the line would
            read:
    
            #PIC some.png#Caption Text#width=200 length=200##
    
        4.  The position of image: "left" (default), "center", "right"

    Note: The Caption Text will also become the ALT text of the image which is used in case the browser is not capable of showing pictures. You can suppress the ALT text with option --no-picture-alt.

    Fragment identifiers for named tags

    #REF command is used for referring to HTML <name> tag inside current document. The whole command must be placed on one single line and cannot be broken to multiple lines. An example:

        #REF #how_to_profile;(Note: profiling);
              (1)            (2)
    
        1.  The NAME HTML tag reference in current document, a single word.
            This can also be a full URL link.
            You can get NAME list by enabling --toc-url-print option.
    
        2.  The clickable text is delimited by ; characters.
    Referring to external documents.

    #URL tag can be used to embed URLs inline, so that the full link is not visible. Only the shown text is used to jump to URL. This directive cannot be broken to separate lines,

         #URL<FULL-URL><embedded inline text>
             |          |
             |          Displayed, clickable, text
             Must be kept together

    An example:

         See search engine #URL<http://www.google.com><Google>

    TABLE OF CONTENT HEADING

    If there is heading 1, which is named exactly "Table of Contents", then all text up to next heading are discarded from the generated HTML file. This is done because program generates its own TOC. It is supposed that you use some text formatting program to generate the toc for you in .txt file and you do not maintain it manually. For example Emacs package tinytf.el can be used.

    TROUBLESHOOTING

    Generated HTML document did not look what I intended

    Did you use editor that inseted TABs which inserts single ascii code (\t) and 8 spaces? check our editor's settings and prefer writing in-all-space format.

    The most common mistake is that there are extra newlines in the document. Keeep one empty line between headings and text, keep one empty line between paragraphs, keep one empty line between body text and bullet. Make it your mantra: one one one ...

    Next, you may have put text at wrong column position. Remember that the regular text is at column 8.

    If generated HTML suddendly starts using only one font, eg <pre>, then you have forgot to close the block. Make it read even, like this:

        Code block
            Code block
            Code block
        ;;  Add empty comment here to "close" the code example at column 12

    Headings start with a big letter or number, likein "Heading", not "heading". Double check the spelling.

    EXAMPLES

    To print the test page and demonstrate possibilities:

        t2html --test-page

    To make simple HTML page without any meta information:

        t2html --title "Html Page Title" --author "Mr. Foo" \
               --simple --out --print file.txt

    If you have periodic post in email format, use --delete-email-headers to ignore the header text:

        t2html --out --print --delete-email-headers page.txt

    To make page fast

        t2html --out --print page.txt

    To convert page from a text document, including meta tags, buttons, colors and frames. Pay attention to switch --html-body which defines document language.

        t2html                                              \
        --print                                             \
        --out                                               \
        --author    "Mr. foo"                               \
        --email     "foo@example.com"                       \
        --title     "This is manual page of page BAR"       \
        --html-body LANG=en                                 \
        --button-prev  previous.html                        \
        --button-top   index.html                           \
        --buttion-next next.html                            \
        --document  http://example.com/dir/this-page.html   \
        --url       manual.html                             \
        --css-code-bg                                       \
        --css-code-note '(?:Note|Notice|Warning):'          \
        --html-frame                                        \
        --disclaimer-file   $HOME/txt/my-html-footer.txt    \
        --meta-keywords    "language-en,manual,program"     \
        --meta-description "Bar program to do this that and more of those" \
        manual.txt

    To check links and print status of all links in par with the http error message (most verbose):

        t2html --link-check file.txt | tee link-error.log

    To print only problematic links:

        t2html --link-check --quiet file.txt | tee link-error.log

    To print terse output in egep -n like manner: line number, link and error code:

        t2html --link-check-single --quiet file.txt | tee link-error.log

    To check links from multiple pages and cache good links to separate file, use --link-cache option. The next link check will run much faster because cached valid links will not be fetched again. At regular intervals delete the link cache file to force complete check.

        t2html --link-check-single \
               --link-cache $HOME/tmp/link.cache \
               --quiet file.txt

    To split large document into pieces, and convert each piece to HTML:

        t2html --split1 --split-name file.txt | t2html --simple --out

    ENVIRONMENT

    EMAIL

    If environment variable EMAIL is defined, it is used in footer for contact address. Option --email overrides environment setting.

    LANG

    The default language setting for switch --language Make sure the first two characters contains the language definition, like in: LANG=en.iso88591

    SEE ALSO

    asciidoc(1) html2ps(1) htmlpp(1) markdown(1)

    Jan K�rrman <jan@tdb.uu.se> has written Perl html2ps which was 2004-11-11 available at http://www.tdb.uu.se/~jan/html2ps.html

    HTML validator is at http://validator.w3.org/

    iMATIX created htmlpp which is available from http://www.imatix.com and seen 2014-03-05 at http://legacy.imatix.com/html/htmlpp

    Emacs minor mode to help writing documents based on TF layout is available. See package tinytf.el in project https://github.com/jaalto/project--emacs-tiny-tools

    Standards

    RFC 1766 contains list of language codes at http://www.rfc.net/

    Latest HTML/XHTML and CSS specifications are at http://www.w3c.org/

    ISO standards

    639 Code for the representation of the names of languages http://www.oasis-open.org/cover/iso639a.html

    3166 Standard Country Codes http://www.niso.org/3166.html and http://www.netstrider.com/tutorials/HTMLRef/standards/

    BUGS

    The implementation was originally designed to work linewise, so it is unfortunately impossible to add or modify any existing feature to look for items that span more than one line.

    As the options --xhtml was much later added, it may not produce completely syntactically valid markup.

    SCRIPT CATEGORIES

    CPAN/Administrative html

    PREREQUISITES

    No additional Perl CPAN modules needed for text to HTML conversion.

    COREQUISITES

    If link check feature is used to to validate URL links, then following modules are needed from Perl CPAN use LWP::UserAgent HTML::FormatText and HTML::Parse

    If you module HTML::LinkExtractor is available, it is used instead of included link extracting algorithm.

    AVAILABILITY

    Homepage is at https://github.com/jaalto/project--perl-text2html

    AUTHOR

    Copyright (C) 1996-2020 <jari.aalto@cante.net>

    This program is free software; you can redistribute and/or modify program under the terms of GNU General Public license either version 2 of the License, or (at your option) any later version.

    This documentation may be distributed subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).

    perl-text2html-master/doc/manual/index.txt000066400000000000000000001531261371714776500211660ustar00rootroot00000000000000NAME t2html - Simple text to HTML converter. Relies on text indentation rules. SYNOPSIS t2html [options] file.txt > file.html DESCRIPTION Convert pure text files into nice looking, possibly framed, HTML pages. An example of conversion: 1. Plain text source code http://pm-doc.git.sourceforge.net/git/gitweb.cgi?p=pm-doc/pm-doc;a=blob_plain;f=doc/index.txt;hb=HEAD 2. reusult of conversion with custom --css-file option: http://pm-doc.sourceforge.net/pm-tips.html http://pm-doc.sourceforge.net/pm-tips.css 3. An Emacs mode tinytf.el for writing the text files (optional) https://github.com/jaalto/project--emacs-tiny-tools Requirements for the input ascii files The file must be written in Technical Format, whose layout is described in the this manual. Basically the idea is simple and there are only two heading levels: one at column 0 and the other at column 4 (halfway between the tab width). Standard text starts at column 8 (the position after pressed tab-key). The idea of technical format is that each column represents different rendering layout in the generated HTML. There is no special markup needed in the text file, so you can use the text version as a master copy of a FAQ etc. Bullets, numbered lists, word emphasis and quotation etc. can expressed in natural way. HTML description The generated HTML includes embedded Cascading Style Sheet 2 (CSS2) and a small piece of Java code. The CSS2 is used to colorize the page loyout and to define suitable printing font sizes. The generated HTML also takes an approach to support XHTML. See page http://www.w3.org/TR/xhtml1/#guidelines where the backward compatibility recommendations are outlined: Legal HTML XHTML requires

    ..






    XHTML does not support fragment identifiers #foo, with the "name" element, but uses "id" instead. For backward compatibility both elements are defined: < ..name="tag"> Is now <.. name="tag" id="tag"> NOTE: This program was never designed to be used for XHTML and the strict XHTML validity is not to be expected. Motivation The easiest format to write large documents, like FAQs, is text. A text file offers WysiWyg editing and it can be turned easily into HTML format. Text files are easily maintained and there is no requirements for special text editors. Any text editor like notepad, vi, Emacs can be used to maintain the documents. Text files are also the only sensible format if documents are kept under version control like RCS, CVS, SVN, Arch, Perforce, ClearCase. They can be asily compared with diff and patches can be easily received and sent to them. To help maintining large documents, there is also available an *Emacs* minor mode, package called *tinytf.el*, which offers text fontification with colors, Indentation control, bullet filling, heading renumbering, word markup, syntax highlighting etc. See https://github.com/jaalto/project--emacs-tiny-tools OPTIONS Html: Header and Footer options --as-is Any extra HTML formatting or text manipulation is suppressed. Text is preserved as it appears in file. Use this option if you plan to deliver or and print the text as seen. o If file contains "Table of Contents" it is not removed o Table of Content block is not created (it usually would) --author -a STR Author of document e.g. --author "John Doe" --disclaimer-file FILE The text that appears at the footer is read from this file. If not given the default copyright text is added. Options "--quiet" and "--simple" suppress disclaimers. --document FILE Name of the document or filename. You could list all alternative URLs to the document with this option. --email -e EMAIL The contact address of the author of the document. Must be pure email address with no "<" and ">" characters included. Eg. --email foo@example.com --email "" WRONG --email "me@here.com" right --simple -s Print minimum footer only: contact, email and date. Use "--quiet" to completely discard footer. --t2html-tags Allow processing embedded #T2HTML- directives inside file. See full explanation by reading topic "EMBEDDED DIRECTIVES INSIDE TEXT". By default, you do not need to to supply this option - it is "on" by default. To disregard embedded directives in text file, supply "no" option: --not2html-tags. --title STR -t STR The title text that appears in top frame of browser. --url URL Location of the HTML file. When --document gave the name, this gives the location. This information is printed at the Footer. Html: Navigation urls --base URL URL location of the HTML file in the destination site where it will be put available. This option is needed only if the document is hosted on a FTP server (rare, but possible). A FTP server based document cannot use Table Of Contents links (fragment *#tag* identifiers) unless HTML tag BASE is also defined. The argument can be full URL to the document: --base ftp://ftp.example.com/file.html --base ftp://ftp.example.com/ --button-heading-top Add additional [toc] navigation button to the end of each heading. This may be useful in long non-framed HTML files. --button-top URL Buttons are placed at the top of document in order: [previous][top][next] and *--button-** options define the URLs. If URL is string *none* then no button is inserted. This may be handy if the buttons are defined by a separate program. And example using Perl: #!/usr/bin/perl my $top = "index.html"; # set defaults my $prev = "none"; my $next = "none"; # ... somewhere $prev or $next may get set, or then not qx(t2html --button-top "$top" --button-prev "$prev" --button-next "$next" ...); # End of sample program --button-prev URL URL to go to previous document or string *none*. --button-next URL URL to go to next document or string *none*. --reference tag=value You can add any custom references (tags) inside text and get them expand to any value. This option can be given multiple times and every occurrence of TAG is replaced with VALUE. E.g. when given following options: --reference "#HOME-URL=http://www.example.com/dir" --reference "#ARCHIVE-URL=http://www.example.com/dir/dir2" When referenced in text, the generated HTML includes expanded expanded to values. An example text: The homepage is #HOME-URL/page.html and the mirrot page it at #ARCHIVE-URL/page.html where you can find the latest version. -R, --reference-separator STRING See above. String that is used to split the TAG and VALUE. Default is equal sign "=". -T, --toc-url-print Display URLs (constructed from headings) that build up the Table of Contents (NAME AHREF tags) in a document. The list is outputted to stderr, so that it can be separated: % t2html --toc-url-print tmp.txt > file.html 2> toc-list.txt Where would you need this? If you want to know the fragment identifies for your file, you need the list of names. http://www.example.com/myfile.html#fragment-identifier Html: Controlling CSS generation (HTML tables) --css-code-bg This option affects how the code section (column 12) is rendered. Normally the section is surrounded with a
    ..
    codes, but with this options, something more fancier is used. The code is wrapped inside a ...
    and the background color is set to a shade of gray. --css-code-note "REGEXP" Option --css-code-bg is required to activate this option. A special word defined using regexp (default is 'Note:') will mark code sections specially. The "first word" is matched against the supplied Perl regexp. The supplied regexp must not, repeat, must not, include any matching group operators. This simply means, that grouping parenthesis like "(one|two|three)" are not allowed. You must use the Perl non-grouping ones like "(?:one|two|three)". Please refer to perl manual page [perlre] if this short introduction did not give enough rope. With this options, instead of rendering column 12 text with
    ..
    , the text appears just like regular text, but with a twist. The background color of the text has been changed to darker grey to visually stand out form the text. An example will clarify. Suppose that you passed options --css-code-bg and --css-code-note='(?:Notice|Note):', which instructed to treat the first paragraphs at column 12 differently. Like this: This is the regular text that appears somewhere at column 8. It may contain several lines of text in this paragraph. Notice: Here is the special section, at column 12, and the first word in this paragraph is 'Notice:'. Only that makes this paragraph at column 12 special. Now, we have some code to show to the user: for ( i = 0; i++; i < 10 ) { // Doing something in this loop } One note, text written with initial special word, like "Notice:", must all fit in one full pragraph. Any other paragraphs that follow, are rendered as code sections. Like here: This is the regular text that appears somewhere It may contain several lines of text in this paragraph Notice: Here is the special section, at column 12, and the first word in this paragraph is 'Notice:' which makes it special Hoewver, this paragraph IS NOT rendered specially any more. Only the first paragraph above. for ( i = 0; i++; i < 10 ) { // Doing something in this loop } As if this were not enough, there are some special table control directives that let you control the ..
    which is put around the code section at column 12. Here are few examples: Here is example 1 #t2html::td:bgcolor=#F7F7DE for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 2 #t2html::td:bgcolor=#F7F7DE:tableborder:1 for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 3 #t2html::td:bgcolor="#FFFFFF":tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Here is example 4 #t2html::td:bgcolor="#FFFFFF":table:border=1_width=94%_border=0_cellpadding="10"_cellspacing="0" for ( i = 0; i++; i < 10 ) { // Doing something in this loop } Looks cryptic? Cannot help that and in order for you to completely understand what these directives do, you need to undertand what elements can be added to the and
    tokens. Refer to HTML specification for available attributes. Here is briefing what you can do: The start command is: #t2html:: | After this comes attribute pairs in form key:value and multiple ones as key1:value1:key2:value2 ... The "key:value" pairs can be: td:ATTRIBUTES | This is converted into table:ATTRIBUTES | This is converted into There can be no spaces in the ATTRIBUTES, because the "First-word" must be one contiguous word. An underscore can be used in place of space: table:border=1_width=94% | Interpreted as
    It is also possible to change the default CLASS style with word "tableclass". In order the CLASS to be useful, its CSS definitions must be either in the default configuration or supplied from a external file. See option --script-file. tableclass:name | Interpreted as
    For example, there are couple of default styles that can be used: 1) Here is CLASS "dashed" example #t2html::tableclass:dashed for ( i = 0; i++; i < 10 ) { // Doing something in this loop } 2) Here is CLASS "solid" example: #t2html::tableclass:solid for ( i = 0; i++; i < 10 ) { // Doing something in this loop } You can change any individual value of the default table definition which is:
    To change e.g. only value cellpadding, you would say: #t2html::table:tablecellpadding:2 If you are unsure what all of these were about, simply run program with --test-page and look at the source and generated HTML files. That should offer more rope to experiment with. --css-file FILE Include which refers to external CSS style definition source. This option is ignored if --script-file option has been given, because that option imports whole content inside HEAD tag. This option can appear multiple times and the external CSS files are added in listed order. --css-font-type CSS-DEFINITION Set the BODY element's font definition to CSS-DEFINITION. The default value used is the regular typeset used in newspapers and books: --css-font-type='font-family: "Times New Roman", serif;' --css-font-size CSS-DEFINITION Set the body element's font size to CSS-DEFINITION. The default font size is expressed in points: --css-font-size="font-size: 12pt;" Html: Controlling the body of document --delete REGEXP Delete lines matching perl REGEXP. This is useful if you use some document tool that uses navigation tags in the text file that you do not want to show up in generated HTML. --delete-email-headers Delete email headers at the beginning of file, until first empty line that starts the body. If you keep your document ready for Usenet news posting, they may contain headers and body: From: ... Newsgroups: ... X-Sender-Info: Summary: BODY-OF-TEXT --nodelete-default Use this option to suppress default text deletion (which is on). Emacs "folding.el" package and vi can be used with any text or programming language to place sections of text between tags {{{ and }}}. You can open or close such folds. This allows keeping big documents in order and manageable quite easily. For Emacs support, see. ftp://ftp.csd.uu.se/pub/users/andersl/beta/ The default value deletes these markers and special comments "#_comment" which make it possible to cinlude your own notes which are not included in the generated output. {{{ Security section #_comment Make sure you revise this section to #_comment the next release The seecurity is an important issue in everyday administration... More text ... }}} --html-body STR Additional attributes to add to HTML tag . You could e.g. define language of the text with --html-body LANG=en which would generate HTML tag See section "SEE ALSO" for ISO 639. --html-column-beg="SPEC HTML-SPEC" The default interpretation of columns 1,2,3 5,6,7,8,9,10,11,12 can be changed with *beg* and *end* swithes. Columns 0,4 can't be changed because they are reserved for headings. Here are some samples: --html-column-beg="7quote " --html-column-end="7quote " --html-column-beg="10
     class='column10'"
                --html-column-end="10    
    " --html-column-beg="quote " --html-column-end="quote " Note: You can only give specifications up till column 12. If text is beyound column 12, it is interpreted like it were at column 12. In addition to column number, the *SPEC* can also be one of the following strings Spec equivalent word markup ------------------------------ quote `' bold _ emp * small + big = ref [] like: [Michael] referred to [rfc822] Other available Specs ------------------------------ 7quote When column 7 starts with double quote. For style sheet values for each color, refer to *class* attribute and use --script-file option to import definitions. Usually /usr/lib/X11/rgb.txt lists possible color values and the HTML standard at http://www.w3.org/ defines following standard named colors: Black #000000 Maroon #800000 Green #008000 Navy #000080 Silver #C0C0C0 Red #FF0000 Lime #00FF00 Blue #0000FF Gray #808080 Purple #800080 Olive #808000 Teal #008080 White #FFFFFF Fuchsia #FF00FF Yellow #FFFF00 Aqua #00FFFF --html-column-end="COL HTML-SPEC" See --html-column-beg --html-font SIZE Define FONT SIZE. It might be useful to set bigger font size for presentations. -F, --html-frame [FRAME-PARAMS] If given, then three separate HTML files are generated. The left frame will contain TOC and right frame contains rest of the text. The *FRAME-PARAMS* can be any valid parameters for HTML tag FRAMESET. The default is "cols="25%,75%"". Using this implies --out option automatically, because three files cannot be printed to stdout. file.html --> file.html The Frame file, point browser here file-toc.html Left frame (navigation) file-body.html Right frame (content) --language ID Use language ID, a two character ISO identifier like "en" for English during the generation of HTML. This only affects the text that is shown to end-user, like text "Table Of contents". The default setting is "en". See section "SEE ALSO" for standards ISO 639 and ISO 3166 for proper codes. The selected language changes propgram's internal arrays in two ways: 1) Instead of default "Table of ocntents" heading the national langaugage equivalent will be used 2) The text "Pic" below embedded sequentially numbered pictures will use natinal equivalent. If your languagae is not supported, please send the phrase for "Table of contents" and word "Pic" in your language to the maintainer. --script-file FILE Include java code that must be complete from FILE. The code is put inside of each HTML. The --script-file is a general way to import anything into the HEAD element. Eg. If you want to keep separate style definitions for all, you could only import a pointer to a style sheet. See *14.3.2 Specifying external style sheets* in HTML 4.0 standard. --meta-keywords STR Meta keywords. Used by search engines. Separate kwywords like "AA, BB, CC" with commas. Refer to HTML 4.01 specification and topic "7.4.4 Meta data" and see http://www.htmlhelp.com/reference/wilbur/ and --meta-keywords "AA,BB,CC" --meta-description STR Meta description. Include description string, max 1000 characters. This is used by search engines. Refer to HTML 4.01 specification and topic "7.4.4 Meta data" --name-uniq First 1-4 words from the heading are used for the HTML *name* tags. However, it is possible that two same headings start with exactly the same 1-4 words. In those cases you have to turn on this option. It will use counter 00 - 999 instead of words from headings to construct HTML *name* references. Please use this option only in emergencies, because referring to jump block *name* via httpI://example.com/doc.html#header_name is more convenient than using obscure reference httpI://example.com/doc.html#11 In addition, each time you add a new heading the number changes, whereas the symbolic name picked from heading stays as long as you do not change the heading. Think about welfare of your netizens who bookmark you pages. Try to make headings to not have same subjects and you do not need this option. Document maintenance and batch job commands -A, --auto-detect Convert file only if tag "#T2HTML-" is found from file. This option is handy if you run a batch command to convert all files to HTML, but only if they look like HTML base files: find . -name "*.txt" -type f \ -exec t2html --auto-detect --verbose --out {} \; The command searches all *.txt files under current directory and feeds them to conversion program. The --auto-detect only converts files which include "#T2HTML-" directives. Other text files are not converted. --link-check -l Check all http and ftp links. *This option is supposed to be run standalone* Option --quiet has special meaning when used with link check. With this option you can regularly validate your document and remove dead links or update moved links. Problematic links are outputted to *stderr*. This link check feature is available only if you have the LWP web library installed. Program will check if you have it at runtime. Links that are big, e.g. which match *tar.gz .zip ...* or that run programs (links with ? character) are ignored because the GET request used in checking would return whole content of the link and it would. be too expensive. A suggestion: When you put binary links to your documents, add them with space: http://example.com/dir/dir/ filename.tar.gz Then the program *does* check the http addresses. Users may not be able to get the file at one click, checker can validate at least the directory. If you are not the owner of the link, it is also possible that the file has moved of new version name has appeared. -L, --link-check-single Print condensed output in *grep -n* like manner *FILE:LINE:MESSAGE* This option concatenates the url response text to single line, so that you can view the messages in one line. You can use programming tools (like Emacs M-x compile) that can parse standard grep syntax to jump to locations in your document to correct the links later. -o, --out write generated HTML to file that is derived from the input filename. --out --print /dir/file --> /dir/file.html --out --print /dir/file.txt --> /dir/file.html --out --print /dir/file.this.txt --> /dir/file.this.html --link-cache CACHE_FILE When links are checked periodically, it would be quite a rigorous to check every link every time that has already succeeded. In order to save link checking time, the "ok" links can be cached into separate file. Next time you check the links, the cache is opened and only links found that were not in the cache are checked. This should dramatically improve long searches. Consider this example, where every text file is checked recursively. $ t2html --link-check-single \ --quiet --link-cache ~tmp/link.cache \ `find . -name "*.txt" -type f` -O, --out-dir DIR Like --out, but chop the directory part and write output files to DIR. The following would generate the HTML file to current directory: --out-dir . If you have automated tool that fills in the directory, you can use word none to ignore this option. The following is a no-op, it will not generate output to directory "none": --out-dir none -p, --print Print filename to stdout after HTML processing. Normally program prints no file names, only the generated HTML. % t2html --out --print page.txt --> page.html -P, --print-url Print filename in URL format. This is useful if you want to check the layout immediately with your browser. % t2html --out --print-url page.txt | xargs lynx --> file: /users/foo/txt/page.html --split REGEXP Split document into smaller pieces when REGEXP matches. *Split commands are standalone*, meaning, that it starts and quits. No HTML conversion for the file is engaged. If REGEXP is found from the line, it is a start point of a split. E.g. to split according to toplevel headings, which have no numbering, you would use: --split '^[A-Z]' A sequential numbers, 3 digits, are added to the generated partials: filename.txt-NNN The split feature is handy if you want to generate slides from each heading: First split the document, then convert each part to HTML and finally print each part (page) separately to printer. -S1, --split1 This is shorthand of --split command. Define regexp to split on toplevel heading. -S2, --split2 This is shorthand of --split command. Define regexp to split on second level heading. -SN, --split-named-files Additional directive for split commands. If you split e.g. by headings using --split1, it would be more informative to generate filenames according to first few words from the heading name. Suppose the heading names where split occur were: Program guidelines Conclusion Then the generated partial filenames would be as follows. FILENAME-program_guidelines FILENAME-conclusion -X, --xhtml Render using strict XHTML. This means using
    ,
    and paragraphs use

    ..

    . "Note: this option is experimental. See BUGS" Miscellaneous options --debug LEVEL Turn on debug with positive LEVEL number. Zero means no debug. --help -h Print help screen. Terminates program. --help-css Print default CSS used. Terminates program. You can copy and modify this output and instruct to use your own with --css-file=FILE. You can also embed the option to files with "#T2HTML-OPTION" directive. --help-html Print help in HTML format. Terminates program. --help-man Print help page in Unix manual page format. You want to feed this output to nroff -man in order to read it. Terminates program. --test-page Print the test page: HTML and example text file that demonstrates the capabilities. --time Print to stderr time spent used for handling the file. -v, --verbose [LEVEL] Print verbose messages. -q, --quiet Print no footer at all. This option has different meaning if *--link-check* option is turned on: print only erroneous links. V, --version Print program version information. FORMAT DESCRIPTION Program converts text files to HTML. The basic idea is to rely on indentation level, and the layout used is called 'Technical format' (TF) where only minimal conventions are used to mark italic, bold etc. text. The Basic principles can be demonstrated below. Notice the column poisiton ruler at the top: --//-- description start 123456789 123456789 123456789 123456789 123456789 column numbers Heading 1 starts with a big letter at leftmost column 1 The column positions 1,2,3 are currently undefined and may not format correctly. Do not place text at columns 1,2 or 3. Heading level 2 starts at half-tab column 4 with a big letter Normal but colored text at columns 5 Normal but colored text at columns 6 Heading 3 can be considered at position TAB minus 1, column 7. "Special text at column 7 starts with double quote" Standard text starts at column 8, you can *emphatize* text or make it _strong_ and write =SmallText= or +BigText+ show variable name `ThisIsAlsoVariable'. You can `_*nest*_' `the' markup. more txt in this paragraph txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt Strong text at column 9 Column 10 is reserved for quotations Column 10 is reserved for quotations Column 10 is reserved for quotations Column 10 is reserved for quotations Strong text at column 11 Column 12 and further is reserved for code examples Column 12 and further is reserved for code examples All text here are surrounded by
     HTML codes
                   This CODE column in affected by the --css-code* options.
    
             Heading 2 at column 4 again
    
                If you want something like Heading level 3, use column 7 (bold)
    
                 Column 8. Standard tab position. txt txt txt txt txt txt txt
                 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt
                 txt txt txt txt txt txt txt txt txt txt txt txt txt txt
                 [1998-09-10 Mr. Foo said]:
    
                   cited text at column 10. cited text cited text cited text
                   cited text cited text cited text cited text cited text
                   cited text cited text cited text cited text cited text
                   cited text
    
    
                 *   Bullet at column 8. Notice 3 spaces after (*), so
                     text starts at half-tab forward at column 12.
                 *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
                 *   Bullet. txt txt txt txt txt txt txt txt txt txt txt txt
                     ,txt txt txt txt
    
                     Notice that previous paragraph ends to P-comma
                     code, it tells this paragraph to continue in
                     bullet mode, otherwise this text at column 12
                     would be interpreted as code section surrounded
                     by 
     HTML codes.
    
    
                 .   This is ordered list.
                 .   This is ordered list.
                 .   This is ordered list.
    
    
                 .This line starts with dot and is displayed in line by itself.
                 .This line starts with dot and is displayed in line by itself.
    
                 !! This adds an 
    HTML code, text in line is marked with !! Make this email address clickable Do not make this email address clickable bar@example.com, because it is only an example and not a real address. Notice that the last one was not surrounded by <>. Common login names like foo, bar, quux, or internet site 'example' are ignored automatically. Also do not make < this@example.com> because there is extra white space. This may be more convenient way to disable email addresses temporarily. Heading1 again at column 0 Subheading at column 4 And regular text, column 8 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt --//-- description end That is it, there is the whole layout described. More formally the rules of text formatting are secribed below. USED HEADINGS * There are only *two* heading levels in this style. Heading columns are 0 and 4 and the heading must start with big letter ("Heading") or number ("1.0 Heading") * At column 4, if the text starts with small letter, that line is interpreted as * A HTML
    mark is added just before printing heading at level 1. * The headings are gathered, the TOC is built and inserted to the beginning of HTML page. The HTML references used in TOC are the first 4 sequential words from the headings. Make sure your headings are uniquely named, otherwise there will be same NAME references in the generated HTML. Spaces are converted into underscore when joining the words. If you can not write unique headings by four words, then you must use --name-uniq switch TEXT PLACEMENT RULES General The basic rules for positioning text in certain columns: * Text at column 1 is undefined if it does not start with big letter or number to indicate Heading level 1. * Text between columns 2 and 3 is marked with * Column 4 is reserved for heading level 2 * Text between columns 5-7 is marked with * Text at column 7 is if the first character is double quote. * Column 10 is reserved for text. If you want to quote someone or to add reference text, place the text in this column. * Text at columns 9 and 11 are marked with Column 8 for text and special codes * Column 8 is reserved for normal text * At the start of text, at column 8, there can be DOT-code or COMMA-code. Column 12 is special * Column 12 is treated specially: block is started with
     and
            lines are marked as . When the last text at *column* 12
            is found, the block is closed with 
    . An example: txt txt txt ;evenly placed block, fine, do it like this txt txt txt txt txt txt ;Can not terminate the /pre, because last txt txt txt txt ;column is not at 12 txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt txt ;; Finalizing comment, now the text is evenly placed Additional tokens for use at column 8 * If there is "."(dot) at the beginning of a line and immediately non-whitespace, then
    code is added to the end of line. .This line will have a
    HTML tag at the end. While these two line are joined together by the browser, depending on the frame width. * If there is ","(comma) then the

    code is not inserted if the previous line is empty. If you use both "."(dot) and ","(comma), they must be in order dot-comma. The ","(comma) works differently if it is used in bullet A

    is always added if there is separation of paragraphs, but when you are writing a bullet, there is a problem, because a bullet exist only as long as text is kept together * This is a bullet and it has all text ketp together even if there is another line in the bullet. But to write bullets tat spread multiple paragraphs, you must instruct that those are to kept together and the text in next paragraph is not while it is placed at column 12 * This is a bullet and it has all text ketp together ,even if there is another line in the bullet. This is new paragrah to the previous bullet and this is not a text sample. See continued COMMA-code above. * This is new bullet // and this is code sample after bullet if ( $flag ) { ..do something.. } Special text markings italic, bold, code, small, big tokens _this_ is interpreted as this *this* is interpreted as this `this' is interpreted as this ` Exra modifiers that can be mixed with the above. Usually if you want bigger font, CAPITALIZE THE WORDS. =this= is interpreted as this +this+ is interpreted as this [this] is interpreted as this superscripting word[this] is interpreted as superscript. You can use like this[1], multiple[(2)] and almost any[(ab)] and imaginable[IV superscritps] as long as the left bracket is attached to the word. subscripting 12[[10]] is representation of value 12 un base 10. This is interpreted as subscript. You can use like this[[1]], multiple[[(2)]] and almost any[[(ab)]] and imaginable[[IV superscritps]] as long as *two* left brackets are attached to the word. embedding standard HTML tokens Stanadard special HTML entities can be added inside text in a normal way, either using sybolic names or the hash code. Here are exmples: × < > ≤ ≥ ≠ √ − α β γ ÷ « » ‹ › - – — ≈ ≡ ∑ ƒ ∞ ° ± ™ © ® € £ ¥ embedding PURE HTML into text This feature is highly experimental. It is possible to embed pure HTML inside text in occasions, where e.g. some special formatting is needed. The idea is simple: you write HTML as usual but double every '<' and '>' characters, like: <

    > The other rule is that all PURE HTML must be kept together. There must be no line breaks between pure HTML lines. This is incorrect: <

    > <>one <>two <
    > The pure HTML must be written without extra newlines: <> <>one <>two <
    > This "doubling" affects normal text writing rules as well. If you write documents, where you describe Unix styled HERE-documents, you MUST NOT put the tokens next to each other: bash$ cat< code. any text after !! in the same line is written with and inserted just after
    code, therefore the word formatting commands have no effect in this line. Http and email marking control * All http and ftp references as well as email addresses are marked clickable. Email must have surrounding <> characters to be recognized. * If url is preceded with hyphen, it will not be clickable. If a string foo, bar, quux, test, site is found from url, then it is not counted as clickable. clickable http://example.com clickable < me@here.com> not clickable; contains space <5dko56$1@news02.deltanet.com> Message-Id, not clickable -http://example.com hyphen, not clickable http://$EXAMPLE variable. not clickable Lists and bullets * The bulletin table is constructed if there is "o" or "*" at column 8 and 3 spaces after it, so that text starts at column 12. Bulleted lines are advised to be kept together; no spaces between bullet blocks. * This is a bullet * This is a bullte Another example: o This is a bullet o This is a bullet List example: . This is an ordered list . This is an ordered list * The ordered list is started with ".", a dot, and written like bullet where text starts at column 12. Line breaks * All line breaks are visible in your document, do not use more than one line break to separate paragraphs. * Very important is that there is only *one* line break after headings. EMBEDDED DIRECTIVES INSIDE TEXT Command line options You can cancel obeying all embedded directives by supplying option --not2html-tags. You can include these lines anywhere in the document and their content is included in HTML output. Each directive line must fit in one line and it cannot be broken to separate lines. #T2HTML-TITLE #T2HTML-EMAIL #T2HTML-AUTHOR #T2HTML-DOC #T2HTML-METAKEYWORDS #T2HTML-METADESCRIPTION You can pass command line options embedded in the file. Like if you wanted the CODE section (column 12) to be coloured with shade of gray, you could add: #T2HTML-OPTION --css-code-bg Or you could request turning on particular options. Notice that each line is exactly as you have passed the argument in command line. Imagine surrounding double quoted around lines that are arguments to the associated options. #T2HTML-OPTION --as-is #T2HTML-OPTION --quiet #T2HTML-OPTION --language #T2HTML-OPTION en #T2HTML-OPTION --css-font-type #T2HTML-OPTION Trebuchet MS #T2HTML-OPTION --css-code-bg #T2HTML-OPTION --css-code-note #T2HTML-OPTION (?:Note|Notice|Warning): You can also embed your own comments to the text. These are stripped away: #T2HTML-COMMENT You comment here #T2HTML-COMMENT You another comment here Embedding files #INCLUDE- command This is used to include the content into current current position. The URL can be a filename reference, where every $VAR is substituted from the environment variables. The tilde(~) expansion is not supported. The included filename is operating system supported path location. A prefix "raw:" disables any normal formatting. The file content is included as is. The URL can also be a HTTP reference to a remote location, whose content is included at the point. In case of remote content or when filename ends to extension ".html" or ".html", the content is stripped in order to make the inclusion of the content possible. In picture below, only the lines within the BODY, marked with !!, are included: ... this text !! and more of this !! Examples: #INCLUDE-$HOME/lib/html/picture1.html #INCLUDE-http://www.example.com/code.html #INCLUDE-raw:example/code.html Embedding pictures #PIC command is used to include pictures into the text #PIC picture.png#Caption Text#Picture HTML attributes#align# (1) (2) (3) (4) 1. The NAME or URL address of the picture. Like image/this.png 2. The Text that appears below picture 3. Additional attributes that are attached inside tag. For , the line would read: #PIC some.png#Caption Text#width=200 length=200## 4. The position of image: "left" (default), "center", "right" Note: The "Caption Text" will also become the ALT text of the image which is used in case the browser is not capable of showing pictures. You can suppress the ALT text with option --no-picture-alt. Fragment identifiers for named tags #REF command is used for referring to HTML tag inside current document. The whole command must be placed on one single line and cannot be broken to multiple lines. An example: #REF #how_to_profile;(Note: profiling); (1) (2) 1. The NAME HTML tag reference in current document, a single word. This can also be a full URL link. You can get NAME list by enabling --toc-url-print option. 2. The clickable text is delimited by ; characters. Referring to external documents. "#URL" tag can be used to embed URLs inline, so that the full link is not visible. Only the shown text is used to jump to URL. This directive cannot be broken to separate lines, #URL | | | Displayed, clickable, text Must be kept together An example: See search engine #URL TABLE OF CONTENT HEADING If there is heading 1, which is named exactly "Table of Contents", then all text up to next heading are discarded from the generated HTML file. This is done because program generates its own TOC. It is supposed that you use some text formatting program to generate the toc for you in .txt file and you do not maintain it manually. For example Emacs package *tinytf.el* can be used. TROUBLESHOOTING Generated HTML document did not look what I intended Did you use editor that inseted TABs which inserts single ascii code (\t) and 8 spaces? check our editor's settings and prefer writing in-all-space format. The most common mistake is that there are extra newlines in the document. Keeep *one* empty line between headings and text, keep *one* empty line between paragraphs, keep *one* empty line between body text and bullet. Make it your mantra: *one* *one* *one* ... Next, you may have put text at wrong column position. Remember that the regular text is at column 8. If generated HTML suddendly starts using only one font, eg
    , then
        you have forgot to close the block. Make it read even, like this:
    
            Code block
                Code block
                Code block
            ;;  Add empty comment here to "close" the code example at column 12
    
        Headings start with a big letter or number, likein "Heading", not
        "heading". Double check the spelling.
    
    EXAMPLES
        To print the test page and demonstrate possibilities:
    
            t2html --test-page
    
        To make simple HTML page without any meta information:
    
            t2html --title "Html Page Title" --author "Mr. Foo" \
                   --simple --out --print file.txt
    
        If you have periodic post in email format, use --delete-email-headers to
        ignore the header text:
    
            t2html --out --print --delete-email-headers page.txt
    
        To make page fast
    
            t2html --out --print page.txt
    
        To convert page from a text document, including meta tags, buttons,
        colors and frames. Pay attention to switch *--html-body* which defines
        document language.
    
            t2html                                              \
            --print                                             \
            --out                                               \
            --author    "Mr. foo"                               \
            --email     "foo@example.com"                       \
            --title     "This is manual page of page BAR"       \
            --html-body LANG=en                                 \
            --button-prev  previous.html                        \
            --button-top   index.html                           \
            --buttion-next next.html                            \
            --document  http://example.com/dir/this-page.html   \
            --url       manual.html                             \
            --css-code-bg                                       \
            --css-code-note '(?:Note|Notice|Warning):'          \
            --html-frame                                        \
            --disclaimer-file   $HOME/txt/my-html-footer.txt    \
            --meta-keywords    "language-en,manual,program"     \
            --meta-description "Bar program to do this that and more of those" \
            manual.txt
    
        To check links and print status of all links in par with the http error
        message (most verbose):
    
            t2html --link-check file.txt | tee link-error.log
    
        To print only problematic links:
    
            t2html --link-check --quiet file.txt | tee link-error.log
    
        To print terse output in egep -n like manner: line number, link and
        error code:
    
            t2html --link-check-single --quiet file.txt | tee link-error.log
    
        To check links from multiple pages and cache good links to separate
        file, use --link-cache option. The next link check will run much faster
        because cached valid links will not be fetched again. At regular
        intervals delete the link cache file to force complete check.
    
            t2html --link-check-single \
                   --link-cache $HOME/tmp/link.cache \
                   --quiet file.txt
    
        To split large document into pieces, and convert each piece to HTML:
    
            t2html --split1 --split-name file.txt | t2html --simple --out
    
    ENVIRONMENT
        EMAIL
            If environment variable *EMAIL* is defined, it is used in footer for
            contact address. Option --email overrides environment setting.
    
        LANG
            The default language setting for switch "--language" Make sure the
            first two characters contains the language definition, like in:
            LANG=en.iso88591
    
    SEE ALSO
        asciidoc(1) html2ps(1) htmlpp(1) markdown(1)
    
      Related programs
        Jan K�rrman  has written Perl html2ps which was
        2004-11-11 available at http://www.tdb.uu.se/~jan/html2ps.html
    
        HTML validator is at http://validator.w3.org/
    
        iMATIX created htmlpp which is available from http://www.imatix.com and
        seen 2014-03-05 at http://legacy.imatix.com/html/htmlpp
    
        Emacs minor mode to help writing documents based on TF layout is
        available. See package tinytf.el in project
        https://github.com/jaalto/project--emacs-tiny-tools
    
      Standards
        RFC 1766 contains list of language codes at http://www.rfc.net/
    
        Latest HTML/XHTML and CSS specifications are at http://www.w3c.org/
    
      ISO standards
        639 Code for the representation of the names of languages
        http://www.oasis-open.org/cover/iso639a.html
    
        3166 Standard Country Codes http://www.niso.org/3166.html and
        http://www.netstrider.com/tutorials/HTMLRef/standards/
    
    BUGS
        The implementation was originally designed to work linewise, so it is
        unfortunately impossible to add or modify any existing feature to look
        for items that span more than one line.
    
        As the options --xhtml was much later added, it may not produce
        completely syntactically valid markup.
    
    SCRIPT CATEGORIES
        CPAN/Administrative html
    
    PREREQUISITES
        No additional Perl CPAN modules needed for text to HTML conversion.
    
    COREQUISITES
        If link check feature is used to to validate URL links, then following
        modules are needed from Perl CPAN "use LWP::UserAgent"
        "HTML::FormatText" and "HTML::Parse"
    
        If you module "HTML::LinkExtractor" is available, it is used instead of
        included link extracting algorithm.
    
    AVAILABILITY
        Homepage is at https://github.com/jaalto/project--perl-text2html
    
    AUTHOR
        Copyright (C) 1996-2020 
    
        This program is free software; you can redistribute and/or modify
        program under the terms of GNU General Public license either version 2
        of the License, or (at your option) any later version.
    
        This documentation may be distributed subject to the terms and
        conditions set forth in GNU General Public License v2 or later; or, at
        your option, distributed under the terms of GNU Free Documentation
        License version 1.2 or later (GNU FDL).