`mkhtmlindex`, a Hypertext File Index Generator

15 October 2000

The current version is 3.0 (a release version)

contents
1 What's New?
2 What is it?
3 Why?
4 How does it work? How would I use it?
5 Where can I get Documentation?
6 How do I get it, and how much does it cost?

1 What's New?

1.1 15 October 2000

big change under the hood, not many user-noticable changes, though.

removed the SString/QString crap, replaced with ANSI C++ string class.
cleaned up and simplified some file creation logic.
got rid of the silly spinner --- it actually slowed down execution so you could watch the spinner. how dumb of me.

and that's about it.

1.2 14 April 1999

maintenance release

fixed a subtle bug in the htmlfile parser that caused segfaults when reading files with http-equiv meta tags.
updated (actually, re-created) msvc project files.
cleaned out a lot of trash from the distro, and fixed some makefiles.
compiles cleanly and works fine on Linux (RH5.2, GCC) and Win32 (NT4.0, MSVC6).

1.3 5 February 1999

Some not-so-major changes, and one big one.

updated the advertisement url (it pointed to a dead site)
converted the HTML class to use a flex parser. i can't really tell if there was much of a speed improvement (since what it did before was very simplistic), but the code is simpler and the parsing much more accurate.
added a short manpage.
changed the internals to use a linked list rather than an assinine array of pointers, which completely removed the rather stupid compile-time ``max number of files'' limit. the maximum number of files you can index is now limited by virtual memory and your patience for letting the program run.
still haven't updated much documentation... basically because the undocumented features are going to change rather radically.
a flex parser for template files is in the planning stages, which means that the template syntax will become much more powerful.
removed the rather useless and incomplete qtfrontend, since i don't use it anyway. (mkhtmlindex is much more powerful when you put it into an update script...) maybe a new one will crop up in the future, but i'm not planning it.

The re-write is complete. Doubtless version 2.0 still has bugs (and probably needs more woodshedding before it earns the name 2.0), but my available time for development has come to an end, and this is good enough.

...

1.5 14 July 1998

The source for versions 0.x through 1.3 was so convoluted and un-architected that i've undertaken a complete rewrite, mostly from scratch. this version is larger, but runs faster and is more reliable.

...

1.6 18 June 1998

I'm releasing version 1.0 of my program to the world. I haven't considered a license, but you may assume something like GPL. Enjoy.

2 What is it?

mkhtmlindex a program to simplify my web publishing. It automagically generates a hypertext file containing descriptive links to each hypertext file in a directory.

It does some nifty things with template files, and is scriptable. see the explanation section below for a better explanation (oddly enough...).

Right now, mkhtmlindex is a command-line program only. I'm developing it in C++ on a Linux system, but am trying to make it as portable as possible. So far it has compiled cleanly on any system using GNU's C compiler (GCC), including the Cygnus cygwin32 setup, as well as with Visual C++ 4.0/5.0/6.0.

3 Why?

I have a couple of websites which are just large collections of files in a directory, with a file that indexes them with links and such. Maintaining this file gets to be a real hassle when the number of files exceeds about thirty or so... and so I undertook to write a program to automate the process. I'm sure something like it exists on the web, I couldn't find one that did exactly what I wanted.

I've tailored the program's behavior mostly to suit the needs of my humor page: sorting files by date (newest on top), separating them by category or date (eventually to be ``category and date''), printing the link with a pretty format (the meta info), et cetera. This has proven useful also for the University of Kentucky IEEE hardware contest page, on which the use of this program (hopefully) would make me (the maintainer) obselete.

I envision that this program should be mildly useful to anyone who maintains a page that is just a list of links to other files.

4 How does it work? How would I use it?

The point of the program is to generate a hypertext file that contains a list of links to other hypertext files, an index of sorts. When you start the program, it scans the current directory (or, eventually, an arbitrary directory you specify) for files with the *.html or *.htm extensions. It creates for itself a list of all of these files, and reads the html title and several meta strings from each one. Then it creates a list of hypertext links (using the page title as the link text), and then exits. (``that doesn't sound so bad...'')

The program searches each *.html or *.htm file for the <title> tag --- if this tag is in the file, then the program will add a link to it in the page, using the page's title (from the title tag) as the link text. If that <title> tag is empty, the link text will be the filename.

4.1 meta Titles

Alternatively, you can use <meta> information in each source file as the link text (and eventually even as sorting keys). Words enclosed in vertical bars become link text, words in asterisks become bold (strong), and words enclosed in underscores become italicized (emphasis). To wit, the line:

<meta name="description" content="isn't |this| _really *cool*_?!?">

in a file called foo.html will result in this list item:

<li>isn't <a href="foo.html">this</a> <em>really <strong>cool</strong></em>?!?"</li>

which looks like this:

isn't this really cool?!?

4.2 list styles

You aren't restricted to using unnumbered lists (<ul></ul>); there are options for choosing several different styles:

	Unnumbered	<ul><li> [link text] </li></ul>
	Numbered	<ol><li> [link text] </li></ol>
	paragraph-based	<div><p> [link text] </p></div>
	``new!''	<ul><li><em>---NEW!---</em> [link text]</li></ul>
	custom	(as of 2.0 this works only in templates)

The custom liststyles enable you to do spiffy stuff... (more later)

4.3 other meta info

It is also possible to use <meta> information in each source file as filters and sorting keys. As of release 1.1, mkhtmlindex can sort by ``category'' when using a template file. Release 2.0 supports some extensive sorting and filtering capability with template files (see below)

4.4 other page characteristics

You can specify the page title (the text that appears as the heading and in the generated file's <title> tags). The program also supports that marvellous invention, Cascading Style Sheets. You can use an interal stylesheet (configurable at compile time), or specify an external stylesheet's URL. Or if you really hate the look of the internal page, you can use a template file of your own design.

4.5 template files

With mkhtmlindex's file templates you can generate much more specialized html files. Basically, the template file is any html file containing some special, custom tags. For example:

<html> 
<head> 
        <title>This is my template</title> 
</head> 
<body>

Here is a list of some files: 
<!insertlist>

Here is another list, numbered: 
<!insertlist numbered>

Here is yet another list, using meta info as the titles: 
<!insertlist meta>

Here is yet another list, using meta info as the titles, unnumbered,
and containing only files which contain a tag <meta name="category"
content="foo">: 

<!insertlist unnumbered meta category="foo">

Here is a list containing only files whose meta date is newer than
27-1-1998: 
<!insertlist unnumbered meta 
             newerthan="27-1-1998" 
             liststyle="newstyle.sty"> 
The custom liststyle in the above tag is supplied in the file "newstyle.sty".

<hr noshade> Here is an ad for this program: 
<!insertcredits>
</body> 
</html>

The above example showcases plenty of features. Read on for descriptions.

As you can see, the template file is just your ordinary, run-of-the-mill HTML file. The interesting part is the specialized comment tags (the old-style, single-tag comment). Currently, you can do two things, which corresponds to two template tags --- inserting a list, and inserting the credits (an ad or banner for the program). The tag is replaced by the credits or the list, or whatever, in the output file.

4.5.1 credits tag syntax

The credits tag is the simpler of the two. There are only two options for the credits: show the date of file creation, and show the url of the program. This batch of credits appears in its own <div> and <p>.

Here's the syntax:

<!insertcredits [showdate] [showurl]>

The text in square brakets ([]) is optional. Note that the tag is case-insensitive, and the options can come in any order, with (just about) any amount of whitespace in between (but try to stay under 256 characters per tag --- otherwise, the program may barf, segfault, or something else nasty).

4.5.2 list tag syntax

This one has plenty of options, but the basic form works very well. To insert a vanilla list of all the files in the current directory (at this point in the file)

<!insertlist>

The options can come in any order, and are case insensitive. (quoted arguments, however, are taken literally.) In essence, you want to describe the setup as follows:

<!insertlist [liststyle] [use meta titles?] [sort key] [sort direction] [filter key]>

List styles

The accepted/understood values for liststyles are:

unnumbered
numbered
paragraph
new
liststyle="filename"

The file supplying the custom liststyle follows this format:

first line: tag that opens the list
second line: tag that closes the list
third line: tag that opens a list item
fourth line: tag that closes a list item
lines five through whatever are completely ignored. (the routine that reads the file calls getline() four times, then closes the file. if you leave any of the first four lines blank, it will simply read in a null string, making that tag empty.)

The tags can be any string of characters. It will be inserted verbatim into the output file. For example, say you want the list to be in one centered paragraph, with a little image called bullet.gif as the bullet. Here's a file that would do the trick:

<p align=center>
</p>
<img src="bullet.gif" alt="o" height=10 width=10> &nbsp; 
<br>

Note that there's an extra space at the end of the third line. This will be included verbatim. Just try it out, and you'll get the hang of it.

use meta titles?

Simply type meta to use meta titles (if they exist in the files). This option is off by default, so just leave it out to turn it off.

sort key

This selects the attribute by which to sort the list. Accepted values:

sortbydate
(more will be added in the future, with a syntax along the lines of sort="attribute")

The default is to sort by filename.

sort direction

Defaults to forward. Accepted values are sortreverse and sortforward.

filter key

Basic syntax:

attribute="string" only files whose attribute attribute matches string are included
attribute!="string" files whose attribute attribute matches string are excluded

Accepted values:

category="string" --- all files having category string
category!="string" --- all files not having category string
newerthan="datestring" --- all files newer than datestring
olderthan="datestring" --- all files older than datestring

(datestring is of the format dd-mm-yy or dd/mm/yy.)

4.6 sorting

to be written...

In version 2.0, sorting really only works from template files...

Unfotunately, there can be only one sorting algorithm in effect at a time. In other words, you can choose to sort either by filename or by date, but you can't have all files with the same date sorted by filename. This is due to some limitiations with the internal architecture of the engine... and should change at some point in the future.

4.7 filtering

filtering only works from template files...

like sorting, the current filtering scheme is limited to one filter, such as newerthan or category. i would like to be able to filter out all files in a certain category that are newer than a certain date, but this will require some relatively large changes in the code and i'm too lazy to do it just yet. this will change in the future.

currently available filter options (as of 2.1.1):

`category=`text	include only files whose category is text
`category!=`text	exclude files whose category is text
`author=`text	include only files whose author is text
`author!=`text	exclude files whose author is text
`newerthan=`datestring	include only files newer than datestring
`olderthan=`datestring	include only files older than datestring

these came straight from TemplateFile.cpp; eventuallyi'll change this to read the template files with a flex parser, and the template tag syntax will be much richer...

to be completed...

4.8 custom liststyles

to be written...

In version 2.0, custom list styles really only work from template files...

4.9 help -- the arguments, etc

here's the usage message from version 2.0:

$ mkhtmlindex --help 
Usage: mkhtmlindex [ options ]
   -h --help      print this message 
   -v --verbose   verbose mode (writes messages to stderr) 
   -q --quiet     suppress everything (implies --overwrite) 
   -V --version   print version information 
   -f --overwrite force overwrite of output file 
   -              write output to stdout 
   -o <filename>  use <filename> as the output file 
   -u write       unnumbered lists (default) 
   -n write       numbered lists 
   -p write       lists in a paragraph style 
   -d --date      show the date of file generation 
   -m --usemeta   use meta description string for the title 
                  fall back is html title, then filename. 
   -T --title <string> 
                  use <string> as the output file's header and title 
   -s --stylesheet [<url>] 
                  use a Cascading Style Sheet to format the output file. 
                  if <url> is specified, it is included as an externally 
                  <link>ed sheet. It may be either a local file or a 
                  fully-qualified URL. 
   -t --template <filename> 
                  use <filename> as a template for the output file 
   -i[=<filename>] -i --ignoreold[=<filename>] 
                  do not include a list item for "index.html" or file 
                  specified by the optional <filename> (this is the default) 
   -I --no-ignore do NOT ignore old output files.

           http://www.asofyet.org/muppet/software/mkhtmlindex.html

There are indeed ``undocumented features,'' but since they're mostly buggy, they shall stay undocumented. If you're really curious, read the source code.

5 Where can I get Documentation?

You're reading it.

The usage message (mkhtmlindex --help), and the source code are also rather helpful.

6 How do I get it, and how much does it cost?

It's free. I don't hold much stock in paying for software (it's ones and zeroes, for cryin' out loud!), and i'd rather not be obligated to take care of it on account of making (very little) money from it, anyway. You can get the current version of mkhtmlindex from my homepage (i.e., http://www.asofyet.org/muppet/software/mkhtmlindex/)... I don't have all that much disk space on the server, so the source distribution is the de facto. If you use a platform that I use, there might be a binary for you.

Otherwise, just mail me (scott arrington) at scott at asofyet dot org.

`mkhtmlindex`, a Hypertext File Index Generator

Table of Contents

1 What's New?

1.1 15 October 2000

1.2 14 April 1999

1.3 5 February 1999

1.4 26 July 1998

1.5 14 July 1998

1.6 18 June 1998

2 What is it?

3 Why?

4 How does it work? How would I use it?

4.1 meta Titles

4.2 list styles

4.3 other meta info

4.4 other page characteristics

4.5 template files

4.5.1 credits tag syntax

4.5.2 list tag syntax

List styles

use meta titles?

sort key

sort direction

filter key

4.6 sorting

4.7 filtering

4.8 custom liststyles

4.9 help -- the arguments, etc

5 Where can I get Documentation?

6 How do I get it, and how much does it cost?