Friday, July 1, 2011

Documentation ideas

The ideal for documentation creation

I find that the type of documentation I want to write isn't well supported by the tools I've found. In particular, I typically like to write documentation at three levels. First, there should be a high-level overview, defining the overall purpose and ideas behind a module of code. This should be followed by detailed description of every element in a source file. Finally, I'd like to provide a line-by-line commentary of each function, commenting on the particulars of its implementation. This documentation should, as necessary, include equations, diagrams, images, flow charts, videos, etc. Because code changes frequently, all code snippets or references to names within the code should be easily refreshed by applying a tool.

I see variants of this approach in use for several types of documentation. I typically write code for other programmers, rather than end users. Therefore, my "users" will be fellow programmers desiring to make use of a module I've created. For these users, a high-level explanation of a given header file followed by a description of each element of the header, provides all the information they need to make use of the module. For fellow developers, I'd like to present the same high-level overview, this time focusing of the algorithms used to implement elements declared in the header. This information naturally belongs with the source file the header accompanies. Next, a per-element detailed description of the source file might also be accompanies by a line-by-line analysis of some of the subtle portions of the code. (On a side note, I'd like to use this to develop and updated version of the PIC24 book I co-authored).

My dream implementation would be seamless: a fully-featured word-processing program in which I can type code or in-line documentation, including snippets of code in explanatory sections as necessary. All documentation would be transparently encoded in the raw source file. No such tool exists, to the best of my knowledge.

Existing tools

There's nothing new under the sun, including this idea. Its best-known formulation, Literate Programming by none other than Donald Knuth, "regards a program as a communication to human beings rather than as a set of instructions to a computer. Your program is also viewed as a hypertext document, rather like the World Wide Web." (from an associated site). While WEB (Knuth's tool) operates on Pascal to produce TeX documents, a more modern version (CWEB) applies the same process to C. The literate programming site provides additional information on these ideas; several other notable implementations (FunnelWeb, Noweb). The practical result (here's a sample of some code) is that a program is written in CWEB syntax (mixed C and TeX), then transformed to either C or TeX, making it painful (IMHO) to either write documentation or develop code!

An opposite approach is to embed documentation into the source code, simplifying the build process but still requiring a translation step to produce documentation. Doxygen (along with variants such as JavaDoc), my favorite documentation tool which I've used for several years, excels at extracting documentation from code and producing a polished, nicely cross-referenced result -- the middle level (describing each element) of my documentation hierarchy. However, it contains several major flaws, IMHO:
  1. There's no way to directly edit the resulting documentation. I often find a typo or other small correction when browsing through the documentation, which then requires that I dig up the corresponding source, edit it, recompile the docs, and check. This discourages quick fixes.
  2. Writing high-level documentation is painful; editing text then compiling reminds me of all the evils of LaTeX without any of the helpfulness of word wrapping, TexWorks docs-to-source synchronization, or quick compilation.
  3. There's no way to write line-by-line commentary for a detailed look at an algorithm.
  4. Including non-textual media is painful.
  5. Trying to fix syntax errors in the source code documentation tags is painful.
Recently, Python adopted use of Sphinx and reStructured text to produce their documentation, which is very impressive. It seems a step back from Doxygen, since there's no automatic linking to source code, while suffering from all its liabilities. The same is true of other alternatives I've found, such as antiweb.

Proposed solution

So, I'd like to create yet another documentation tool, in the (most likely vain) hope it will have some impact. My ideas:

  1. I'd like to be able to open some source code in a modern, fully-featured word processor, add documentation (images, diagrams, etc.), then save the result (including any changes I made to the code) back to both the source file and its accompanying documentation file.
  2. The program should support documenting only selected portions of the code; for example, I'd typically omit a copyright notice appearing at the top of every file. It should allow adding comments to arbitrary snippets of code, rather than just as the API level (Doxygen's strength), and placing multiple copies of these snippets in arbitrary order within the code.
  3. All snippets should be auto-refreshable by reflecting any changes made to the source code. They should follow any source code changes such as moving code around, changing names, etc.
After pondering how I can implement this in as simple a fashion as possible, I've converged on the following design:
  1. Label the start of a snippet with a tag marked by rarely-used delimiters, such as &|tag|&.
  2. Auto-generate these tags when the documentation file is edited then saved.
  3. Auto-refresh all snippets when the documentation file is opened by matching source code tagged snippets with their tagged snippets in the documentation file.
I've chosen Microsoft Word as a word processor and begun writing code in VBA (Visual Basic for Applications), Word's macro language. There's little good documentation on the language I've found; the built-in help is poor, MSDN lacks in many areas, and even searching the web produces mediocre results. I may purchase a book to help. I haven't found a good unit-testing framework for Word; a framework for Excel seems fairly tied to that platform.

So far, I've written code that divides a source file into named snippets; not bad progress, but there's much more to do. I should probably next:
  1. Write unit tests, which I should have done first.
  2. Create a good, high-level document to describe all this in more detail.

No comments:

Post a Comment