Documentation and Dissemination of Scientific Results: An HTML approach

Michael Purucker, Raytheon STX and Geodynamics Branch, GSFC

I am using HTML as an alternative model for the dissemination and documentation of scientific results. HTML serves to connect my scientific papers via hypermedia to the underlying computer programs, data sets, images and supporting material that are critical for other investigators who are trying to reproduce or extend my work.

Reproducibility is a practical question in this context. As scientists, we frequently are called on to adapt or reproduce the work of others. One common problem is the loss of the link between the figures and the programs which produced them. HTML provides an easy way of making that link permanent.

A manuscript recently published in Geophysical Research Letters serves as an example of this approach. This paper, Conjugate gradient analysis: A new tool for studying satellite magnetic data sets, was written originally in Latex. It was then converted with a publicly available Perl script called Latex2html to HTML. The script also converts the equations and greek script to inlined GIF images.

Hypertext links were then added to the Methods section of the manuscript in order to connect it to the underlying computer programs which implement the algorithms. The Figure captions were then supplemented by links to the programs which produced the figures and postscript versions of the figures themselves.

When the manuscript was accepted and all changes made, I transfered this hypertext assemblage from the hard drive on my PC, where it resides now, to a CD-ROM using our CD-ROM writer. This CD-ROM, or copies of it, can then be distributed to other researchers wishing to extend or reproduce my work. In this way I will have produced a self-documenting work.

Other workers are also investigating the possibilities of self-documenting works, especially Jon Claerbout of Stanford University. Claerbout has used the phrase 'Reproducible Research' to describe his work.