LaTeX typesetting as a programming language

LaTeX is a system for typesetting documents that replaces the notion of What-You-See-Is-What-You-Get with standard ideas from conventional programming languages. Mark Harman demonstrates the power of this inheritance.

Anyone who has used a What-You-See-Is-What-You-Get (WYSIWYG) editor, wordprocessor or DTP package will have probably had two frustrations:-

WYSIWYG always seems to be a lie – What you see is rather similar to what you get, or what you see is almost always what you get, or what you see would be what you got had your printer had the right fonts, but what you see is seldom exactly what you get.

WYSIWYG is restrictive – Suppose you can’t get to see what you want? What you see is what you get has the additional, and tacit implication that what you cannot manage to see you also cannot get. How often have you wanted something to look slightly different (or perhaps very different) but your editor will not allow it?

LaTeX (pronounced Lateck) is a typesetting system which does not follow the WYSIWYG approach. Instead, it is inspired by programming languages. It inherits all the advantages of programming languages and some of their disadvantages. Instead of composing a document ‘at the screen’ (the WYSIWYG approach), a LaTeX document is a program which tells the LaTeX system how to create the document. The program is compiled using a LaTeX compiler to produce a document which can be printed or viewed.

This may sound a little odd to someone familiar with the WYSIWYG approach, but anyone who enjoys (or appreciates) the power and flexibility of a high-level programming language, will soon find that LaTeX is simply a better way of designing documents.

In this article, I will explain a little of the LaTeX language, enough to allow you to download a free LaTeX system and to write some normal documents. There will not be time to cover all of the features of LaTeX (this would take a whole book), but I hope to leave you with a strong feeling for ways in which writing a document could, using LaTeX, be a similar activity to writing a program.

A simple LaTeX source file

Figure 1 contains a simple LaTeX source file. The first line is a predefined LaTeX command. All commands start with a backslash character. The command in the first line establishes the global properties of the document to be typeset. The document style ‘article’ is the style used for a short article. Other styles include book, report, thesis and so on. Each document style changes global parameters which describe the layout of your document. For example, in the book environment, running headers are produced giving the chapter title and author on alternate pages.

Like all good programming languages, all this is, of course, entirely configurable. However, like most programming languages, the more flexibility you want, the more you will find you need to know of the underlying programming language. Fortunately the default settings for all of the LaTeX environments give very pleasing results, so it is possible to get a long way without having to know too much of the underlying language. What you get is likely to be what you want, and if it is not, then at least you will be able to change it.

The text of the document itself is contained within the commands \begin{document} ... \end{document}, \begin{} and \end{} are commands which open and close an environment. All documents (and fragments of documents are typeset within an environment. We can also nest environments, as we shall see later.

Space characters are unimportant to LaTeX; one space is a good as one hundred. New lines can also be inserted anywhere, but two or more new lines are used to denote the point at which one paragraph finishes and another starts. When you print out the final document, LaTeX will left and right justify the text (inserting hyphens, where pure justification would lead to unattractive output).

A large document usually consists of a number of sections, which may contain subsections. A new section is introduced into LaTeX using the \section command, and a new subsection with the \subsection command. The LaTeX source code in Figure 2 describes a document with two sections, the titles of which will be ‘introduction’ and ‘rationale’. Notice that, in the source code we do not need to give the sections a section number. LaTeX will do this for us when it compiles the document. Thus, ‘introduction’ will be section number 1 and ‘rationale’ will be section number 2. If we were to swap the order in which the two sections occur (by cutting and pasting the source code), then ‘rationale’ will become section 1, while ‘introduction’ will become section 2 – automatically.

An important question now arises: How would I cross-reference from one section to another? For example, suppose I want to refer to the section ‘introduction’ in the section ‘rationale’. The way that this is achieved illustrates the first advantage we gain from the LaTeX way of doing things.

Symbolic references

Because a LaTeX source file is a program, you can use symbolic names to refer to parts of the document. This makes cross-referencing a pleasure, as the cross-reference is a logical entity, referring to some named part of the document. If this named part of the document should be moved, then all we have to do is recompile.

To introduce a symbolic reference, we use the \label{} command, and to refer to it we use the \ref{} command. Figure 3 illustrates this. \label{intro} introduces a symbolic name, label, whose value depends upon the context in which the \label command appears. In this case, since the \label command is used in the first section of the document, the value assigned to ‘intro’ will be 1. The \ref{} command simply produces the value of the label. Now if I move the introduction to a new position, for instance, after the ‘rationale’ section, the value of intro will change to 2, and the cross reference in ‘rationale’ will thus point to the new location of the section ‘introduction’.

This style of writing forces us to think of the document at a logical level rather than at the physical level. It would be foolish to write ‘as we saw earlier in section \ref{intro}’ for example, because we may move the label ‘intro’ to a point after the reference. Instead of thinking of our document as a monolith of text occurring in a particular order, we think of it at a higher level of abstraction, as a collection of sections which we are free to move around. We can even reuse sections from one document in another, and providing our symbolic names are unique, we will find that all the cross references work out correctly.

Some more environments

LaTeX has lots of useful predefined environments. Suppose we want to produce a sequence of points using bullets. We can do this with the ‘itemize’ environment, the source code in Figure 4 will produce a document which lists the three principal states of matter, one per line and each preceded by a bullet point mark. In many respects the LaTeX way of designing a document is similar to the HTML way of doing things. For example, the itemize environment is rather like the unsorted list environment in HTML.

Sometimes we want to put items into a sorted, numbered list. This is achieved with the enumerate environment. Figure 5, shows a nested sequence of enumerated items, describing the four eras of geological time and the periods within them. LaTeX uses different numbering systems for each level of nesting (Arabic numerals for level one, alphabetic characters for level two, roman numerals for level three). This, as with everything else, can be changed if we so wish.

To emphasise a portion of text, it is enclosed in the ‘em’ (emphasise) environment, so we simply write ‘\begin{em} help! \end{em} she cried.’ to emphasise the word ‘help’ (and the exclamation mark which follows it).

Procedures

In a conventional programming language, the ability to define procedures gives the programmer considerable flexibility. In LaTeX too, we can define procedures for laying out text. The simplest form of procedure is a parameterless one. It allows us to name some portion of source code and then call it up. Suppose I am writing a document in which I want to refer to an item of fruit, but I haven’t yet decided whether it is to be an apple, orange or pear. I could introduce a procedure called ‘fruit’, and put an arbitrary fruit name in its body. When I’ve finally decided which fruit I want to refer to, then I will only have to change the body of the procedure; all the points at which the procedure is called will then automatically take account of the change to its body.

In LaTeX, a procedure is called a command, and a new one is created with the command \newcommand. Commands are often called macros because LaTeX expands calls to these when it meets them in the body of the document. Figure 6 illustrates the use of a simple parameterless macro. When compiled, the LaTeX source in this figure produces the text ‘The first apple to appear will be the first apple I shall eat.’

To be entirely fair, this could be achieved, perhaps more simply, with a WYSIWYG wordprocessor, simply by performing a search and replace. (Of course, this would not have worked had the sentence been ‘the first \fruit to appear is the apple of my eye’!) This is, however, only a simple example of what we can do with LaTeX macros. They really come into their own when we provide them with parameters.

Parameters

Suppose I am writing a document about array handling. I might want to describe an algorithm for finding the largest element of an array. To make the document more generic and to save retyping large sections of it, I could produce two versions, each specific to a particular programming language, for example, Basic and C. Using commands, I can avoid using the particular syntax of arrays, or at least I can capture the syntactic differences in a single command, making it far easier to adapt my document to different programming languages.

Figure 7 illustrates this. In the command definition for \lookup, the [2] tells the LaTeX compiler that the command takes two parameters, the first is referred to as #1 and the second as #2. In a call to a command, the parameters are supplied one after the other in curly brackets. So the call \lookup{S}{2}, will produce the text ‘S(2)’. This is the Basic version of the \lookup command. If we replace it with the version in Figure 8, then we obtain the same document, but with array references in square brackets. This is the C version of the document. Notice that the difference between the two LaTeX source documents is precisely two characters, namely, the two characters which make up the difference between array referencing in Basic and C.

As with programming language procedures, it is possible to call one procedure from the body of another and to use the result of a procedure call as the actual parameter to another. So, for example, we can write \lookup{A}{\lookup{B}{1}} which produces either the text ‘A(B(1))’ or ‘A[B[1]]’ depending upon whether we are using the Basic or the C version of the \lookup command.

Variables

LaTeX has its own variables, on which we can perform simple arithmetic (more advanced forms of arithmetic are possible, but addition is usually all that is required for typesetting). I will look at two simple examples of the way in which we might use variables, both of which will be familiar to programmers; the counter variable and the flag variable.

Suppose we want to include a sequence of numbered points in a document. We can use a counter variable to number each point, and write a few simple commands to control the numbering. Figure 9 illustrates this. The counter is declared using the command \newcounter. It is set to a specific value using the command \setcounter. The command \point is used to printout the current point number and to step the counter (to add one to its value). The command \the<name>, for some counter <name>, causes the value of the variable to be printed. This command can be used with any variable, not just those that the user has introduced, so for example \thesection prints out the current value of the section variable. In Figure 9 we use the \point command to print out three points. A nice feature of this approach is that we can vary the order in which the points occur and the numbering will change accordingly.

Now let’s see how we can use variables as flags to choose what text is produced in a document. As we shall see, the combination of flags and macros allows us to write very generic documents, which can be instantiated simply by choosing a suitable value for the flag. Consider again the problem of writing a document about arrays, where we wanted two forms of the command \lookup, one for Basic and one for C. It would be better if we could use a flag in our LaTeX source to indicate whether the language was to be C or Basic. All that we would then have to do is to give the flag the correct value before compiling the document. Figure 10 illustrates this.

The first thing to do is to include the option ‘ifthen’ in the documentstyle declaration. This allows us to use the command \ifthenelse later on. Next we declare a counter variable, ‘language’, which is set to 1 if the language is to be Basic and to 0 if it is to be C. The % symbol is used by LaTeX for comments; any text which appears after a % symbol (and before the end of the line) is ignored by the LaTeX compiler. Next we set the counter to 1, using the command \setcounter{language}{1}, so the text we shall produce will, in this case, be specialised for Basic. This specialisation is achieved using the modified version of the \lookup command. The new version of \lookup uses the built in command \ifthenesle to test the value of the ‘language’ variable. The format of an \ifthenelse command is \ifthenelse{<test>}{<then_branch>}{<else_branch>}. It behaves just like an if statement in a conventional programming language. If <test> evaluates to true, the text in the text in the <then_branch> is produced, if false, the text in the <else_branch> is produced.

Using this flag we might write lots of commands, each of which produces the text for a particular kind of statement, the language depending upon the value of the flag counter variable. In this way we could write a generic document about programming and simply set the flag appropriately to produce the specialised version of the document we want.

Figure 10 shows how we might do this. We define commands which produce Basic or C syntax for array lookup (using the \lookup macro, as described above), array updating and, more elaborately, a command which produces the appropriate syntax for a ‘for’ loop. The last of these requires some further explanation.

The difference between a for loop in C and in Basic is largely syntactic, and we can use the flexibility of LaTeX to escape from these syntactic details. The command \forloop used the flag counter ‘language’ to decide whether to lay out the four elements of the loop in Basic or C style. This allows us to write some text about array initialisation and loops, without having to decide which language the target document will refer to.

Notice that in the C version of the for loop syntax the curly brackets that enclose the statements of the body of the loop are written as \{ ... \}, rather than as { ... }. This is because the curly bracket symbols already have a meaning to LaTeX, so to get it to print curly brackets we prefix them by a backslash.

In Figure 10, we set the counter language to 1, so the output produced will be for Basic. From the source code in Figure 10, LaTeX will produce the output in Figure 11. If we want to produce a document which says the same thing about C arrays, we simply have to change the line \setcounter{language}{1} to \setcounter{language}{0}. It’s that easy.

Mathematics

LaTeX is often (and rightly) praised for the way in which it allows for the typesetting of complex mathematics. Many modern mathematics, computing and other science and engineering texts are typeset using LaTeX.

Mathematical text can either be laid out ‘in-line’, in which case it appears in the sentence in which it is typed, or in ‘display mode’, in which case it appears centred on a line of its own – ‘displayed’ as it were. All the standard mathematical symbols and forms of text are provided for using commands. As LaTeX has been around for so long and has been used developed and enhanced by so many mathematicians around the world, it is extremely unlikely that there exists any form of mathematical output that has not been catered for by someone. A quick trawl through your bookshelf will probably reveal acknowledgements to LaTeX in several computing and mathematics textbooks, as it is often used to prepare technical books, allowing the author(s) to provide camera-ready copy for their publishers.

There is also a thriving LaTeX user community which ensures that all of this valuable information is collected, maintained and updated. All LaTeX developments are entirely backwards compatible, so there’s no need to worry that your documents will somehow ‘go out of date’.

Reuse

I estimate that it takes between two days and one week to become productive using LaTeX. Many readers may consider this unacceptable when compared to the lead-in time for WYSIWYG editors. Certainly, if you only have to prepare documents such as letters and memos then LaTeX is probably not worth considering. However, if you are concerned with the production of a large amount of text and are prepared to invest in a system that could ultimately save you months of work, then LaTeX may be the answer.

One of the most intangible, yet most attractive advantages experienced by LaTeX users comes from the way in which, like a good programming language, LaTeX supports reuse. Very quickly you will find yourself building up a set of your own personal macros, which will allow you to tailor your documents to your own taste. Reusing parts of a document in another is achieved effortlessly and seamlessly. The seamlessness derives from two aspects of the LaTeX approach. The symbolic naming of parts of a document allows cross-references to be updated automatically as the document is edited. The concept of an environment means that the same piece of source text may look different when included in different contexts. Of course, this directly contradicts the WYSIWYG principle, but this is the essential strength of LaTeX. Many computing journals [but not EXE, still battling with WYSIWYG – Ed], conferences and publishers provide their own LaTeX ‘style files’, which, when included in a LaTeX source file, automatically lay the document in the form required for publication.

Where to go next

If you are interested in trying out LaTeX for yourself, an MS-Windows version can be obtained (for free) from http://www.eece.ksu.edu/~khc/tex.html. LaTeX comes as standard on most UNIX platforms, and with most Linux distributions, so if you’re using one of these try typing ‘man latex’. There is an FTP site containing many useful LaTeX tools, macros and related documents at ftp.tex.ac.uk.

There are two indispensable books on the subject of LaTeX document writing. Both are highly readable and informative. LaTeX A Document Preparation System by Leslie Lamport (ISBN 0-201-15790-X), describes the basic system and is a very good book to get started with. It contains enough information to write immediately most normal documents. The LaTeX Companion by Mike Goossens, Frank Mittelbach and Alexander Samarin (ISBN 0-201-54199-8), is more detailed, and covers all the new features added to LaTeX by the LaTeX2e project. This book is useful if you want to write a great many documents using LaTeX and to customise the language to your own tastes. It explains how to achieve all sorts of exotic effects, such as laying text out in a heart shape (perhaps useful for certain documents written just before the 14th of February). Both these books are published by Addison-Wesley.

Logical is better

The LaTeX document preparation system has evolved and improved over the years. It is extremely robust and provides features for writing documents to publishable standards containing text and mathematics. A LaTeX document is described using a programming language, which gives the LaTeX user all the power and flexibility of a conventional programming language. The style of writing forces the user to view documents at the level of their logical organisation, rather than their physical appearance. This is initially a little frustrating, but ultimately it has many advantages such as supporting reuse and creating generic documents which may have several physical instantiations.

Mark Harman is director of research and acting head at the School of Informatics and Multimedia Technology in the University of North London (http://www.unl.ac.uk/~mark/welcome.html). He can be contacted via email at m.harman@unl.ac.uk or by post to Mark Harman, Project Project, School of Informatics and Multimedia Technology, University of North London, Holloway Road, London N7 8DB.

Figure 1 – A Simple LaTeX document.

\documentstyle{article}

\begin{document}

hello world

\end{document}

Figure 2 – Sections

\documentstyle{article}

\begin{document}

\section{Introduction}

This is a fairly short document and this is its introduction.

\section{Rationale}

The document is so short because it is simply and example.

\end{document}

Figure 3 – Symbolic references

\documentstyle{article}

\begin{document}

\section{Introduction}

\label{intro}

This is a fairly short document and this is its introduction.

\section{Rationale}

A brief introduction to this document can be found in section \ref{intro}.

\end{document}

Figure 4 – The Itemize environment.

\begin{itemize}

\item Solid

\item Liquid

\item Gas

\end{itemize}

Figure 5 – The enumerate environment.

\begin{enumerate}

\item Cenozoic

\begin{enumerate}

\item Quaternary

\item Tertiary

\end{enumerate}

\item Mesozoic

\begin{enumerate}

\item Cretaceous

\item Jurassic

\item Triassic

\end{enumerate}

\item Paleozoic

\begin{enumerate}

\item Permian

\item Carboniferous

\item Devonian

\item Silurian

\item Ordovician

\item Cambrian

\end{enumerate}

\item Precambrian

\end{enumerate}

Figure 6 – Parameterless commands.

\documentstyle{article}

\newcommand{\fruit} { apple }

\begin{document}

The first \fruit to appear will be the first \fruit I shall eat.

\end{document}

Figure 7 – Parameters: Basic version.

\documentstyle{article}

\newcommand{\lookup} [2] { #1(#2) }

\begin{document}

To find the biggest element of the array A, store the first element, \lookup{A}{0}, in the variable b. Next enter a loop, controlled by the variable i, starting 1 and proceeding to the end of the array. At each point in the loop, compare element i, \lookup{A}{i}, with the value in b. If \lookup{A}{i} is larger than b, then assign \lookup{A}{i} to b.

\end{document}

Figure 8 – Parameters: C version.

\documentstyle{article}

\newcommand{\lookup} [2] { #1[#2] }

\begin{document}

To find the biggest element of the array A, store the first element, \lookup{A}{0}, in the variable b. Next enter a loop, controlled by the variable i, starting 1 and proceeding to the end of the array. At each point in the loop, compare element i, \lookup{A}{i}, with the value in b. If \lookup{A}{i} is larger than b, then assign \lookup{A}{i} to b.

\end{document}

Figure 9 – Counter variables.

\newcounter{pointnumber}

\setcounter{pointnumber}{1}

\newcommand{\point} { Point \thepointnumber \stepcounter{pointnumber} }

\point

Some text associated with one of the points

\point

Some text associated with another point

\point

Yet another point

Figure 10 – Flag variables.

\documentstyle[ifthen]{article}

\newcounter{language} % set to 1 for Basic and 0 for C

\setcounter{language}{1}

\newcommand{\lookup}[2]

{

\ifthenelse{\value{language} = 1} {#1(#2)} {#1[#2]}

}

\newcommand{\update}[3]

{

\ifthenelse{\value{language} = 1} {LET #1(#2) = #3} {#1[#2] = #3}

}

% The forloop command takes four parameters

% 1. The lower bound of the loop - an integer or integral expression.

% 2. The upper bound of the loop - an integer or integral expression.

% 3. The loop control variable - an integral variable.

% 4. The body of the loop - a sequence of statements.

% The flag counter language, is used to determine the language in which

% the syntax of the loop is written.

\newcommand{\forloop}[4]

{

\ifthenelse{\value{language} = 1}

{

FOR #3 = #1 TO #2

#4

NEXT #3

}

{

for(#3=#1;#3 != #2;#3++)

\{

#4

\}

}

\begin{document}

To store the value 10 in element number 3 in the array A, we write \update{A}{3}{10}.

To initialise the elements 0 through to 10 in the array A with the initial value 0, we can use a for loop, starting at 0 and going up to 10. This would be written like this

\forloop{0}{10}{i}{\update{A}{i}{0}}

\end{document}

Figure 11 – The result of compiling the LaTeX source in Figure 10.

To store the value 10 in element number 3 in the array A, we write LET A(3) = 10.

To initialise the elements 0 through to 10 in the array A with the initial value 0, we can use a for loop, starting at 0 and going up to 10. This would be written like this

FOR i = 0 TO 10

LET A(i) = 0

NEXT i

(P)1997, Centaur Communications Ltd. EXE Magazine is a publication of Centaur Communications Ltd. No part of this work may be published, in whole or in part, by any means including electronic, without the express permission of Centaur Communications and the copyright holder where this is a different party.

EXE Magazine, St Giles House, 50 Poland Street, London W1V 4AX, email editorial@dotexe.demon.co.uk