Hour 15: Getting an Overview of a File: Getting an Outline of Your Document

When you are writing a large document, you might at some point want to get an overview, or outline, of your document. You can obtain such an overview by asking Emacs to show you only the heading lines of your document. Emacs must, of course, be told what a headline looks like. Fortunately, this is quite simple, and many major modes contain definitions for it (for example, sgml mode, cc mode, and AucTeX. (see Hours 17, "Editing LaTeX and HTML Files," and 18, "Editing C, C++, and Java Files," for a description of these).

In Figure 15.2, you can see an example of a buffer where only an outline is shown. This buffer contains an HTML document. In HTML heading lines are of the form <h1>...</h1>, where the number one can be exchanged with any number from one to six.

It is important to understand that this HTML document contains hundreds of lines of text, with text below each headline, but the text is hidden, to give an overview of the document.

Accessing the Outline Functions

A major mode and a minor mode exist that offer the outline functionality. Only one major mode can be active at a time, so the outline major mode is of interest only in special buffers where you do not need another major mode. When you edit text, you will most likely have a favorite major mode for this type of text, and you will therefore use the minor mode. All the outline functions in the major mode are bound to the key prefix C-c, whereas they are bound to C-c @ in the minor mode.

The horrible minor mode prefix is in fact the only reason for an outline major mode. Fortunately the key prefix for the outline functions are customizable. Personally I think that the outline functions are so useful that I want them bound to more accessible keys. To bind them all to a different prefix, set the variable outline-minor-mode-prefix to a keybinding. An example might be to bind them to C-o. This prefix is easy to remember, and, because all the outline functions are bound to keys prefixed with C, it is also easy to type.

Unfortunately C-o is already bound to the function open-line. This function might, however, be bound to M-o instead then. You can do this by inserting the following code into your .emacs file:

To start either the major mode or the minor mode, press M-x and type outline-mode (the major mode) or outline-minor-mode (the minor mode).

Hiding/Showing Text

Several functions exist for hiding/showing part of the text. All the functions for showing text are located under the menu entry Show, whereas the functions for hiding text are located in the menu Hide.

A structured document can conceptually be seen as a tree. On a tree, a branch can have several subbranches, each of which can also have several subbranches. Likewise a document can have several headings, each of which can have several subheadings and so on. On a branch of a tree other branches and leaves can be located. Likewise, in a document, a chapter can have several sections in it and several lines of ordinary text. (The lines of text are named body-lines when talking about them in the context of the outline functions.) The parallel between a tree and a document can be seen in Figure 15.3.

The hiding and showing functions come in two classes: those that work on the whole document, and those that work only on the subtree in which point is located. All available functions are shown in Table 15.1. The functions that hide text do not show any text, although the description might sound like that. Likewise functions for showing text do not hide any text. In the table the key prefix for the minor mode is shown; please remember that the key prefix for major mode is C-c.

Table 15.1 Available Outline Commands

*Name and Binding*	*Description*
	Hide--Whole Document
sublevels ( C-c @ C-q.	This collapses the whole tree, showing only the topmost heading lines. Given a numeric prefix, this command shows deeply nested sublevels. M-3 C-c @ C-q thus shows the top most heading lines and two more levels of heading lines.
body ( C-c @ C-t.	This hides all the body lines of the document--that is, showing only the heading lines. This is a very convenient way to get an overview of the whole document.
other ( C-c @ C-o.	This hides everything except the body lines in which point is located and the branches above it. The body lines of the branches above the line with point are also hidden. Note this does not seem to work in GNU Emacs 20.
	Hide--Subtree
entry ( C-c @ C-c.	This hides the body lines of the given headline, but not its subbranches.
leaves ( C-c @ C-l.	This hides all the body lines of the subtree rooted in the given headline, but not the branches. This is equivalent to hide body described previously, but only for the given subtree.
subtree ( C-c @ C-d.	This hides all body lines and branches in the subtree located at the given headline.
	Show--Whole Document
all ( C-c @ C-a.	This shows the whole tree. That is, it unfolds anything hidden by any of the outline functions.
	Show--Subtree
entry ( C-c @ C-e.	This shows the body lines of the given headline. That is, neither the branches nor any of their content.
children ( C-c @ C-i.	This shows the immediate branches of this headline. Section heading lines will be shown for a chapter, but subsections, subsubsections, and body lines are not shown.
branches ( C-c @ C-k.	This shows all branches in the subtree for the given headline.
subtree ( C-c @ C-s.	This shows the whole subtree for the given headline. That is, all heading lines and body lines below the given headline.

Don't get frustrated by all these functions. You will seldom use more than a few of them.

Jumping to Heading Lines

Besides offering the capability to hide some text, the outline mode also offers functions for jumping to the heading lines. All the jumping functions are available under the menu entry Headings. Table 15.2 shows all the commands for jumping between heading lines.

Table 15.2 Commands for Jumping Between Headers

*Name and Binding*	*Description*
Up ( C-c @ C-u.	This command jumps up one level of heading lines. If you, for example, are located in a section, this brings you to the headline of the chapter in which this section is located.
Next ( C-c @ C-n.	This brings you to the next headline (not regarding its depth, only that it is a headline).
Previous ( C-c @ C-p.	This brings you to the previous headline (regardless of depth).
Next same level ( C-c @ C-f.	This brings you to the next headline of the same level. If you, for example, are located in the body text of a section, this brings you to the next section, not regarding that your current section can have several subsections.
Previous same level ( C-c @ C-b.	This brings you to the previous headline of the same level (see previous item).

XEmacs Glyphs

In XEmacs, small figures (called glyphs) are added next to the heading lines in outline-major mode, as can be seen in Figure 15.4.

Clicking the down glyphs shows the children of the headline (that is, the branches, but not the body text or the branches' body text). Clicking it once more then shows the body text of the headline, but not the body lines of the branches. Likewise, clicking the up glyphs hides the body lines on the first click. The second click hides the branches.

Configuring Outline

I hope you have now learned everything that is worth knowing about using the outline mode. Many major modes support outline mode; that is, they contain information for outline mode about the text they are supposed to be used with. You can, for example, use outline mode on an HTML document if you use sgml mode major mode.

You might, however, get in situations where there either is no major mode for the text you are editing or the major mode that exists doesn't support outline mode. In these situations, you need to tell outline mode yourself what a headline looks like and how to figure out the level of a given header if you want to use it.

Two different configuration options exist for telling Emacs about the outline structure. These include the variable outline-regexp, which is used to describe which lines are heading lines (and sometimes even the level of the given headline), and the variable outline-level, which must be set to a Lisp function which tells the level of a given header.

Defining Lisp functions is beyond the scope of this book; fortunately the default value for outline-level is very useful and does not need to be redefined in most cases. What it does is let the length of the text matched with outline-regexp determine the level on which the headline exists.

In most situations it is therefore enough to set the variable outline-regexp to a regular expression that matches more text the deeper a headline is located (see Hour 9, "Regular Expressions"). Three examples will be provided here to give you an idea how to write such a regular expression. All examples are for file formats for which an outline setup already exists. This is merely to give you good examples.

Example 1

In LaTeX there exist chapters, sections, subsections, and so on, written as \chapter, \section, \subsection, \subsubsection, and so on. As stated previously, the regular expression must match more text the more deeply a header is located. This gives us a problem, as the words chapter and section are equally long. The trick here is simply to write a regular expression that matches any character after section, as in Figure 15.5.

Note, in Figure 15.5, the trick is that the dot matches any character, and thus \section is more deeply nested than \chapter.

Example 2

In HTML, there are six header lines: <h1>, <h2>, <h3>, <h4>, <h5>, and <h6>, with <h1> as the topmost header line.

In this situation all header lines have the same length, but fortunately you can use the trick from Figure 15.5, which gives us the following regular expression:

The best--or at least the most beautiful--solution would be to identify lines with <h.> as header lines and then use the function outline-level to extract the number matches to get the level. This does work unless there is no character on the line to match the dots.

Example 3

The final example is from the C programming language. In C, there is no easy way to describe a function definition header line using a regular expression, so here you are in a bit of trouble. Fortunately the C mode contains several functions for indenting the code. This way, function definitions are indented at the lowest level. Statements within for loops are indented with more spaces than the for loop header and so on. Thus you can use the indentation to describe the level of a header. (It's not a header anymore, because every line is regarded as a header line, but with the outline functions it is now possible to hide the body of a for loop.)

The regular expression which could have been used in c mode is shown in Figure 15.6.

Figure 15.6
This regular expression is used to describe outline properties in C.

This regular expression might have been used in c mode, but another one was used that was more effective.

You have now seen three different examples of regular expressions for outline mode. You might see more examples by using igrep to search for outline-regexp in the Lisp files installed as part of your Emacs installation. (If you do not know where that is, search for files ending in .el on your hard disk.)