The interaction of all the different regexp definitions, overlay properties and auto-overlay classes provided by the auto-overlay package can be a little daunting. This section will go through an example of how the auto-overlay regexps could be defined to create overlays for a subset of LaTeX, which is complex enough to demonstrate most of the features.
LaTeX is a markup language, so a LaTeX document combines markup commands with normal text. Commands start with ‘\’, and end at the first non-word-constituent character. We want to highlight all LaTeX commands in blue. Two commands that will particularly interest us are ‘\begin’ and ‘\end’, which begin and end a LaTeX environment. The environment name is enclosed in braces: ‘\begin{environment-name}’, and we want it to be highlighted in pink. LaTeX provides many environments, used to create lists, tables, titles, etc. We will take the example of an ‘equation’ environment, used to typeset mathematical equations. Thus equations are enclosed by ‘\begin{equation}’ and ‘\end{equation}’, and we would like to highlight these equations in yellow. Another example we will use is the ‘$’ delimiter. Pairs of ‘$’s delimit mathematical expressions that appear in the middle of a paragraph of normal text (whereas ‘equation’ environments appear on their own, slightly separated from surrounding text). Again, we want to highlight these mathematical expressions, this time in green. The final piece of LaTeX markup we will need to consider is the ‘%’ character, which creates a comment that lasts till the end of the line (i.e. text after the ‘%’ is ignored by the LaTeX processor up to the end of the line).
LaTeX commands are a good example of when to use word
regular
expressions (see Overview). The appropriate regexp definition is
loaded by
(auto-overlay-load-definition 'latex '(word ("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" (face . (background-color . "blue")))))
We have called the regexp set latex
. The face
property is
a standard Emacs overlay property that sets font properties within the
overlay. See Overlay Properties. "\\\\"
is the string defining the regexp that matches a
single ‘\’. (Note that the ‘\’ character has a special
meaning in regular expressions, so to include a literal one it must be
escaped: ‘\\’. However, ‘\’ also has a special meaning in lisp
strings, so both ‘\’ characters must be escaped there too, giving
\\\\
.) [[:alpha:]]*?
matches a sequence of zero or more
letter characters. The ?
ensures that it matches the
shortest sequence of letters consistent with matching the regexp,
since we want the region to end at the first non-letter character,
matched by [^[:alpha:]]
. The \|
defines an alternative, to
allow the LaTeX command to be terminated either by a non-letter
character or by the end of the line ($
). See Regular Expressions, for more details
on Emacs regular expressions.
However, there's a small problem. We only want the blue background to
cover the characters making up a LaTeX command. But as we've defined
things so far, it will cover all the text matched by the regexp, which
includes the leading ‘\’ and the trailing non-letter character. To
rectify this, we need to group the part of the regexp that matches the
command (i.e. by surround it with ‘\(’ and ‘\)’), and put the
regexp inside a cons cell containing the regexp in its car
and a
number indicating which subgroup to use in its cdr
:
(auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
The ‘$’ delimiter is an obvious example of when to use a
self
regexp (see Overview). We can update our example to
include this (note that ‘$’ also has a special meaning in regular
expressions, so it must be escaped with ‘\’ which itself must be
escaped in lisp strings):
(auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue"))))) (auto-overlay-load-definition 'latex '(self ("\\$" (face . (background-color . "green")))))
This won't quite work though. LaTeX maths commands also start with a
‘\’ character, which will match the word
regexp. For the
sake of example we want the entire equation highlighted in green,
without highlighting any LaTeX maths commands it contains in
blue. Since the word
overlay will be within the self
overlay, the blue highlighting will take precedence. We can change this
by giving the self
overlay a higher priority (any priority is
higher than a non-existent one; we use 3 here for later
convenience). For efficiency reasons, it's a good idea to put higher
priority regexp definitions before lower priority ones, so we get:
(auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
The ‘\begin{equation}’ and ‘\end{equation}’ commands also
enclose maths regions, which we would like to highlight in yellow. Since
the opening and closing delimiters are different in this case, we must
use nested
overlays (see Overview). Our example now looks like:
(auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
Notice how we've used separate start
and end
regexps to
define the auto-overlay. Once again, we have had to escape the ‘\’
characters, and increase the priority of the new regexp definition to
avoid any LaTeX commands within the maths region being highlighted in
blue.
LaTeX comments start with ‘%’ and last till the end of the line:
a perfect demonstration of a line
regexp. Here's a first attempt:
(auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue"))))) (auto-overlay-load-definition 'latex `(line ("%" (face . (background-color . ,(face-attribute 'default :background))))))
We use the standard Emacs face-attribute function to retrieve the default background colour, which is evaluated before the regexp definition is loaded. (This will of course go wrong if the default background colour is subsequently changed, but it's sufficient for this example). Let's think about this a bit. We probably don't want anything within a comment to be highlighted at all, even if it matches one of the other regexps. In fact, creating overlays for ‘\begin’ and ‘\end’ commands which are within a comment could cause havoc! If they don't occur in pairs within the commented region, they will erroneously pair up with ones outside the comment. We need comments to take precedence over everything else, and we need them to block other regexp matches, so we boost the overlay's priority and set the exclusive property:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
We're well on our way to creating a useful setup, at least for the LaTeX commands we're considering in this example. There is one last type of overlay to create, but it is the most complicated. We want environment names to be highlighted in pink, i.e. the region between ‘\begin{’ and ‘}’. A first attempt at this might result in:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("}" :edge end (priority . 2) (face . (background-color . "pink"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
However, we'll hit a problem with this. The ‘}’ character also closes the ‘\end{’ command. Since we haven't told auto-overlays about ‘\end{’, every ‘}’ that should close an ‘\end{’ command will instead be interpreted as the end of a ‘\start{’ command, probably resulting in lots of unmatched ‘}’ characters, creating pink splodges everywhere! Clearly, since we also want environment names between ‘\end{’ and ‘}’ to be pink, we need something more along the lines of:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\end{" :edge start (priority . 2) (face . (background-color . "pink"))) ("}" :edge end (priority . 2) (face . (background-color . "pink"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
We still haven't solved the problem though. The ‘}’ character doesn't only close ‘\begin{’ and ‘\end{’ commands in LaTeX. All arguments to LaTeX commands are surrounded by ‘{’ and ‘}’. We could add all the commands that take arguments, but we don't really want to bother about any other commands (at least in this example). All we want to do is prevent predictive mode incorrectly pairing the ‘}’ characters used for other commands. Instead, we can just add ‘{’ to the list:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("{" :edge start (priority . 2)) ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\end{" :edge start (priority . 2) (face . (background-color . "pink"))) ("}" :edge end (priority . 2)))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
Notice how the {
and }
regexps do not define a
background colour (or indeed any other properties), so that any overlays
they create will have no effect other than making sure all ‘{’ and
‘}’ characters are correctly paired.
We've made one mistake though: by putting the {
regexp at the
beginning of the list, it will take priority over any other regexp in
the list that could match the same text. And since {
will match
whenever \begin{
or \end{
matches, environments will
never be highlighted! The {
regexp must come after the
\begin{
and \end{
regexps, to ensure it is only used if
neither of them match (it doesn't matter whether it appears before or
after the {
regexp, since the latter will never match the same
text):
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\end{" :edge start (priority . 2) (face . (background-color . "pink"))) ("{" :edge start (priority . 2)) ("}" :edge end (priority . 2)))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
There is one last issue. A literal ‘{’ or ‘}’ character can
be included in a LaTeX document by escaping it with ‘\’:
‘\{’ and ‘\}’. In this situation, the characters do not
match anything and should not be treated as delimiters. We can modify
the {
and }
regexps to exclude these cases:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\end{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\([^\\]\\|^\\){" :edge start (priority . 2)) ("\\([^\\]\\|^\\)}" :edge end (priority . 2)))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
The new, complicated-looking regexps will only match ‘{’ and
‘}’ characters if they are not preceded by a ‘\’
character (see Regular Expressions). Note that the character alternative [^\]\|^
can match
any character that isn't a ‘\’ or the start of a line. This
is required because macthes to auto-overlay regexps are not allowed to
span more than one line. If ‘{’ or ‘}’ appear at the
beginning of a line, there will be no character in front (the newline
character doesn't count, since it isn't on the same line), so the
[^\]
will not match.
However, when it does match, the }
regexp will now match an
additional character before the }
, causing the overlay to end
one character early. (The {
regexp will also match one
additional character before the {
, but since the beginning of
the overlay starts from the end of the start
delimiter,
this poses less of a problem.) We need to group the part of the regexp
that should define the delimiter, i.e. the }
, by surrounding it
with \(
and \)
, and put the regexp in the car
of a
cons cell whose cdr
specifies the new subgroup (i.e. the 2nd
subgroup, since the regexp already included a group for other reasons;
we could alternatively replace the original group by a shy-group, since
we don't actually need to capture match data for that group). Our final
version looks like this:
(auto-overlay-load-definition 'latex `(line ("%" (priority . 4) (exclusive . t) (face . (background-color . ,(face-attribute 'default :background)))))) (auto-overlay-load-definition 'latex '(self ("\\$" (priority . 3) (face . (background-color . "green"))))) (auto-overlay-load-definition 'latex '(nested ("\\begin{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\end{" :edge start (priority . 2) (face . (background-color . "pink"))) ("\\([^\\]\\|^\\){" :edge start (priority . 2)) (("\\([^\\]\\|^\\)\\(}\\)" . 2) :edge end (priority . 2)))) (auto-overlay-load-definition 'latex '(nested ("\\begin{equation}" :edge start (priority . 1) (face . (background-color . "yellow"))) ("\\end{equation}" :edge end (priority . 1) (face . (background-color . "yellow"))))) (auto-overlay-load-definition 'latex '(word (("\\\\[[:alpha:]]*?\\([^[:alpha:]]\\|$\\)" . 1) (face . (background-color . "blue")))))
With these regexp definitions, LaTeX commands will automatically be highlighted in blue, equation environments in yellow, inline maths commands in green, and environment names in pink. LaTeX markup within comments will be ignored. And ‘{’ and ‘}’ characters from other commands will be correctly taken into account. All this is done in “real-time”; it doesn't wait until Emacs is idle to update the overlays. Not bad for a bundle of regexps!
Of course, this could all be done more easily using Emacs' built-in syntax highlighting features, but the highlighting was only an example to show the location of the overlays. The main point is that the overlays are automatically created and kept up to date, and can be given any properties you like and used for whatever purpose is required by your Elisp package.