Emacs data structure packages

16 August 2017 These packages provide basic (and not so basic) data structures. They are all relatively stable, though bug-fixes and new features are added occasionally. (Latest update: August 2017).

In recent versions of Emacs (>=24.1), you can install all the non-obsolete packages from within Emacs itself, via GNU ELPA. Use M-x list-packages and take it from there. This is the preferred installation method. (Occasionally, the ELPA version might lag slightly behind the latest version available here.)

  • Git repository: http://www.dr-qubit.org/git/predictive.git
  • heap.el (version 0.5)
  • queue.el (version 0.2)
  • avl-tree.el (already included in Emacs >=24.1)
  • tNFA.el (version 0.1.1; requires queue.el)
  • tstree.el (obsolete! Use trie.el instead)
  • trie.el (version 0.4; requires everything except dict-tree.el)
  • dict-tree.el (version 0.14; requires everything else)

Documentation

The functions these packages provide are well documented using Emacs' built-in documentation features. Brief descriptions of the data structures follow:

Heaps

A heap is a form of efficient self-sorting tree. In particular, the root node is guaranteed to be the highest-ranked entry in the tree. (The comparison function used for ranking the data can, of course, be freely defined). They are often used as priority queues, for scheduling tasks in order of importance, and for implementing efficient sorting algorithms (such as heap-sort).

Queues

A queue can be used both as a first-in last-out and as a first-in first-out stack, i.e. elements can be added to and removed from the front or back of the queue. (This library is an updated re-implementation of the old Elib queue library.)

Tagged Non-deterministic Finite state Automata

Features of modern regexp implementations, including Emacs', mean they can recognise much more than regular languages. This comes with a big downside: matching certain pathological regexps is very time-consuming. (In fact, it's NP-complete.)

A tagged, non-deterministic finite state automata (NFA) is an abstract computing machine that recognises regular languages. In layman's terms, they are used to decide whether a string matches a regular expression.1 1Regexps are not regular expressions! The "tagged" part lets the NFA do group-capture: it returns information about which parts of a string matched which subgroup of the regular expression.

Why re-implement regular expression matching when Emacs comes with extensive built-in support for regexps? Primarily, because some algorithms require access to the NFA states produced part way through the regular expression matching process. Secondarily, because Emacs regexps only work on strings, whereas regular expressions can equally well be used to match other Lisp sequence types.

  • tNFA.el (version 0.1.1; requires queue.el)

Tries2 2The ternary search tree package has been obsoleted by the trie package. The former is no longer supported.

A trie stores data associated with "strings" (not necessarily the string data type; any ordered sequence of elements can be used). It stores them in such a way that both storage size and data lookup are reasonably space- and time- efficient, respectively. But, more importantly, advanced string queries are also very efficient, such as finding all strings with a given prefix, finding approximate matches, finding all strings matching a regular expression, returning results in alphabetical or any other order, returning only the first few results, etc. etc.

  • trie.el (version 0.4; requires everything except dict-tree.el)

Dictionary trees

The dictionary tree data structures are a hybrid between tries and hash tables. Data is stored in a trie, but results that take particularly long to retrieve are cached in hash tables, which are automatically synchronised with the trie. The dictionary package provides persistent storage of the data structures in files, and many other convenience features.

Manual download and installation

To install, simply copy the files to a directory in your Emacs load-path.

If you want to see if there are any bleeding-edge changes, the very latest "development" versions of the data structure libraries are hosted in the same git repository as the Predictive Completion package. Note that the git repository URL is a git repository, not a web-site. You cannot view it in a web browser. To grab the latest development version, clone the repository using something like:

git clone http://www.dr-qubit.org/git/predictive.git

Note that the avl-tree.el library is an enhanced version of the library of the same name that comes with Emacs, and is a drop-in replacement for that original version. If you don't want to overwrite that version, make sure you save this replacement version to a location that appears earlier in your Emacs load-path than the directory in which the original version is located (usually something like /usr/share/emacs/lisp/emacs-lisp/ on *nix systems.)

You can view the current load path using C-h v load-path from within Emacs, and check that the new version shadows the original one by looking at the output of M-x list-load-path-shadows.

Recent versions of Emacs (>=24.1) already include this updated version of avl-tree.el.

Leave a comment

All comments are moderated. Clicking submit will open your email client and let you send your comment by email. By submitting your comment you agree to license the content under a Creative Commons Attribution-ShareAlike 4.0 International License.




Creative Commons License