This isn’t about the pixel pushing kind of design, but the engineering
one. Given a problematic matter, what choices do you make to create a
tool that enables its user to effectively interact another object?
More importantly, how do you deal with choices that are hard to
rectify afterwards? While this is going to be a rant, the subject is
one of my more popular Emacs packages, Shackle. I thought the 1.0.0
release of it with a new debugging facility to make troubleshooting
easier is just the right moment to ponder a bit about those choices I
made and why I regret some of them.
You may wonder “Wait, what is wrong with Shackle? It has over a
hundred stars of GitHub, a few thousand downloads on MELPA, dozens of
people using it in their init files and a handful of people
recommending it to others.”. While all of this is true, it’s not all
roses. I occasionally get issues from users that don’t understand it
at all and I can’t really blame them. There is a fundamental mismatch
going on here because all this package does is hijacking the
display-buffer-alist variable to invent a similar, but not quite
as powerful mechanism on top of it. It’s an inherently leaky
abstraction which makes for less than ideal debugging: If it ever
breaks down, you’ll have to understand both the abstraction and the
underlying code it’s built upon.
This project started off with me not understanding how to use this
variable at all. In hindsight, this should have been the first
warning signal: If you can’t fully understand the problem, don’t
expect to solve it in a satisfactory manner. There are a few glaring
problems with display-buffer-alist:
- The docstring for it is hard to parse. If a newbie asks how to
customize the display of a certain buffer and is directed to that
variable, I couldn’t blame them for just giving up on this
- It isn’t clear how to display a buffer in a certain way. I’ve found
only one example in the elisp manual so far and it’s more about
display-buffer than display-buffer-alist.
- Conditions may be buffer names and functions, but not major modes.
This is rather annoying as it means you’ll have to write a function
to check the major mode yourself. While this is far from fool-proof
(the code setting up the buffer may enable the desired major mode
only after displaying it), it works in many cases.
- If your customization of display-buffer-alist contains a call to
a function that errors out, the display of that buffer will fail.
This is particularly annoying if you have a catch-all rule there
that prevents the source debugger window from appearing, something I
mostly ran into while developing Shackle. While you can use M-:
(setq display-buffer-alist nil), it’s relatively annoying to do
- The default behavior is rather inscrutable and mostly, but not only
determined by display-buffer-fallback-action. Worse, some
packages rely on the default behavior just to fail with
customizations to display-buffer-alist.
Now, does Shackle do better? Well, it does in some ways while being
worse in others:
- Conditions are interpreted as buffer names (if a string) or modes
(if a symbol) or a list of either. While this is convenient, the
original design had the issue of making it impossible to match by
regex or use a custom function, so I added a :regex modifier to
the action (which is just wrong because it changes all of them to
match by regex) and interpret a list starting with :custom as a
function which isn’t nice either. Judging by GitHub’s search
there’s about three users of this functionality, with the most
prolific one being doom.
- Shackle tries being easier to understand with regards to actions by
abolishing the alist approach and instead going for a flat plist.
There is no hierarchy whatsoever which turned out to be a mistake,
people didn’t understand that there were keywords with
mutually-exclusive behavior, keywords that modified other keywords
and keywords that work universally. I’ve had feature requests where
I was asked to allow to combine keywords more flexibly, to explain
how the whole thing works and most surprisingly, to provide a
grammar of the implemented language. The latter found its way into
the README and is more confusing than helpful IMO. If you want to
understand the behavior, you’re best off with heading to the source.
I consider this to be the ultimate proof of failing at its design.
- It’s way harder to shoot yourself in the foot, in case you do you
can always bail out with M-x shackle-mode and revert to vanilla
- The mere act of enabling Shackle will subtly change the default
behavior of displaying buffers. The reason for this is
shackle--display-buffer-popup-window which tries to do something
sensible, but will never behave like the original.
- I’ve added a feature that doesn’t display a window differently, but
rather modifies the window parameter. Admittedly it makes things
more convenient because you’d otherwise need a second package to
achieve the same effect, but it’s the main reason for display of
buffers intended to not be selected to have weird side effects.
- Debugging Shackle not working as expected is rather tricky. In the
best case you’ll need to look at the source code of a package to
check whether it’s using display-buffer or a function using it
internally (like pop-to-buffer, pop-to-buffer-same-window,
switch-to-buffer-other-window, etc.). In the worst case you’ll
need to debug the part of the package displaying such windows or
Shackle itself while it tries matching conditions and applying
actions. I’ve added a tracing mode to make the former easier, but
the inherent leaky abstraction remains.
- While Shackle stayed mostly the same, Emacs gained new capabilities
for display-buffer-alist. There isn’t nearly as much reason for
using Shackle now, other than laziness. Other people reached the
same conclusion that it’s worth investing some of your time in
The bottom line is that I’m not happy with Shackle’s design, but am
wise enough to keep it as is and not do any more invasive changes.
My happiness (or the lack of) isn’t worth risking the happiness of its
The nice thing about Scheme is that not only it comes in more than
seven standards, but with tons of implementations as well. I’ve
heard from various people that R7RS-small is good enough to write
portable code in it that will work on more than one implementation, so
I decided to try implementing MAL in it. This has been mostly
successful as I’ve started out with 11 implementations, narrowed it
down to 7 and found a new one for future experimentation. During this
time I’ve reported 14 bugs for two implementations, Cyclone and
Foment. My impressions of the implementations can be summarized as
- The recommended implementation if you’re looking for a fully
standards-compliant one. It comes with its own comprehensive test
suite that has been borrowed by others. I’ve had zero issues with
it, although the overall speed could be better, earning it the last
place in the toy benchmarks (where it’s tied with Foment). This is
somewhat weird considering that it’s advertised as small yet
embeddable (look at Lua for an example of a fast one in that
domain); Picrin would be another candidate for that purpose.
- It proudly wears the GNU banner (despite not being the Scheme
officially endorsed by GNU), but is otherwise completely unknown.
This is a shame because unlike the other JVM languages I’ve worked
with it comes with a fast compiler, has a small boot time and didn’t
pose any issues for me, other than having to learn how this
classpath thing works in Java. Speed is decent as it comes third
place in the toy benchmarks, interop doesn’t look painful either.
Its author did a few more interesting things with it, such as
implementing other languages like Emacs Lisp (which forms the basis
of JEmacs). I might just as well invest more time into it and use
it to write things where Clojure would otherwise be a bad idea.
- My favorite implementation for writing all kinds of small,
standalone programs. I occasionally stumble upon issues, but none
for this run. It’s not completely obvious how to create R7RS
programs using libraries from anywhere else than the current
directory, this will be hopefully addressed in the next major
release, among with a more R7RS-like reorganization of the
namespaces and improvements to the package system. Speedwise it’s
the best, despite the r7rs egg being known for incurring a
huge speed penalty in numerical benchmarks.
- The domain says it all. I haven’t expected it to support R7RS
considering how long it has been around, but it does! Thanks to
this it contains many contrib libraries and performs fairly well for
an interpreter-only system. If I hadn’t discovered CHICKEN, this
might have been my preferred Scheme to go for.
- The other contender in the embeddable domain. Unfortunately I ran
into several problems, the most problematic ones being that it’s
currently not buildable from master and that there are no
official plans to support loading user-written libraries. For this
reason I wouldn’t recommend using it over Chibi.
- A fork of Gauche with more features, some changes (build system,
command-line switches) and a bit of performance improvements.
- Judging from the design documents this is basically CHICKEN, but
with a different approach to GC and threading that allows it to do
native threads. It’s a relatively young project and therefore bound
to have countless bugs, yet it’s second place in the speed
- Someone’s personal project for learning Scheme. It’s about the same
speed as Chibi and not quite finished yet.
- The Scheme officially endorsed by GNU. It had a long time in
making, but its R7RS support is minimal and only covers reader
- I’m somewhat surprised there is unofficial support for R7RS, given
that the language itself has been derived from R6RS. Not tested
- Research project that supports R5RS, R6RS and R7RS to varying
degrees. Due to R7RS incompatibilities, a somewhat outdated
documentation and a ridiculous toolchain, I’ve decided to test it at
some later point to see whether it becomes the fastest
implementation for MAL.
- I’ve learned about this one from a blog post by Fare, the
maintainer of ASDF, a build system for CL packages. It turns out
that not only he likes Racket, no, he also switched to this Scheme
implementation because it’s made by a close friend who not only
ported Racket’s module system to Gambit, but also added R7RS
support to it. I expect it to become usable soon and to make for
another contender to the fastest Scheme implementation for MAL.
Hello again and sorry for the long downtime! My current Emacs project
is an EPUB reader, something that requires parsing XML and
validating/extracting data from it. The former can be done just fine
in Emacs Lisp with the help of libxml2 (or alternatively,
xml.el), for the latter there is no good solution. Typically
people go for one of the following approaches:
- Don’t parse at all and just use regular expressions on the raw XML.
This works somewhat okayish if your input is predictable and doesn’t
- Parse and walk manually through the parse tree with car, cdr
and assoc. Rather tedious and requires writing your own tree
traversal functions for anything less than static XML.
- Invent your own library and use a selector DSL for DOM traversal.
I’ve seen a few of those, like xml+.el, enlive.el and
xml-query.el, however they support relatively little features in
their selectors, use their own language instead of a
well-established one (such as CSS selectors or XPath) and are
usually not available from a package archive for easy installation.
As I’m a big fan of APIs like Python’s lxml with the cssselect
module and used the esxml package before, I decided to implement CSS
selectors for it. The general strategy was to take parse a CSS
selector into a suitable form, do tree traversal by interpreting the
parse tree and return the nodes satisfying the selector. Surprisingly
enough, the hardest part of this were the parsing bits, so I’ll go
into a bit more of detail on how you’d do it properly without any
The approach taken in esxml-query.el is recursive-descent parsing,
as seen in software like GCC. Generally speaking, a language can be
described by a set of rules where the left side refers to its name and
the right side explains what it expands to. Expansions are sequences
of other rules or constants (which naturally cannot be expanded) and
may contain syntactic sugar, such as the Kleene star (as seen in
regular expressions). Given an input string described by the grammar,
a parser breaks it down according to its rules until it has found a
valid combination. The easiest way to turn a grammar into code is by
expressing it with a function for each rule, with each function being
free to call others. Success and failure can be expressed by
returning a piece of the parse tree, a special sentinel value (I’ve
chosen to return nil if the rule wasn’t completely matched) or
throwing an error, thereby terminating the computation. If all
recursively called rule functions returned a bit of the parse tree,
the top-level call returns the complete parse tree and the parsing
attempt has been successful.
Traditionally there is an extra step before parsing the string, as
it’s a bit tedious to express the terminating rules as a sequence of
characters, the string is typically preprocessed by a so-called lexer
into a list of tagged tokens. This is relatively simple to do in
Emacs Lisp by treating the string like a buffer, finding a token that
matches the current position, adding it to the list of found tokens
and advancing the position until the input has been exhausted. There
is one non-trivial problem though, depending on the token definitions
it can happen that there are two different kinds of tokens for a given
position in the input string. A simple solution here is picking the
longer match, this is why the tokenization in
esxml--tokenize-css-selector finds all possible matches and picks
the longest one.
The syntactical sugar used for the official CSS grammars consists of
alternation (|), grouping ([...]), optionals (?) and
greedy repetition (* and +). Given the basic token operations
(peek) (return first token in the stream) and (next) (pop
first token in the stream), it’s straight-forward to translate them to
working code by using conditionals and loops. For example, the rule
whitespace: SPACE* is consumed by calling (next) while
(pop) returns a whitespace. To make things easier, I’ve also
introduced an (accept TYPE) helper that uses (peek) to check
whether the following token matches TYPE and either consumes it
and returns the value or returns nil without consuming. With it
the preceding example can be shortened to (while (accept 'space)).
Similarly, alternation is expressed with cond and grouping with a
while where the body checks whether the grouped content could be
This parsing strategy allows for highly flexible error reporting going
beyond “Invalid selector” errors I’ve seen previously in a browser
console as you immediately know at which place the parser fails and
are free to insert code dealing with the error as you see fit. Be
warned though that you must understand the grammar well enough to
transform it into a more suitable form, yet equivalent form if you run
into rules that are hard or even impossible to express as code.
Debugging isn’t too bad either, you can observe the junctions taken by
your code and quickly spot at which it goes wrong.
I’m looking forward to venture into parser combinators and PEGs next
as they follow the same approach, but involve less code to achieve
<paluche> wasamasa: how many times have you done MAL?
<wasamasa> paluche: this is the fourth time
<wasamasa> so, three times so far
<paluche> you're a MAL-coholic?
I don’t even need to answer this one, do I? PR #4 is for
Smalltalk, one of those languages I always wanted to try out because
of their influence. What convinced me to finally do it was that I had
to learn a bit of Objective-C for work and this made the second
obviously Smalltalk-inspired language I’ve encountered, other than
Ruby. Overall, it was a pretty nice experience, despite not using an
image-based and IDE-centric workflow with GNU Smalltalk. As usual,
here’s my list of random notes:
- Smalltalk has relatively minimalistic syntax
- My biggest stumbling points with it were sequencing syntax
(semicolon and period) and precedence rules
- It’s kind of telling that the standard all implementations go by
omitted how the syntax for defining a method looks like, this makes
it more difficult than it should to share self-contained code
examples on the web
- Documentation or rather, finding the right method is a big problem,
the canonical solution would be to use the browser, however the
search function there errors out
- I’ve therefore resorted to searching the locally installed sources
and internets for things not specific to the implementation
- Apply (aka valueWithArguments) is supported, variadic args aren’t
or rather, there is no syntax for them
- There is no cond/case construct, instead you’re supposed to either
return from every conditional, do a dictionary lookup, use
polymorphism or nest boolean tests
- There is no continue/break construct, but it can be emulated easily
with the non-standard valueWithExit method
- It’s awesome just how much of the language is implemented in itself
- It’s interesting that there are only five keywords and no special
forms, only methods that are implemented in terms of VM primitives
- The class hierarchy makes more sense than in Java, generally this is
the cleanest implementation of OO I came across (though Self dares
challenging this by replacing metaclasses with a prototyping
mechanism as seen in JS)
- The debugging experience is less than stellar, I have to try out
Pharo for the real deal
- Blocks are considerably cleaner than in Ruby where you have three
options for subtly different purposes with subtly different
- Block syntax however is imperfect, you cannot return from them
(attempting to do so gives you a silly error message), instead the
last expression is used as return value
- This gets stranger if you consider that (non-local) returns are
exclusive to methods; in other words blocks aren’t intended to be as
powerful as method bodies
- There is no stack overflow in step 5, however if you push far
enough, you can reach an unrecoverable OOM condition
- The class library is far less forgiving than in Ruby, for instance
slicing/access errors out instead of returning an empty array or
- String syntax is a bit weird with regards to quoting as strings are
delimited with ‘, no backslash escapes exist and escaping of ‘
is done by doubling it
- If there’s a language that invented monkey-patching, this is the
one; I’ve successfully made use of this capability to fix a runtime
bug that only happens in the CI environment
My first programming experience was with VB6. I considered trying
TurboPascal, but couldn’t figure out how to use it from cmd.exe on my
measly Windows 98 SE machine. Many things have been said about the
quality of the BASIC family of languages and the subpar programs
created with VB6, but it’s easy to forget how convenient they make it
for the beginner to create something workable without losing their
head in irrelevant details in the windowing toolkit. The workflow
looks mostly like this: Create and arrange widgets, edit their
properties, double-click each widget with custom behavior and write
code for the respective event handler. Needless to say that the few
original programs I did write weren’t of the useful kind. Only much
later after a brief encounter with C++ for game programming (which
made me quit programming for a few years) I discovered scripting
languages, learned to love Python and eventually spun off into the
polyglossia required by modern software development jobs.
windows93.net reminded me of the good ol’ times, leaving me wondering
just how hard it would be to recreate the irretrievably lost programs
from back then. My excursions into GUI programming have taught me
that GNOME offers two separate projects that connect their
and gjs. Unfortunately seed is no longer in the Arch
repositories, so I picked gjs, just to find out that there is no
official documentation, aside from an auto-generated set of API docs
provided by a user.
The good news is that with the help of #gtk+ on Freenode, I
managed porting all applications in less than a day. The bad news
is that I doubt I’ll use gjs again as its situation reminds me of
LLVM: Lots of potential, intriguing claims (in their case,
introspection giving you language bindings for free) but with outdated
documentation if any (you’re better off with reading the source and
experimenting a lot) and tooling being half-assed. Rewriting teapub
might be possible, but for what gain? It’s no surprise Qt is eating
their lunch despite being more complex and requiring you to use C++.
Had I been given this environment ten years ago, I’d probably have
gone for browsers instead. So, in a way it’s not surprising that
environment appears to breed the new generation of programmers.