I tried out some new social networks lately. Mastodon I quite like
(it’s like what I’ve wanted Twitter to be), Discord, not so sure.
So, if you’ve wondered about my reduced presence on IRC, that’s why.
Writing an IRC bot is one of the classic programming exercises that
can be done in pretty much every programming language offering you
some way to open TCP sockets and manipulate strings. I started doing
one in Emacs Lisp long time ago, although not from scratch (rather by
leveraging an existing IRC client) and wondered whether there is
anything to learn from doing the equivalent with a “modern” IM
platform like Discord. Could it be still be done from scratch? What
else is different about it?
First I had to find a meaningful thing for the bot to do. I chose
Eliza, the classic demonstration of a chatter bot that managed
fooling people into having prolonged conversations with them. The
version I’m familiar with is M-x doctor which is part of Emacs.
So, first of all, I wrote some code to interface with that command in
a REPL-style fashion. A companion shell script boots up Emacs in
batch mode for interfacing with the doctor from the shell. Much like
the original mode, you terminate your input by pressing RET twice.
This is an intentional design decision to allow for multi-line input
as seen on Discord (but absent from IRC, where you could get away with
making it single-line input).
I briefly entertained the thought of writing the rest of the bot from
scratch in Emacs Lisp, but abandoned it after learning that I’d need
to use websockets with zlib compression to subscribe and respond to
incoming messages. While there is an existing library for websocket
support, I’d rather not figure out its nitty-gritty details, let
alone with the lack of zlib compression. It doesn’t help that
Discord’s official API docs are inconclusive and fail answering
questions such as how you can set your current status (and more
importantly, why it fails getting updated). So, an officially
recommended Discord library it is.
The choice on which one it’s going to be depended on whether the
programming language it’s implemented with allowed me to communicate
with my shell script. I tried out discord.js first, battled a
fair bit with Node.js, but gave up eventually. There doesn’t seem to
be a way to spawn a child process and read from / write to its
stdout / stdin pipes as you see fit. Instead you can only add a
callback for the process output and good luck if you want to figure
out what piece of output corresponds to the input you wrote earlier.
This is why I went for discordrb instead, wrote some glue code for
subprocess communication and started figuring out their API to react
to incoming messages.
There are a few lessons to be learned from their API:
- Allow adding as many event handlers as you want for specific events,
with convenience options for narrowing down to commonly needed
situations (like messages starting with a prefix)
- Inside these event handlers, provide an object containing all
context you’d need, including the information who to respond to
- Keep the bot alive when an event handler errors out
Now, to test the bot I needed to unleash it on a server. It turns out
that unlike on IRC bot accounts are handled specially. You must:
- Register an application and obtain an ID for authorization purposes
- Enable a bot account and obtain an authorization token
- Generate an URL for inviting the bot to a server
- Share that URL with someone wielding the necessary permissions
- Hope for the best and wait for them to invite the bot
This meant that I had to create my own test server to check whether my
code worked at all. For this reason I haven’t been able to get it
running on the server I was invited on. If you want to, you can try
it on your own server, the sources are on GitHub as always.
Solving the cryptopals crypto challenges was easily the most fun
I’ve had programming. If you happen to work on public-facing code
that relies on cryptography, by all means do these challenges. There
is no crazy math involved and the only prerequisite is that
you’re familiar with any programming language. My solutions can be
found on GitHub and include notes on each exercise, some of which
spoil the puzzle bits.
I’ve learned the following while completing the original set of 48
- You should under no circumstances use ECB as cipher mode
- Padding is a crucial thing to get right, both when attacking
cryptographic systems and when implementing them
- An attacker can bitflip your ciphertexts into anything they want and
the only thing you can do about it is checking whether they’ve been
tampered with before decrypting (like with a MAC or signature)
- Do not ever reuse a nonce or you’ll weaken your crypto drastically
- It’s much easier to exploit a sidechannel than attacking the
- Overly detailed error messages can form a sidechannel
- Do not seed a RNG with the current time, use your system’s CSPRNG
- Do not use MT19937 for cryptographic purposes, given enough
observation its next values can be predicted
- Do not reuse your key as IV
- Don’t invent your own MAC scheme, it may be susceptible to length
- Even something like a timing leak can form an exploitable
sidechannel that circumvents the cryptographic system
- Diffie-Hellman is susceptible to MITM attacks
- Make sure to verify the parameters in asymmetric protocols for
values that make the shared secret predictable and abort when
- Don’t do textbook RSA, padding is crucial
- Do not use low exponents with RSA
- Do not use PKCS#1 v1.5 padding with RSA
One more thing that doesn’t fit into a short sentence. You’ve most
certainly heard the advice “Don’t implement your own crypto”. This
advice isn’t the whole truth because it doesn’t explain what exactly
“your own crypto” means. Cryptography in software consists of
primitives that are put together to achieve something useful, such as
a hash function and a block cipher to form a HMAC. These primitives
may be considered safe in isolation, however that doesn’t mean their
combination will be equally safe. These combinations are called
cryptographic systems and the security of one relies upon making sure
none of the invariants are violated. Therefore, creating your own
cryptosystem out of stock crypto primitives also counts as “your own
crypto” and is rightfully considered dangerous. Your best bet is to
use a vetted library that has been designed so that it’s hard to use
incorrectly, such as libsodium.
While making a MIDI REPL I ran into the problem of providing people
interested in trying it out something self-contained so that they
wouldn’t have to recreate my dev setup. The solution for this is
making a JAR file containing the .class files of your project and
all of its dependencies. There are tools for this purpose such as
Ant and Maven, but I couldn’t figure out how to make
them work for me, so I decided to take a closer look at what happens
behind the scenes and created a simple Makefile.
A JAR file is just a ZIP archive following certain rules. It must
contain a manifest.txt in its root and class files inside
directories mirroring the package structure. The only difference
between a regular JAR and an Uberjar is that the latter will also
include the class files of its dependencies. Tools for creating them
will have to extract the class files from all JAR files involved and
combine them into a directory tree before creating a new JAR file
containing all required class files. Things can get ugly if your
dependencies share the same package prefix (such as org.foo and
org.bar) or if a dependency exists in multiple versions (such as
org.foo depending on npm.leftpad-0.0.1 and org.bar
depending on npm.leftpad-0.0.2), I don’t even attempt to deal
The manifest is a text file following a fixed format. The only thing
you can get wrong here is the entry point which must be the name a
class containing a static main method. It’s required so that a
java -jar my.jar knows where to look, however the entry point can
be changed by running java -cp my.jar <classname> instead. This
is useful for debugging and allows you to add other dependency JARs to
the classpath you haven’t put into your Uberjar yet. Just change the
argument to -cp to be a double colon separated list of JARs.
The dependencies are unzipped into a temporary directory. The jar
tool supports changing the working directory so that you can switch to
that directory and add the extracted directories without any prefix.
That’s everything necessary to create a runnable JAR!
This isn’t about the pixel pushing kind of design, but the engineering
one. Given a problematic matter, what choices do you make to create a
tool that enables its user to effectively interact another object?
More importantly, how do you deal with choices that are hard to
rectify afterwards? While this is going to be a rant, the subject is
one of my more popular Emacs packages, Shackle. I thought the 1.0.0
release of it with a new debugging facility to make troubleshooting
easier is just the right moment to ponder a bit about those choices I
made and why I regret some of them.
You may wonder “Wait, what is wrong with Shackle? It has over a
hundred stars of GitHub, a few thousand downloads on MELPA, dozens of
people using it in their init files and a handful of people
recommending it to others.”. While all of this is true, it’s not all
roses. I occasionally get issues from users that don’t understand it
at all and I can’t really blame them. There is a fundamental mismatch
going on here because all this package does is hijacking the
display-buffer-alist variable to invent a similar, but not quite
as powerful mechanism on top of it. It’s an inherently leaky
abstraction which makes for less than ideal debugging: If it ever
breaks down, you’ll have to understand both the abstraction and the
underlying code it’s built upon.
This project started off with me not understanding how to use this
variable at all. In hindsight, this should have been the first
warning signal: If you can’t fully understand the problem, don’t
expect to solve it in a satisfactory manner. There are a few glaring
problems with display-buffer-alist:
- The docstring for it is hard to parse. If a newbie asks how to
customize the display of a certain buffer and is directed to that
variable, I couldn’t blame them for just giving up on this
- It isn’t clear how to display a buffer in a certain way. I’ve found
only one example in the elisp manual so far and it’s more about
display-buffer than display-buffer-alist.
- Conditions may be buffer names and functions, but not major modes.
This is rather annoying as it means you’ll have to write a function
to check the major mode yourself. While this is far from fool-proof
(the code setting up the buffer may enable the desired major mode
only after displaying it), it works in many cases.
- If your customization of display-buffer-alist contains a call to
a function that errors out, the display of that buffer will fail.
This is particularly annoying if you have a catch-all rule there
that prevents the source debugger window from appearing, something I
mostly ran into while developing Shackle. While you can use M-:
(setq display-buffer-alist nil), it’s relatively annoying to do
- The default behavior is rather inscrutable and mostly, but not only
determined by display-buffer-fallback-action. Worse, some
packages rely on the default behavior just to fail with
customizations to display-buffer-alist.
Now, does Shackle do better? Well, it does in some ways while being
worse in others:
- Conditions are interpreted as buffer names (if a string) or modes
(if a symbol) or a list of either. While this is convenient, the
original design had the issue of making it impossible to match by
regex or use a custom function, so I added a :regex modifier to
the action (which is just wrong because it changes all of them to
match by regex) and interpret a list starting with :custom as a
function which isn’t nice either. Judging by GitHub’s search
there’s about three users of this functionality, with the most
prolific one being doom.
- Shackle tries being easier to understand with regards to actions by
abolishing the alist approach and instead going for a flat plist.
There is no hierarchy whatsoever which turned out to be a mistake,
people didn’t understand that there were keywords with
mutually-exclusive behavior, keywords that modified other keywords
and keywords that work universally. I’ve had feature requests where
I was asked to allow to combine keywords more flexibly, to explain
how the whole thing works and most surprisingly, to provide a
grammar of the implemented language. The latter found its way into
the README and is more confusing than helpful IMO. If you want to
understand the behavior, you’re best off with heading to the source.
I consider this to be the ultimate proof of failing at its design.
- It’s way harder to shoot yourself in the foot, in case you do you
can always bail out with M-x shackle-mode and revert to vanilla
- The mere act of enabling Shackle will subtly change the default
behavior of displaying buffers. The reason for this is
shackle--display-buffer-popup-window which tries to do something
sensible, but will never behave like the original.
- I’ve added a feature that doesn’t display a window differently, but
rather modifies the window parameter. Admittedly it makes things
more convenient because you’d otherwise need a second package to
achieve the same effect, but it’s the main reason for display of
buffers intended to not be selected to have weird side effects.
- Debugging Shackle not working as expected is rather tricky. In the
best case you’ll need to look at the source code of a package to
check whether it’s using display-buffer or a function using it
internally (like pop-to-buffer, pop-to-buffer-same-window,
switch-to-buffer-other-window, etc.). In the worst case you’ll
need to debug the part of the package displaying such windows or
Shackle itself while it tries matching conditions and applying
actions. I’ve added a tracing mode to make the former easier, but
the inherent leaky abstraction remains.
- While Shackle stayed mostly the same, Emacs gained new capabilities
for display-buffer-alist. There isn’t nearly as much reason for
using Shackle now, other than laziness. Other people reached the
same conclusion that it’s worth investing some of your time in
The bottom line is that I’m not happy with Shackle’s design, but am
wise enough to keep it as is and not do any more invasive changes.
My happiness (or the lack of) isn’t worth risking the happiness of its
The nice thing about Scheme is that not only it comes in more than
seven standards, but with tons of implementations as well. I’ve
heard from various people that R7RS-small is good enough to write
portable code in it that will work on more than one implementation, so
I decided to try implementing MAL in it. This has been mostly
successful as I’ve started out with 11 implementations, narrowed it
down to 7 and found a new one for future experimentation. During this
time I’ve reported 14 bugs for two implementations, Cyclone and
Foment. My impressions of the implementations can be summarized as
- The recommended implementation if you’re looking for a fully
standards-compliant one. It comes with its own comprehensive test
suite that has been borrowed by others. I’ve had zero issues with
it, although the overall speed could be better, earning it the last
place in the toy benchmarks (where it’s tied with Foment). This is
somewhat weird considering that it’s advertised as small yet
embeddable (look at Lua for an example of a fast one in that
domain); Picrin would be another candidate for that purpose.
- It proudly wears the GNU banner (despite not being the Scheme
officially endorsed by GNU), but is otherwise completely unknown.
This is a shame because unlike the other JVM languages I’ve worked
with it comes with a fast compiler, has a small boot time and didn’t
pose any issues for me, other than having to learn how this
classpath thing works in Java. Speed is decent as it comes third
place in the toy benchmarks, interop doesn’t look painful either.
Its author did a few more interesting things with it, such as
implementing other languages like Emacs Lisp (which forms the basis
of JEmacs). I might just as well invest more time into it and use
it to write things where Clojure would otherwise be a bad idea.
- My favorite implementation for writing all kinds of small,
standalone programs. I occasionally stumble upon issues, but none
for this run. It’s not completely obvious how to create R7RS
programs using libraries from anywhere else than the current
directory, this will be hopefully addressed in the next major
release, among with a more R7RS-like reorganization of the
namespaces and improvements to the package system. Speedwise it’s
the best, despite the r7rs egg being known for incurring a
huge speed penalty in numerical benchmarks.
- The domain says it all. I haven’t expected it to support R7RS
considering how long it has been around, but it does! Thanks to
this it contains many contrib libraries and performs fairly well for
an interpreter-only system. If I hadn’t discovered CHICKEN, this
might have been my preferred Scheme to go for.
- The other contender in the embeddable domain. Unfortunately I ran
into several problems, the most problematic ones being that it’s
currently not buildable from master and that there are no
official plans to support loading user-written libraries. For this
reason I wouldn’t recommend using it over Chibi.
- A fork of Gauche with more features, some changes (build system,
command-line switches) and a bit of performance improvements.
- Judging from the design documents this is basically CHICKEN, but
with a different approach to GC and threading that allows it to do
native threads. It’s a relatively young project and therefore bound
to have countless bugs, yet it’s second place in the speed
- Someone’s personal project for learning Scheme. It’s about the same
speed as Chibi and not quite finished yet.
- The Scheme officially endorsed by GNU. It had a long time in
making, but its R7RS support is minimal and only covers reader
- I’m somewhat surprised there is unofficial support for R7RS, given
that the language itself has been derived from R6RS. Not tested
- Research project that supports R5RS, R6RS and R7RS to varying
degrees. Due to R7RS incompatibilities, a somewhat outdated
documentation and a ridiculous toolchain, I’ve decided to test it at
some later point to see whether it becomes the fastest
implementation for MAL.
- I’ve learned about this one from a blog post by Fare, the
maintainer of ASDF, a build system for CL packages. It turns out
that not only he likes Racket, no, he also switched to this Scheme
implementation because it’s made by a close friend who not only
ported Racket’s module system to Gambit, but also added R7RS
support to it. I expect it to become usable soon and to make for
another contender to the fastest Scheme implementation for MAL.