Fixing My #1 Annoyance With Clojure

03/08/2018

Clojure is quite something. Immutable by default, functional style programming being encouraged and with many useful libraries written by a nice community. There’s ugly sides to it as well, like the error reporting being terrible, dealing with Java things and waiting for your tooling to boot. However, my greatest annoyance with it is that my preferred debugging workflow, printing out the problematic thing, is far more painful than it should be[1]. Suppose you have the following piece of code and want to check the result of the last form in the let.

(defn frobnicate [x]
  (let [y ...
        z ...]
    (filter foo? (map bar (baz x)))))

This won’t do as the function would return nil:

(defn frobnicate [x]
  (let [y ...
        z ...]
    (prn "XXX" (filter foo? (map bar (baz x))))))

This works, but what if the expression has a side effect or is computationally expensive?

(defn frobnicate [x]
  (let [y ...
        z ...]
    (prn "XXX" (filter foo? (map bar (baz x))))
    (filter foo? (map bar (baz x)))))

This is quite good, but annoying to type out and undo after it’s no longer needed:

(defn frobnicate [x]
  (let [y ...
        z ...
        xxx (filter foo? (map bar (baz x)))]
    (prn "XXX" xxx)
    xxx))

I’d love to have this:

(defn frobnicate [x]
  (let [y ...
        z ...]
    (dbg (filter foo? (map bar (baz x))))))

The idea is borrowed from Smalltalk, more specifically Super Collider where you can just call .debug on an object to see its value printed. Unlike a regular print function this one returns the object and does therefore allow inserting into a call chain just fine. Bonus: It accepts an optional argument for printing a prefix so that you can identify the debug output easily. Translating this to Clojure is as easy as it gets[2]:

(defn dbg
  ([thing]
   (dbg "XXX" thing))
  ([prefix thing]
   (println prefix)
   (print (with-out-str (clojure.pprint/pprint thing)))
   thing))

Now, this is useful. You can surround anything useful with (dbg ...) and if you wish, add a prefix as first argument. Removing the debug print is as easy as raising the enclosed S-Expression[3]. But how do you get this into a Clojure session? The thing is that once you eval this in a namespace, the function belongs to that namespace and to use it in another one you’d need to either import it from that namespace or define it in the other namespace. Another issue is that you’d need to edit the sources of a Clojure project to make use of this helper. Surely you can do better than that?

Studying the Leiningen documentation and the official Clojure docs on namespaces I learned about two more things:

  • Leiningen supports the :injections key which evaluates a vector of forms in the project context[4]
  • The lowest-level function to manipulate a different namespace than the current one is intern[5]

Put both together and you get the following snippet for your ~/.lein/profiles.clj:

{:user {:injections [(defn dbg ...)
                     (intern 'clojure.core 'dbg dbg)]}}

This is close to perfect. It will only work for projects using Leiningen obviously and displays a warning as the code is run twice, but it works nicely in any namespace!

[1]I’ve got to admit, this is quite petty. If I managed learning how to read ugly Java backtraces and studied the wonders of the Java class path, how could printf-debugging annoy me to this extent? I believe the conventional wisdom that it’s hardly worth automating tasks with little time savings misses the point, fixing long-standing annoyances however is a worthy goal. Better be happy than bitter about your setup.
[2]The eagle-eyed reader will notice that you could get by with a mere (clojure.pprint/pprint thing). The reason for the above is that I’ve encountered rather discomforting behavior in a codebase where pretty-printed output ended up interleaved with logging output. The easy workaround is making the pretty-printing atomic by collecting it into a string, then printing it out.
[3]If you use Paredit or Smartparens, it’s as easy as hitting M-r with point on the form you want to replace the outside one with.
[4]This doesn’t say anything about how often it’s actually evaluated, so better put an idempotent expression there.
[5]intern in Emacs Lisp does merely convert a string to a symbol. In CL it does the same, but allows specifying what package the symbol should belong to. In Clojure it takes a namespace, a name and a value…

Highlight Text Manually In LaTeX Slides

29/04/2018

Sometimes I’ve been in the situation that I have a text snippet where a splash of color would explain things in a far easier way than having to create a visualization or use a laser pointer. Imagine something like a backtrace, with different parts highlighted in different colors. This thing can be done pretty easily, even when using LaTeX indirectly, like when compiling it from Org in Emacs.

The following trick relies on using the minted package for highlighting. It supports an option for embedding LaTeX escapes into your code snippets. The documentation shows off a mathematic formula in a comment, however we can do far more, like using the \textcolor command from the xcolor package to insert colored text. Have a silly example:

\setminted{escapeinside=||}
\definecolor{green}{HTML}{218A21}

...

\begin{minted}[]{text}
|\textcolor{red}{RR}|
|\textcolor{green}{GG}|
|\textcolor{blue}{BB}|
\end{minted}

In an Org file you’d have to do a bit less typing (assuming you customized Org to always use minted):

#+LATEX_HEADER: \setminted{escapeinside=||}
#+LATEX: \definecolor{green}{HTML}{218A21}

...

#+BEGIN_SRC text
|\textcolor{red}{RR}|
|\textcolor{green}{GG}|
|\textcolor{blue}{BB}|
#+END_SRC

Bridging the Ancient and the Modern

23/03/2018

I tried out some new social networks lately. Mastodon I quite like (it’s like what I’ve wanted Twitter to be), Discord, not so sure. So, if you’ve wondered about my reduced presence on IRC, that’s why.

Writing an IRC bot is one of the classic programming exercises that can be done in pretty much every programming language offering you some way to open TCP sockets and manipulate strings. I started doing one in Emacs Lisp long time ago, although not from scratch (rather by leveraging an existing IRC client) and wondered whether there is anything to learn from doing the equivalent with a “modern” IM platform like Discord. Could it be still be done from scratch? What else is different about it?

First I had to find a meaningful thing for the bot to do. I chose Eliza, the classic demonstration of a chatter bot that managed fooling people into having prolonged conversations with them. The version I’m familiar with is M-x doctor which is part of Emacs. So, first of all, I wrote some code to interface with that command in a REPL-style fashion. A companion shell script boots up Emacs in batch mode for interfacing with the doctor from the shell. Much like the original mode, you terminate your input by pressing RET twice. This is an intentional design decision to allow for multi-line input as seen on Discord (but absent from IRC, where you could get away with making it single-line input).

I briefly entertained the thought of writing the rest of the bot from scratch in Emacs Lisp, but abandoned it after learning that I’d need to use websockets with zlib compression to subscribe and respond to incoming messages. While there is an existing library for websocket support, I’d rather not figure out its nitty-gritty details, let alone with the lack of zlib compression. It doesn’t help that Discord’s official API docs are inconclusive and fail answering questions such as how you can set your current status (and more importantly, why it fails getting updated). So, an officially recommended Discord library it is.

The choice on which one it’s going to be depended on whether the programming language it’s implemented with allowed me to communicate with my shell script. I tried out discord.js first, battled a fair bit with Node.js, but gave up eventually. There doesn’t seem to be a way to spawn a child process and read from / write to its stdout / stdin pipes as you see fit. Instead you can only add a callback for the process output and good luck if you want to figure out what piece of output corresponds to the input you wrote earlier. This is why I went for discordrb instead, wrote some glue code for subprocess communication and started figuring out their API to react to incoming messages.

There are a few lessons to be learned from their API:

  • Allow adding as many event handlers as you want for specific events, with convenience options for narrowing down to commonly needed situations (like messages starting with a prefix)
  • Inside these event handlers, provide an object containing all context you’d need, including the information who to respond to
  • Keep the bot alive when an event handler errors out

Now, to test the bot I needed to unleash it on a server. It turns out that unlike on IRC bot accounts are handled specially. You must:

  • Register an application and obtain an ID for authorization purposes
  • Enable a bot account and obtain an authorization token
  • Generate an URL for inviting the bot to a server
  • Share that URL with someone wielding the necessary permissions
  • Hope for the best and wait for them to invite the bot

This meant that I had to create my own test server to check whether my code worked at all. For this reason I haven’t been able to get it running on the server I was invited on. If you want to, you can try it on your own server, the sources are on GitHub as always.


Cryptopals

03/03/2018

Solving the cryptopals crypto challenges was easily the most fun I’ve had programming. If you happen to work on public-facing code that relies on cryptography, by all means do these challenges. There is no crazy math involved[1] and the only prerequisite is that you’re familiar with any programming language. My solutions can be found on GitHub and include notes on each exercise, some of which spoil the puzzle bits.

I’ve learned the following while completing the original set of 48 exercises:

  • You should under no circumstances use ECB as cipher mode
  • Padding is a crucial thing to get right, both when attacking cryptographic systems and when implementing them
  • An attacker can bitflip your ciphertexts into anything they want and the only thing you can do about it is checking whether they’ve been tampered with before decrypting (like with a MAC or signature)
  • Do not ever reuse a nonce or you’ll weaken your crypto drastically
  • It’s much easier to exploit a sidechannel than attacking the cryptographic primitive
  • Overly detailed error messages can form a sidechannel
  • Do not seed a RNG with the current time, use your system’s CSPRNG instead
  • Do not use MT19937 for cryptographic purposes, given enough observation its next values can be predicted
  • Do not reuse your key as IV
  • Don’t invent your own MAC scheme, it may be susceptible to length extension attacks
  • Even something like a timing leak can form an exploitable sidechannel that circumvents the cryptographic system
  • Diffie-Hellman is susceptible to MITM attacks
  • Make sure to verify the parameters in asymmetric protocols for values that make the shared secret predictable and abort when encountering one
  • Don’t do textbook RSA, padding is crucial
  • Do not use low exponents with RSA
  • Do not use PKCS#1 v1.5 padding with RSA

One more thing that doesn’t fit into a short sentence. You’ve most certainly heard the advice “Don’t implement your own crypto”. This advice isn’t the whole truth because it doesn’t explain what exactly “your own crypto” means. Cryptography in software consists of primitives that are put together to achieve something useful, such as a hash function and a block cipher to form a HMAC. These primitives may be considered safe in isolation, however that doesn’t mean their combination will be equally safe. These combinations are called cryptographic systems and the security of one relies upon making sure none of the invariants are violated. Therefore, creating your own cryptosystem out of stock crypto primitives also counts as “your own crypto” and is rightfully considered dangerous. Your best bet is to use a vetted library that has been designed so that it’s hard to use incorrectly, such as libsodium.

[1]The challenges like to emphasize that it’s only 9th grader math. This is almost correct, you’ll want to look up basic statistics (which I’ve had in 12th grade) and modular arithmetic (which I’ve had at college).

Hand-crafted Uberjars

26/02/2018

While making a MIDI REPL I ran into the problem of providing people interested in trying it out something self-contained so that they wouldn’t have to recreate my dev setup. The solution for this is making a JAR file containing the .class files of your project and all of its dependencies. There are tools for this purpose such as Ant and Maven, but I couldn’t figure out how to make them work for me, so I decided to take a closer look at what happens behind the scenes and created a simple Makefile.

A JAR file is just a ZIP archive following certain rules. It must contain a manifest.txt in its root and class files inside directories mirroring the package structure. The only difference between a regular JAR and an Uberjar is that the latter will also include the class files of its dependencies. Tools for creating them will have to extract the class files from all JAR files involved and combine them into a directory tree before creating a new JAR file containing all required class files. Things can get ugly if your dependencies share the same package prefix (such as org.foo and org.bar) or if a dependency exists in multiple versions (such as org.foo depending on npm.leftpad-0.0.1 and org.bar depending on npm.leftpad-0.0.2)[1], I don’t even attempt to deal with these.

The manifest is a text file following a fixed format. The only thing you can get wrong here is the entry point which must be the name a class containing a static main method. It’s required so that a java -jar my.jar knows where to look, however the entry point can be changed by running java -cp my.jar <classname> instead. This is useful for debugging and allows you to add other dependency JARs to the classpath you haven’t put into your Uberjar yet. Just change the argument to -cp to be a double colon separated list of JARs.

The dependencies are unzipped into a temporary directory. The jar tool supports changing the working directory so that you can switch to that directory and add the extracted directories without any prefix. That’s everything necessary to create a runnable JAR!

[1]The way to deal with them is using a custom class loader, as demonstrated by yet another product in the problem space.