Vasilij Schneidermannatom egg for Chickentag:https://emacsninja.com,:/BlogEmacs Ninja2022-02-16T23:28:35ZVasilij Schneidermann
<p><strong>Note</strong>: The <tt class="docutils literal">\037</tt> sequence appearing in the code snippets is one
character, escaped for readability.</p>
<p>It’s been eight years since I started using Emacs and Emacs Lisp and I
still keep running into dusty corners. Traditionally, Lisp dialects
use the semicolon for line comments, with block and s-expression
comments being optional features.</p>
<table border="1" class="docutils">
<colgroup>
<col width="28%" />
<col width="20%" />
<col width="21%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Dialect</th>
<th class="head">Line comment</th>
<th class="head">Block comment</th>
<th class="head">S-expression comment</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>Clojure, Hy</td>
<td><tt class="docutils literal">;</tt></td>
<td>n/a</td>
<td><tt class="docutils literal">#_</tt></td>
</tr>
<tr><td>Common Lisp<a class="footnote-reference" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-1" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-1">[1]</a></td>
<td><tt class="docutils literal">;</tt></td>
<td><tt class="docutils literal"><span class="pre">#|...|#</span></tt></td>
<td><tt class="docutils literal"><span class="pre">#+(or)</span></tt></td>
</tr>
<tr><td>Emacs Lisp, Lush</td>
<td><tt class="docutils literal">;</tt></td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr><td>ISLisp, LFE, uLisp</td>
<td><tt class="docutils literal">;</tt></td>
<td><tt class="docutils literal"><span class="pre">#|...|#</span></tt></td>
<td>n/a</td>
</tr>
<tr><td>NewLisp</td>
<td><tt class="docutils literal">;</tt>, <tt class="docutils literal">#</tt></td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr><td>Picolisp<a class="footnote-reference" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-2" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-2">[2]</a></td>
<td><tt class="docutils literal">#</tt></td>
<td><tt class="docutils literal"><span class="pre">#{...}#</span></tt></td>
<td>n/a</td>
</tr>
<tr><td>Racket, Scheme<a class="footnote-reference" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-3" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-3">[3]</a></td>
<td><tt class="docutils literal">;</tt></td>
<td><tt class="docutils literal"><span class="pre">#|...|#</span></tt></td>
<td><tt class="docutils literal">#;</tt></td>
</tr>
<tr><td>TXR Lisp</td>
<td><tt class="docutils literal">;</tt></td>
<td>n/a</td>
<td><tt class="docutils literal">#;</tt></td>
</tr>
<tr><td>WAT<a class="footnote-reference" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-4" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-4">[4]</a></td>
<td><tt class="docutils literal">;;</tt></td>
<td><tt class="docutils literal"><span class="pre">(;...;)</span></tt></td>
<td>n/a</td>
</tr>
</tbody>
</table>
<p>Emacs Lisp is special though. Here’s an unusual section from <a class="reference external" href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Comments.html">the
Emacs Lisp reference on comments</a>:</p>
<blockquote>
The <tt class="docutils literal">#@COUNT</tt> construct, which skips the next COUNT characters,
is useful for program-generated comments containing binary data.
The Emacs Lisp byte compiler uses this in its output files (see
“Byte Compilation”). It isn’t meant for source files, however.</blockquote>
<p>At first sight, this seems useless. This feature is meant to be used
in <tt class="docutils literal">.elc</tt>, not <tt class="docutils literal">.el</tt> files and looking at a file produced by the
byte compiler, its only use is to emit docstrings:</p>
<pre class="code elisp literal-block">
<span class="c1">;;; This file uses dynamic docstrings, first added in Emacs 19.29.</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="err">#</span><span class="nv">@11</span> <span class="nv">docstring\037</span>
<span class="p">(</span><span class="nb">defalias</span> <span class="ss">'my-test</span> <span class="err">#</span><span class="p">[</span><span class="o">...</span><span class="p">])</span>
</pre>
<p>This is kind of like a block-comment, except there is no comment
terminator. For this reason, the characters to be commented out need
to be counted. You’d think that the following would work, but it
fails with an “End of file during parsing” error:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="k">defvar</span> <span class="nv">my-variable</span> <span class="err">#</span><span class="nv">@8</span> <span class="p">(</span><span class="nf">/</span> <span class="mi">1</span> <span class="mi">0</span><span class="p">)</span> <span class="mi">123</span><span class="p">)</span>
</pre>
<p>It took me <a class="reference external" href="https://git.savannah.gnu.org/cgit/emacs.git/tree/src/lread.c?id=a602e86bc1c10f44dbe9d2680bece2f552a54707#n378">a dive into the reader</a> to find out why:</p>
<pre class="code c literal-block">
<span class="cp">#define FROM_FILE_P(readcharfun) \
(EQ (readcharfun, Qget_file_char) \
|| EQ (readcharfun, Qget_emacs_mule_file_char))
</span><span class="w">
</span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w">
</span><span class="nf">skip_dyn_bytes</span><span class="w"> </span><span class="p">(</span><span class="n">Lisp_Object</span><span class="w"> </span><span class="n">readcharfun</span><span class="p">,</span><span class="w"> </span><span class="kt">ptrdiff_t</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">FROM_FILE_P</span><span class="w"> </span><span class="p">(</span><span class="n">readcharfun</span><span class="p">))</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">block_input</span><span class="w"> </span><span class="p">();</span><span class="w"> </span><span class="cm">/* FIXME: Not sure if it's needed. */</span><span class="w">
</span><span class="n">fseek</span><span class="w"> </span><span class="p">(</span><span class="n">infile</span><span class="o">-></span><span class="n">stream</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">infile</span><span class="o">-></span><span class="n">lookahead</span><span class="p">,</span><span class="w"> </span><span class="n">SEEK_CUR</span><span class="p">);</span><span class="w">
</span><span class="n">unblock_input</span><span class="w"> </span><span class="p">();</span><span class="w">
</span><span class="n">infile</span><span class="o">-></span><span class="n">lookahead</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">else</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="cm">/* We're not reading directly from a file. In that case, it's difficult
to reliably count bytes, since these are usually meant for the file's
encoding, whereas we're now typically in the internal encoding.
But luckily, skip_dyn_bytes is used to skip over a single
dynamic-docstring (or dynamic byte-code) which is always quoted such
that \037 is the final char. */</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">c</span><span class="p">;</span><span class="w">
</span><span class="k">do</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">READCHAR</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'\037'</span><span class="p">);</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>Due to encoding difficulties, the <tt class="docutils literal">#@COUNT</tt> construct is always used
with a terminating <tt class="docutils literal">\037</tt> AKA unit separator character. While it
seems that the <tt class="docutils literal">FROM_FILE_P</tt> macro applies when using the reader
with <tt class="docutils literal"><span class="pre">get-file-char</span></tt> or <tt class="docutils literal"><span class="pre">get-emacs-mule-file-char</span></tt> (which are used
by <tt class="docutils literal">load</tt> internally), I never managed to trigger that code path.
The reader therefore seems to always ignore the count argument,
essentially turning <tt class="docutils literal">#@COUNT</tt> into a block comment facility.</p>
<p>Given this information, one could obfuscate Emacs Lisp code to hide
something unusual going on:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nf">message</span> <span class="s">"Fire the %s!!!"</span> <span class="err">#</span><span class="nv">@11</span> <span class="s">"rockets"</span><span class="p">)</span><span class="nv">\037</span>
<span class="p">(</span><span class="nf">reverse</span> <span class="s">"sekun"</span><span class="p">))</span>
</pre>
<p>A more legitimate usecase is <a class="reference external" href="https://stackoverflow.com/a/6259330/8729149">a multi-line shebang</a>:</p>
<pre class="code elisp literal-block">
<span class="err">#</span><span class="nv">!/bin/sh</span>
<span class="err">#</span><span class="nv">@0</span> <span class="nv">-*-</span> <span class="nv">emacs-lisp</span> <span class="nv">-*-</span>
<span class="nv">exec</span> <span class="nv">emacs</span> <span class="nv">-Q</span> <span class="nv">--script</span> <span class="s">"$0"</span> <span class="nv">--</span> <span class="s">"$@"</span>
<span class="nv">exit</span>
<span class="err">#</span><span class="nv">\037</span>
<span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nf">equal</span> <span class="p">(</span><span class="nf">car</span> <span class="nv">argv</span><span class="p">)</span> <span class="s">"--"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">pop</span> <span class="nv">argv</span><span class="p">))</span>
<span class="p">(</span><span class="k">while</span> <span class="nv">argv</span>
<span class="p">(</span><span class="nf">message</span> <span class="s">"Argument: %S"</span> <span class="p">(</span><span class="nb">pop</span> <span class="nv">argv</span><span class="p">)))</span>
</pre>
<p>In case you want to experiment with this and want to use the correct
counts, here’s a quick and dirty command:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nb">defun</span> <span class="nv">cursed-elisp-block-comment</span> <span class="p">(</span><span class="nv">beg</span> <span class="nv">end</span><span class="p">)</span>
<span class="p">(</span><span class="k">interactive</span> <span class="s">"r"</span><span class="p">)</span>
<span class="p">(</span><span class="k">save-excursion</span>
<span class="p">(</span><span class="k">save-restriction</span>
<span class="p">(</span><span class="nf">narrow-to-region</span> <span class="nv">beg</span> <span class="nv">end</span><span class="p">)</span>
<span class="p">(</span><span class="nf">goto-char</span> <span class="p">(</span><span class="nf">point-min</span><span class="p">))</span>
<span class="c1">;; account for space and terminator</span>
<span class="p">(</span><span class="nf">insert</span> <span class="p">(</span><span class="nf">format</span> <span class="s">"#@%d "</span> <span class="p">(</span><span class="nf">+</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">end</span> <span class="nv">beg</span><span class="p">)</span> <span class="mi">2</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">goto-char</span> <span class="p">(</span><span class="nf">point-max</span><span class="p">))</span>
<span class="p">(</span><span class="nf">insert</span> <span class="s">"\037"</span><span class="p">))))</span>
</pre>
<p>There’s <a class="reference external" href="https://git.savannah.gnu.org/cgit/emacs.git/tree/src/lread.c?id=a602e86bc1c10f44dbe9d2680bece2f552a54707#n3297">one more undocumented feature though</a>, <tt class="docutils literal">#@00</tt> is
special-cased as EOF comment:</p>
<pre class="code c literal-block">
<span class="cm">/* Read a decimal integer. */</span><span class="w">
</span><span class="k">while</span><span class="w"> </span><span class="p">((</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">READCHAR</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="o">&&</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'9'</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">STRING_BYTES_BOUND</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">extra</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">nskip</span><span class="p">)</span><span class="w">
</span><span class="n">string_overflow</span><span class="w"> </span><span class="p">();</span><span class="w">
</span><span class="n">digits</span><span class="o">++</span><span class="p">;</span><span class="w">
</span><span class="n">nskip</span><span class="w"> </span><span class="o">*=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w">
</span><span class="n">nskip</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="sc">'0'</span><span class="p">;</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">digits</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">nskip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="cm">/* We've just seen #@00, which means "skip to end". */</span><span class="w">
</span><span class="n">skip_dyn_eof</span><span class="w"> </span><span class="p">(</span><span class="n">readcharfun</span><span class="p">);</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">Qnil</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>The EOF comment version can be used to create polyglots. An Emacs Lisp
script could end with <tt class="docutils literal">#@00</tt>, then concatenated with a file
tolerating leading garbage. The ZIP format is known for its permissive
behavior, thereby allowing you to embed several resources into one
file:</p>
<pre class="code shell-session literal-block">
<span class="gp">[wasa@box ~]$ </span>cat polyglot.el
<span class="gp-VirtualEnv">(message "This could be a whole wordle game")</span>
<span class="go" /><span class="gp-VirtualEnv">(message "I've attached some dictionaries for you though")</span><span class="gp">#</span>@00
<span class="gp">[wasa@box ~]$ </span>cat polyglot.el wordle.zip > wordle.el
<span class="gp">[wasa@box ~]$ </span>file wordle.el
<span class="go">wordle.el: data
</span><span class="gp">[wasa@box ~]$ </span>emacs --script wordle.el
<span class="go">This could be a whole wordle game
I've attached some dictionaries for you though
</span><span class="gp">[wasa@box ~]$ </span>unzip wordle.el
<span class="go">Archive: wordle.el
warning [wordle.el]: 109 extra bytes at beginning or within zipfile
(attempting to process anyway)
inflating: wordle.de
inflating: wordle.uk</span>
</pre>
<p>This could be combined with the multi-line shebang trick to create a
<a class="reference external" href="https://en.wikipedia.org/wiki/Self-extracting_archive">self-extracting archive format</a>. Or maybe an installer? Or just a
script that can access its own resources? Let me know if you have any
interesting ideas.</p>
<table class="docutils footnote" frame="void" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-1">[1]</a></td><td>Strictly speaking, <tt class="docutils literal"><span class="pre">#+(or)</span></tt> isn’t a comment, but a
conditional reader construct with an always false feature test.
While one may shorten it to <tt class="docutils literal">#+nil</tt> or <tt class="docutils literal"><span class="pre">#-t</span></tt>, that would be
incorrect because both may be registered features.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-2">[2]</a></td><td>Here’s a notable exception using the number sign instead. The
semicolon is a function for property access.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-3" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-3">[3]</a></td><td><tt class="docutils literal"><span class="pre">#|...|#</span></tt> and <tt class="docutils literal">#;</tt> are available as of R6RS and R7RS. R5RS
implementations may support them as non-standard extension.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="forbidden-emacs-lisp-knowledge-block-comments_footnote-4" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#forbidden-emacs-lisp-knowledge-block-comments_footnote-reference-4">[4]</a></td><td>Semicolons must be doubled or part of a block comment. This
feels like an unfortunate design choice for implementors.</td></tr>
</tbody>
</table>
tag:https://emacsninja.com,2022-02-12:/posts/forbidden-emacs-lisp-knowledge-block-comments.html2022-02-12T23:26:23+01:00Forbidden Emacs Lisp Knowledge: Block Comments2022-02-12T23:26:23+01:00Vasilij Schneidermann
<p>I’ve discovered a trivial stored XSS vulnerability in Checkmk 1.6.0p18
during an on-site penetration test and disclosed it responsibly to the
tribe29 GmbH. The vendor promptly confirmed the issue, fixed it and
announced an advisory. I’ve applied for a CVE, but didn’t get around
explaining the vulnerability in detail, therefore I’m publishing this
blog post to complete the process.</p>
<div class="section" id="summary">
<h2>Summary</h2>
<ul class="simple">
<li>CVSS Score: 5.4 (Medium)</li>
<li>CVSS: <tt class="docutils literal">CVSS:3.0/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:L/A:N</tt></li>
<li>Affected versions: 1.6.0p18 and earlier</li>
<li>Fixed version: 1.6.0p19, 2.0.0i1</li>
<li>Vendor advisory: <a class="reference external" href="https://checkmk.com/werk/11501">Werk #11501</a></li>
<li>Affected component: <a class="reference external" href="https://github.com/tribe29/checkmk/blob/v1.6.0p18/cmk/gui/htmllib.py#L120-L166">cmk/gui/htmllib.py:Escaper</a></li>
<li>Bug fix: <a class="reference external" href="https://github.com/tribe29/checkmk/commit/87ceb966b1ae46947b696232af84a4f9f0ab74e1">87ceb966</a>, <a class="reference external" href="https://github.com/tribe29/checkmk/commit/e7fd8e4c90be490e4293ec91804d00ec01af5ca6">e7fd8e4c</a></li>
</ul>
</div>
<div class="section" id="impact">
<h2>Impact</h2>
<p>The vulnerability requires an authenticated attacker with permission
to configure and share a custom view. Given these prerequisites, they
can inject arbitrary JavaScript into the view title by inserting a
HTML link with a JavaScript URL. If the attacker manages to trick a
user into clicking that link, the JavaScript URL is executed within
the user’s browser context.</p>
<p>There is a CSP policy in place, but it does not mitigate inline
JavaScript code in event handlers, links or script tags. An attacker
could therefore obtain confidential user data or perform UI
redressing.</p>
<p>The vulnerable code has been identified in versions below 1.6.0p18,
such as 1.6.0 and older. It is unclear in which version the
vulnerability has been introduced, therefore it’s recommended to
update to 1.6.0p19/2.0.0i1 or newer.</p>
</div>
<div class="section" id="detailed-description">
<h2>Detailed description</h2>
<p>The Checkmk GUI code uses a WordPress-style approach to handle HTML:
User input is encoded using HTML entities, then selectively decoded
with a regular expression looking for simple tags. As a special case,
the <tt class="docutils literal"><a></tt> tag gets its <tt class="docutils literal">href</tt> attribute unescaped as well to
enable hyperlinks. The attribute is not checked for its protocol,
thereby allowing URLs such as <tt class="docutils literal">javascript:alert(1)</tt>.</p>
<pre class="code python literal-block">
<span class="k">class</span> <span class="nc">Escaper</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Escaper</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_unescaper_text</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="sa">r</span><span class="s1">'&lt;(/?)(h1|h2|b|tt|i|u|br(?: /)?|nobr(?: /)?|pre|a|sup|p|li|ul|ol)&gt;'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_quote</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(?:&quot;|&#x27;)"</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_a_href</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'&lt;a href=((?:&quot;|&#x27;).*?(?:&quot;|&#x27;))&gt;'</span><span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">escape_text</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">HTML</span><span class="p">):</span>
<span class="k">return</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">text</span> <span class="c1"># This is HTML code which must not be escaped</span>
<span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">escape_attribute</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_unescaper_text</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">'<\1\2>'</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
<span class="k">for</span> <span class="n">a_href</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_a_href</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">a_href</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span>
<span class="s2">"<a href=</span><span class="si">%s</span><span class="s2">>"</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">_quote</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s2">"</span><span class="se">\"</span><span class="s2">"</span><span class="p">,</span> <span class="n">a_href</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"&amp;nbsp;"</span><span class="p">,</span> <span class="s2">"&nbsp;"</span><span class="p">)</span>
</pre>
<p>The above code is used for HTML generation. To exploit it, I started
looking for a HTML form and found that when editing a custom view, no
user input validation is performed on the view title (as opposed to
the view name).</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">page_edit_visual</span><span class="p">(</span><span class="n">what</span><span class="p">,</span>
<span class="n">all_visuals</span><span class="p">,</span>
<span class="n">custom_field_handler</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">create_handler</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">load_handler</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">info_handler</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">sub_pages</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="n">html</span><span class="o">.</span><span class="n">header</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
<span class="n">html</span><span class="o">.</span><span class="n">begin_context_buttons</span><span class="p">()</span>
<span class="n">back_url</span> <span class="o">=</span> <span class="n">html</span><span class="o">.</span><span class="n">get_url_input</span><span class="p">(</span><span class="s2">"back"</span><span class="p">,</span> <span class="s2">"edit_</span><span class="si">%s</span><span class="s2">.py"</span> <span class="o">%</span> <span class="n">what</span><span class="p">)</span>
<span class="n">html</span><span class="o">.</span><span class="n">context_button</span><span class="p">(</span><span class="n">_</span><span class="p">(</span><span class="s2">"Back"</span><span class="p">),</span> <span class="n">back_url</span><span class="p">,</span> <span class="s2">"back"</span><span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="n">vs_general</span> <span class="o">=</span> <span class="n">Dictionary</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="n">_</span><span class="p">(</span><span class="s2">"General Properties"</span><span class="p">),</span>
<span class="n">render</span><span class="o">=</span><span class="s1">'form'</span><span class="p">,</span>
<span class="n">optional_keys</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">elements</span><span class="o">=</span><span class="p">[</span>
<span class="n">single_infos_spec</span><span class="p">(</span><span class="n">single_infos</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'name'</span><span class="p">,</span>
<span class="n">TextAscii</span><span class="p">(</span>
<span class="n">title</span><span class="o">=</span><span class="n">_</span><span class="p">(</span><span class="s1">'Unique ID'</span><span class="p">),</span>
<span class="n">help</span><span class="o">=</span><span class="n">_</span><span class="p">(</span><span class="s2">"The ID will be used in URLs that point to a view, e.g. "</span>
<span class="s2">"<tt>view.py?view_name=<b>myview</b></tt>. It will also be used "</span>
<span class="s2">"internally for identifying a view. You can create several views "</span>
<span class="s2">"with the same title but only one per view name. If you create a "</span>
<span class="s2">"view that has the same view name as a builtin view, then your "</span>
<span class="s2">"view will override that (shadowing it)."</span><span class="p">),</span>
<span class="n">regex</span><span class="o">=</span><span class="s1">'^[a-zA-Z0-9_]+$'</span><span class="p">,</span>
<span class="n">regex_error</span><span class="o">=</span><span class="n">_</span><span class="p">(</span>
<span class="s1">'The name of the view may only contain letters, digits and underscores.'</span><span class="p">),</span>
<span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
<span class="n">allow_empty</span><span class="o">=</span><span class="kc">False</span><span class="p">)),</span>
<span class="p">(</span><span class="s1">'title'</span><span class="p">,</span> <span class="n">TextUnicode</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="n">_</span><span class="p">(</span><span class="s1">'Title'</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'<sup>*</sup>'</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">allow_empty</span><span class="o">=</span><span class="kc">False</span><span class="p">)),</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
</pre>
</div>
<div class="section" id="fix">
<h2>Fix</h2>
<p>Checkmk 1.6.0p19 and 2.0.0i1 parses the URL and validates its scheme
against an allowlist before unescaping. JavaScript URLs are therefore
left unescaped and not made clickable:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">escape_text</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">HTML</span><span class="p">):</span>
<span class="k">return</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">text</span> <span class="c1"># This is HTML code which must not be escaped</span>
<span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">escape_attribute</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_unescaper_text</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">'<\1\2>'</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
<span class="k">for</span> <span class="n">a_href</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_a_href</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">text</span><span class="p">):</span>
<span class="n">href</span> <span class="o">=</span> <span class="n">a_href</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="o">.</span><span class="n">urlparse</span><span class="p">(</span><span class="n">href</span><span class="p">)</span>
<span class="k">if</span> <span class="n">parsed</span><span class="o">.</span><span class="n">scheme</span> <span class="o">!=</span> <span class="s2">""</span> <span class="ow">and</span> <span class="n">parsed</span><span class="o">.</span><span class="n">scheme</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">"http"</span><span class="p">,</span> <span class="s2">"https"</span><span class="p">]:</span>
<span class="k">continue</span> <span class="c1"># Do not unescape links containing disallowed URLs</span>
<span class="n">target</span> <span class="o">=</span> <span class="n">a_href</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">if</span> <span class="n">target</span><span class="p">:</span>
<span class="n">unescaped_tag</span> <span class="o">=</span> <span class="s2">"<a href=</span><span class="se">\"</span><span class="si">%s</span><span class="se">\"</span><span class="s2"> target=</span><span class="se">\"</span><span class="si">%s</span><span class="se">\"</span><span class="s2">>"</span> <span class="o">%</span> <span class="p">(</span><span class="n">href</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">unescaped_tag</span> <span class="o">=</span> <span class="s2">"<a href=</span><span class="se">\"</span><span class="si">%s</span><span class="se">\"</span><span class="s2">>"</span> <span class="o">%</span> <span class="n">href</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">a_href</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">unescaped_tag</span><span class="p">)</span>
<span class="k">return</span> <span class="n">text</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"&amp;nbsp;"</span><span class="p">,</span> <span class="s2">"&nbsp;"</span><span class="p">)</span>
</pre>
</div>
<div class="section" id="timeline">
<h2>Timeline</h2>
<ul class="simple">
<li>2020-10-11: Initial contact with vendor</li>
<li>2020-10-12 - 2020-10-14: Further clarification with vendor</li>
<li>2020-10-20: Vendor advisory <a class="reference external" href="https://checkmk.com/werk/11501">Werk #11501</a> has been released</li>
<li>2020-10-26: Vendor notified me about a patch for Checkmk 1.6.0p19</li>
<li>2020-11-17: Applied for CVE</li>
<li>2020-11-18: Received CVE-2020-28919</li>
<li>2021-12-19: Released blog post</li>
<li>2022-01-15: NVD published</li>
</ul>
</div>
tag:https://emacsninja.com,2022-02-16:/posts/cve-2020-28919-stored-xss-in-checkmk-160p18.html2022-02-17T00:28:35+01:00CVE-2020-28919: Stored XSS in Checkmk 1.6.0p182022-02-17T00:28:35+01:00Vasilij Schneidermann
<p>Warning: Rant ahead. Feel free to skip the nstore backend section.</p>
<div class="section" id="motivation">
<h2>Motivation</h2>
<p>I’ve spent the past year looking into the fungi kingdom and the deeper
I look, the weirder it gets. One barrier of entry is identifying
mushrooms, with two different schools of thought:</p>
<ul class="simple">
<li>Carefully observing their features and using a <a class="reference external" href="https://en.wikipedia.org/wiki/Single-access_key">dichotomous key</a>
system to narrow down to a manageable set of matches. I found
<a class="reference external" href="http://www.mushroomexpert.com/major_groups.html">Michael Kuo’s website</a> useful for this.</li>
<li>Taking a few photos and letting a neural network analyze them.</li>
</ul>
<p>I’m not a fan of the latter approach for various reasons. You’re at
the mercy of the training set quality, <a class="reference external" href="https://openai.com/blog/multimodal-neurons/">it’s easy to subvert them</a>
and they’re essentially undebuggable. I also found that Wikipedia has
basic identification data on mushrooms. Therefore I thought it to be a
fun exercise to build my own web application for quickly narrowing
down interesting Wikipedia articles to read. You can find the code
over at <a class="reference external" href="https://depp.brause.cc/brause.cc/wald/">https://depp.brause.cc/brause.cc/wald/</a>, with the web
application itself hosted on <a class="reference external" href="https://wald.brause.cc/">https://wald.brause.cc/</a>.</p>
</div>
<div class="section" id="data-munging">
<h2>Data munging</h2>
<p>The mushroom data uses so-called mycomorphboxes to hold their
characteristics. Using the Wikipedia API one can query for the latest
revision of every page containing a mycomorphbox template and fetch
its contents in the form of JSON and Wiki markup.</p>
<p>While I like writing scrapers, I dislike that the programs tend to be
messy and require an online connection for every test run. I used the
chance to try out the ETL pattern, that is, writing separate programs
that perform the extraction (downloading data from the service while
avoiding tripping up API limits), transformation (massaging the data
into a form that’s easier to process) and loading (putting the data
into a database). I quite like it. Each part has its own unique
challenges and by sticking to a separate program I can fully focus on
it. Instead of fetching, transforming and loading up the data every
time, I focus on fetching it correctly to disk, then transform the
dump to a more useful form, then figure out how to load it into the
database. If more patterns of that kind emerge, I can see myself
writing utility libraries for them.</p>
</div>
<div class="section" id="data-stores">
<h2>Data stores</h2>
<p>There were two obvious choices for storing the data:</p>
<ul class="simple">
<li>Keeping it as JSON and just writing ugly code traversing the parse
tree.</li>
<li>Using SQLite because it’s a fast and reliable solution. That is,
once you’ve come up with a suitable schema fitting the problem at
hand.</li>
</ul>
<p>I wanted to try out something different this time, though - something
other than JSON or a relational database. Perhaps something in the
NoSQL realm that’s both pleasant to use and comes with a query
language. Or maybe some dumb key-value store to speed up loading and
dumping the data. I ended up going with a tuple store, but I’m still
considering to give graph and document databases a try. Here’s some
benchmark figures for querying all available filters and filtering
species with a complicated query:</p>
<pre class="code shell-session literal-block">
<span class="gp">[wasa@box ~]$ </span><span class="nb">time</span> <span class="nv">DB</span><span class="o">=</span>json ./benchmark mushrooms.json >/dev/null
<span class="go">Filters: 14898.5027832031μs
Query stats: 1808.65561523438μs
DB=json ./benchmark mushrooms.json > /dev/null 1.37s user 0.09s system 98% cpu 1.480 total
</span><span class="gp">[wasa@box ~]$ </span><span class="nb">time</span> <span class="nv">DB</span><span class="o">=</span>sqlite ./benchmark db.sqlite3 >/dev/null
<span class="go">Filters: 214.554809570313μs
Query stats: 3953.87497558594μs
DB=sqlite ./benchmark db.sqlite3 > /dev/null 0.24s user 0.01s system 96% cpu 0.253 total
</span><span class="gp">[wasa@box ~]$ </span><span class="nb">time</span> <span class="nv">DB</span><span class="o">=</span>nstore ./benchmark db.lmdb >/dev/null
<span class="go">Filters: 355414.137402344μs
Query stats: 407887.70847168μs
DB=nstore ./benchmark db.lmdb > /dev/null 8.15s user 0.05s system 99% cpu 8.250 total</span>
</pre>
<p>Bonus: There should be no hardcoded storage solution, but the
possibility to choose it at runtime. This would hopefully not
complicate things too much and encourage cleaner design. For this I
came up with a simple API revolving around establishing/closing a
database connection, performing a transaction on that connection and
querying filters/species on a transaction.</p>
<div class="section" id="json-backend">
<h3>JSON backend</h3>
<p>This was rather braindead code. It’s far from pretty, but does the job
surprisingly well. Queries are acceptably fast, so it makes for a nice
fallback. Initial loading time is a bit slow though, using a key-value
store like LMDB would help here. Maybe it’s time for a binary Scheme
serialization solution along the lines of Python’s pickle format, but
without <a class="reference external" href="https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/">the arbitrary code execution parts</a>…</p>
</div>
<div class="section" id="sqlite-backend">
<h3>SQLite backend</h3>
<p>It took considerable time to get the schema right. I ended up asking
another SQL expert for help with this and they taught me about EAV
tables. Another oddity was that the database only performed properly
after running ANALYZE once. The code itself is relatively short, but
makes use of lots of string concatenation to generate the search
query.</p>
</div>
<div class="section" id="nstore-backend">
<h3>nstore backend</h3>
<p>Retrospectively, this was quite the rabbit hole. I ignored the warning
signs, persisted and eventually got something working. But at what
cost?</p>
<p>My original plan was to use a graph database like Neo4j. I’ve seen it
used for <a class="reference external" href="https://neo4j.com/use-cases/social-network/">analysis of social graphs</a>, <a class="reference external" href="https://github.com/BloodHoundAD/BloodHound">Active Directory networks</a>
and <a class="reference external" href="https://joern.io/">source code</a>. It’s powerful, though clunky and oversized for my
purposes. If I can avoid it, I’d rather not run a separate Java
process and tune its garbage collection settings to play well with
everything else running on my VPS. On top of that I’d need to write a
database adaptor, be it for <a class="reference external" href="https://neo4j.com/docs/http-api/4.2/">their HTTP API</a> or <a class="reference external" href="https://neo4j-client.net/">the Bolt protocol</a>.
If you’re aware of a comparable in-process solution, I’d be all ears.
It doesn’t even need to do graphs (the data set doesn’t have any
connections), a JSON store with a powerful query language would be
sufficient.</p>
<p>I asked the <tt class="docutils literal">#scheme</tt> channel on Freenode about the topic of graph
databases and was told that tuple stores have equivalent power, while
being considerably easier to implement. <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> describes a
so-called nstore and comes with a sample in-memory implementation
depending on <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a> and a few others. Getting it running seemed
like an ideal weekend exercise. Or so I thought. I’ve run into the
following obstacles and weeks turned into months of drudgery:</p>
<ul class="simple">
<li>The specifications themselves are of subpar quality. It seems little
proofreading was done. There are minor errors in the language and
several unclear parts and outright design mistakes that render parts
of the library unusable. Unfortunately I noticed this long after the
SRFI has been finalized. While the process allows for errata, it
took some back and forth to get the most egregious faults in
<a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a> fixed. Some faults remain in <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> and the sample
implementation is incompatible with <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a> due to an API change.</li>
<li>There is no such thing as a query language. You get basic pattern
matching and <a class="reference external" href="https://srfi.schemers.org/srfi-158/srfi-158.html">SRFI-158</a> generators. Everything else, like grouping
results or sorting them, you must do yourself. For this reason the
nstore implementation is a bit more verbose than the JSON one.
<a class="reference external" href="https://x32.be/map-reduce.png">Relevant webcomic</a>.</li>
<li>The sample implementation itself depends on several other SRFIs,
most of which I had to port first. Granted, I only did this because
I wanted to contribute them properly to <a class="reference external" href="http://eggs.call-cc.org/5/">the CHICKEN coop</a>, but it
was still bothersome. I hacked on <a class="reference external" href="https://srfi.schemers.org/srfi-125/srfi-125.html">SRFI-125</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-126/srfi-126.html">SRFI-126</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-145/srfi-145.html">SRFI-145</a>,
<a class="reference external" href="https://srfi.schemers.org/srfi-146/srfi-146.html">SRFI-146</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-158/srfi-158.html">SRFI-158</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> plus alternative versions
of <a class="reference external" href="https://srfi.schemers.org/srfi-125/srfi-125.html">SRFI-125</a> (using a portable hash tables implementation instead of
the stock one) and <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a> (using LMDB for its backend).</li>
<li>Some of the SRFIs were particularly difficult to port. SRFI-125
turned out to neither work with stock hash tables (they’re
incompatible with R6RS-style APIs) nor the R6RS-style hash table
implementation provided by <a class="reference external" href="https://srfi.schemers.org/srfi-126/srfi-126.html">SRFI-126</a> (the stock implementation fails
with custom comparators and the portable <a class="reference external" href="https://srfi.schemers.org/srfi-69/srfi-69.html">SRFI-69</a> implementation
runs into an infinite loop when executing the test suite). <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a>
requires a custom backend for on-disk storage, I initially messed
around with <a class="reference external" href="https://github.com/pmwkaa/sophia">Sophia</a> for this (turned out to be unusable) and
eventually settled for a LMDB-backed implementation. The <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a>
and <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> eggs deviate from the official APIs and have therefore
not been published. For this reason only <a class="reference external" href="https://srfi.schemers.org/srfi-145/srfi-145.html">SRFI-145</a>, <a class="reference external" href="https://srfi.schemers.org/srfi-146/srfi-146.html">SRFI-146</a> and
<a class="reference external" href="https://srfi.schemers.org/srfi-158/srfi-158.html">SRFI-158</a> have been added to the coop.</li>
<li>During the time I worked on the project, some of the links pointing
towards documentation, implementations and example code broke and
pointed nowhere. When I communicated with the author, I got the
impression they had become dissatisfied with the project and wanted
to start over on a clean slate. Links have been replaced, but some
code has been permanently lost. Most recently they admitted they
don’t have any working implementation of <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a> and <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> at
hand. I consider this a deeply troubling sign for the health of the
project and therefore discourage anyone from relying on it.</li>
<li>Once I actually got everything running with LMDB for the backing
store, I was surprised to see awful overall performance. Even with
JSON a query takes only a few milliseconds, whereas here it’s two
orders of magnitude more. I did some light profiling and identified
hot paths in both <a class="reference external" href="https://srfi.schemers.org/srfi-128/srfi-128.html">SRFI-128</a> and <a class="reference external" href="https://srfi.schemers.org/srfi-167/srfi-167.html">SRFI-167</a>. For this reason the web
application is currently using the SQLite backend.</li>
<li>The APIs themselves are kind of clumsy. I worked around this with my
data storage abstraction, but it’s still something to look out for.
If you compare it to <a class="reference external" href="http://clojure-doc.org/articles/ecosystem/java_jdbc/home.html">clojure.jdbc</a> or <a class="reference external" href="http://wiki.call-cc.org/eggref/5/sql-de-lite">the sql-de-lite egg</a>,
there’s a few obvious usability improvements to be done.</li>
<li>Eventually, after criticism from other people, <a class="reference external" href="https://srfi-email.schemers.org/srfi-167/msg/16229013/">the entire SRFI was
considered to be withdrawn</a>. It hasn’t been withdrawn so far <a class="reference external" href="https://srfi-email.schemers.org/srfi-167/msg/16229089/">as
the process requires a replacement SRFI</a>. I believe this to be a
mistake.</li>
<li>The SRFI process in general has accelerated in the last few years
due to R7RS-large making heavy use of it for its dockets. There is
the occasional SRFI among them that is too ambitious in scope and
bound to become outdated. I believe this to be an abuse of the
standardization process, instead there should be experimentation on
a decoupled platform such as <a class="reference external" href="http://snow-fort.org/">Snow</a> or <a class="reference external" href="https://akkuscm.org/">Akku</a>. Once the project has
been approved by the community and implemented by several Scheme
systems, it can be considered for standardization. The <a class="reference external" href="https://github.com/pre-srfi">pre-srfi
repository</a> lists a few upcoming projects of that kind, such as
HTTP servers/clients, a P2P network proposal, a web UI library and
Emacs-style text manipulation. I’m doubtful they will be anywhere as
successful as existing non-portable Scheme libraries.</li>
</ul>
<p>Needless to say that I’ve become increasingly frustrated over time. To
the <a class="reference external" href="https://srfi.schemers.org/srfi-168/srfi-168.html">SRFI-168</a> author’s credit, they’ve always been civil, recognized
the design mistakes and are working on <a class="reference external" href="https://github.com/scheme-live/live/blob/okvslite-and-co/live/okvslite/README.md#status">a less ambitious replacement
library</a>. While I do regret the time that went into this adventure, I
have learned a few lessons:</p>
<ul class="simple">
<li>LMDB and key-value stores in general are great. They’re easy to
comprehend, have fast load times and can be a quick and dirty
solution when dealing with relational models is complete overkill.
I’m not sure whether ordered key-value stores are worth it though.</li>
<li>While it’s true that tuple stores are roughly equivalent in power to
graph databases, <a class="reference external" href="https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference/">graph databases still have the edge</a>. Mind you
though, this piece has been written by a Neo4j person, so it’s most
likely biased. Still, I’m inclined to believe their claims.</li>
<li>Portable code is cool, but it cannot compete with highly tuned
solutions. Do not expect a sample implementation of a database to
rival SQLite and friends.</li>
</ul>
</div>
</div>
<div class="section" id="web-frontend">
<h2>Web frontend</h2>
<p>I assumed this part to be way harder, but it only took me two days of
hacking without any sort of web framework backing it. I do miss some
of the conveniences I learned from writing Clojure web applications
though:</p>
<ul class="simple">
<li>I had to write my own database abstraction instead of using
<a class="reference external" href="http://clojure-doc.org/articles/ecosystem/java_jdbc/home.html">clojure.jdbc</a> and a connection string. On top of that there’s ugly
code to detect which database to use and perform a dynamic import.</li>
<li><a class="reference external" href="http://wiki.call-cc.org/eggref/5/sql-de-lite">Stuart Sierra’s component library</a> gives you easy dependency
injection. For example you can access configuration and database
connections from a HTTP handler directly instead of having to use
global or dynamically bound variables.</li>
<li>A <a class="reference external" href="https://github.com/ring-clojure/ring/wiki">ring</a>-style API with a request/response alist and middleware
manipulating them would improve discoverability considerably. It’s
no deal breaker though.</li>
</ul>
</div>
<div class="section" id="further-thoughts">
<h2>Further thoughts</h2>
<p>I’d have expected this project to suck any remaining enthusiasm for
writing web applications out of me, but it didn’t. While I’m not sure
whether I’ll stick to Scheme for them, I could see myself doing
another one soonish. I think I’ll abstain from investing more time
into databases though and hack on something else for the time being.</p>
</div>
tag:https://emacsninja.com,2021-03-17:/posts/on-fungi-and-data-stores.html2021-03-17T13:37:18+01:00On Fungi and Data Stores2021-03-17T13:37:18+01:00Vasilij Schneidermann
<p>My relationship with games is complicated. I never had the chance to
get good at them and few I’ve played have been any good. Despite that,
I had both the urge to complete the game and discover how they work
internally. As nearly all commercially developed games happen to be
proprietary, I focused on viewing and extracting their asset files, an
art not unlike reverse engineering of executable files.</p>
<p>Fast-forward many years and I still occasionally play games. At least
I have proper tools at hand now and the knowledge to make sense of
binary formats. Another plus is that people have come to discover the
benefits of the open source spirit to collaborate and share their
knowledge online. Recently I’ve taken a closer look at <a class="reference external" href="https://store.steampowered.com/app/415420/Nyan_Cat_Lost_In_Space/">a certain meme
game</a> in my Steam library. Many of its assets (music, sound effects,
fonts and a single texture) are stored as regular files on disk,
however, there’s an 79M asset file, presumably holding the missing
textures for the game sprites and backgrounds. This blog post will
explore its custom format and inner workings in enough detail to write
your own extraction program.</p>
<div class="section" id="reconnaissance">
<h2>Reconnaissance</h2>
<p>For starters I’ve opened the file in <a class="reference external" href="https://github.com/radareorg/radare2">my favorite hex editor</a>
editor and browsed through it, looking for obvious patterns such as
human-readable strings, repetitive byte sequences and anything not
looking like random noise. I’ve found the following:</p>
<ul class="simple">
<li>A very short header that doesn’t contain any human-readable file
signatures.</li>
<li>Several file paths, each terminated with a null byte.</li>
<li>Several 16-byte entries, with columns lining up almost perfectly.</li>
<li>Several concatenated files, identified by file signatures for the
<a class="reference external" href="https://en.wikipedia.org/wiki/WebP">WebP</a>, <a class="reference external" href="https://en.wikipedia.org/wiki/Portable_Network_Graphics">PNG</a> and <a class="reference external" href="https://en.wikipedia.org/wiki/XML">XML</a> formats.</li>
</ul>
<p>Here’s some screenshots, with the relevant patterns highlighted:</p>
<p>Header and paths section:</p>
<img alt="/img/nyancat-header-paths.png" src="/img/nyancat-header-paths.png" />
<p>Mysterious 16-byte entries, with many even-numbered columns being
zeroes<a class="footnote-reference" href="#lost-in-space_footnote-1" id="lost-in-space_footnote-reference-1">[1]</a>:</p>
<img alt="/img/nyancat-index-patterns.png" src="/img/nyancat-index-patterns.png" />
<p>WebP file header in files section:</p>
<img alt="/img/nyancat-files-webp.png" src="/img/nyancat-files-webp.png" />
<p>XML file header in files section:</p>
<img alt="/img/nyancat-files-xml.png" src="/img/nyancat-files-xml.png" />
<p>PNG file header in files section:</p>
<img alt="/img/nyancat-files-png.png" src="/img/nyancat-files-png.png" />
<p>Given the information so far, several hypotheses can be established:</p>
<ul class="simple">
<li>The number of paths is the same as the number of embedded files and
every path corresponds to an embedded file.</li>
<li>The file contains information about how long each embedded file is.</li>
<li>The mystery section (which I’ll call the index from now on) contains
that information in each of its 16-byte entries</li>
<li>Each of these entries corresponds to a path and embedded file.</li>
<li>The association between path, entry and embedded file is ordered,
for example the first path corresponds to the first entry and first
embedded file.</li>
</ul>
</div>
<div class="section" id="verification">
<h2>Verification</h2>
<p>Each hypothesis can be proven by doing basic mathematics. The most
fundamental assumptions the format relies upon are the number of
paths, index entries and embedded files being the same, and the length
of each embedded file being stored somewhere else in the file,
presumably the index section. I decided to start with the latter, for
which I picked the first embedded file, a WebP image<a class="footnote-reference" href="#lost-in-space_footnote-2" id="lost-in-space_footnote-reference-2">[2]</a>. Its length
can be determined by looking at bytes 4 to 7, decoding them as
unsigned little-endian 32-bit integer and adding 8 to include the
length of the preceding header. The obtained length can be verified by
seeking to the beginning of the file in the hex editor, then seeking
by the length<a class="footnote-reference" href="#lost-in-space_footnote-3" id="lost-in-space_footnote-reference-3">[3]</a> and checking whether that position corresponds to
the start of the next file. Likewise, the length of a PNG file can be
obtained by looking for the <tt class="docutils literal">IEND</tt> sequence followed by a 32-bit
checksum and for XML files by looking for the closing tag.</p>
<p>The first file is 2620176 bytes long and is immediately followed by a
XML file describing it. It corresponds to either <tt class="docutils literal">0027fb10</tt> or
<tt class="docutils literal">10fb2700</tt> when encoded to hex, depending on whether it’s big- or
little-endian. And indeed, the latter value shows up in the last 4
bytes of the first 16-byte entry. I’ve then subsequently verified
whether this property holds true by extracting the file length from
the second 16-byte entry and applying it to the second embedded file.</p>
<p>This left verifying the number of embedded files by counting the
number of paths and entries in their respective sections. I’ve found
335 of them in each, represented as <tt class="docutils literal">4f010000</tt> using the previously
encountered little-endian hex notation. That number corresponds to
bytes 4 to 7 in the header, leaving two 4-byte numbers around it. I
haven’t been able to deduce the meaning of the preceding one, but the
succeeding one is <tt class="docutils literal">a6210000</tt> which corresponds to 8614, the length
of all paths immediately following the file header, thereby giving me
all information necessary to extract the assets.</p>
</div>
<div class="section" id="extraction">
<h2>Extraction</h2>
<p>The file format deduced so far:</p>
<pre class="code literal-block">
# header
# 4-byte integer (unknown)
# 4-byte integer (number of filenames)
# 4-byte integer (length of filenames section)
# paths
# null terminated string (path)
# repeat count times
# index
# 4-byte integer (unknown)
# 4-byte integer (unknown)
# 4-byte integer (unknown)
# 4-byte integer (file length)
# repeat count times
# data
# file length bytes
# repeat count times
</pre>
<p>Expressed in pseudo code:</p>
<pre class="code python literal-block">
<span class="n">read_integer</span><span class="p">()</span>
<span class="n">filenames_count</span> <span class="o">=</span> <span class="n">read_integer</span><span class="p">()</span>
<span class="n">filenames_length</span> <span class="o">=</span> <span class="n">read_integer</span><span class="p">()</span>
<span class="n">filenames</span> <span class="o">=</span> <span class="n">read_bytes</span><span class="p">(</span><span class="n">filenames_length</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\x00</span><span class="s2">"</span><span class="p">)</span>
<span class="n">index</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">filenames_count</span><span class="p">):</span>
<span class="n">read_integer</span><span class="p">()</span>
<span class="n">read_integer</span><span class="p">()</span>
<span class="n">read_integer</span><span class="p">()</span>
<span class="n">file_length</span> <span class="o">=</span> <span class="n">read_integer</span><span class="p">()</span>
<span class="n">index</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">filenames</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">file_length</span><span class="p">]</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">index</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">read_bytes</span><span class="p">(</span><span class="n">index</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">write_bytes</span><span class="p">(</span><span class="n">index</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">data</span><span class="p">)</span>
</pre>
<p>A reward you’ve earned:</p>
<img alt="/img/nyancat-texture-yodacat.png" src="/img/nyancat-texture-yodacat.png" />
</div>
<div class="section" id="further-thoughts">
<h2>Further thoughts</h2>
<p>Performing the analysis and writing the extraction program took me a
few hours. It could have been a lot trickier, especially if my goal
was to perform game modding. This would require to extract the files,
modify them, then repack them back into the asset file without the
game noticing a change. To do this safely, it’s necessary to perform
deeper analysis of the unknown fields, for example by looking into
other matching metadata of every embedded file or by reverse
engineering the game itself.</p>
<p>Another common problem is that data doesn’t always form clear
patterns, for example if it’s encrypted, compressed or random-looking
for other reasons. Sometimes formats are optimized towards programmer
convenience and may store data necessary to verify the asset file
inside the game instead. This would again not pose a challenge to a
reverse engineer, but would still complicate automatic extraction.</p>
<p>Sometimes team work is necessary. Chances are that tools have been
developed for popular games and may only need minor adjustments to get
working again. One resource I’ve found immensely helpful to gain a
better understanding of common patterns is <a class="reference external" href="http://wiki.xentax.com/index.php/DGTEFF">The Definitive Guide To
Exploring File Formats</a>.</p>
<table class="docutils footnote" frame="void" id="lost-in-space_footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#lost-in-space_footnote-reference-1">[1]</a></td><td><tt class="docutils literal">radare2</tt> can shift the file contents around in visual mode
by using the <tt class="docutils literal">h</tt> and <tt class="docutils literal">l</tt> movement keys. This is useful to
force the entries to align into the expected columns.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="lost-in-space_footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#lost-in-space_footnote-reference-2">[2]</a></td><td>The first path suggests a PNG file, but the first embedded file
used the WebP format. This threw me off for a while, my working
theory is that the artist mislabeled WebP files as PNGs and the
game engine they’ve used auto-detected their contents without
any hitch. Good for them!</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="lost-in-space_footnote-3" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#lost-in-space_footnote-reference-3">[3]</a></td><td><tt class="docutils literal">radare2</tt> offers the <tt class="docutils literal">s+</tt> command for this purpose.</td></tr>
</tbody>
</table>
</div>
tag:https://emacsninja.com,2021-01-13:/posts/lost-in-space.html2021-01-13T20:14:01+01:00Lost In Space2021-01-13T20:14:01+01:00Vasilij Schneidermann
<p><strong>Update</strong>: Added <a class="reference external" href="http://rocky.github.io/elisp-bytecode.pdf">a helpful link</a> explaining more opcodes.</p>
<p><strong>Note</strong>: This is an expanded version of <a class="reference external" href="https://old.reddit.com/r/emacs/comments/j2a0jg/how_does_defadvice_work/g75lltq/">this Reddit post</a>.</p>
<p>Advice is one of those Emacs Lisp features that you don’t see often in
other programming languages. It enables you to extend almost any
function you’d like by executing code before/after/instead of it and
messing with arguments/return values. But how does it work? And
which of the two implementations of it should be used?</p>
<div class="section" id="on-advice-el">
<h2>On advice.el</h2>
<p>Somewhat surprisingly, <tt class="docutils literal">advice.el</tt> consists of more than 3000 lines,
but more than half of them are comments. It doesn’t quite reach
literate programming level of commentary, but explains its internals
and includes a small tutorial explaining how it works. There are many
bells and whistles, but to keep things simple I’ll focus on the part
of the tutorial that changes a function to manipulate its argument
before execution of the function body. That body can be
programmatically obtained using <tt class="docutils literal"><span class="pre">symbol-function</span></tt>:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
<span class="s">"Add 1 to X."</span>
<span class="p">(</span><span class="nf">1+</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="nf">symbol-function</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; => (defun foo (x) "Add 1 to X." (1+ x))</span>
</pre>
<p>The example advice <tt class="docutils literal"><span class="pre">fg-add2</span></tt> adds one to <tt class="docutils literal">x</tt> again before the
actual code is run:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nb">defadvice</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">before</span> <span class="nv">fg-add2</span> <span class="nv">first</span><span class="p">)</span>
<span class="s">"Add 2 to X."</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">x</span> <span class="p">(</span><span class="nf">1+</span> <span class="nv">x</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">symbol-function</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; #[128 "<bytecode>"</span>
<span class="c1">;; [apply ad-Advice-foo (lambda (x) "Add 1 to X." (1+ x)) nil]</span>
<span class="c1">;; 5 nil]</span>
</pre>
<p>Yikes. How does one make sense of the byte-code?</p>
</div>
<div class="section" id="interlude-byte-code-disassembly">
<h2>Interlude: Byte-code disassembly</h2>
<p>Emacs Lisp contains two interpreters, a tree walker (takes a s-exp as
input, walks along it and evaluates the branches) and a byte-code
interpreter (takes bytecode, interprets it using a stack VM).
<tt class="docutils literal">bytecomp.el</tt> and <tt class="docutils literal"><span class="pre">byte-opt.el</span></tt> transform s-expressions into
optimized byte-code. I can recommend studying these to understand how
a simple compiler works. The result of this is code expressed in a
stack-oriented fashion using up to 256 fundamental operations<a class="footnote-reference" href="#a-piece-of-advice_footnote-1" id="a-piece-of-advice_footnote-reference-1">[1]</a>.
One can look at it with the <tt class="docutils literal">disassemble</tt> function, which accepts
both function symbols and function definitions:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nv">disassemble</span> <span class="p">(</span><span class="nb">lambda</span> <span class="p">()</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1">;; byte code:</span>
<span class="c1">;; args: nil</span>
<span class="c1">;; 0 constant 1</span>
<span class="c1">;; 1 return</span>
</pre>
<p>What happens here is that the constant 1 is pushed to the stack, then
the top of stack is returned. Arguments are treated in a similar
manner:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nv">disassemble</span> <span class="p">(</span><span class="nb">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">))</span>
<span class="c1">;; byte code:</span>
<span class="c1">;; args: (x)</span>
<span class="c1">;; 0 varref x</span>
<span class="c1">;; 1 return</span>
</pre>
<p>Instead of putting a constant on the stack, the value of x is looked
up and pushed to the stack. Finally, an easy function call looks as
follows:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nv">disassemble</span> <span class="p">(</span><span class="nb">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nf">message</span> <span class="s">"%S: %S"</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; byte code:</span>
<span class="c1">;; args: (a b)</span>
<span class="c1">;; 0 constant message</span>
<span class="c1">;; 1 constant "%S: %S"</span>
<span class="c1">;; 2 varref a</span>
<span class="c1">;; 3 varref b</span>
<span class="c1">;; 4 call 3</span>
<span class="c1">;; 5 return</span>
</pre>
<p>Four values are pushed on the stack in function call order, then a
function is called with three arguments. The four stack values are
replaced with its result, then returned. We’re almost ready to tackle
the actually interesting disassembly now and can look up all other
unknown opcodes in <a class="reference external" href="http://rocky.github.io/elisp-bytecode.pdf">this unofficial manual</a>.</p>
<p>You may wonder though, why bother? Why not just use <a class="reference external" href="https://github.com/rocky/elisp-decompile">a decompiler</a>?
Or even avoid dealing with byte-compiled code in the first place…
It turns out there are a few reasons going for it:</p>
<ul class="simple">
<li>Ideally you’d always have access to source code. This is not always
an option. For example it’s not unheard of for an Emacs
installation to only ship byte-compiled sources (hello Debian).
Likewise defining advice as above will byte-compile the function.
Byte-code compilation is done as performance enhancement and
backtraces from optimized functions will contain byte-code.</li>
<li>The byte-code decompiler we have is clunky and incomplete. It
sometimes fails to make sense of byte-code, meaning you cannot rely
on it. Another thing to consider is that byte-code doesn’t have to
originate from the official byte-code compiler, there’s other
projects generating byte-code that the decompiler may not target.
Suppose someone wants to thwart analysis of (presumably malicious
code), hand-written byte-code would be an option.</li>
<li>Sometimes byte-code is studied to understand the performance of an
Emacs Lisp function. It’s easier to reason about byte-code than
regular code, especially to see <a class="reference external" href="https://nullprogram.com/blog/2017/01/30/">the effects of lexical binding</a>.</li>
<li>It’s educational to wade through <tt class="docutils literal">bytecode.c</tt> and other Emacs
internals. While there isn’t too much benefit of understanding
Emacs byte-code, the same lessons apply to other stack-oriented VMs,
such as the JVM. Learning this makes reversing proprietary programs
targeting the JVM (such as Android apps) much easier and enables
advanced techniques such as binary patching<a class="footnote-reference" href="#a-piece-of-advice_footnote-2" id="a-piece-of-advice_footnote-reference-2">[2]</a>.</li>
</ul>
</div>
<div class="section" id="on-advice-el-continued">
<h2>On advice.el (continued)</h2>
<p>We’re ready to unravel what <tt class="docutils literal">foo</tt> does:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nv">disassemble</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; byte code for foo:</span>
<span class="c1">;; args: (x)</span>
<span class="c1">;; 0 constant apply</span>
<span class="c1">;; 1 constant ad-Advice-foo</span>
<span class="c1">;; 2 constant (lambda (x) "Add 1 to X." (1+ x))</span>
<span class="c1">;; 3 stack-ref 3</span>
<span class="c1">;; 4 call 3</span>
<span class="c1">;; 5 return</span>
</pre>
<p><tt class="docutils literal">apply</tt>, <tt class="docutils literal"><span class="pre">ad-Advice-foo</span></tt> and a lambda are placed on the stack.
Then, stack element 3 (zero-indexed) is added to the top of stack. We
already know that elements 0, 1 and 2 are the three constants, element
3 however is the first argument passed to the function. As it turns
out, when lexical binding is enabled, the <tt class="docutils literal"><span class="pre">stack-ref</span></tt> opcode is used
instead of <tt class="docutils literal">varref</tt>. Therefore the byte-code presented is
equivalent to <tt class="docutils literal">(lambda (&rest arg) (apply <span class="pre">'ad-Advice-foo</span> (lambda (x)
"Add 1 to X." (1+ <span class="pre">x)))</span> arg)</tt>. You can verify by disassembling that
lambda and compare the output with the previous disassembly.</p>
<p>What does <tt class="docutils literal"><span class="pre">ad-Advice-foo</span></tt> do though?</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nv">disassemble</span> <span class="ss">'ad-Advice-foo</span><span class="p">)</span>
<span class="c1">;; byte code for ad-Advice-foo:</span>
<span class="c1">;; args: (ad--addoit-function x)</span>
<span class="c1">;; 0 constant nil</span>
<span class="c1">;; 1 varbind ad-return-value</span>
<span class="c1">;; 2 varref x</span>
<span class="c1">;; 3 add1</span>
<span class="c1">;; 4 varset x</span>
<span class="c1">;; 5 varref ad--addoit-function</span>
<span class="c1">;; 6 varref x</span>
<span class="c1">;; 7 call 1</span>
<span class="c1">;; 8 dup</span>
<span class="c1">;; 9 varset ad-return-value</span>
<span class="c1">;; 10 unbind 1</span>
<span class="c1">;; 11 return</span>
</pre>
<p>This is a bit more to unravel. <tt class="docutils literal">varbind</tt> introduces a temporary
variable, <tt class="docutils literal">unbind</tt> undoes this binding, <tt class="docutils literal">varset</tt> is equivalent to
<tt class="docutils literal">set</tt> and <tt class="docutils literal">dup</tt> pushes a copy of top of stack (kind of like
<tt class="docutils literal"><span class="pre">stack-ref</span> 0</tt> would do). The sequence of <tt class="docutils literal">constant nil</tt> and
<tt class="docutils literal">varbind <span class="pre">ad-return-value</span></tt> is the same as <tt class="docutils literal">(let <span class="pre">((ad-return-value</span>
nil)) <span class="pre">...)</span></tt>. <tt class="docutils literal">x</tt> is retrieved, incremented by 1 and <tt class="docutils literal">x</tt> set to
the result of that, therefore <tt class="docutils literal">(setq x (1+ x))</tt>. Then
<tt class="docutils literal"><span class="pre">ad--addoit-function</span></tt> is called with <tt class="docutils literal">x</tt> as argument. The result
of that is duplicated and <tt class="docutils literal"><span class="pre">ad-return-value</span></tt> is set to it. Finally
stack item 1 is unbound, presumably the temporary variable. Therefore
the byte-code is equivalent to <tt class="docutils literal">(let <span class="pre">(ad-return-value)</span> (setq x (1+
x)) (setq <span class="pre">ad-return-value</span> (funcall <span class="pre">ad--addoit-function</span> <span class="pre">x)))</span></tt>. Let’s
see how <tt class="docutils literal">nadvice.el</tt> fares.</p>
</div>
<div class="section" id="on-nadvice-el">
<h2>On nadvice.el</h2>
<p>It’s tiny compared to <tt class="docutils literal">advice.el</tt>, at only 391 lines of code. To
nobody’s surprise it’s lacking bells and whistles such as changing
argument values directly or not activating advice immediately.
Therefore some adjustments are required to create the equivalent
advice with it:</p>
<pre class="code elisp literal-block">
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo-advice</span> <span class="p">(</span><span class="nv">args</span><span class="p">)</span>
<span class="p">(</span><span class="nf">mapcar</span> <span class="ss">'1+</span> <span class="nv">args</span><span class="p">))</span>
<span class="p">(</span><span class="nv">advice-add</span> <span class="ss">'foo</span> <span class="nb">:filter-args</span> <span class="ss">'foo-advice</span><span class="p">)</span>
<span class="p">(</span><span class="nf">symbol-function</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; #[128 "<bytecode>" [apply foo-advice (lambda (x) "Add 1 to X." (1+ x)) nil] 5 nil]</span>
<span class="p">(</span><span class="nv">disassemble</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; byte code for foo:</span>
<span class="c1">;; args: (x)</span>
<span class="c1">;; 0 constant apply</span>
<span class="c1">;; 1 constant (lambda (x) "Add 1 to X." (1+ x))</span>
<span class="c1">;; 2 constant foo-advice</span>
<span class="c1">;; 3 stack-ref 3</span>
<span class="c1">;; 4 call 1</span>
<span class="c1">;; 5 call 2</span>
<span class="c1">;; 6 return</span>
</pre>
<p>We have our three constants and <tt class="docutils literal">x</tt> on the stack. At first a
function is called with one argument, that would be <tt class="docutils literal"><span class="pre">foo-advice</span></tt>
with <tt class="docutils literal">x</tt> (which represents the argument list). Then a function is
called with two arguments, that is <tt class="docutils literal">apply</tt> with the lambda and the
result of the previous function call. In other words, <tt class="docutils literal">(lambda
(&rest x) (apply (lambda (x) "Add 1 to X." (1+ x)) <span class="pre">(foo-advice</span> <span class="pre">x)))</span></tt>.
It was a bit less convenient to write, but far easier to understand.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p><tt class="docutils literal">nadvice.el</tt> is surprisingly elegant, striking a good balance
between amount of overall features and technical simplicity. Unless
you maintain a package that must keep compatibility with Emacs 24.3 or
earlier, I don’t see a good reason to go for <tt class="docutils literal">advice.el</tt>.</p>
<table class="docutils footnote" frame="void" id="a-piece-of-advice_footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#a-piece-of-advice_footnote-reference-1">[1]</a></td><td>Or in short, opcode. A byte represents up to 256 values, hence
the “byte-code” name.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="a-piece-of-advice_footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#a-piece-of-advice_footnote-reference-2">[2]</a></td><td>Simple protections rely on checking a conditional and executing
good/bad code. This tends to compile down to a conditional
jump. Switch out the jump opcode for the opposite one and
it will execute bad/good code instead…</td></tr>
</tbody>
</table>
</div>
tag:https://emacsninja.com,2020-11-14:/posts/a-piece-of-advice.html2020-11-14T20:48:00+01:00A Piece of Advice2020-11-14T20:48:00+01:00