Farewell, GitHub

16/05/2020

I’m sorry for the dramatic title. Note that I won’t delete my GitHub account or stop using it, there’s plenty projects hosted there that I contribute to. This blog post is about my experience with hosting my personal projects on GitHub and why I stopped doing that.

What’s wrong with GitHub?

It depends on who you ask. There’s a lot going for GitHub:

  • Pretty Git repo viewer with integrated issue tracker, wiki and more.
  • Many projects chose it as their home.
  • Lots of convenience features.
  • Highly reliable hosting.
  • Social network effects.

On the other hand there’s a few reasons not to use it:

  • Don’t put all your eggs into one basket.
  • Slow and unreliable at times.
  • Owned by Microsoft now.
  • Proprietary SaaS.

All of these are good and important points, but they’re unrelated to my move to selfhosting. Over time I’ve come to dislike the workflow GitHub helped popularizing:

  • Sign up if you haven’t already
  • Fork repository in Browser
  • Clone forked repository
  • Create new branch
  • Perform changes on that branch
  • Push branch
  • Click on “Create pull request” button
  • Describe changes and overall motivation

Some projects required an email-driven workflow, for example by virtue of not being hosted on GitHub and only offering the committer’s email address as contact option:

  • Clone repository
  • Perform changes
  • Format patch
  • Write an email with the patch attached, describing changes and overall motivation

If you’re feeling fancy, you can even set up Git to handle emails for you and combine the last two steps into one. I haven’t done that yet; https://git-send-email.io/ provides a simple tutorial for it explaining the finer details.

I’ve come to enjoy this process considerably more, mostly because it doesn’t waste my time on needless Git operations[1]. Another nice side effect is that one takes more time composing email, thereby resulting in a higher quality conversation with the other project. Similarly, there’s other workflows where public discussion on GitHub is not an option, for example when reporting security issues to a project. In this case it’s common for the project to provide an email address and GPG key for secure communication.

On issue trackers

GitHub’s issue tracker is clean and tidy. It may not be as well suited for large projects, user support or discussing matters beyond bug reports, but that didn’t stop people from using it for all these things. Sometimes they’ll go as far as asking other projects to use GitHub just for that one feature.

I have a completely different problem with it though. Looking back at the timeline between an issue being opened and closed, they tend to either follow the pattern of being resolved quickly (anywhere between a few hours and up to a week) or staying open for a long time, sometimes up to years. Another such pattern is that development activity for my projects tends to start with an initial burst of up to a month, followed by silence and occasional bugfixes. As soon as the project reached good enough or even finished status, chances are that I won’t do any further development on it. Seeing a repository with many open issues makes me unhappy, especially if there’s nothing I can do about it in the short-term. I’ve seen other projects automating their issue tracker grooming by closing issues without recent activity, but it feels dishonest to me and like sweeping the problem under the rug.

For this reason I’ve decided to go with a different approach, following the strategy I’ve come up with for bugs involving attachments that may not be shared publicly for legal reasons[2]: Send me an email and make sure to include the attachment. Whenever I receive an email, I make sure to reply to it, this goes back and forth until a conclusion has been reached (or the issue reporter stops bothering). This worked surprisingly well so far and integrates seamlessly into my inbox zero workflow.

A stupid Git viewer

There is no shortage when it comes to self-hostable Git viewers, but most of them are a tad too social for my needs, with Sourcehut being the closest match. Another issue for me is security, if I can avoid it I’d rather not run software with a history of security issues. After lots of pondering I decided to build a tool for administration of Git repositories and generation of a static website, satisfying the following requirements:

  • Convert an existing repository to a self-hosted one
  • Generate tarballs for all tags[3]
  • Provide a raw view of files
  • Provide a file listing
  • Render READMEs

This excludes a lot of convenience features like browsing the Git history, other branches, syntax highlighting[4], search and so on. To perform these you need to clone the repo and perform the operations locally. This is what I end up doing anyway when searching a repository and doing more serious work. Another bonus of this strategy is having a full copy of the project at hand, which means no more need for a fork button.

My main inspiration design-wise is Evan Hanson’s Git site. The Linux Git User Manual helped figuring out how one sets up Git to pull via HTTPS and push via SSH. Serving a Git repo via HTTPS requires enabling the example post-update hook, to regenerate the files and tarballs a post-receive hook is used. Only a handful of Git commands were necessary for the remaining operations:

  • git init: Initialize (bare) repository
  • git cat-file: Obtain contents of file
  • git update-server-info: Update files for serving via HTTPS
  • git ls-tree: Display (full) file tree with metadata
  • git archive: Create release tarballs
  • git tag: Display tags for release tarballs

This leaves some advanced nginx config. Normally serving a statically generated site is simple, but in this case I want index pages to be served as HTML (using text/html as mimetype) and raw files as plain text (using text/plain as mimetype). Not all raw files though, images, videos and more should still be served with the appropriate mimetype. This took some headscratching and experimentation, but I eventually figured it all out and can fully recommend nginx beyond its traditional role as ridiculously fast HTTP server with great reverse proxying support.

You can find my repositories over at https://depp.brause.cc, with the main repository powering this on https://depp.brause.cc/depp and some auxiliary tools like git-export (convert existing repository to self-hosted one) and git-depp (perform maintenance operations in local repo on remote server, like regenerating static files).

What now?

I’ve successfully migrated all repositories I own and continue maintaining from GitHub. Some of them are pulled in by the MELPA repository and CHICKEN’s egg system. I’ve put up a basic FAQ for contributors and am curious to see whether they’ll bother contributing or whether there will instead be forks of the archived repositories on GitHub.

[1]I already have a clone of the repository, why do I need to clone a fork? After I’m done, why do I need to delete the fork again? If I don’t delete it, why do I have to pull updates using an upstream branch? Bonus: Try explaining all that to someone new to Git.
[2]I’ve happened to write an EPUB mode for Emacs. Most issues can be fixed by carefully studying backtraces, some require the EPUB file that triggered the error. I’d rather not get DMCA notices if I can avoid it, hence why I ask people to share it in private.
[3]CHICKEN Scheme mandates eggs to either provide a file list or tarball for each release. The latter option is far easier to get right.
[4]There is no reason why this couldn’t be implemented in a more generic way. For example there’s a service to serve raw GitHub HTML files of a repo with the text/html mimetype. Something similar could be done for syntax highlighting and more comfortable code view in general, for the rare case where it’s useful.