December 17, 2018

hackergotchi for Norbert Preining

Norbert Preining

Git and Autotools – a hate relation?

For a project where I contribute I started to rewrite the build system using autotools (autoconf/automake), since this is one of the most popular build systems as far as I know. The actual conversion turned out to be not too complicated. But it turned out that in modern world of git(hub), it seems that the autotools have lost their place.

Source of all the problems is the dreaded

WARNING: 'aclocal-1.16' is missing on your system.
         You should only need it if you modified 'acinclude.m4' or
         '' or m4 files included by ''.

which, when you search for it, appears all over projects hosted on some instance of git. The reason is rather simple: git does not preserve timestamps, and autotools try to rebuild each and everything from the most basic files upward if there is a timestamp squeeze.

Consequences of this PITA is that:

  • Every developer needs to install the whole autotools stack, and before running anything always has to do a autoreconf -i
  • github generated tarballs might have arbitrary time stamps, so release management via git tags is borken, and users who want to install the software package might see the same errors

The best answer I have found to this problem is to use the make dist generated tarballs and replace the github generated ones with the home-generated ones. Not an optimal solution that is purely git based.

Another option that seems to work is mentioned here: enable maintainer mode by adding AM_MAINTAINER_MODE to This is discouraged in the automake manual. But then, the automake manual’s only reference to a VCS is CVS – which is, well, slightly outdated nowadays.

If anyone has a working solution that fixes the above two problems I would be grateful to hear about it.

So where to go from here in the modern world? The answer is (for me) meson and ninja: no intermediate generation of files, no generated files that are necessary for distribution (configure,, no bunch of maintainer mode, distribution mode, extra complicated targets, readable source files, … For the named project I also created a meson/ninja build system and it turned out to be much easier to handle.

The other path I could take (and actually implemented) is dropping automake, and only use autoconf to build Makefile from a pre-made, without the This is in our case where there is no need for probably the approach best when one wants to use autotools.

Bottomline for me: give up on automake if you want to use git – either meson/ninja or autoconf only.

17 December, 2018 12:07AM by Norbert Preining

December 16, 2018

Iustin Pop

Why on earth are 512e HDDs used?

I guess backwards compatibility is just another form of entropy. So this is all expected, but still…

Case in point, 512e hard drives. Wikipedia has a nice article on this, so I’ll skip the intro, and just go directly to my main point.

It’s 2018. End of 2018, more precisely. Linux has long supported 4K sectors, Windows since Win10 (OK, so recent), but my sampling on existing HDDs:

  • All of Western Digital’s hard drives under its own brand (WD rainbow series) is using 512e (or 512n, for ≤4TB or so); not even the (recently retired) “Gold” series, top of the top, provides 4Kn.
  • HGST has a large variety for the same size/basic model, but in Switzerland at least, all the 4Kn variants seem to be rare than unicorns; whereas the 512e are available in stock all over the place.

I tried to find HDDs ≥8TB with 4Kn (and ideally ISE), but no luck; mostly “zero available in stock, no availability at supplier, no orders”. Sure, Switzerland is a small market, but at the same time, exact same size/model but with 512e is either available, or already on order. I’ve seen at one shop a 512e model where the order backlog was around 200 HDDs.

I guess customer demand is what drives the actual stocking of models. So why on earth are people still buying hard drives using 512e format? Is it because whatever virtualisation backend they use doesn’t support yet 4Kn (at all or well)? I’ve seen some mentions of this, but they seemed to be from about 2 years ago.

Or is it maybe because most of HDDs are still used as boot drives? Very unlikely, I think, especially at high-end, where the main/only advantage is humongous size (for the price).

I also haven’t seen any valid performance comparisons (if any). I suppose since Linux knows the physical sector size, it can always do I/O in proper sizes, but still, a crutch is a crutch. And 4Kn allows going above 2TB limit for the old-style MBR partition tables, if that matters.

Side-note: to my surprise, the (somewhat old now) SSDs I have in my machine are supposedly “512n”, although everybody knows their minimal block size is actually much, much larger - on the order of 128KB-512KB. I/O size is not same as erase block size, but still, one could think they are more truthful and will report their proper physical block size.

Anyway, rant over. Let’s not increase the entropy even more…

16 December, 2018 11:45PM

hackergotchi for Gregor Herrmann

Gregor Herrmann

RC bugs 2018/49-50

as mentioned in my last blog post, I attended the Bug Squashing Party in bern two weeks ago. I think alltogether we were quite productive, as seen on the list of usertagged bugs. – here's my personal list of release-critical bugs that I touched at the BSP or afterwards:

  • #860523 – python-dogtail: "python-dogtail: /bin/sniff fails to start on a merged-/usr system"
    apply patch found in the BTS, QA upload
  • #873997 – src:openjpeg2: "FTBFS with Java 9 due to -source/-target only"
    apply patch from the BTS, upload to DELAYED/5
  • #879624 – xorg: "xorg: After upgrade to buster: system doesnt start x-server anymore but stop reacting"
    lower severity
  • #884658 – dkms: "dkms: Should really depends on dpkg-dev for dpkg-architecture"
    add a comment to the bug log
  • #886120 – ctpp2: "makes ctpp2 randomly FTBFS, syntax errors, hides problems"
    apply patch from the BTS, upload to DELAYED/5
  • #886836 – libgtkmm-2.4-dev,libgtkmm-2.4-doc: "libgtkmm-2.4-dev,libgtkmm-2.4-doc: both ship usr/share/doc/libgtkmm-2.4-dev/examples/*"
    propose a patch
  • #887602 – dia: "dia: Detect freetype via pkg-config"
    add patch from upstream merge request, upload to DELAYED/5
  • #890595 – phpmyadmin: "phpmyadmin: warnings when running under php 7.2, apparently fixed by new upstream series 4.7.x"
    prepare a debdiff
  • #892121 – src:libxml-saxon-xslt2-perl: "libxml-saxon-xslt2-perl FTBFS with libsaxonhe-java"
    upload with patch from racke (pkg-perl)
  • #892444 – src:ttfautohint: "ttfautohint: Please use 'pkg-config' to find FreeType 2"
    add patch from Hilko Bengen in BTS, upload to DELAYED/5, then fixed in a maintainer upload
  • #898752 – src:node-webpack: "node-webpack: FTBFS and autopkgtest failure on 32-bit"
    apply proposal from BTS, upload to DELAYED/5
  • #900395 – xserver-xorg-input-all: "xserver-xorg-input-all: keyboard no longer working after dist-upgrade"
    downgrade and tag moreinfo
  • #906187 – facter-dev: "facter-dev: missing Depends: libfacter3.11.0 (= ${binary:Version})"
    add missing Depends, upload to DELAYED/5
  • #906977 – src:parley: "parley: FTBFS in buster/sid ('QItemSelection' does not name a type)"
    add patch from upstream git, upload to DELAYED/5
  • #907975 – libf2fs-format-dev: "libf2fs-format-dev: missing Breaks+Replaces: libf2fs-dev (<< 1.11)"
    add missing Breaks+Replaces, upload to DELAYED/5
  • #908147 – cups-browsed: "restarting cups-browsed deleted print jobs"
    try to reproduce
  • #911324 – carton: "carton: Carton fails due to missing Menlo::CLI::Compat dependency"
    package missing new dependencies and depend on them (pkg-perl)
  • #912046 – ebtables: "ebtables provides the same executables as iptables packages without using alternatives"
    add a comment to the bug log
  • #912380 – man-db: "endless looping, breaks dpkg"
    add a comment to the bug log
  • #912685 – src:net-snmp: "debian/rules is not binNMU safe"
    add a comment to the bug log
  • #913179 – libprelude: "libprelude: FTBFS with glibc 2.28; cherrypicked patches attached"
    take patch from BTS, upload to DELAYED/10
  • #915297 – src:libtest-bdd-cucumber-perl: "libtest-bdd-cucumber-perl: FTBFS: README.pod: No such file or directory"
    remove override from debian/rules (pkg-perl)
  • #915806 – src:jabref: "jabref FTBFS"
    add build-dep and jar to classpath (pkg-java)

16 December, 2018 06:32PM

December 15, 2018

Craig Small

WordPress 5.0.1

While I missed the WordPress 5.0 release, it was only a few more days before there was a security release out.

So WordPress 5.0.1 will be available in Debian soon. This is both a security update from 5.0.1 and a huge feature update from the 4.9.x versions to the 5.0 versions.

The WordPress website, in their 5.0 announcement describe all the changes better, but one of the main things is the new editor (which I’m using as I write this).  It’s certainly cleaner, or perhaps more sparse. I’m not sure if I like it yet.

The security fixes (there are 7) are the usual things you expect from a WordPress security update. The usual XSS and permission problems type stuff.

I have also in the 5.0.1 Debian package removed the build dependency to libphp-phpmailer. The issue with that package is there won’t be any more security updates for the version in Debian. WordPress has an embedded version of it which *I hope* they maintain. There is an issue about the phpmailer in WordPress, so hopefully it gets fixed soon.

15 December, 2018 11:43PM by Craig

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

linl 0.0.3: Micro release

Our linl package for writing LaTeX letter with (R)markdown had a fairly minor release today, following up on the previous release well over a year ago. This version just contains one change which Mark van der Loo provided a few months ago with a clean PR. As another user was just bitten the same issue when using an included letterhead – which was fixed but unreleased – we decided it was time for a release. So there it is.

linl makes it easy to write letters in markdown, with some extra bells and whistles thanks to some cleverness chiefly by Aaron.

Here is screenshot of the vignette showing the simple input for some moderately fancy output:

The NEWS entry follows:

Changes in linl version 0.0.3 (2018-12-15)

  • Correct LaTeX double loading of package color with different options (Mark van der Loo in #18 fixing #17).

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the linl page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

15 December, 2018 11:12PM

hackergotchi for Gunnar Wolf

Gunnar Wolf

Tecnicatura Universitaria en Software Libre: First bunch of graduates!

December starts for our family in Argentina, and in our second day here, I was invited to Facultad de Ingeniería y Ciencias Hídricas (FICH) of Universidad Nacional del Litoral (UNL). FICH-UNL has a short (3 year), distance-learning career called Tecnicatura Universitaria en Software Libre (TUSL).

This career opened in 2015, and today we had the graduation exams for three of its students — It's no small feat for a recently created career to start graduating their first bunch! And we had one for each of TUSL's "exit tracks" (Administration, development and education).

The topics presented by the students were:

  1. An introductory manual for performing migrations and installations of free software-based systems
  2. Design and implementation of a steganography tool for end-users
  3. A Lego system implementation for AppBuilder
  4. I will try to link soon; the TUSL staff is quite well aligned to freedom, transparency and responsibilty.

15 December, 2018 09:32PM by gwolf

Ingo Juergensmann

Adding NVMe to Server

My server runs on a RAID10 of 4x WD RAID 2 TB disks. Basically those disks are fast enough to cope with the disk load of the virtual machines (VMs). But since many users moved away from Facebook and Google, my Friendica installation on Nerdica.Net has a growing user count putting a large disk I/O load with many small reads & writes on the disks, resulting a slowing down the general disk I/O for all the VMs and the server itself. On mdraid-sync-Sunday this month the server needed two full days to sync its RAID10.

So the idea was to remove the high disk I/O load from the rotational disks the something different. For that reason I bought a Samsung Pro 970 512 GB NVMe disk and a matching PCIe 3.0 card to be put into my server in the colocation. On Thursday the Samsung has been installed by the rrbone staff in the colocation. I moved the PostgreSQL and MySQL databases from the RAID10 to the NVMe disk and restarted services again.

Here are some results from Munin monitoring: 

Disk Utilization

Here you can see how the disk utilization dropped after NVMe installation. The red coloured bar symbolizes the average utilization on RAID10 disks and the green bar symbolizes the same RAID10 after the databases were moved to the NVMe disk. There's roughly 20% less utilization now, whch is good.

Disk Latency

Here you can see the same coloured bars for the disk latency. As you can see the latency dropped by 1/3 now.

CPU I/O wait

The most significant graph is maybe the CPU graph where you can see a large portion of iowait of the CPUs. This is no longer true as there is apparently no significant iowait anymore thanks to the low latency and high IOPS nature of SSD/NVMe disks.

Overall I cannot confirm that adding the NVMe disk results in a significant faster page load of Friendica or Mastodon, maybe because other measurements like Redis/Memcached or pgbouncer already helped a lot before the NVMe disk, but it helps a lot with general disk I/O load and improving disk speeds inside of the VMs, like for my regular backups and such.

Ah, one thing to report is: in a quick test pgbench reported >2200 tps on NVMe now. That at least is a real speed improvement, maybe by order of 10 or so.


15 December, 2018 05:45PM by ij

Petter Reinholdtsen

Learn to program with Minetest on Debian

A fun way to learn how to program Python is to follow the instructions in the book "Learn to program with Minecraft", which introduces programming in Python to people who like to play with Minecraft. The book uses a Python library to talk to a TCP/IP socket with an API accepting build instructions and providing information about the current players in a Minecraft world. The TCP/IP API was first created for the Minecraft implementation for Raspberry Pi, and has since been ported to some server versions of Minecraft. The book contain recipes for those using Windows, MacOSX and Raspian. But a little known fact is that you can follow the same recipes using the free software construction game Minetest.

There is a Minetest module implementing the same API, making it possible to use the Python programs coded to talk to Minecraft with Minetest too. I uploaded this module to Debian two weeks ago, and as soon as it clears the FTP masters NEW queue, learning to program Python with Minetest on Debian will be a simple 'apt install' away. The Debian package is maintained as part of the Debian Games team, and the packaging rules are currently located under 'unfinished' on Salsa.

You will most likely need to install several of the Minetest modules in Debian for the examples included with the library to work well, as there are several blocks used by the example scripts that are provided via modules in Minetest. Without the required blocks, a simple stone block is used instead. My initial testing with a analog clock did not get gold arms as instructed in the python library, but instead used stone arms.

I tried to find a way to add the API to the desktop version of Minecraft, but were unable to find any working recipes. The recipes I found are only working with a standalone Minecraft server setup. Are there any options to use with the normal desktop version?

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

15 December, 2018 02:30PM

hackergotchi for Norbert Preining

Norbert Preining

TeX Live/Debian updates 20181214

Another month passed, and the (hoepfully) last upload for this year brings updates to the binaries, to the usual big set of macro and font packages, and some interesting and hopefully useful changes.

The version for the binary packages is 2018.20181214.49410-1 and is based on svn revision 49410. That means we get:

  • support some primitives from pdftex in xetex
  • dviout updates
  • kpathsea change for brace expansion
  • various memory fixes, return value fixes etc

The version of the macro/font packages is 2018.20181214-1 and contains the usual set of updated and new packages, see below for the complete list. There is one interesting functional change, we have replaced the . in the various TEXMF* variables with TEXMFDOTDIR, which in turn is defined to .. By itself no functional changes, but it allows users to redefine TEXMFDOTDIR to say .// for certain projects, which will make kpathsea automatically find all files in the current directory and below.

Now for the full list of updates and new packages. Enjoy!

New packages

cweb-old, econ-bst, eqexpl, icon-appr, makecookbook, modeles-factures-belges-assocs, pdftex-quiet, pst-venn, tablvar, tikzlings, xindex.

Updated packages

a2ping, abnt, abntex2, acmart, acrotex, addlines, adigraph, amsmath, animate, auto-pst-pdf-lua, awesomebox, babel, babel-german, beamer, beamertheme-focus, beebe, bib2gls, biblatex-abnt, biblatex-archaeology, biblatex-ext, biblatex-gb7714-2015, biblatex-publist, bibleref, bidi, censor, changelog, colortbl, convbkmk, covington, cryptocode, cweb, dashundergaps, datatool, datetime2-russian, datetime2-samin, diffcoeff, dynkin-diagrams, ebgaramond, enumitem, fancyvrb, glossaries-extra, grabbox, graphicxsp, hagenberg-thesis, hyperref, hyperxmp, ifluatex, japanese-otf-uptex, japanese-otf-uptex-nonfree, jigsaw, jlreq, jnuexam, jsclasses, ketcindy, knowledge, kpathsea, l3build, l3experimental, l3kernel, latex, latex2man, libertinus-otf, lipsum, luatexko, lwarp, mcf2graph, memoir, modeles-factures-belges-assocs, oberdiek, pdfx, platex, platex-tools, plautopatch, polexpr, pst-eucl, pst-fractal, pst-func, pst-math, pst-moire, pstricks, pstricks-add, ptex2pdf, quran, rec-thy, register, reledmac, rutitlepage, sourceserifpro, srdp-mathematik, suftesi, svg, tcolorbox, tex4ht, texdate, thesis-ekf, thesis-qom, tikz-cd, tikzducks, tikzmarmots, tlshell, todonotes, tools, toptesi, ucsmonograph, univie-ling, uplatex, widows-and-orphans, witharrows, xepersian, xetexref, xindex, xstring, xurl, zhlineskip.

15 December, 2018 05:58AM by Norbert Preining

December 14, 2018

hackergotchi for Rapha&#235;l Hertzog

Raphaël Hertzog

Freexian’s report about Debian Long Term Support, November 2018

A Debian LTS logoLike each month, here comes a report about the work of paid contributors to Debian LTS.

Individual reports

In November, about 209 work hours have been dispatched among 14 paid contributors. Their reports are available:

Evolution of the situation

In November we had a few more hours available to dispatch than we had contributors willing and able to do the work and thus we are actively looking for new contributors. Please contact Holger if you are interested to become a paid LTS contributor.

The number of sponsored hours stayed the same at 212 hours per month but we actually lost two sponsors and gained a new one (silver level).

The security tracker currently lists 30 packages with a known CVE and the dla-needed.txt file has 32 packages needing an update.

Thanks to our sponsors

New sponsors are in bold.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

14 December, 2018 10:50PM by Raphaël Hertzog

hackergotchi for Eddy Petri&#537;or

Eddy Petrișor

rust for cortex-m7 baremetal

Update 14 December 2018: After the release of stable 1.31.0 (aka 2018 Edition), it is no longer necessary to switch to the nightly channel to get access to thumb7em-none-eabi / Cortex-M4 and Cortex-M7 components. Updated examples and commands accordingly.
For more details on embedded development using Rust, the official Rust embedded docs site is the place to go, in particular, you can start with The embedded Rust book.
This is a reminder for myself, if you want to install Rust for a baremetal Cortex-M7 target, this seems to be a tier 3 platform:

Higlighting the relevant part:

Target std rustc cargo notes
msp430-none-elf * 16-bit MSP430 microcontrollers
sparc64-unknown-netbsd NetBSD/sparc64
thumbv6m-none-eabi * Bare Cortex-M0, M0+, M1
thumbv7em-none-eabi *

Bare Cortex-M4, M7
thumbv7em-none-eabihf * Bare Cortex-M4F, M7F, FPU, hardfloat
thumbv7m-none-eabi * Bare Cortex-M3
x86_64-unknown-openbsd 64-bit OpenBSD

In order to enable the relevant support, use the nightly build and use stable >= 1.31.0 and add the relevant target:
eddy@feodora:~/usr/src/rust-uc$ rustup show
Default host: x86_64-unknown-linux-gnu

installed toolchains

nightly-x86_64-unknown-linux-gnu (default)

active toolchain

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.28.0-nightly (cb20f68d0 2018-05-21)
eddy@feodora:~/usr/src/rust$ rustup show
Default host: x86_64-unknown-linux-gnu

stable-x86_64-unknown-linux-gnu (default)
rustc 1.31.0 (abe02cefd 2018-12-04)

If not using nightly, switch to that:

eddy@feodora:~/usr/src/rust-uc$ rustup default nightly-x86_64-unknown-linux-gnu
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.28.0-nightly (cb20f68d0 2018-05-21)
Add the needed target:
eddy@feodora:~/usr/src/rust$ rustup target add thumbv7em-none-eabi
info: downloading component 'rust-std' for 'thumbv7em-none-eabi'
info: installing component 'rust-std' for 'thumbv7em-none-eabi'
eddy@feodora:~/usr/src/rust$ rustup show
Default host: x86_64-unknown-linux-gnu

installed targets for active toolchain


active toolchain

stable-x86_64-unknown-linux-gnu (default)
rustc 1.31.0 (abe02cefd 2018-12-04)
Then compile with --target.

14 December, 2018 09:20PM by eddyp (

hackergotchi for Ben Hutchings

Ben Hutchings

Debian LTS work, November 2018

I was assigned 20 hours of work by Freexian's Debian LTS initiative and worked all those hours.

I prepared and released another stable update for Linux 3.16 (3.16.61), but have not yet included this in a Debian upload.

I updated the firmware-nonfree package to fix security issues in wifi firmware and to provide additional firmware that may be requested by drivers in Linux 4.9. I issued DLA 1573-1 to describe this update.

I worked on documenting a bug in the installer that delays installation of updates to the kernel. The installer can only be updated as part of a point release, and these are not done after the LTS team takes on responsibility for a release. Laura Arjona drafted an addition to the Errata section of the official jessie installer page and I reviewed this. I also updated the LTS/Installing wiki page.

I also participated in various discussions on the debian-lts mailing list.

14 December, 2018 07:18PM

December 13, 2018

Molly de Blanc

The OSD and user freedom

Some background reading

The relationship between open source and free software is fraught with people arguing about meanings and value. In spite of all the things we’ve built up around open source and free software, they reduce down to both being about software freedom.

Open source is about software freedom. It has been the case since “open source” was created.

In 1986 the Four Freedoms of Free Software (4Fs) were written. In 1998 Netscape set its source code free. Later that year a group of people got together and Christine Peterson suggested that, to avoid ambiguity, there was a “need for a better name” than free software. She suggested open source after open source intelligence. The name stuck and 20 years later we argue about whether software freedom matters to open source, because too many global users of the term have forgotten (or never knew) that some people just wanted another way to say software that ensures the 4Fs.

Once there was a term, the term needed a formal definition: how to we describe what open source is? That’s where the Open Source Definition (OSD) comes in.

The OSD is a set of ten points that describe what an open source license looks like. The OSD came from the Debian Free Software Guidelines. The DFSG themselves were created to “determine if a work is free” and ought to be considered a way of describing the 4Fs.

Back to the present

I believe that the OSD is about user freedom. This is an abstraction from “open source is about free software.” As I eluded to earlier, this is an intuition I have, a thing I believe, and an argument I’m have a very hard time trying to make.

I think of free software as software that exhibits or embodies software freedom — it’s software created using licenses that ensure the things attached to them protect the 4Fs. This is all a tool, a useful tool, for protecting user freedom.

The line that connects the OSD and user freedom is not a short one: the OSD defines open source -> open source is about software freedom -> software freedom is a tool to protect user freedom. I think this is, however, a very valuable reduction we can make. The OSD is another tool in our tool box when we’re trying to protect the freedom of users of computers and computing technology.

Why does this matter (now)?

I would argue that this has always mattered, and we’ve done a bad job of talking about it. I want to talk about this now because its become increasingly clear that people simply never understood (or even heard of) the connection between user freedom and open source.

I’ve been meaning to write about this for a while, and I think it’s important context for everything else I say and write about in relation to the philosophy behind free and open source software (FOSS).

FOSS is a tool. It’s not a tool about developmental models or corporate enablement — though some people and projects have benefited from the kinds of development made possible through sharing source code, and some companies have created very financially successful models based on it as well. In both historical and contemporary contexts, software freedom is at the heart of open source. It’s not about corporate benefit, it’s not about money, and it’s not even really about development. Methods of development are tools being used to protect software freedom, which in turn is a tool to protect user freedom. User freedom, and what we get from that, is what’s valuable.

Side note

At some future point, I’ll address why user freedom matters, but in the mean time, here are some talks I gave (with Karen Sandler) on the topic.

13 December, 2018 07:50PM by mollydb

hackergotchi for Joachim Breitner

Joachim Breitner

Thoughts on bootstrapping GHC

I am returning from the reproducible builds summit 2018 in Paris. The latest hottest thing within the reproducible-builds project seems to be bootstrapping: How can we build a whole operating system from just and only source code, using very little, or even no, binary seeds or auto-generated files. This is actually concern that is somewhat orthogonal to reproducibility: Bootstrappable builds help me in trusting programs that I built, while reproducible builds help me in trusting programs that others built.

And while they make good progress bootstrapping a full system from just a C compiler written in Scheme, and a Scheme interpreter written in C, that can build each other (Janneke’s mes project), and there are plans to build that on top of stage0, which starts with a 280 bytes of binary, the situation looks pretty bad when it comes to Haskell.

Unreachable GHC

The problem is that contemporary Haskell has only one viable implementation, GHC. And GHC, written in contemporary Haskell, needs GHC to be build. So essentially everybody out there either just downloads a binary distribution of GHC. Or they build GHC from source, using a possibly older (but not much older) version of GHC that they already have. Even distributions like Debian do nothing different: When they build the GHC package, the builders use, well, the GHC package.

There are other Haskell implementations out there. But if they are mature and active developed, then they are implemented in Haskell themselves, often even using advanced features that only GHC provides. And even those are insufficient to build GHC itself, let alone the some old and abandoned Haskell implementations.

In all these cases, at some point an untrusted binary is used. This is very unsatisfying. What can we do? I don’t have the answers, but please allow me to outline some venues of attack.

Retracing history

Obviously, even GHC does not exist since the beginning of time, and the first versions surely were built using something else than GHC. The oldest version of GHC for which we can find a release on the GHC web page is version 0.29 from July 1996. But the installation instructions write:

GHC 0.26 doesn't build with HBC. (It could, but we haven't put in the effort to maintain it.)

GHC 0.26 is best built with itself, GHC 0.26. We heartily recommend it. GHC 0.26 can certainly be built with GHC 0.23 or 0.24, and with some earlier versions, with some effort.

GHC has never been built with compilers other than GHC and HBC.

So it seems that besides GHC, only ever HBC was used to compiler GHC. HBC is a Haskell compiler where we find the sources of one random version only thanks to Parts of it are written in C, so I looked into this: Compile HBC, use it to compile GHC-0.29, and then step for step build every (major) version of GHC until today.

The problem is that it is non-trivial to build software from the 90s using today's compilers. I briefly looked at the HBC code base, and had to change some files from using varargs.h to stdargs.v, and this is surely just one of many similar stumbling blocks trying to build that tools. Oh, and even the hbc source state

# To get everything done: make universe
# It is impossible to make from scratch.
# You must have a running lmlc, to
# recompile it (of course).

So I learned that actually, most of it is written in LML, and the LML compiler is written in LML. So this is a dead end. (Thanks to Lennart for clearing up a misunderstanding on my side here.

Going back, but doing it differently

Another approach is to go back in time, to some old version of GHC, but maybe not all the way to the beginning, and then try to use another, officially unsupported, Haskell compiler to build GHC. This is what rekado tried to do in 2017: He use the most contemporary implementation of Haskell in C, the Hugs interpreter. Using this, he compiled nhc98 (yet another abandoned Haskell implementation), with the hope of building GHC with nhc98. He made impressive progress back then, but ran into a problem where the runtime crashed. Maybe someone is interested in picking up up from there?

Removing, simplifying, extending, in the present.

Both approaches so far focus on building an old version of GHC. This adds complexity: other tools (the shell, make, yacc etc.) may behave different now in a way that causes hard to debug problems. So maybe it is more fun and more rewarding to focus on today’s GHC? (At this point I am starting to hypothesize).

I said before that no other existing Haskell implementation can compile today’s GHC code base, because of features like mutually recursive modules, the foreign function interface etc. And also other existing Haskell implementations often come with a different, smaller set of standard libraries, but GHC assumes base, so we would have to build that as well...

But we don’t need to build it all. Surely there is much code in base that is not used by GHC. Also, much code in GHC that we do not need to build GHC, and . So by removing that, we reduce the amount of Haskell code that we need to feed to the other implementation.

The remaining code might use some features that are not supported by our bootstrapping implementation. Mutually recursive module could be manually merged. GADTs that are only used for additional type safety could be replaced by normal ones, which might make some pattern matches incomplete. Syntactic sugar can be desugared. By simplifying the code base in that way, one might be able a fork of GHC that is within reach of the likes of Hugs or nhc98.

And if there are features that are hard to remove, maybe we can extend the bootstrapping compiler or interpreter to support them? For example, it was mostly trivial to extend Hugs with support for the # symbol in names – and we can be pragmatic and just allow it always, since we don’t need a standards conforming implementation, but merely one that works on the GHC code base. But how much would we have to implement? Probably this will be more fun in Haskell than in C, so maybe extending nhc98 would be more viable?

Help from beyond Haskell?

Or maybe it is time to create a new Haskell compiler from scratch, written in something other than Haskell? Maybe some other language that is reasonably pleasant to write a compiler in (Ocaml? Scala?), but that has the bootstrappability story already sorted out somehow.

But in the end, all variants come down to the same problem: Writing a Haskell compiler for full, contemporary Haskell as used by GHC is hard and really a lot of work – if it were not, there would at least be implementations in Haskell out there. And as long as nobody comes along and does that work, I fear that we will continue to be unable to build our nice Haskell ecosystem from scratch. Which I find somewhat dissatisfying.

13 December, 2018 01:02PM by Joachim Breitner (

hackergotchi for Junichi Uekawa

Junichi Uekawa

Already December.

Already December. Nice. I tried using tramp for a while but I am back to mosh. tramp is not usable when ssh connection is not reliable.

13 December, 2018 11:39AM by Junichi Uekawa

hackergotchi for Keith Packard

Keith Packard


Newt: A Tiny Embeddable Python Subset

I've been helping teach robotics programming to students in grades 5 and 6 for a number of years. The class uses Lego models for the mechanical bits, and a variety of development environments, including Robolab and Lego Logo on both Apple ][ and older Macintosh systems. Those environments are quite good, but when the Apple ][ equipment died, I decided to try exposing the students to an Arduino environment so that they could get another view of programming languages.

The Arduino environment has produced mixed results. The general nature of a full C++ compiler and the standard Arduino libraries means that building even simple robots requires a considerable typing, including a lot of punctuation and upper case letters. Further, the edit/compile/test process is quite long making fixing errors slow. On the positive side, many of the students have gone on to use Arduinos in science research projects for middle and upper school (grades 7-12).

In other environments, I've seen Python used as an effective teaching language; the direct interactive nature invites exploration and provides rapid feedback for the students. It seems like a pretty good language to consider for early education -- "real" enough to be useful in other projects, but simpler than C++/Arduino has been. However, I haven't found a version of Python that seems suitable for the smaller microcontrollers I'm comfortable building hardware with.

How Much Python Do We Need?

Python is a pretty large language in embedded terms, but there's actually very little I want to try and present to the students in our short class (about 6 hours of language introduction and another 30 hours or so of project work). In particular, all we're using on the Arduino are:

  • Numeric values
  • Loops and function calls
  • Digital and analog I/O

Remembering my childhood Z-80 machine with its BASIC interpreter, I decided to think along those lines in terms of capabilities. I think I can afford more than 8kB of memory for the implementation, and I really do want to have "real" functions, including lexical scoping and recursion.

I'd love to make this work on our existing Arduino Duemilanove compatible boards. Those have only 32kB of flash and 2kB of RAM, so that might be a stretch...

What to Include

Exploring Python, I think there's a reasonable subset that can be built here. Included in that are:

  • Lists, numbers and string types
  • Global functions
  • For/While/If control structures.

What to Exclude

It's hard to describe all that hasn't been included, but here's some major items:

  • Objects, Dictionaries, Sets
  • Comprehensions
  • Generators (with the exception of range)
  • All numeric types aside from single-precision float


Newt is implemented in C, using flex and bison. It includes the incremental mark/sweep compacting GC system I developed for my small scheme interpreter last year. That provides a relatively simple to use and efficient memory system.

The Newt “Compiler”

Instead of directly executing a token stream as my old BASIC interpreter did, Newt is compiling to a byte coded virtual machine. Of course, we have no memory, so we don't generate a parse tree and perform optimizations on that. Instead, code is generated directly in the grammar productions.

The Newt “Virtual Machine”

With the source compiled to byte codes, execution is pretty simple -- read a byte code, execute some actions related to it. To keep things simple, the virtual machine has a single accumulator register and a stack of other values.

Global and local variables are stored in 'frames', with each frame implemented as a linked list of atom/value pairs. This isn't terribly efficient in space or time, but was quick to implement the required Python semantics for things like 'global'.

Lists and tuples are simple arrays in memory, just like C Python. I use the same sizing heuristic for lists that Python does; no sense inventing something new for that. Strings are C strings.

When calling a non-builtin function, a new frame is constructed that includes all of the formal names. Those get assigned values from the provided actuals and then the instructions in the function are executed. As new locals are discovered, the frame is extended to include them.


Any new language implementation really wants to have a test suite to ensure that the desired semantics are implemented correctly. One huge advantage for Newt is that we can cross-check the test suite by running it with Python.

Current Status

I think Newt is largely functionally complete at this point; I just finished adding the limited for statement capabilities this evening. I'm sure there are a lot of bugs to work out, and I expect to discover additional missing functionality as we go along.

I'm doing all of my development and testing on my regular x86 laptop, so I don't know how big the system will end up on the target yet.

I've written 4836 lines of code for the implementation and another 65 lines of Python for simple test cases. When compiled -Os for x86_64, the system is about 36kB of text and another few bytes of initialized data.


The source code is available from my server at, and also at github It is licensed under the GPLv2 (or later version).

13 December, 2018 07:55AM

December 12, 2018

hackergotchi for Jonathan Dowland

Jonathan Dowland

Game Engine Black Book: DOOM

Fabien's proof copies

Fabien's proof copies

*proud smug face*

proud smug face

Today is Doom's' 25th anniversary. To mark the occasion, Fabien Sanglard has written and released a book, Game Engine Black Book: DOOM.

It's a sequel of-sorts to "Game Engine Black Book: Wolfenstein 3D", which was originally published in August 2017 and has now been fully revised for a second edition.

I had the pleasure of proof-reading an earlier version of the Doom book and it's a real treasure. It goes into great depth as to the designs, features and limitations of PC hardware of the era, from the 386 that Wolfenstein 3D targetted to the 486 for Doom, as well as the peripherals available such as sound cards. It covers NeXT computers in similar depth. These were very important because Id Software made the decision to move all their development onto NeXT machines instead of developing directly on PC. This decision had some profound implications on the design of Doom as well as the speed at which they were able to produce it. I knew very little about the NeXTs and I really enjoyed the story of their development.

Detailed descriptions of those two types of personal computer set the scene at the start of the book, before Doom itself is described. The point of this book is to focus on the engine and it is explored sub-system by sub-system. It's fair to say that this is the most detailed description of Doom's engine that exists anywhere outside of its own source code. Despite being very familiar with Doom's engine, having worked on quite a few bits of it, I still learned plenty of new things. Fabien made special modifications to a private copy of Chocolate Doom in order to expose how various phases of the renderer worked. The whole book is full of full colour screenshots and illustrations.

The main section of the book closes with some detailed descriptions of the architectures of various home games console systems of the time to which Doom was ported, as well as describing the fate of that particular version of Doom: some were impressive technical achievements, some were car-crashes.

I'm really looking forward to buying a hard copy of the final book. I would recommend this to anyone has fond memories of that era, or is interested to know more about the low level voodoo that was required to squeeze every ounce of performance possible out of the machines from the time.

Edit: Fabien has now added a "pay what you want" option for the ebook. If the existing retailer prices were putting you off, now you can pay him for his effort at a level you feel is reasonable. The PDF is also guaranteed not to be mangled by Google Books or anyone else.

12 December, 2018 04:50PM

Petter Reinholdtsen

Non-blocking bittorrent plugin for vlc

A few hours ago, a new and improved version (2.4) of the VLC bittorrent plugin was uploaded to Debian. This new version include a complete rewrite of the bittorrent related code, which seem to make the plugin non-blocking. This mean you can actually exit VLC even when the plugin seem to be unable to get the bittorrent streaming started. The new version also include support for filtering playlist by file extension using command line options, if you want to avoid processing audio, video or images. The package is currently in Debian unstable, but should be available in Debian testing in two days. To test it, simply install it like this:

apt install vlc-plugin-bittorrent

After it is installed, you can try to use it to play a file downloaded live via bittorrent like this:


As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

12 December, 2018 06:20AM

hackergotchi for Matthew Palmer

Matthew Palmer

Falsehoods Programmers Believe About Pagination

The world needs it, so I may as well write it.

  • The number of items on a page is fixed for all time.
  • The number of items on a page is fixed for one user.
  • The number of items on a page is fixed for one result set.
  • The pages are only browsed in one direction.
  • No item will be added to the result set during retrieval.
  • No item will be removed from the result set during retrieval.
  • Item sort order is stable.
  • Only one page of results will be retrieved at one time.
  • Pages will be retrieved in order.
  • Pages will be retrieved in a timely manner.

12 December, 2018 12:00AM by Matt Palmer (

December 11, 2018

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

Montreal Bug Squashing Party - Jan 19th & 20th 2019

We are organising a BSP in Montréal in January! Unlike the one we organised for the Stretch release, this one will be over a whole weekend so hopefully folks from other provinces in Canada and from the USA can come.

So yeah, come and squash bugs with us! Montreal in January can be cold, but it's usually snowy and beautiful too.

A picture of Montréal during the winter

As always, the Debian Project is willing to reimburse 100 USD (or equivalent) of expenses to attend Bug Squashing Parties. If you can find a cheap flight or want to car pool with other people that are interested, going to Montréal for a weekend doesn't sound that bad, eh?

When: January 19th and 20th 2019

Where: Montréal, Eastern Bloc

Why: to squash bugs!

11 December, 2018 11:15PM by Louis-Philippe Véronneau

Reproducible builds folks

Reproducible Builds: Weekly report #189

Here’s what happened in the Reproducible Builds effort between Sunday December 2 and Saturday December 8 2018:

Packages reviewed and fixed, and bugs filed

Test framework development

There were a number of updates to our Jenkins-based testing framework that powers this week, including:

  • Chris Lamb:
    • Re-add support for calculating a PureOS package set. (MR: 115)
    • Support arbitrary package filters when generating deb822 output. (MR: 22)
    • Add missing DBDJSON_PATH import. (MR: 21)
    • Correct Tails’ build manifest URL. (MR: 20)
  • Holger Levsen:
    • Ignore disk full false-positives building the GNU C Library. []
    • Various node maintenance. (eg. [], [], etc.)
    • Exclusively use the database to track blacklisted packages in Arch Linux. []

This week’s edition was written by Bernhard M. Wiedemann, Chris Lamb, Muz & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

11 December, 2018 04:01PM

hackergotchi for Bits from Debian

Bits from Debian

Debian Cloud Sprint 2018

The Debian Cloud team held a sprint for the third time, hosted by Amazon at its Seattle offices from October 8th to October 10th, 2018.

We discussed the status of images on various platforms, especially in light of moving to FAI as the only method for building images on all the cloud platforms. The next topic was building and testing workflows, including the use of Debian machines for building, testing, storing, and publishing built images. This was partially caused by the move of all repositories to Salsa, which allows for better management of code changes, especially reviewing new code.

Recently we have made progress supporting cloud usage cases; grub and kernel optimised for cloud images help with reducing boot time and required memory footprint. There is also growing interest in non-x86 images, and FAI can now build such images.

Discussion of support for LTS images, which started at the sprint, has now moved to the debian-cloud mailing list). We also discussed providing many image variants, which requires a more advanced and automated workflow, especially regarding testing. Further discussion touched upon providing newer kernels and software like cloud-init from backports. As interest in using secure boot is increasing, we might cooperate with other team and use work on UEFI to provide images signed boot loader and kernel.

Another topic of discussion was the management of accounts used by Debian to build and publish Debian images. SPI will create and manage such accounts for Debian, including user accounts (synchronised with Debian accounts). Buster images should be published using those new accounts. Our Cloud Team delegation proposal (prepared by Luca Fillipozzi) was accepted by the Debian Project Leader. Sprint minutes are available, including a summary and a list of action items for individual members.

Group photo of the participants in the Cloud Team Sprint

11 December, 2018 11:30AM by Tomasz Rybak

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RQuantLib 0.4.7: Now with corrected Windows library

A new version 0.4.7 of RQuantLib reached CRAN and Debian. Following up on the recent 0.4.6 release post which contained a dual call for help: RQuantLib was (is !!) still in need of a macOS library build, but also experienced issues on Windows.

Since then we set up a new (open) mailing list for RQuantLib and, I am happy to report, sorted that Windows issue out! In short, with the older g++ 4.9.3 imposed for R via Rtools, we must add an explicit C++11 flag at configuration time. Special thanks to Josh Ulrich for tireless and excellent help with testing these configurations, and to everybody else on the list!

QuantLib is a very comprehensice free/open-source library for quantitative finance, and RQuantLib connects it to the R environment and language.

This release re-enable most examples and tests that were disabled when Windows performance was shaky (due to, as we now know, as misconfiguration of ours for the windows binary library used). With the exception of the AffineSwaption example when running Windows i386, everything is back!

The complete set of changes is listed below:

Changes in RQuantLib version 0.4.7 (2018-12-10)

  • Changes in RQuantLib tests:

    • Thanks to the updated #rwinlib/quantlib Windows library provided by Josh, all tests that previously exhibited issues have been re-enabled (Dirk in #126).
  • Changes in RQuantLib documentation:

    • The CallableBonds example now sets an evaluation date (#124).

    • Thanks to the updated #rwinlib/quantlib Windows library provided by Josh, examples that were set to dontrun are re-activated (Dirk in #126). AffineSwaption remains the sole holdout.

  • Changes in RQuantLib build system:

    • The src/ file was updated to reflect the new layout used by the upstream build.

    • The -DBOOST_NO_AUTO_PTR compilation flag is now set.

As stated above, we are still looking for macOS help though. Please get in touch on-list if you can help build a library for Simon’s recipes repo.

Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the new rquantlib-devel mailing list. Issue tickets can be filed at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

11 December, 2018 10:47AM

hackergotchi for Julien Danjou

Julien Danjou

Podcast.__init__: Gnocchi, a Time Series Database for your Metrics

Podcast.__init__: Gnocchi, a Time Series Database for your Metrics

A few weeks ago, Tobias Macey contacted me as he wanted to talk about Gnocchi, the time series database I've been working on for the last few years.

It was a great opportunity to talk about the project, so I jumped on it! We talk about how Gnocchi came to life, how we built its architecture, the challenges we met, what kind of trade-off we made, etc.

You can list to this episode here.

11 December, 2018 09:50AM by Julien Danjou

hackergotchi for Masayuki Hatta

Masayuki Hatta

Good ciphers in OpenJDK 10

Until recently, I didn't know the list of supported Cipher Suites in OpenJDK is widely different between JDK versions. I used getSupportedCipherSuites() on OpenJDK 10 to get the following list, and check the strength of encryption.

My criteria is:

  1. At least 128bit.
  2. No NULL ciphers.
  3. No anonymous auth ciphers.

Then I got the following. The red ones are supposed to be weak.

Name Encryption Mode
TLS_RSA_WITH_AES_256_GCM_SHA384 256bit
TLS_RSA_WITH_AES_128_GCM_SHA256 128bit
TLS_RSA_WITH_AES_256_CBC_SHA256 256bit
TLS_RSA_WITH_AES_128_CBC_SHA256 128bit
TLS_DH_anon_WITH_AES_256_GCM_SHA384 256bit anon
TLS_DH_anon_WITH_AES_128_GCM_SHA256 128bit anon
TLS_DH_anon_WITH_AES_256_CBC_SHA256 256bit anon
TLS_ECDH_anon_WITH_AES_256_CBC_SHA 256bit anon
TLS_DH_anon_WITH_AES_256_CBC_SHA 256bit anon
TLS_DH_anon_WITH_AES_128_CBC_SHA256 128bit anon
TLS_ECDH_anon_WITH_AES_128_CBC_SHA 128bit anon
TLS_DH_anon_WITH_AES_128_CBC_SHA 128bit anon
SSL_DH_anon_WITH_DES_CBC_SHA 56bit anon
TLS_RSA_WITH_NULL_SHA256 0bit null
TLS_ECDH_anon_WITH_NULL_SHA 0bit null

11 December, 2018 08:36AM by Masayuki Hatta

Good ciphers in OpenJDK

Until recently, I didn't know the list of supported Cipher Suites in OpenJDK is widely different between JDK versions. I used getSupportedCipherSuites() on OpenJDK 10 to get the following list, and check the strength of encryption.

Name Encryption Mode
TLS_RSA_WITH_AES_256_GCM_SHA384 256bit
TLS_RSA_WITH_AES_128_GCM_SHA256 128bit
TLS_RSA_WITH_AES_256_CBC_SHA256 256bit
TLS_RSA_WITH_AES_128_CBC_SHA256 128bit
TLS_DH_anon_WITH_AES_256_GCM_SHA384 256bit anon
TLS_DH_anon_WITH_AES_128_GCM_SHA256 128bit anon
TLS_DH_anon_WITH_AES_256_CBC_SHA256 256bit anon
TLS_ECDH_anon_WITH_AES_256_CBC_SHA 256bit anon
TLS_DH_anon_WITH_AES_256_CBC_SHA 256bit anon
TLS_DH_anon_WITH_AES_128_CBC_SHA256 128bit anon
TLS_ECDH_anon_WITH_AES_128_CBC_SHA 128bit anon
TLS_DH_anon_WITH_AES_128_CBC_SHA 128bit anon
SSL_DH_anon_WITH_DES_CBC_SHA 56bit anon
TLS_RSA_WITH_NULL_SHA256 0bit null
TLS_ECDH_anon_WITH_NULL_SHA 0bit null

11 December, 2018 08:36AM by Masayuki Hatta

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

Razer Deathadder Elite Review

After more than 10 years of use and abuse, my old Microsoft IntelliMouse died a few months ago. The right click had been troublesome for a while, but it became so broken I couldn't reliably drag and drop anymore.

It's the first mouse I kill and I don't know if I have to feel proud or troubled by that fact. I guess I'm getting old enough that saying I've used the same mouse for 10 years strait sounds reasonable?

I considered getting a new IntelliMouse, as Microsoft is reviving the brand, but at the price the 3.0 model was selling in August (~70 CAD), better options were available.

Picture of the mouse

After shopping online for a while, I ended up buying the Razer Dethadder Elite. Despite the very gamer oriented branding, I decided to get this one for its size and its build quality. I have very large hands and although I'm more of a "Tip Grip" type of person, I occasionally enjoy a "Palm Grip".

I have been using the mouse for around 3 months now and the only thing I really dislike is its default DPI and RGB settings. To me the DPI buttons were basically useless since anything beyond the lowest level was set too high.

The mouse also has two separate RGB zones for the scroll wheel and the Razer logo and I couldn't care less. As they are annoyingly set to a rainbow-colored shuffle by default, I turned them off.

Although Razer's program to modify mouse settings like DPI levels and RGB colors doesn't support Linux, the mouse is supported by OpenRazer. Settings are stored in the mouse directly, so you can setup OpenRazer in a throwaway VM, get the mouse the way you want and never think about that ever again.

Let's hope this one lasts another 10 years!

11 December, 2018 05:00AM by Louis-Philippe Véronneau

December 09, 2018

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

Awards and citations at computing conferences

I’ve heard a surprising “fact” repeated in the CHI and CSCW communities that receiving a best paper award at a conference is uncorrelated with future citations. Although it’s surprising and counterintuitive, it’s a nice thing to think about when you don’t get an award and its a nice thing to say to others when you do. I’ve thought it and said it myself.

It also seems to be untrue. When I tried to check the “fact” recently, I found a body of evidence that suggests that computing papers that receive best paper awards are, in fact, cited more often than papers that do not.

The source of the original “fact” seems to be a CHI 2009 study by Christoph Bartneck and Jun Hu titled “Scientometric Analysis of the CHI Proceedings.” Among many other things, the paper presents a null result for a test of a difference in the distribution of citations across best papers awardees, nominees, and a random sample of non-nominees.

Although the award analysis is only a small part of Bartneck and Hu’s paper, there have been at least two papers have have subsequently brought more attention, more data, and more sophisticated analyses to the question.  In 2015, the question was asked by Jaques Wainer, Michael Eckmann, and Anderson Rocha in their paper “Peer-Selected ‘Best Papers’—Are They Really That ‘Good’?

Wainer et al. build two datasets: one of papers from 12 computer science conferences with citation data from Scopus and another papers from 17 different conferences with citation data from Google Scholar. Because of parametric concerns, Wainer et al. used a non-parametric rank-based technique to compare awardees to non-awardees.  Wainer et al. summarize their results as follows:

The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited.

The question was also recently explored in a different way by Danielle H. Lee in her paper on “Predictive power of conference‐related factors on citation rates of conference papers” published in June 2018.

Lee looked at 43,000 papers from 81 conferences and built a regression model to predict citations. Taking into an account a number of controls not considered in previous analyses, Lee finds that the marginal effect of receiving a best paper award on citations is positive, well-estimated, and large.

Why did Bartneck and Hu come to such a different conclusions than later work?

Distribution of citations (received by 2009) of CHI papers published between 2004-2007 that were nominated for a best paper award (n=64), received one (n=12), or were part of a random sample of papers that did not (n=76).

My first thought was that perhaps CHI is different than the rest of computing. However, when I looked at the data from Bartneck and Hu’s 2009 study—conveniently included as a figure in their original study—you can see that they did find a higher mean among the award recipients compared to both nominees and non-nominees. The entire distribution of citations among award winners appears to be pushed upwards. Although Bartneck and Hu found an effect, they did not find a statistically significant effect.

Given the more recent work by Wainer et al. and Lee, I’d be willing to venture that the original null finding was a function of the fact that citations is a very noisy measure—especially over a 2-5 post-publication period—and that the Bartneck and Hu dataset was small with only 12 awardees out of 152 papers total. This might have caused problems because the statistical test the authors used was an omnibus test for differences in a three-group sample that was imbalanced heavily toward the two groups (nominees and non-nominees) in which their appears to be little difference. My bet is that the paper’s conclusions on awards is simply an example of how a null effect is not evidence of a non-effect—especially in an underpowered dataset.

Of course, none of this means that award winning papers are better. Despite Wainer et al.’s claim that they are showing that award winning papers are “good,” none of the analyses presented can disentangle the signalling value of an award from differences in underlying paper quality. The packed rooms one routinely finds at best paper sessions at conferences suggest that at least some additional citations received by award winners might be caused by extra exposure caused by the awards themselves. In the future, perhaps people can say something along these lines instead of repeating the “fact” of the non-relationship.

09 December, 2018 08:20PM by Benjamin Mako Hill

Iustin Pop

Looking for a better backup solution


After my last post, didn’t feel like writing for a while. But now I found a good subject: backups. Ah, backups…

I’ve run my current home-grown backup solution for a long time. Git history says at least since mid-2007 (so 11½ years), but the scripts didn’t start in Git, so 12 years is a fair assessment.

It’s a simple solution, based on incremental filesystem dumps, and back to level 0 periodically. I did use my backups to recover files (around once per year, I think), so it works, but it’s clunky. The biggest deficiencies are:

  • I don’t have enough space to backup everything I want to backup, if I want long-term history (since the full dumps every N units of time are costly).
  • Since the dump utility I use is limited to 9 levels, it also creates a limitation on how often I can make backups, which leads to too coarse backup granularity (and large at-risk intervals).
  • Since the dump is incremental, one needs to restore the correct archives in the correct order to get to the file… urgh!

Clearly I’m using technology from the stone-age, so this week I took a look at what’s available to replace my home-grown stuff.

But let’s make it clear first: I’m not interested in cloud-based backups as main solution. They might suit as an N+M (where M > 2) option, but not as primary/only solution. Plus, where’s the fun in delegating the technical stuff to others?

Various options


The first thing I looked at, because it was on the back of my mind for a while, was rsnapshot. Its simplicity is very appealing, as well as its nice file-based deduplication, but a quick look at the current situation is not very encouraging:

  • it seems half-orphaned; not a very dire situation, but the fact that despite much discussion on that bug, it didn’t get a clear clojure; activity is small, the last official release was in 2015 and only a few commits since then;
  • low activity wouldn’t be a problem, but there are quite a few bugs filled that point to potential data loss, for example issue 141: “At certain conditions rsnapshot removes old backups without make new ones”;

Looking especially at the mentioned issue 141 made realise that the use of relative (e.g. hourly.N, etc.) timestamps is what leads to fragility in the script. Ideally the actual directories would be absolute-timestamp-based (e.g. 2018-12-09T15:45:44), and there would be just helpful symlinks (hourly.0) to these. Sure, there is the “sync_first” mode which seems safer, but it still doesn’t guarantee the correct transition since the various rotate calls are independent from each other and from the sync action itself.

Speaking of the rotate calls, the whole cron story (“create a cron entry for each cycle, and make sure to run the greater period ones before the lower periods”) points to more issues regarding the architecture of the rotation.

The conclusion was that at best, this would be a small improvement on my current solution. And since rsnapshot itself is a 4K LOC Perl script, I’m unlikely to contribute significantly to it; also, my desired changes would change the use of it significantly.

So, if this doesn’t work, what about other solutions?

borg backup

A tool very highly spoken of in the DIY/self-hosting backups is borgbackup. A quick look at it shows many advantages over rsnapshot:

  • space efficient storage, due to chunk-based (variable chunk? not entirely clear what’s the criteria for chunk length) deduplication, even across source filesystems/source machine/etc.
  • data encryption, yay!
  • customisable compression

It also can do off-site backups, of course, also requiring SSH access; and if the tool is also installed remotely, it’s much more efficient.

Something not clearly spoken about in the readme is the “correct” (IMHO) handling of repository maintenance: since archives are time-based and not relative, you declare pruning much more logically, along the lines of “keep only N backups older than T”. And it’s pruning, not rotation, which is very good.

Add on top the better handling of multiple filesystems/areas to be backed up, all in a single repository, and at first glance everything looks good. But a bit deeper look make me worried about a few things.

Reliability: On one hand, the archives are mountable. Which seems fancy. But it also means that without the tool working, and the metadata in good shape, you can’t access the data. A quick look at the design shows significant complexity, which means likely bugs, in the whole archive/database/chunk handling. If this would be the only way to get space-efficient compression, all would be good, but if you’re willing to give up encryption (at least for local backups this can be an acceptable trade-off), then rsnapshot plus a tool like duperemove which can do block-based deduplication (yes, it will kill performance on HDDs) seems a much simpler way to get the same result. And without the entire overhead of “your repository consists of opaque blobs” potential problem.

Of course, having significant internal state, there are tools to support this, like borg check and borg recreate, but the existence of these tools in itself confirms to me that there’s an inherent risk in such a design. A rsnapshot directory can be deleted, but it’s hard to get it corrupted.

Speaking of mounting archives, it also means that getting to your files a few hours ago is not as trivial as in rsnapshot’s case, which is simply cp /snapshots/hourly.3/desired/path/file ., without mounting, needing to come up with the right permissions to allow unprivileged users to do it, etc.

Security: The promise of isolating clients from bad servers and viceversa is good indeed. But it also has a few issues, out of which for my use case most important is the following: in order to allow clients to only push new archives, but not delete/break old ones (i.e. append-only mode), one can set a per-connection (via SSH keys forced command args) append only mode: you just need to set --append-only for that client. It gives a nice example, but it ends up with:

As data is only appended, and nothing removed, commands like prune or delete won’t free disk space, they merely tag data as deleted in a new transaction.

Be aware that as soon as you write to the repo in non-append-only mode (e.g. prune, delete or create archives from an admin machine), it will remove the deleted objects permanently (including the ones that were already marked as deleted, but not removed, in append-only mode).

So basically, the append only mode is not “reject other actions” (and ideally alert on this), but rather “postpone modifications until later”, which makes it IMHO useless.

Conclusion: borg backup is useful if you want a relatively hands-off, works well solution, but it has corner cases that kind of nullify its space savings advantage, depending on your trade-offs. So, not for me.

What would my ideal solution be?

After thinking on it, these are the important trade-offs:

  1. File or block/chunk-based deduplication? Given native (file-system-level) block-based deduplication, a “native” (in the backup tool) seems preferred for local backups; for remote backups, of course it’s different, but then deduplication with encryption is its own story
  2. File storage: native (1:1) or bundled (and needs extraction step); I personally would take native again, just to ensure I can get access to the files without the tool/its internal state to be needed to work
  3. Per-file-system or global repository: ideally global, so that different file-systems don’t require separate handling/integration.

This leans more towards a rsnapshot-like solution… And then there are additional bonus points (in random order):

  • facilitating secure periodic snapshots to offline media
  • facilitating secure remote backups on dumb storage (not over SSH!) so that cloud-based backups can be used if desired
  • native support for redundancy in terms of Reed-Solomon error correction so that small blocks being lost don’t risk losing an entire file
  • ideally good customisation for the retention policy
  • ideally good exclusion rules (i.e. needing to add manually /home/*/.mozilla/cache is not “good”)

That’s a nice list, and from my search, I don’t think there is something like that.

Which makes me worried that I’ll start another project I won’t have time to properly maintain…

Next steps

Well, at least the next step is to get bigger harddrives for my current backup solution ☺ I’m impressed by the ~64K hours (7+ years) Power_On_Hours of my current HDDs, and it makes me feel good about choosing right hardware way back, but I can buy now 5× or more bigger hard-drives, which will allow more retention and more experiments. I was hoping I can retire my HDDs completely and switch to SSDs only, but that’s still too expensive, and nothing can beat the density and price of 10TB+ HDDs…

Comments and suggestions are very welcome! In the meantime, I’m shopping for hardware :-P

09 December, 2018 04:39PM

Petter Reinholdtsen

Why is your site not using Content Security Policy / CSP?

Yesterday, I had the pleasure of watching on Frikanalen the OWASP talk by Scott Helme titled "What We’ve Learned From Billions of Security Reports". I had not heard of the Content Security Policy standard nor its ability to "call home" when a browser detect a policy breach (I do not follow web page design development much these days), and found the talk very illuminating.

The mechanism allow a web site owner to use HTTP headers to tell visitors web browser which sources (internal and external) are allowed to be used on the web site. Thus it become possible to enforce a "only local content" policy despite web designers urge to fetch programs from random sites on the Internet, like the one enabling the attack reported by Scott Helme earlier this year.

Using CSP seem like an obvious thing for a site admin to implement to take some control over the information leak that occur when external sources are used to render web pages, it is a mystery more sites are not using CSP? It is being standardized under W3C these days, and is supposed by most web browsers

I managed to find a Django middleware for implementing CSP and was happy to discover it was already in Debian. I plan to use it to add CSP support to the Frikanalen web site soon.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

09 December, 2018 02:00PM

Andreas Metzler

Please test GnuTLS 3.6 in experimental

GnuTLS 3.6.x has been marked stable with release 3.6.5. Binary packages are available in experimental. - Please test! FWIW I have rebuilt all reverse build-dependencies without finding GnuTLS-triggered build errors.

GnuTLS 3.6.x is ABI compatible with earlier versions. However the addition of TLS 1.3 support might require source changes. Due to differences in the TLS 1.3 handshake GNUTLS_E_AGAIN can be returned for blocking sockets.

09 December, 2018 01:16PM by Andreas Metzler

December 08, 2018

Vishal Gupta

Using LSTMs to join words (Portmanteaus): Part 2

Part 1 - Training a Name-Generating LSTM

Part 2 : Joining Names using our LSTM

Now that we have an LSTM to generate names, we can use it to bridge two words. Since the model can predict likeliness of 27 characters succeeding a given sequence of letters, we can use it find the bridge between two words. We define a bridge as :

  • Let m = Length of the left word, L
  • Let n = Length of the right word, R
  • Then Bridge(L,R) = (i,j), i <= m & j <= n
  • Where i is the end index of the left word in the portmanteau and j is the start index of the right word in the portmanteau.
  • Hence… Join(L,R) = L[:i] + R[j:]
  • For all combinations of (i,j) compute a score by summing the probabilities obtained from the char LSTM.


import pandas as pd
import numpy as np
import random

Loading our model

from keras.models import model_from_json

json_file = open('model_keras.json', 'r')
loaded_model_json =
model = model_from_json(loaded_model_json)
print("Loaded model from disk")

Alternatively, you can also download the weight files

wget -nv
wget wget -nv

Helper functions

  • sample_preds : Probabilities of 27 characters (A-Z + \n) to follow given sequence
  • ohmygauss:Function to return decreasing gaussian sequence (right half of bell curve)
from scipy.stats import norm
def sample_preds(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    return preds

def ohmygauss(length, sigma=1.8):
    rv = norm(loc=0, scale=sigma)
    x = np.arange(length)
    return rv.pdf(x)

Getting bridge scores

  1. Iterate over all sequences of 3 (SEQLEN) characters in the left word. (MINLEFT -> n)
    1. Iterate over all sequences of 3 (COMPARE) characters in the right word. (0 -> MINRIGHT)
      1. Get probability that given character in right word will follow sequence from word
      2. Repeat for COMPARE sequences.</br> For example : to bridge britain and exit at _br+exit,
        Score : prob(e|"_br")*w1 + prob(x|"bre")*w2 + prob(i|"rex")*w3
      3. Multiply Gaussian factors to score to prioritize words that are bridges towards the beginning of the right word
LEFT_BIAS = [0.06, 0.05, 0.04]

def proc(left, right, verbose=False):
    best_matches = {}
    best_i = None
    best_i_score = -1
    for i in range(0, len(left) - MINLEFT + 1):
        # Searching all sequences of size COMPARE in the right word
        # to find best match
        best_j = None
        best_j_score = -1
        best_matches[i] = {}
        right_bound = len(right) - MINRIGHT + 1
        gaus_factors = ohmygauss(right_bound)
        for j in range(0, right_bound):
            right_chars = right[j:j + COMPARE]
            s = 0
            for x in range(COMPARE):
                # Character on right which is being sampled
                c_index = char_indices[right_chars[x]]
                if verbose:
                    print ("Sampling " + left[i + x:i + SEQLEN] +
                           right[j:j + x] + "-->" + right_chars[x])

                # Generating sequence and getting probability
                Xoh = one_hot(left[i + x:i + SEQLEN] + right[j:j + x],char_indices)
                preds = model.predict(Xoh, verbose=0)[0]
                pred_probs = sample_preds(preds, 0.7)

                # Getting corresponding character in left word
                left_char = np.zeros((1, len(char_indices)))
                    left_char[0, char_indices[left[i + SEQLEN + x]]] = 1
                except IndexError:
                # Adding some bias to left_char and adding it to predicted probs
                biased_probs = LEFT_BIAS[x] * left_char + \
                    (1 - LEFT_BIAS[x]) * pred_probs

                # Adding probability of bridging at c_index to s
                s += biased_probs[0, c_index]

            # Prioritizing words that start with the first few letters of the right word
            s = s * gaus_factors[j]

            if verbose:
                print (i, j, s,)
            best_matches[i][j] = s
            if s > best_j_score:
                best_j = j
                best_j_score = s
#         best_matches[i] = {'index': best_j, 'score': best_j_score}
        if best_j_score > best_i_score and i < len(left) - MINLEFT:
            best_i_score = best_j_score
            best_i = i

    return best_matches, best_i

Picking the best portmanteaus

  • Maximize smoothness of the bridge (derived from proc using the LSTM model)
  • Minimize length of portmanteau
  • Maximize fraction of each word in portmanteau

def join(left, right, verbose=False,dict_result=False,n=3):
    left = '\n' + left
    right = right + '\n'
    matches, i = proc(left, right, verbose)
    probs = {}
    for i_temp in matches:
        for j_temp in matches[i_temp]:
            word = (left[:i_temp + SEQLEN] + right[j_temp:]).replace('\n', '').title()
            num_letters = len(word)
            if verbose :
                print (word, num_letters,(1 / float(num_letters)) * 0.5)
            probs[word] = probs.get(word,0)+round(matches[i_temp][j_temp],4) + (1 / float(num_letters) * PHONEME_WT)
            probs[word] *= (min((i_temp+1)/min(len(left),8),1.0) + min((len(right) - j_temp - 1)/min(len(right),8),1.0))
    if dict_result:
        return probs
        ser = pd.Series(probs).sort_values()[::-1][:n]
        ports = ser.index.tolist()
        port_vals = [i+'('+str(round(ser[i],3))+')' for i in ports]
        print (left,'+',right,' = ',port_vals)

Generating common portmanteaus

word_pairs =  [('britain','exit'),('biology', 'electronic'), ('affluence', 'influenza'), ('brad', 'angelina'),
               ('brother', 'romance'), ('breakfast', 'lunch'), ('chill', 'relax'), ('emotion', 'icon'),('feminist', 'nazi')]

for p in word_pairs:
britain + exit
=  ['Britainexit(0.71)', 'Brexit(0.705)', 'Briexit(0.69)']

biology + electronic
=  ['Biolectronic(1.23)', 'Biolonic(0.821)', 'Bionic(0.677)']

affluence + influenza
=  ['Affluenza(2.722)', 'Affluenfluenza(1.261)', 'Affluencenza(1.093)']

brad + angelina
=  ['Brangelina(1.626)', 'Braangelina(0.637)', 'Bradangelina(0.635)']

brother + romance
=  ['Brotheromance(1.493)', 'Bromance(0.963)', 'Brothermance(0.625)']

breakfast + lunch
=  ['Breaunch(0.657)', 'Breakfasunch(0.59)', 'Breakfalunch(0.588)']

chill + relax
=  ['Chillax(1.224)', 'Chilax(1.048)', 'Chillelax(0.699)']

emotion + icon
=  ['Emoticon(1.331)', 'Emotion(0.69)', 'Emicon(0.667)']

feminist + nazi
=  ['Feminazi(1.418)', 'Femazi(0.738)', 'Feministazi(0.678)']

Generating Pokemon names!

pokemon_pairs = [('char','lizard'), ('venus', 'dinosaur'), ('blast', 'tortoise'), ('pikapika', 'chu')]        
for p in pokemon_pairs:
char + lizard
=  ['Chard(0.928)', 'Charizard(0.764)', 'Charlizard(0.698)']

venus + dinosaur
=  ['Venusaur(1.051)', 'Venosaur(0.945)', 'Venusdinosaur(0.661)']

blast + tortoise
=  ['Blastortoise(1.46)', 'Blastoise(1.121)', 'Blasttortoise(0.627)']

pikapika + chu
=  ['Pikachu(0.728)', 'Pikapikachu(0.714)', 'Pichu(0.711)']

Try it yourself!

Wanna give it a spin ? Run it on Colab or download the notebook and run it locally.

08 December, 2018 11:22PM by Vishal Gupta

Jelmer Vernooij

Lintian Brush

With Debian packages now widely being maintained in Git repositories, there has been an uptick in the number of bulk changes made to Debian packages. Several maintainers are running commands over many packages (e.g. all packages owned by a specific team) to fix common issues in packages.

Examples of changes being made include:

  • Updating the Vcs-Git and Vcs-Browser URLs after migrating from alioth to salsa
  • Stripping trailing whitespace in various control files
  • Updating e.g. homepage URLs to use https rather than http

Most of these can be fixed with simple sed or perl one-liners.

Some of these scripts are publically available, for example:


Lintian-Brush is both a simple wrapper around a set of these kinds of scripts and a repository for these scripts, with the goal of making it easy for any Debian maintainer to run them.

The lintian-brush command-line tool is a simple wrapper that runs a set of "fixer scripts", and for each:

  • Reverts the changes made by the script if it failed with an error
  • Commits the changes to the VCS with an appropriate commit message
  • Adds a changelog entry (if desired)

The tool also provides some basic infrastructure for testing that these scripts do what they should, and e.g. don't have unintended side-effects.

The idea is that it should be safe, quick and unobtrusive to run lintian-brush, and get it to opportunistically fix lintian issues and to leave the source tree alone when it can't.


For example, running lintian-brush on the package talloc fixes two minor lintian issues:

% debcheckout talloc
declared git repository at
git clone talloc ...
Cloning into 'talloc'...
remote: Enumerating objects: 2702, done.
remote: Counting objects: 100% (2702/2702), done.
remote: Compressing objects: 100% (996/996), done.
remote: Total 2702 (delta 1627), reused 2601 (delta 1550)
Receiving objects: 100% (2702/2702), 1.70 MiB | 565.00 KiB/s, done.
Resolving deltas: 100% (1627/1627), done.
% cd talloc
talloc% lintian-brush
Lintian tags fixed: {'insecure-copyright-format-uri', 'public-upstream-key-not-minimal'}
% git log
commit 0ea35f4bb76f6bca3132a9506189ef7531e5c680 (HEAD -> master)
Author: Jelmer Vernooij <>
Date:   Tue Dec 4 16:42:35 2018 +0000

    Re-export upstream signing key without extra signatures.

    Fixes lintian: public-upstream-key-not-minimal
    See for more details.

 debian/changelog                |   1 +
 debian/upstream/signing-key.asc | 102 +++++++++++++++---------------------------------------------------------------------------------------
 2 files changed, 16 insertions(+), 87 deletions(-)

commit feebce3147df561aa51a385c53d8759b4520c67f
Author: Jelmer Vernooij <>
Date:   Tue Dec 4 16:42:28 2018 +0000

    Use secure copyright file specification URI.

    Fixes lintian: insecure-copyright-format-uri
    See for more details.

 debian/changelog | 3 +++
 debian/copyright | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Script Interface

A fixer script is run in the root directory of a package, where it can make changes it deems necessary, and write a summary of what it's done for the changelog (and commit message) to standard out.

If a fixer can not provide any improvements, it can simply leave the working tree untouched - lintian-brush will not create any commits for it or update the changelog. If it exits with a non-zero exit code, then it is assumed that it failed to run and it will be listed as such and its changes reset rather than committed.

In addition, tests can be added for fixers by providing various before and after source package trees, to verify that a fixer script makes the expected changes.

For more details, see the documentation on writing new fixers.


lintian-brush is currently available in unstable and testing. See man lintian-brush(1) for an explanation of the command-line options.

Fixer scripts are included that can fix (some of the instances of) 34 lintian tags.

Feedback would be great if you try lintian-brush - please file bugs in the BTS, or propose pull requests with new fixers on salsa.

08 December, 2018 11:04PM by Jelmer Vernooij

Vishal Gupta

Using LSTMs to join words (Portmanteaus): Part 1

Often, I find myself making portmateaus from verbs, names, adjectives and pretty much any word I think too much about. Sometimes to shrink phrases, and sometimes to name a product or app; occasionally, to ship couples. And as someone who loves tinkering with AI, I wondered if it was possible to write an algorithm to do it… and here we are. The first part, this blog is about training a model that can generate artificial names with a character-level LSTM.

If you’re new to LSTMs, RNNs, or sequential models, here are a few resources that can help you learn and get started:

Part 1 : Training a Name-Generating LSTM

  • First, we need to train an LSTM on a large dataset of names, so it can generate artificial names by predicting the nth character given (n-1) characters of a name.
  • In the image on the left, we have a character-level RNN with accepts and predicts 1 of 4 characters ('h','e','l' and 'o').
  • Hence it has 4-dimensional input and output layers, and a hidden layer of 3 units (neurons).
  • The output layer contains confidences the RNN assigns for the next character (vocabulary is "h,e,l,o")
  • We want the green numbers to be high and red numbers to be low (in the output layer).

Image Credits : Andrej Karpathy


Importing pandas, numpy, random and sys

import pandas as pd
import numpy as np
import random
import sys

Downloading the Dataset

Baby Names from Social Security Card Applications - National Level Data


Loading and pre-processing data.

SEQLEN = 3 # No. of chars our LSTM uses to predict the next character
STEP = 1 # No. of letters to skip between two samples

Hence, the name PETER is used to generate the following samples :

X1 X2 X3 Y
- - P E
- P E T
T E R -

We need to do this for all names in our dataset.

  • Load names from NationalNames.csv
  • Eliminate names shorter than 4 chars and having frequency less than 3
  • Join (seperating) names with \n
def get_data():
    df = pd.read_csv('data/Names/NationalNames.csv')
    names = list(df[(df['Count'] > 3) & (df['Name'].str.len() > 4)]['Name'].unique())
    text = '\n' + '\n\n'.join(names).lower() + '\n'
    chars = sorted(list(set(text)))

    print ("Loaded",len(names),"names with",len(chars),"characters.")
    # Loaded 87659 names with 27 characters.
    return text,chars
  • Split text into sequences of 3 characters (X) and adding next character to next_chars (y)
def get_seq(args):
    text = args[0]
    sequences = []
    next_chars = []
    for i in range(0, len(text) - SEQLEN, STEP):
        sequences.append(text[i: i + SEQLEN])
        next_chars.append(text[i + SEQLEN])
    print('No. of sequences:', len(sequences))
    print('No. of chars:', len(next_chars))
    return sequences,next_chars,args[1]

  • One-Hot Encoding characters in sequences and next_chars
# This function encodes a given word into a numpy array by 1-hot encoding the characters
def one_hot(word,char_indices):
    x_pred = np.zeros((1, SEQLEN, 27))

    for t, char in enumerate(word):
        x_pred[0, t, char_indices[char]] = 1.
    return x_pred
# Encoding all sequences
def get_vectors(args):
    sequences,next_chars,chars = args
    char_indices = dict((c, i) for i, c in enumerate(chars))
    indices_char = dict((i, c) for i, c in enumerate(chars))
    X = np.zeros((len(sequences), SEQLEN, len(chars)), dtype=np.bool)
    y = np.zeros((len(sequences), len(chars)), dtype=np.bool)
    for i, sentence in enumerate(sequences):
        X[i] = one_hot(sentence,char_indices)
        y[i, char_indices[next_chars[i]]] = 1
    print ("Shape of X (sequences):",X.shape)
    print ("Shape of y (next_chars):",y.shape)
    # Shape of X (sequences): (764939, 3, 27)
    # Shape of y (next_chars): (764939, 27)
    return X,y,char_indices, indices_char

Creating the LSTM Model

  • We’re creating a simple LSTM model that takes in a sequence of size SEQLEN, each element of len(chars) numbers (1 or 0)
  • The output of the LSTM goes into a Dense layer that predicts the next character with a softmaxed one-hot encoding
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

def get_model(num_chars):
    model = Sequential()
    model.add(LSTM(16, input_shape=(SEQLEN, num_chars)))

    optimizer = RMSprop(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    return model

Sampling with our model

  • Picking the element with the greatest probability will always return the same character for a given sequence
  • I’d like to induce some variance by sampling from a probability array instead.

To explain this better, here’s an excerpt from Andrej Karpathy’s blog about CharRNNs :

Temperature. We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, etc).

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Training the LSTM on our dataset of Names

  • Finally, we use the functions defined above to fetch data and train the model
  • I trained for 30 epochs since the loss almost seemed to stagnate after that
  • This depends on the dataset that’s being used and complexity of the sequences.
X,y,char_indices, indices_char = get_vectors(get_seq(get_data()))
model = get_model(len(char_indices.keys())), y,

# Saving the model
model_json = model.to_json()
with open("model_keras.json", "w") as json_file:

Testing our model

def gen_name(seed):
    generated = seed
    for i in range(10):
        x_pred = np.zeros((1, SEQLEN, 27))
        for t, char in enumerate(seed):
            x_pred[0, t, char_indices[char]] = 1.
        preds = model.predict(x_pred, verbose=0)[0]
        next_char = indices_char[sample(preds,0.5)]
        if next_char == '\n':break
        generated += next_char
        seed = seed[1:] + next_char

    return generated

  • Generating names from 3-letter seeds
for i in ['mar','ram','seb']:
    print ('Seed: "'+i+'"\tNames :',[gen_name(i) for _ in range(5)])
Seed: "mar"	Names : ['marisa', 'maria', 'marthi', 'marvamarra', 'maria']
Seed: "ram"	Names : ['ramir', 'ramundro', 'ramariis', 'raminyodonami', 'ramariegena']
Seed: "seb"	Names : ['sebeexenn', 'sebrinx', 'seby', 'sebrey', 'seberle']  

Great! We have a model that can generate fake names. Now all you need is a fake address and an empty passport. Jk.

In the next blog, I’ll explain how you can use this model to join two words by finding the best bridge.

08 December, 2018 06:55PM by Vishal Gupta

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

It was twenty years ago …

… this week that I made a first cameo in the debian/changelog for the Debian R package:

r-base (0.63.1-1) unstable; urgency=low

  • New upstream release
  • Linked html directory to /usr/doc/r-base/doc/html (Dirk Eddelbuettel)

– Douglas Bates Fri, 4 Dec 1998 14:22:19 -0600

For the next few years I assisted Doug here and there, and then formally took over in late 2001.

It’s been a really good and rewarding experience, and I hope to be able to help with this for a few more years to come.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

08 December, 2018 05:07PM

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

I'm going to FOSDEM 2019 (and to a DebConf video sprint)

The DebConf video team tries to organise one or two sprints a year outside of DebConf to fix problems and improve our recording and streaming setup. It helps us a lot, since hacking on camera-related stuff without actual cameras can sometimes be problematic.

This year, our first sprint will take place a week before FOSDEM 2019. If you have some spare time and are familiar with Ansible or simply want to learn more about our setup, feel free to drop by!

A clip from Final Space, s/adventures/beers/g

I'm of course planning to stay a few days after the sprint and go to FOSDEM. It will be my first time there and I can't wait to see how large it is! I'm also pretty siked, as it'll be my first time in Bruxelles too. Let's just say I've heard good things about Bruxelle's craft beer scene and intend to dedicate few afternoons/nights to documenting the largest amount of brews I'll be able to stomach.

Here's a few places I really want to go to, but if you have advices, hit me up! I'm mainly interested in places where they brew their own beer.

  • Nanobrasserie de l'Ermitage1
  • Brussels Beer Project
  • Brasserie No Science
  • En Stoemelings
  • Beerstorming
  • Brasserie de la Senne
  • Brasserie Cantillon
  • Moeder Lambic
  • Délirium Café or Little Delirium Café

I might also have a go at Brew Dog, as I like their beers a lot and their bar in Bruxelles seems pretty nice.

Two cheers for bunches of Free Software and beers!

  1. L'Ermitage has been brewing very nice collabs with some of the finest craft breweries in Quebec (like Dunham!). A must!!! 

08 December, 2018 05:00AM by Louis-Philippe Véronneau

December 07, 2018

hackergotchi for Jonathan Dowland

Jonathan Dowland

I'm moving to the Red Hat OpenJDK team

I'm very excited to announce that I've moved roles within Red Hat: I am now part of the OpenJDK team!

I've been interested in the theory and practise behind compilers, programming language design and the interaction of the two for a long time¹. Before my undergrad I was fascinated by the work of Wouter van Oortmerssen, who built lots of weird and wonderful experimental languages and systems². During my undergrad, dissatisfied with the available choice of topics for my third year, I petitioned the Computer Science Department at Durham University to revive an older module "Programming Language Design & Compiling". I'm eternally grateful to Dr. Paul Callaghan for being prepared to teach it to us³.

I've spent my time within Red Hat so far in "Cloud Enablement". Our mission was to figure out and develop the tools, techniques and best practises for preparing containerized versions of the Middleware product portfolio to run on OpenShift, Red Hat's enterprise container management platform. The team was always meant to be temporary, the end game being the product teams themselves taking responsibility for building the OpenShift containers for their products, which is where we are today. And so, myself and the other team members are dismantling the temporary infrastructure and moving on to other roles.

Within Cloud Enablement, one of my responsibilities was the creation of the Java Applications for OpenShift container image, which is effectively OpenJDK and integration scripts for OpenShift. I am going to continue maintaining and developing this image (or images) within the OpenJDK team.

Longer term I'm looking forward to getting my teeth into some of the technical work within OpenJDK: such as the JVM, architecture ports, garbage collectors or the JIT or AOT compilers within the runtime.

Earlier this year, I put together a private "bucket list" of things I wanted to achieve in the near-ish future. I recently stumbled across it, having not thought about it for a while, and I was pleasantly surprised to see I'd put on "compilers/lang design" as something to revisit. With my move to OpenJDK I can now consider that ticked off.

  1. I'm straying into this area a little bit with my PhD work (graph rewriting, term rewriting, etc.)
  2. one of which, WadC, I took over maintenance of, ten years ago
  3. Paul has recently had some of his writing published in the book Functional Programming: A PragPub Anthology

07 December, 2018 11:23AM

hackergotchi for Olivier Berger

Olivier Berger

Demo of displaying labtainers labs in a Web browser through Guacamole

Here’s a first report on trying to add Guacamole to Labtainers in order to allow running Labtainers in a headless way, without an X display, in containers, and accessing the GUI in a Web browser, through the use of VNC and Guacamole.

We’ve casted a demo of it :

Labtainer + Guacamole demo on Vimeo.

This aims at allowing similar use cases as for the execution of labs on the cloud (“virtual labs”), and uses the same remote display connection mechanism, I’ve already blogged about in the case of CloVER (Guacamole)

To repeat our efforts, we’ve been working, together with Thomas Gonçalves, in order to test the feasability of running Labtainers in a “headless” way, in containers, accessing the display through a Web browser. This corresponds to what I’ve described in

Under the hood, it reuses the approach of using a “master” container running the core of labtainer scripts (see my previous post), which is highly insecure, but helps us test it.

The architecture we’ve tested is as follows :

  • the “labtainer.master” container runs the Python scripts, and starts a VNC server (TigerVNC) which in turn runs bits of an XCFE session. The labs will be started in the X DISPLAY of that session (displaying the gnome terminals and such).
  • the “guacamole.labtainer” container runs Guacamole (using Jetty for the internal Web server), which includes guacd, the proxy that sits between a Web canvas and the VNC server of the other container. Guacamole has been setup to use a minimal set of resources, like no DB, etc.

When you connect your browser to a URL on the guacamole.labtainer container’s IP, you display the labs \o/

The current use case would typically be running these and the labs’ containers on a IaaS VM (which acts as a sandbox for running docker containers in a secure manner), without a full X desktop on that VM, and accessing the VM’s guacamole via HTTPS from outside the Cloud.

Guess what, even thoigh we had only tested that on Linux, it worked with almost no modification on Docker Desktop for Windows too.

You can find the Dockerfiles and scripts at : (and above), and the 2 container images have been put on the DockerHub by Thomas.

We’d welcome some feedback on whether you find this interesting.

07 December, 2018 08:08AM by Olivier Berger

December 06, 2018

hackergotchi for Markus Koschany

Markus Koschany

My Free Software Activities in November 2018

Welcome to Here is my monthly report that covers what I have been doing for Debian. If you’re interested in Java, Games and LTS topics, this might be interesting for you.

Debian Games

  • This month I packaged a new upstream Git snapshot of performous, a karaoke game, because this seemed to be the quickest route to fix a build failure and RC bug (#914061) with Debian’s latest Boost version. We had to overcome some portability issues later (#914667, #914688) and now the only blocker for a migration to testing is GCC-8 itself.
  • I uploaded a new revision of widelands to fix a FTBFS with ICU 63.1 (#913513). The patch was provided by László Böszörményi.
  • I updated the packaging of the following games without making bigger changes, just the normal „grooming“: box2d, brainparty, dangen, flatzebra, jester and etw.
  • The latest upstream release 7.1.3 of renpy, a framework for developing visual-novel type games, is available now.
  • Last but not least I backported teeworlds version 0.7.0, a fun action packed 2D shooter, and its special build system bam to Stretch because the current version 0.6.0 is unable to connect to 0.7.0 servers. Now players should be able to choose between their favorite Teeworld versions.

Debian Java


  • I sponsored another update of android-platform-system-core for Kai-Chung Yan. From now on that should be no longer necessary because he is a Debian Developer now. Congratulations!
  • I packaged a new upstream release of https-everywhere, a very useful Firefox/Chromium addon.

Debian LTS

This was my thirty-third month as a paid contributor and I have been paid to work 30 hours on Debian LTS, a project started by Raphaël Hertzog. In that time I did the following:

  • From 19.11.2018 until 25.11.2018  I was in charge of our LTS frontdesk. I investigated and triaged CVE in jasper, gnome-keyring, keepalived, otrs2, gnuplot, gnuplot5, ncurses, sysstat, php5, uw-imap, eclipse and apktool.
  • DLA-1568-1. Issued a security update for curl fixing 5 CVE.
  • DLA-1583-1. Issued a security update for jasper fixing 5 CVE.
  • DLA-1592-1. Issued a security update for otrs2 fixing 2 CVE.
  • DLA-1593-1. Issued a security update for phpbb3 fixing 1 CVE.
  • DLA-1598-1. Issued a security update for ghostscript fixing 4 CVE.
  • DLA-1600-1. Issued a security update for libarchive fixing 12 CVE.
  • DLA-1603-1. Issued a security update for suricata fixing 4 CVE.
  • I reviewed the openssl update which was later released as DLA 1586-1.
  • I also reviewed and sponsored squid3, icecast2 and keepalived for Abhijith PA.


Extended Long Term Support (ELTS) is a project led by Freexian to further extend the lifetime of Debian releases. It is not an official Debian project but all Debian users benefit from it without cost. The current ELTS release is Debian 7 „Wheezy“. This was my sixth month and I have been paid to work 15  hours on ELTS.

  • I was in charge of our ELTS frontdesk from 19.11.2018 until 25.11.2018 and I triaged CVE in git, sysstat, suricata, libarchive and jasper.
  • ELA-62-1.  Issued a security update for libarchive fixing 3 CVE.
  • ELA-64-1.  Issued a security update for suricata fixing 4 CVE.
  • ELA-65-1.  Issued a security update for jasper fixing 9 CVE.
  • Since upstream development of jasper has slowed down and many bugs remain without a response, I wrote the patches for CVE-2018-18873, CVE-2018-19539 and CVE-2018-19542 myself. I will look into the remaining issues in December.

Thanks for reading and see you next time.

06 December, 2018 11:21PM by Apo

December 05, 2018

Reproducible builds folks

Reproducible Builds: Weekly report #188

Here’s what happened in the Reproducible Builds effort between Sunday November 25 and Saturday December 1 2018:

Patches filed

Test framework development

There were a number of updates to our Jenkins-based testing framework that powers this week, including:

  • Chris Lamb prepared a merge request to generate and serve diffoscope JSON output in addition to the existing HTML and text formats (example output). This required Holger Levsen to increase the partition holding /var/lib/jenkins/userContent/reproducible from 255G to 400G. Thanks to Profitbricks for sponsoring this virtual hardware for more than 6 years now.

  • Holger Levsen and Jelle van der Waa started to add integrate new Arch Linux build nodes, namely and,

  • In addition, Holger Levsen installed the needrestart package everywhere [] updated an interface to always use short hostname [], explained what some nodes were doing [] as well as performed the usual node maintenance ([], [], [], etc.).

  • Jelle van der Waa also fixed a number of issues in the Arch Linux integration including showing the language in the first build [] and setting LANG/LC_ALL in the first build [].

This week’s edition was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

05 December, 2018 07:20AM

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

Banana Peels

Photo comic of seeing a banana peel in the road while on a bike.

Although it’s been decades since I last played, it’s still flashbacks to Super Mario Kart and pangs of irrational fear every time I see a banana peel in the road.

05 December, 2018 04:25AM by Benjamin Mako Hill

hackergotchi for Gunnar Wolf

Gunnar Wolf

New release of the Raspberry Pi 3 *unofficial Debian preview* image

Back in June, Michael Stapelberg asked for somebody interested in adopting the unofficial Debian image for the Raspberry Pi 3 family. It didn't take me long to raise my hand.
What did take me long is to actually do it. I have adopted Raspberry3 image spec repository, with the recipes to build the image using Lars' great vmdb2, as well as the raspi3-firmware non-free Debian package.
After delaying this for too long, first in order to understand it better, and second because of the workload I had this last semester, I think we are ready to announce...

There is a new, updated preview image!

You can look at the instructions at the Debian Wiki page on RaspberryPi3. Or you can just jump to the downloads, at my people.debian.orgxzipped image (388MB, unzips to 1.5GB, and resizes to the capacity of your boot SD at first boot), verification sha256sum, and PGP-signed verification sha256sum.
There are still many things that can be improved, for sure. The main issues for me are:

  • No wireless support. Due to a bug in Linux kenel 4.18, wlan0 support is broken. It is reported, and we expect it to be fixed in the next kernel upload.
  • Hardcoded root password. This will be tackled later on — part of the issue is that I cannot ensure how this computer will be booted. I have some ideas to tackle this, though...

Other than that, what we have is a very minimal Debian system, ready for installing software!
At some point in the future, I plan to add build profiles for some common configurations. But lets go a step at a time.

05 December, 2018 01:35AM by gwolf

December 04, 2018

hackergotchi for Jonathan Dowland

Jonathan Dowland

iPod refresh

Recently I filled up the storage in my iPod and so planned to upgrade it. one. This is a process I've been through several times in the past. My routine used to be to buy the largest capacity SD card that existed at the time (usually twice the capacity of the current one) and spend around £90. Luckily, SD capacity has been growing faster than my music collection. You can buy 400G SD cards today, but I only bought a 200G one, and I only spent around £38.

As I wrote last time, I don't use iTunes: I can move music on and off it from any computer, and I choose music to listen to using a simple file manager. One drawback of this approach is I tend to listen to the same artists over and over, and large swathes of my collection lie forgotten about. The impression I get is that music managers like iTunes have various schemes to help you keep in touch with the rest of your collection, via playlists: "recently added", "stuff you listened to this time last year", or whatever.

As a first step in this direction, I decided it would be useful to build up playlists of recently modified (or added) files. I thought it would be easiest to hook this into my backup solution. In case it's of interest to anyone else, I thought I'd share my solution. The scheme I describe there is used to run a shell script to perform the syncing, which now looks (mostly) like this:

date="$(/bin/date +%Y-%m-%d)"

    grep -v deleting \
        | grep -v '/\._' \
        | grep -E '(m4a|mp3|ogg|wav|flac)$' \
        | tee -a "$plsd/$date.m3u8"

# set the attached blinkstick LED to a colour indicating "work in progress"
# systemd sets it to either red or green once the job is complete
blinkstick --index 1 --limit 10 --set-color 33c280

# sync changes from my iPod onto my NAS; feed the output of files changed
# into "make_playlists"
rsync -va --delete --no-owner --no-group --no-perms \
    --exclude=/.Spotlight-V100 --exclude=/.Trash-1000 \
    --exclude=/.Trashes --exclude=/lost+found /media/ipod/ /music/ \
    | make_playlists

# sync all generated playlists back onto the iPod
rsync -va --no-owner --no-group --no-perms \
    /home/jon/pls/ /media/ipod/playlists/

Time will tell whether this will help.

04 December, 2018 07:46PM

hackergotchi for Daniel Lange

Daniel Lange

Google GMail continues to own the email market, Microsoft is catching up

Back in 2009 I wrote about Google's GMail emerging as the dominant platform for email. It had 46% of all accounts I sampled from American bloggers for the Ph.D. thesis of a friend. Blogging was big back then :-).

Now I wondered how things have changed over the last decade while I was working on another email related job. Having access to a list of 2.3 million email addresses from a rather similar (US-centric) demographic, let's do some math:

Google's GMail has 39% in that (much larger, but still non-scientific and skewed) sample. This is down from 46% in 2009. Microsoft, with its various email domains from Hotmail to has massively caught up from 10% to 35%. This is definitely also due to now focussing more on the strong Microsoft Office brands e.g. for Office 365 and Yahoo, the #2 player back in 2009, is at 18%, still up from the 12% back then.

So Google plus Microsoft command nearly ¾ of all email addresses in that US-centric sample. Adding Yahoo into the equation leaves the accounts covered at >92%. Wow.

Email has essentially centralized onto three infrastructure providers and with this the neutrality advantage of open standards will probably erode. Interoperability is something two or three players can make or break for 90% of the user base within a single meeting in Sunnyvale.

Google is already trying their luck with "confidential email" which carry expiry dates and revokable reading rights for the recipient. So ... not really email anymore. More like Snapchat. Microsoft has been famous for their winmail.dat attachments and other negligence of email best practices. Yahoo is probably busy trying to develop a sustainable business model and trying to find cash that Marissa didn't spend so hopefully less risk of trying out misguided "innovations" in the email space from them.

All other players are less that 1% of the email domains in the sample. AOL used to have 3.1% and now the are at 0.6% which is in the same (tiny) ball park as the combined Apple offerings (, at 0.4%.

There is virtually no use of the new TLDs for (real, user)1 email. Just a few hundreds of .info and .name. And very few that consider themselves .sexy or .guru and want to tell via their email TLD.

Domain owner   2009 2018
GMail   46.1% 38.6%
Yahoo 11.6% 18.3%
Microsoft 9.9% 35.4%
AOL 3.1% 0.6%
Apple 1.0% 0.4%
Comcast 2.3% 0.2%
SBCGlobal 0.9%   0.09%

  1. There is extensive use of cheap TLDs for "throw-away" spam operations

04 December, 2018 06:41PM by Daniel Lange

Russ Allbery

Review: The Winter Long

Review: The Winter Long, by Seanan McGuire

Series: October Daye #8
Publisher: DAW
Copyright: 2014
ISBN: 1-101-60175-2
Format: Kindle
Pages: 368

This is the eighth book in the October Daye series and leans heavily on the alliances, friendship, world-building, and series backstory. This is not the sort of series that can be meaningfully started in the middle. And, for the same reason, it's also rather hard to review without spoilers, although I'll give it a shot.

Toby has had reason to fear Simon Torquill for the entire series. Everything that's happened to her was set off by him turning her into a fish and destroying her life. She's already had to deal with his partner (in Late Eclipses), so it's not a total surprise that he would show up again. But Toby certainly didn't expect him to show up at her house, or to sound weirdly unlike an enemy, or to reference a geas and an employer. She had never understood his motives, but there may be more to them than simple evil.

I have essentially struck out trying to recommend this series to other people. I think everyone else who's started it has bounced off of it for various reasons: unimpressed by Toby's ability to figure things out, feeling the bits borrowed from the mystery genre are badly done, not liking Irish folklore transplanted to the San Francisco Bay Area, or just finding it too dark. I certainly can't argue with people's personal preferences, but I want to, since this remains my favorite urban fantasy series and I want to talk about it with more people. Thankfully, the friends who started reading it independent of my recommendation all love it too. (Perhaps I'm cursing it somehow?)

Regardless, this is more of exactly what I like about this series, which was never the private detective bits (that have now been discarded entirely) and was always the maneuverings and dominance games of faerie politics, the comfort and solid foundation of Toby's chosen family, Toby's full-throttle-forward approach to forcing her way through problems, and the lovely layered world-building. There is so much going on in McGuire's faerie realm, so many hidden secrets, old grudges, lost history, and complex family relationships. I can see some of the shape of problems that the series will eventually resolve, but I still have no guesses as to how McGuire will resolve them.

The Winter Long takes another deep look at some of Toby's oldest relationships, including revisiting some events from Rosemary and Rue (the first book of the series) in a new light. It also keeps, and further deepens, my favorite relationships in this series: Tybalt, Mags and the Library (introduced in the previous book), and of course the Luidaeg, who is my favorite character in the entire series and the one I root for the most.

I've been trying to pinpoint what I like so much about this series, particularly given the number of people who disagree, and I think it's that Toby gets along with, and respects, a wide variety of difficult people, and brings to every interaction a consistent set of internal ethics and priorities. McGuire sets this against a backdrop of court politics, ancient rivalries and agreements, and hidden races with contempt for humans; Toby's role in that world is to stubbornly do the right thing based mostly on gut feeling and personal loyalty. It's not particularly complex ethics; most of the challenges she faces are eventually resolved by finding the right person to kick (or, more frequently now, use her slowly-growing power against) and the right place to kick them.

That simplicity is what I like. This is my comfort reading. Toby looks at tricky court intrigues, bull-headedly does the right thing, and manages to make that work out, which for me (particularly in this political climate) is escapism in the best sense. She has generally good judgment in her friends, those friends stand by her, and the good guys win. Sometimes that's just what I want in a series, particularly when it comes with an impressive range of mythological creations, an interesting and slowly-developing power set, enjoyable character banter, and a ton of world-building mysteries that I want to know more about.

Long story short, this is more of Toby and friends in much the same vein as the last few books in the series. It adds new depth to some past events, moves Toby higher into the upper echelons of faerie politics, and contains many of my favorite characters. Oh, and, for once, Toby isn't sick or injured or drugged for most of the story, which I found a welcome relief.

If you've read this far into the series, I think you'll love it. I certainly did.

Followed by A Red-Rose Chain.

Rating: 8 out of 10

04 December, 2018 03:24AM

hackergotchi for Colin Watson

Colin Watson

Deploying Swift

Sometimes I want to deploy Swift, the OpenStack object storage system.

Well, no, that’s not true. I basically never actually want to deploy Swift as such. What I generally want to do is to debug some bit of production service deployment machinery that relies on Swift for getting build artifacts into the right place, or maybe the parts of the Launchpad librarian (our blob storage service) that use Swift. I could find an existing private or public cloud that offers the right API and test with that, but sometimes I need to test with particular versions, and in any case I have a terribly slow internet connection and shuffling large build artifacts back and forward over the relevant bit of wet string makes it painfully slow to test things.

For a while I’ve had an Ubuntu 12.04 VM lying around with an Icehouse-based Swift deployment that I put together by hand. It works, but I didn’t keep good notes and have no real idea how to reproduce it, not that I really want to keep limping along with manually-constructed VMs for this kind of thing anyway; and I don’t want to be dependent on obsolete releases forever. For the sorts of things I’m doing I need to make sure that authentication works broadly the same way as it does in a real production deployment, so I want to have Keystone too. At the same time, I definitely don’t want to do anything close to a full OpenStack deployment of my own: it’s much too big a sledgehammer for this particular nut, and I don’t really have the hardware for it.

Here’s my solution to this, which is compact enough that I can run it on my laptop, and while it isn’t completely automatic it’s close enough that I can spin it up for a test and discard it when I’m finished (so I haven’t worried very much about producing something that runs efficiently). It relies on Juju and LXD. I’ve only tested it on Ubuntu 18.04, using Queens; for anything else you’re on your own. In general, I probably can’t help you if you run into trouble with the directions here: this is provided “as is”, without warranty of any kind, and all that kind of thing.

First, install Juju and LXD if necessary, following the instructions provided by those projects, and also install the python-openstackclient package as you’ll need it later. You’ll want to set Juju up to use LXD, and you should probably make sure that the shells you’re working in don’t have http_proxy set as it’s quite likely to confuse things unless you’ve arranged for your proxy to be able to cope with your local LXD containers. Then add a model:

juju add-model swift

At this point there’s a bit of complexity that you normally don’t have to worry about with Juju. The swift-storage charm wants to mount something to use for storage, which with the LXD provider in practice ends up being some kind of loopback mount. Unfortunately, being able to perform loopback mounts exposes too much kernel attack surface, so LXD doesn’t allow unprivileged containers to do it. (Ideally the swift-storage charm would just let you use directory storage instead.) To make the containers we’re about to create privileged enough for this to work, run:

lxc profile set juju-swift security.privileged true
lxc profile device add juju-swift loop-control unix-char \
    major=10 minor=237 path=/dev/loop-control
for i in $(seq 0 255); do
    lxc profile device add juju-swift loop$i unix-block \
        major=7 minor=$i path=/dev/loop$i

Now we can start deploying things! Save this to a file, e.g. swift.bundle:

series: bionic
description: "Swift in a box"
    charm: "cs:mysql-62"
    channel: candidate
    num_units: 1
      dataset-size: 512M
    charm: "cs:keystone"
    num_units: 1
    charm: "cs:swift-storage"
    num_units: 1
      block-device: "/etc/swift/storage.img|5G"
    charm: "cs:swift-proxy"
    num_units: 1
      zone-assignment: auto
      replicas: 1
  - ["keystone:shared-db", "mysql:shared-db"]
  - ["swift-proxy:swift-storage", "swift-storage:swift-storage"]
  - ["swift-proxy:identity-service", "keystone:identity-service"]

And run:

juju deploy swift.bundle

This will take a while. You can run juju status to see how it’s going in general terms, or juju debug-log for detailed logs from the individual containers as they’re putting themselves together. When it’s all done, it should look something like this:

Model  Controller  Cloud/Region     Version  SLA
swift  lxd         localhost        2.3.1    unsupported

App            Version  Status  Scale  Charm          Store       Rev  OS      Notes
keystone       13.0.1   active      1  keystone       jujucharms  290  ubuntu
mysql          5.7.24   active      1  mysql          jujucharms   62  ubuntu
swift-proxy    2.17.0   active      1  swift-proxy    jujucharms   75  ubuntu
swift-storage  2.17.0   active      1  swift-storage  jujucharms  250  ubuntu

Unit              Workload  Agent  Machine  Public address  Ports     Message
keystone/0*       active    idle   0    5000/tcp  Unit is ready
mysql/0*          active    idle   1     3306/tcp  Ready
swift-proxy/0*    active    idle   2     8080/tcp  Unit is ready
swift-storage/0*  active    idle   3              Unit is ready

Machine  State    DNS           Inst id        Series  AZ  Message
0        started  juju-d3e703-0  bionic      Running
1        started   juju-d3e703-1  bionic      Running
2        started   juju-d3e703-2  bionic      Running
3        started  juju-d3e703-3  bionic      Running

At this point you have what should be a working installation, but with only administrative privileges set up. Normally you want to create at least one normal user. To do this, start by creating a configuration file granting administrator privileges (this one comes verbatim from the openstack-base bundle):

_OS_PARAMS=$(env | awk 'BEGIN {FS="="} /^OS_/ {print $1;}' | paste -sd ' ')
for param in $_OS_PARAMS; do
    if [ "$param" = "OS_AUTH_PROTOCOL" ]; then continue; fi
    if [ "$param" = "OS_CACERT" ]; then continue; fi
    unset $param
unset _OS_PARAMS

_keystone_unit=$(juju status keystone --format yaml | \
    awk '/units:$/ {getline; gsub(/:$/, ""); print $1}')
_keystone_ip=$(juju run --unit ${_keystone_unit} 'unit-get private-address')
_password=$(juju run --unit ${_keystone_unit} 'leader-get admin_passwd')

export OS_AUTH_URL=${OS_AUTH_PROTOCOL:-http}://${_keystone_ip}:5000/v3
export OS_USERNAME=admin
export OS_PASSWORD=${_password}
export OS_USER_DOMAIN_NAME=admin_domain
export OS_PROJECT_DOMAIN_NAME=admin_domain
export OS_PROJECT_NAME=admin
export OS_REGION_NAME=RegionOne
# Swift needs this:
# Gnocchi needs this
export OS_AUTH_TYPE=password

Source this into a shell: for instance, if you saved this to ~/.swiftrc.juju-admin, then run:

. ~/.swiftrc.juju-admin

You should now be able to run openstack endpoint list and see a table for the various services exposed by your deployment. Then you can create a dummy project and a user with enough privileges to use Swift:

openstack domain create SwiftDomain
openstack project create --domain SwiftDomain --description Swift \
openstack user create --domain SwiftDomain --project-domain SwiftDomain \
    --project SwiftProject --password "$PASSWORD" "$USERNAME"
openstack role add --project SwiftProject --user-domain SwiftDomain \
    --user "$USERNAME" Member

(This is intended for testing rather than for doing anything particularly sensitive. If you cared about keeping the password secret then you’d use the --password-prompt option to openstack user create instead of supplying the password on the command line.)

Now create a configuration file granting privileges for the user you just created. I felt like automating this to at least some degree:

touch ~/.swiftrc.juju
chmod 600 ~/.swiftrc.juju
sed '/^_password=/d;
     s/\( OS_PROJECT_DOMAIN_NAME=\).*/\1SwiftDomain/;
     s/\( OS_PROJECT_NAME=\).*/\1SwiftProject/;
     s/\( OS_USER_DOMAIN_NAME=\).*/\1SwiftDomain/;
     s/\( OS_USERNAME=\).*/\1'"$USERNAME"'/;
     s/\( OS_PASSWORD=\).*/\1'"$PASSWORD"'/' \
     <~/.swiftrc.juju-admin >~/.swiftrc.juju

Source this into a shell. For example:

. ~/.swiftrc.juju

You should now find that swift list works. Success! Now you can swift upload files, or just start testing whatever it was that you were actually trying to test in the first place.

This is not a setup I expect to leave running for a long time, so to tear it down again:

juju destroy-model swift

This will probably get stuck trying to remove the swift-storage unit, since nothing deals with detaching the loop device. If that happens, find the relevant device in losetup -a from another window and use losetup -d to detach it; juju destroy-model should then be able to proceed.

Credit to the Juju and LXD teams and to the maintainers of the various charms used here, as well as of course to the OpenStack folks: their work made it very much easier to put this together.

04 December, 2018 01:37AM by Colin Watson

December 03, 2018

hackergotchi for Sean Whitton

Sean Whitton

Debian Policy call for participation -- December 2018

Here’s are some of the bugs against the Debian Policy Manual. Please consider getting involved.

Consensus has been reached and help is needed to write a patch

#853779 Clarify requirements about update-rc.d and invoke-rc.d usage in mai…

#874019 Note that the ’-e’ argument to x-terminal-emulator works like ’–’

#874206 allow a trailing comma in package relationship fields

#902612 Packages should not touch users’ home directories

#905453 Policy does not include a section on NEWS.Debian files

#906286 repository-format sub-policy

#907051 Say much more about vendoring of libraries

Wording proposed, awaiting review from anyone and/or seconds by DDs

#786470 [copyright-format] Add an optional “License-Grant” field

#845255 Include best practices for packaging database applications

#850156 Please firmly deprecate vendor-specific series files

#897217 Vcs-Hg should support -b too

Merged for the next release (no action needed)

#188731 Also strip .comment and .note sections

#845715 Please document that packages are not allowed to write outside thei…

#912581 Slightly relax the requirement to include verbatim copyright inform…

03 December, 2018 07:20PM

hackergotchi for Gunnar Wolf

Gunnar Wolf

Chairing «Topics on Internet Censorship and Surveillance»

I have been honored to be invited as a co-chair (together with Vasilis Ververis and Mario Isaakidis) for a Special Track called «Topics on Internet Censorship and Surveillance» (TICS), at the The Eighteenth International Conference on Networks, which will be held in Valencia, Spain, 2019.03.24–2019.03.28, and organized under IARIA's name and umbrella.

I am reproducing here the Call for Papers. Please do note that if you are interested in participating, the relevant dates are those publicized for the Special Track (submission by 2019.01.29; notification by 2019.02.18; registration and camera-ready by 2019.02.27), not those on ICN's site.

Over the past years there has been a greater demand for online censorship and surveillance, as an understandable reaction against hate speech, copyright violations, and other cases related to citizen compliance with civil laws and regulations by national authorities. Unfortunately, this is often accompanied by a tendency of extensively censoring online content and massively spying on citizens actions. Numerous whistleblower revelations, leaks from classified documents, and a vast amount of information released by activists, researchers and journalists, reveal evidence of government-sponsored infrastructure that either goes beyond the requirements and scope of the law, or operates without any effective regulations in place. In addition, this infrastructure often supports the interests of big private corporations, such as the companies that enforce online copyright control.

TICS is a special track the area of Internet censorship, surveillance and other adversarial burdens to technology that bring in danger; to a greater extent the safety (physical security and privacy) of its users.

Proposals for TICS 2019 should be situated within the field of Internet censorship, network measurements, information controls, surveillance and content moderation. Ideally topics should connect to the following , but not limited to:

  • Technical, social, political, and economical implications of Internet censorship and surveillance
  • Detection and analysis of network blocking and surveillance infrastructure (hardware or software)
  • Research on legal frameworks, regulations and policies that imply blocking or limitation of the availability of network services and online content
  • Online censorship circumvention and anti-surveillance practices
  • Network measurements methodologies to detect and categorize network interference
  • Research on the implications of automated or centralized user content regulation (such as for hate speech, copyright, or disinformation)

Please help me share this invitation with possible interested people!
Oh — And to make this more interesting and enticing for you, ICN will take place at the same city and just one week before the Internet Freedom Festival, the Global Unconference of the Internet Freedom Communities ☺

03 December, 2018 06:07PM by gwolf

hackergotchi for Julien Danjou

Julien Danjou

A multi-value syntax tree filtering in Python

A multi-value syntax tree filtering in Python

A while ago, we've seen how to write a simple filtering syntax tree with Python. The idea was to provide a small abstract syntax tree with an easy to write data structure that would be able to filter a value. Filtering meaning that once evaluated, our AST would return either True or False based on the passed value.

With that, we were able to write small rules like Filter({"eq": 3})(4) that would return False since, well, 4 is not equal to 3.

In this new post, I propose we enhance our filtering ability to support multiple values. The idea is to be able to write something like this:

>>> f = Filter(
  {"and": [
    {"eq": ("foo", 3)},
    {"gt": ("bar", 4)},
>>> f(foo=3, bar=5)
>>> f(foo=4, bar=5)

The biggest change here is that the binary operators (eq, gt, le, etc.) now support getting two values, and not only one, and that we can pass multiple values to our filter by using keyword arguments.

How should we implement that? Well, we can keep the same data structure we built previously. However, this time we're gonna do the following change:

  • The left value of the binary operator will be a string that will be used as the key to access the keyword arguments passed to our Filter.__call__ values.
  • The right value of the binary operator will be kept as it is (like before).

We therefore need to change our Filter.build_evaluator to accommodate this as follow:

def build_evaluator(self, tree):
        operator, nodes = list(tree.items())[0]
    except Exception:
        raise InvalidQuery("Unable to parse tree %s" % tree)
        op = self.multiple_operators[operator]
    except KeyError:
            op = self.binary_operators[operator]
        except KeyError:
            raise InvalidQuery("Unknown operator %s" % operator)
        assert len(nodes) == 2 # binary operators take 2 values
        def _op(values):
            return op(values[nodes[0]], nodes[1])
        return _op
    # Iterate over every item in the list of the value linked
    # to the logical operator, and compile it down to its own
    # evaluator.
    elements = [self.build_evaluator(node) for node in nodes]
    return lambda values: op((e(values) for e in elements))

The algorithm is pretty much the same, the tree being browsed recursively.

First, the operator and its arguments (nodes) are extracted.

Then, if the operator takes multiple arguments (such as and and or operators), each node is recursively evaluated and a function is returned evaluating those nodes.
If the operator is a binary operator (such as eq, lt, etc.), it checks that the passed argument list length is 2. Then, it returns a function that will apply the operator (e.g., operator.eq) to values[nodes[0]] and nodes[1]: the former access the arguments (values) passed to the filter's __call__ function while the latter is directly the passed argument.

The full class looks like this:

import operator

class InvalidQuery(Exception):

class Filter(object):
    binary_operators = {
        u"=": operator.eq,
        u"==": operator.eq,
        u"eq": operator.eq,



        u"<=": operator.le,
        u"≤": operator.le,
        u"le": operator.le,



    multiple_operators = {
        u"or": any,
        u"∨": any,
        u"and": all,
        u"∧": all,

    def __init__(self, tree):
        self._eval = self.build_evaluator(tree)

    def __call__(self, **kwargs):
        return self._eval(kwargs)

    def build_evaluator(self, tree):
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery("Unable to parse tree %s" % tree)
            op = self.multiple_operators[operator]
        except KeyError:
                op = self.binary_operators[operator]
            except KeyError:
                raise InvalidQuery("Unknown operator %s" % operator)
            assert len(nodes) == 2 # binary operators take 2 values
            def _op(values):
                return op(values[nodes[0]], nodes[1])
            return _op
        # Iterate over every item in the list of the value linked
        # to the logical operator, and compile it down to its own
        # evaluator.
        elements = [self.build_evaluator(node) for node in nodes]
        return lambda values: op((e(values) for e in elements))

We can check that it works by building some filters:

x = Filter({"eq": ("foo", 1)})
assert not x(foo=1, bar=1)

x = Filter({"eq": ("foo", "bar")})
assert not x(foo=1, bar=1)

x = Filter({"or": (
    {"eq": ("foo", "bar")},
    {"eq": ("bar", 1)},
assert x(foo=1, bar=1)

Supporting multiple values is handy as it allows to pass complete dictionaries to the filter, rather than just one value. That enables users to filter more complex objects.

Sub-dictionary support

It's also possible to support deeper data structure, like a dictionary of dictionary. By replacing values[nodes[0]] by self._resolve_name(values, node[0]) with a _resolve_name method like this one, the filter is able to traverse dictionaries:


def _resolve_name(self, values, name):
        for subname in name.split(self.ATTR_SEPARATOR):
            values = values[subname]
        return values
    except KeyError:
        raise InvalidQuery("Unknown attribute %s" % name)

It then works like that:

x = Filter({"eq": ("baz.sub", 23)})
assert x(foo=1, bar=1, baz={"sub": 23})

x = Filter({"eq": ("baz.sub", 23)})
assert not x(foo=1, bar=1, baz={"sub": 3})

By using the syntax key.subkey.subsubkey the filter is able to access item inside dictionaries on more complex data structure.

That basic filter engine can evolve quite easily in something powerful, as you can add new operators or new way to access/manipulate the passed data structure.

If you have other ideas on nifty features that could be added, feel free to add a comment below!

03 December, 2018 01:29PM by Julien Danjou

hackergotchi for Joachim Breitner

Joachim Breitner

Sliding Right into Information Theory

It's hardly news any more, but it seems I have not blogged about my involvement last year with an interesting cryptanalysis project, which resulted in the publication Sliding right into disaster: Left-to-right sliding windows leak by Daniel J. Bernstein, me, Daniel Genkin, Leon Groot Bruinderink, Nadia Heninger, Tanja Lange, Christine van Vredendaal and Yuval Yarom, which was published at CHES 2017 and on ePrint (ePrint is the cryptographer’s version of arXiv).

This project nicely touched upon many fields of computer science: First we need systems expertise to mount a side-channel attack that uses cache timing difference to observe which line of a square-and-multiply algorithm the target process is executing. Then we need algorithm analysis required to learn from these observations partial information about the bits of the private key. This part includes nice PLy concepts like rewrite rules (see Section 3.2). Oncee we know enough about the secret keys, we can use fancy cryptography to recover the whole secret key (Section 3.4). And finally, some theoretical questions arise, such as: “How much information do we need for the attack to succeed?” and “Do we obtain this much information”, and we need some nice math and information theory to answer these.

Initially, I focused on the PL-related concepts. We programming language people are yak-shavers, and in particular “rewrite rules” just demands the creation of a DSL to express them, and an interpreter to execute them, doesn’t it? But it turned out that these rules are actually not necessary, as the key recovery can use the side-channel observation directly, as we found out later (see Section 4 of the paper). But now I was already hooked, and turned towards the theoretical questions mentioned above.

Shannon vs. Rényi

It felt good to shake the dust of some of the probability theory that I learned for my maths degree, and I also learned some new stuff. For example, it was intuitively clear that whether the attack succeeds depends on the amount of information obtained by the side channel attack, and based on prior work, the expectation was that if we know more than half the bits, then the attack would succeed. Note that for this purpose, two known “half bits” are as good as knowing one full bit; for example knowing that the secret key is either 01 or 11 (one bit known for sure) is just as good as knowing that the key is either 00 or 11.

Cleary, this is related to entropy somehow -- but how? Trying to prove that the attack works if the entropy rate of the leak is >0.5 just did not work, against all intuition. But when we started with a formula that describes when the attack succeeds, and then simplified it, we found a condition that looked suspiciously like what we wanted, namely H > 0.5, only that H was not the conventional entropy (also known as the Shannon entropy, H = −∑p ⋅ log p), but rather something else: H = −∑p2, which turned to be called the collision entropy or Rényi entropy.

This resulted in Theorem 3 in the paper, and neatly answers the question when the Heninger and Shacham key recovery algorithm, extended to partial information, can be expected to succeed in a much more general setting that just this particular side-channel attack.

Markov chains and an information theoretical spin-off

The other theoretical question is now: Why does this particular side channel attack succeed, i.e. why is the entropy rate H > 0.5. As so often, Markov chains are an immensly powerful tool to answer that question. After some transformations, I managed to model the state of the square-and-multiply algorithm, together with the side-channel leak, as a markov chain with a hidden state. Now I just had to calculate its Rényi entropy rate, right? I wrote some Haskell code to do this transformation, and also came up with an ad-hoc, intuitive way of calculating the rate. So when it was time to write up the paper, I was searching for a reference that describes the algorithm that I was using…

Only I could find none! I contacted researchers who have published related to Markov chains and entropies, but they just referred me in circles, until one of them, Maciej Skórski responded. Our conversation, highly condendensed, went like this: “Nice idea, but it can’t be right, it would solve problem X” – “Hmm, but it feels so right. Here is a proof sketch.” – “Oh, indeed, cool. I can even generalize this! Let’s write a paper”. Which we did! Analytic Formulas for Renyi Entropy of Hidden Markov Models (preprint only, it is still under submission).

More details

Because I joined the sliding-right project late, not all my contributions made it into the actual paper, and therefore I published an “inofficial appendix” separately on ePrint. It contains

  1. an alternative way to find the definitively knowable bits of the secret exponent, which is complete and can (in rare corner cases) find more bits than the rewrite rules in Section 3.1
  2. an algorithm to calculate the collision entropy H, including how to model a side-channel attack like this one as a markov chain, and how to calculate the entropy of such a markov chain, and
  3. the proof of Theorem 3.

I also published the Haskell code that I wrote for this projects, including the markov chain collision entropy stuff. It is not written with public consumption in mind, but feel free to ask if you have questions about this.

Note that all errors, typos and irrelevancies in that document and the code are purely mine and not of any of the other authors of the sliding-right paper. I’d like to thank my coauthors for the opportunity to join this project.

03 December, 2018 09:56AM by Joachim Breitner (

hackergotchi for Daniel Pocock

Daniel Pocock

Smart home: where to start?

My home automation plans have been progressing and I'd like to share some observations I've made about planning a project like this, especially for those with larger houses.

With so many products and technologies, it can be hard to know where to start. Some things have become straightforward, for example, Domoticz can soon be installed from a package on some distributions. Yet this simply leaves people contemplating what to do next.

The quickstart

For a small home, like an apartment, you can simply buy something like the Zigate, a single motion and temperature sensor, a couple of smart bulbs and expand from there.

For a large home, you can also get your feet wet with exactly the same approach in a single room. Once you are familiar with the products, use a more structured approach to plan a complete solution for every other space.

The Debian wiki has started gathering some notes on things that work easily on GNU/Linux systems like Debian as well as Fedora and others.


What is your first goal? For example, are you excited about having smart lights or are you more concerned with improving your heating system efficiency with zoned logic?

Trying to do everything at once may be overwhelming. Make each of these things into a separate sub-project or milestone.

Technology choices

There are many technology choices:

  • Zigbee, Z-Wave or another protocol? I'm starting out with a preference for Zigbee but may try some Z-Wave devices along the way.
  • E27 or B22 (Bayonet) light bulbs? People in the UK and former colonies may have B22 light sockets and lamps. For new deployments, you may want to standardize on E27. Amongst other things, E27 is used by all the Ikea lamp stands and if you want to be able to move your expensive new smart bulbs between different holders in your house at will, you may want to standardize on E27 for all of them and avoid buying any Bayonet / B22 products in future.
  • Wired or wireless? Whenever you take up floorboards, it is a good idea to add some new wiring. For example, CAT6 can carry both power and data for a diverse range of devices.
  • Battery or mains power? In an apartment with two rooms and less than five devices, batteries may be fine but in a house, you may end up with more than a hundred sensors, radiator valves, buttons, and switches and you may find yourself changing a battery in one of them every week. If you have lodgers or tenants and you are not there to change the batteries then this may cause further complications. Some of the sensors have a socket for an optional power supply, battery eliminators may also be an option.

Making an inventory

Creating a spreadsheet table is extremely useful.

This helps estimate the correct quantity of sensors, bulbs, radiator valves and switches and it also helps to budget. Simply print it out, leave it under the Christmas tree and hope Santa will do the rest for you.

Looking at my own house, these are the things I counted in a first pass:

Don't forget to include all those unusual spaces like walk-in pantries, a large cupboard under the stairs, cellar, en-suite or enclosed porch. Each deserves a row in the table.

Sensors help make good decisions

Whatever the aim of the project, sensors are likely to help obtain useful data about the space and this can help to choose and use other products more effectively.

Therefore, it is often a good idea to choose and deploy sensors through the home before choosing other products like radiator valves and smart bulbs.

The smartest place to put those smart sensors

When placing motion sensors, it is important to avoid putting them too close to doorways where they might detect motion in adjacent rooms or hallways. It is also a good idea to avoid putting the sensor too close to any light bulb: if the bulb attracts an insect, it will trigger the motion sensor repeatedly. Temperature sensors shouldn't be too close to heaters or potential draughts around doorways and windows.

There are a range of all-in-one sensors available, some have up to six features in one device smaller than an apple. In some rooms this is a convenient solution but in other rooms, it may be desirable to have separate motion and temperature sensors in different locations.

Consider the dining and sitting rooms in my own house, illustrated in the floorplan below. The sitting room is also a potential 6th bedroom or guest room with sofa bed, the downstairs shower room conveniently located across the hall. The dining room is joined to the sitting room by a sliding double door. When the sliding door is open, a 360 degree motion sensor in the ceiling of the sitting room may detect motion in the dining room and vice-versa. It appears that 180 degree motion sensors located at the points "1" and "2" in the floorplan may be a better solution.

These rooms have wall mounted radiators and fireplaces. To avoid any of these potential heat sources the temperature sensors should probably be in the middle of the room.

This photo shows the proposed location for the 180 degree motion sensor "2" on the wall above the double door:


To summarize, buy a Zigate and a small number of products to start experimenting with. Make an inventory of all the products potentially needed for your home. Try to mark sensor locations on a floorplan, thinking about the type of sensor (or multiple sensors) you need for each space.

03 December, 2018 08:44AM by Daniel.Pocock

Russ Allbery

Review: Linked

Review: Linked, by Albert-László Barabási

Publisher: Plume
Copyright: 2002, 2003
Printing: May 2003
ISBN: 0-452-28439-2
Format: Trade paperback
Pages: 241

Barabási at the time of this writing was a professor of physics at Notre Dame University (he's now the director of Northeastern University's Center of Complex Networks). Linked is a popularization of his research into scale-free networks, their relationship to power-law distributions (such as the distribution of wealth), and a proposed model explaining why so many interconnected systems in nature and human society appear to form scale-free networks. Based on some quick Wikipedia research, it's worth mentioning that the ubiquity of scale-free networks has been questioned and may not be as strong as Barabási claims here, not that you would know about that controversy from this book.

I've had this book sitting in my to-read pile for (checks records) ten years, so I only vaguely remember why I bought it originally, but I think it was recommended as a more scientific look at phenomenon popularized by Malcolm Gladwell in The Tipping Point. It isn't that, exactly; Barabási is much less interested in how ideas spread than he is in network structure and its implications for robustness and propagation through the network. (Contagion, as in virus outbreaks, is the obvious example of the latter.)

There are basically two parts to this book: a history of Barabási's research into scale-free networks and the development of the Barabási-Albert model for scale-free network generation, and then Barabási's attempt to find scale-free networks in everything under the sun and make grandiose claims about the implications of that structure for human understanding. One of these parts is better than the other.

The basic definition of a scale-free network is a network where the degree of the nodes (the number of edges coming into or out of the node) follows a power-law distribution. It's a bit hard to describe a power-law distribution without the math, but the intuitive idea is that the distribution will contain a few "winners" who will have orders of magnitude more connections than the average node, to the point that their connections may dominate the graph. This is very unlike a normal distribution (the familiar bell-shaped curve), where most nodes will cluster around a typical number of connections and the number of nodes with a given count of connections will drop off rapidly in either direction from that peak. A typical example of a power-law distribution outside of networks is personal wealth: rather than clustering around some typical values the way natural measurements like physical height do, a few people (Bill Gates, Warren Buffett) have orders of magnitude more wealth than the average person and a noticeable fraction of all wealth in society.

I am moderately dubious of Barabási's assertion here that most prior analysis of networks before his scale-free work focused on random networks (ones where new nodes are connected at an existing node chosen at random), since this is manifestly not the case in computer science (my personal field). However, scale-free networks are a real phenomenon that have some very interesting properties, and Barabási and Albert's proposal of how they might form (add nodes one at a time, and prefer to attach a new node to the existing node with the most connections) is a simple and compelling model of how they can form. Barabási also discusses a later variation, which Wikipedia names the Bianconi-Barabási model, which adds a fitness function for more complex preferential attachment.

Linked covers the history of the idea from Barabási's perspective, as well as a few of its fascinating properties. One is that scale-free networks may not have a tipping point in the Gladwell sense. Depending on the details, there may not be a lower limit of nodes that have to adopt some new property for it to spread through the network. Another is robustness: scale-free networks are startlingly robust against removal of random nodes from the network, requiring removal of large percentages of the nodes before the network fragments, but are quite vulnerable to a more targeted attack that focuses on removing the hubs (the nodes with substantially more connections than average). Scale-free networks also naturally give rise to "six degrees of separation" effects between any two nodes, since the concentration of connections at hubs lead to short paths.

These parts of Linked were fairly interesting, if sometimes clunky. Unfortunately, Barabási doesn't have enough material to talk about mathematical properties and concrete implications at book length, and instead wanders off into an exercise in finding scale-free networks everywhere (cell metabolism, social networks, epidemics, terrorism), and leaping from that assertion (which Wikipedia, at least, labels as not necessarily backed up by later analysis) to some rather overblown claims. I think my favorite was the confident assertion that by 2020 we will be receiving custom-tailored medicine designed specifically for the biological networks of our unique cells, which, one, clearly isn't going to happen, and two, has a strained and dubious connection to scale-free network theory to say the least. There's more in that vein. (That said, the unexpected mathematical connection between the state transition of a Bose-Einstein condensate and scale-free network collapse given sufficiently strong attachment preference and permission to move connections was at least entertaining.)

The general introduction to scale-free networks was interesting and worth reading, but I think the core ideas of this book could have been compressed into a more concise article (and probably have, somewhere on the Internet). The rest of it was mostly boring, punctuated by the occasional eye-roll. I appreciate Barabási's enthusiasm for his topic — it reminds me of professors I worked with at Stanford and their enthusiasm for their pet theoretical concept — but this may be one reason to have the popularization written by someone else. Not really recommended as a book, but if you really want a (somewhat dated) introduction to scale-free networks, you could do worse.

Rating: 6 out of 10

03 December, 2018 04:22AM

December 02, 2018

Mike Gabriel

My Work on Debian LTS/ELTS (November 2018)

In November 2018, I have worked on the Debian LTS project for nine hours as a paid contributor. Of the originally planned twelve hours (four of them carried over from October) I gave two hours back to the pool of available work hours and carry one hour over to December.

For November, I also signed up for four hours of ELTS work, but had to realize that at the end of the month, I hadn't even set up a test environment for Debian wheezy ELTS, so I gave these four hours back to the "pool". I have started getting an overview of the ELTS workflow now and will start fixing packages in December.

So, here is my list of work accomplished for Debian LTS in November 2018:

  • Regression upload of poppler (DLA 1562-2 [1]), updating the fix for CVE-2018-16646
  • Research on Saltstack salt regarding CVE-2018-15750 and CVE-2018-15751. Unfortunately, there was no reference in the upstream Git repository to the commit(s) that actually fixed those issues. Finally, it turned out that the REST netapi code that is affected by the named CVEs was added between upstream release 2014.1.13 and 2014.7(.0). As Debian jessie ships salt's upstream release 2014.1.13, I concluded that salt in jessie is not affected by the named CVEs.
  • Last week I joined Markus Koschany with triaging a plentitude of libav issues that have/had status "undetermined" for Debian jessie. I was able to triage 21 issues, of which 15 have applicable patches. Three issues have patches that don't apply cleanly and need manual work. One issue only is valid to ffmpeg, but not to libav. For another issue, there seems to be no patch available (yet). And yet another issue seemed already somehow fixed in libav (although with error code AVERROR_PATCHWELCOME).

Thanks to all LTS/ELTS sponsors for making these projects possible.



02 December, 2018 09:59PM by sunweaver

Thorsten Alteholz

My Debian Activities in November 2018

FTP master

This month I accepted 486 packages, which is twice as much as last month. On the other side I was a bit reluctant and rejected only 38 uploads. The overall number of packages that got accepted this month was 556.

Debian LTS

This was my fifty third month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

This month my all in all workload has been 30h. During that time I did LTS uploads or prepared security uploads of:

  • [DLA 1574-1] imagemagick security update for one CVE
  • [DLA 1586-1] openssl security update for two CVEs
  • [DLA 1587-1] pixman security update for one CVE
  • [DLA 1594-1] xml-security-c security update for one (temporary) CVE
  • [DLA 1595-1] gnuplot5 security update for three CVEs
  • [DLA 1597-1] gnuplot security update for three CVEs
  • [DLA 1602-1] nsis security update two CVEs

Thanks to Markus Koschany for testing my openssl package. It is really having a calming effect when a different pair of eyes has a quick look and does not start to scream.

I also started to work on the new CVEs of wireshark.

My debdiff of tiff was used by Moritz to doublecheck his and Lazlos work, and finally resulted in DSA 4349-1. Though not every debdiff will result in its own DSA , they are still useful for the security team. So always think of Stretch when you do a DLA.

Last but not least I did some days of frontdesk duties.

Debian ELTS

This month was the sixth ELTS month.

During my allocated time I uploaded:

  • ELA-58-1 for tiff3
  • ELA-59-1 for openssl
  • ELA-60-1 for pixman

I also started to work on the new CVEs of wireshark.

As like in LTS, I also did some days of frontdesk duties.

Other stuff

I improved packaging of …

  • libctl by finally moving to guile-2.2. Though guile-2.0 might not disappear completely in Buster, this is my first step to make it happen
  • mdns-scan
  • libjwt

I uploaded new upstream versions of …

Again I to sponsored some packages for Nicolas Mora. This time it were some dependencies for his new project taliesin, a lightweight audio media server with a REST API interface and a React JS client application. I am already anxious to give it a try :-).

As it is again this time of the year, I would also like to draw some attention to the Debian Med Advent Calendar. Like the past years, the Debian Med team starts a bug squashing event from the December 1st to 24th. Every bug that is closed will be registered in the calendar. So instead of taking something from the calendar, this special one will be filled and at Christmas hopefully every Debian Med related bug is closed. Don’t hestitate, start to squash :-).

02 December, 2018 07:07PM by alteholz

Sylvain Beucler

New Android SDK/NDK Rebuilds

As described in a previous post, Google is still click-wrapping all Android developer binaries with a non-free EULA.

I recompiled SDK 9.0.0, NDK r18b and SDK Tools 26.1.1 from the free sources to get rid of it:

with one-command, Docker-based builds:

This triggered an interesting thread about the current state of free dev tools to target the Android platform.

Hans-Christoph Steiner also called for joining efforts towards a repository hosted using the F-Droid architecture:

What do you think?

02 December, 2018 02:59PM

Sven Hoexter

nginx and lua to evaluate CDN behaviour

I guess in the past everyone used CGIs to achieve something similar, it just seemed like a nice detour to use the nginx Lua module instead. Don't expect to read something magic. I'm currently looking into different CDN providers and how they behave regarding cache-control header, and what additional header they sent by default and when you activate certain feature. So I setup two locations inside the nginx configuration using a content_by_lua_block {} for testing purpose.

location /header {
  default_type 'text/plain';
  content_by_lua_block {
   local myheads=ngx.req.get_headers()
   for key in pairs(myheads) do
    local outp="Header '" .. key .. "': " .. myheads[key]

location /cc {
 default_type 'text/plain';
  content_by_lua_block {
   local cc=ngx.req.get_headers()["cc"]
   if cc ~= nil then
    ngx.say("moep - no cc header found")

The first one is rather boring, it just returns you the request header my origin server received, like this

$ curl -is
HTTP/2 200 
date: Sun, 02 Dec 2018 13:20:14 GMT
content-type: text/plain
set-cookie: __cfduid=d503ed2d3148923514e3fe86b4e26f5bf1543756814; expires=Mon, 02-Dec-19 13:20:14 GMT; path=/;; HttpOnly; Secure
strict-transport-security: max-age=2592000
expect-ct: max-age=604800, report-uri=""
server: cloudflare
cf-ray: 482e16f7ae1bc2f1-FRA

Header 'x-forwarded-for':
Header 'cf-ipcountry': DE
Header 'connection': Keep-Alive
Header 'accept': */*
Header 'accept-encoding': gzip
Header 'host':
Header 'x-forwarded-proto': https
Header 'cf-visitor': {"scheme":"https"}
Header 'cf-ray': 482e16f7ae1bc2f1-FRA
Header 'cf-connecting-ip':
Header 'user-agent': curl/7.62.0

The second one is more interesting, it copies the content of the "cc" HTTP request header to the "cache-control" response header to allow you convenient evaluation of the handling of different cache-control header settings.

$ curl -H'cc: no-store,no-cache' -is
HTTP/2 200 
date: Sun, 02 Dec 2018 13:27:46 GMT
content-type: image/jpeg
set-cookie: __cfduid=d971badd257b7c2be831a31d13ccec77f1543757265; expires=Mon, 02-Dec-19 13:27:45 GMT; path=/;; HttpOnly; Secure
cache-control: no-store,no-cache
cf-cache-status: MISS
strict-transport-security: max-age=2592000
expect-ct: max-age=604800, report-uri=""
server: cloudflare
cf-ray: 482e22001f35c26f-FRA


$ curl -H'cc: public' -is
HTTP/2 200 
date: Sun, 02 Dec 2018 13:28:18 GMT
content-type: image/jpeg
set-cookie: __cfduid=d48a4b571af6374c759c430c91c3223d71543757298; expires=Mon, 02-Dec-19 13:28:18 GMT; path=/;; HttpOnly; Secure
cache-control: public, max-age=14400
cf-cache-status: MISS
expires: Sun, 02 Dec 2018 17:28:18 GMT
strict-transport-security: max-age=2592000
expect-ct: max-age=604800, report-uri=""
server: cloudflare
cf-ray: 482e22c8886627aa-FRA


$ curl -H'cc: no-cache,no-store' -is
HTTP/2 200 
date: Sun, 02 Dec 2018 13:30:33 GMT
content-type: image/jpeg
set-cookie: __cfduid=dbc4758b7bb98d556173a89aa2a8c2d3a1543757433; expires=Mon, 02-Dec-19 13:30:33 GMT; path=/;; HttpOnly; Secure
cache-control: public, max-age=14400
cf-cache-status: HIT
expires: Sun, 02 Dec 2018 17:30:33 GMT
strict-transport-security: max-age=2592000
expect-ct: max-age=604800, report-uri=""
server: cloudflare
cf-ray: 482e26185d36c29c-FRA


As you can see this endpoint is currently fronted by Cloudflare using a default configuration. If you burned one request path below "/cc/" and it's now cached for a long time you can just use a random different one to continue your test, without any requirement to flush the CDN caches.

02 December, 2018 01:40PM

December 01, 2018

hackergotchi for Junichi Uekawa

Junichi Uekawa

Playing with FUSE after several years and realizing things haven't changed that much.

Playing with FUSE after several years and realizing things haven't changed that much.

01 December, 2018 11:37PM by Junichi Uekawa

Julian Andres Klode

Migrating web servers

As of today, I migrated various services from shared hosting on to a VPS hosted by hetzner. This includes my weechat client, this blog, and the following other websites:

  • redirector


Uberspace runs CentOS 6. This was causing more and more issues for me, as I was trying to run up-to-date weechat binaries. In the final stages, I ran weechat and tmux inside a debian proot. It certainly beat compiling half a system with linuxbrew.

The web performance was suboptimal. Webpages are served with Pound and Apache, TLS connection overhead was just huge, there was only HTTP/1.1, and no keep-alive.

Security-wise things were interesting: Everything ran as my user, obviously, whether that’s scripts, weechat, or mail delivery helpers. Ugh. There was also only a single certificate, meaning that all domains shared it, even if they were completely distinct like and

Enter Hetzner VPS

I launched a VPS at hetzner and configured it with Ubuntu 18.04, the latest Ubuntu LTS. It is a CX21, so it has 2 vcores, 4 GB RAM, 40 GB SSD storage, and 20 TB of traffic. For 5.83€/mo, you can’t complain.

I went on to build a repository of ansible roles (see repo on, that configured the system with a few key characteristics:

  • http is served by nginx
  • certificates are per logical domain - each domain has a canonical name and a set of aliases; and the certificate is generated for them all
  • HTTPS is configured according to Mozilla’s modern profile, meaning TLSv1.2-only, and a very restricted list of ciphers. I can revisit that if it’s causing problems, but I’ve not seen huge issues.
  • Log files are anonymized to 24 bits for IPv4 addresses, and 32 bit for IPv6 addresses, which should allow me to identify an ISP, but not an individual user.

I don’t think the roles are particularly reusable for others, but it’s nice to have a central repository containing all the configuration for the server.

Go server to serve comments

When I started self-hosting the blog and added commenting via mastodon, it was via a third-party PHP script. This has been replaced by a Go program (GitHub repo). The new Go program scales a lot better than a PHP script, and provides better security properties due to AppArmor and systemd-based sandboxing; it even uses systemd’s DynamicUser.

Special care has been taken to have time outs for talking to upstream servers, so the program cannot hang with open connections and will respond eventually.

The Go binary is connected to nginx via a UNIX domain socket that serves FastCGI. The service is activated via systemd socket activation, allowing it to be owned by www-data, while the binary runs as a dynamic user. Nginx’s native fastcgi caching mechanism is enabled so the Go process is only contacted every 10 minutes at the most (for a given post). Nice!


Performance is a lot better than the old shared server. Pages load in up to half the time of the old one. Scalability also seems better: I tried various benchmarks, and achieved consistently higher concurrency ratings. A simple curl via https now takes 100ms instead of 200ms.

Performance is still suboptimal from the west coast of the US or other places far away from Germany, but got a lot better than before: Measuring from Oregon using webpagetest, it took 1.5s for a page to fully render vs ~3.4s before. A CDN would surely be faster, but would lose the end-to-end encryption.

Upcoming mail server

The next step is to enable email. Setting up postfix with dovecot is quite easy it turns out. Install them, tweak a few settings, setup SPF, DKIM, DMARC, and a PTR record, and off you go.

I mostly expect to read my email by tagging it on the server using notmuch somehow, and then syncing it to my laptop using muchsync. The IMAP access should allow some notifications or reading on the phone.

Spam filtering will be handled with rspamd. It seems to be the hot new thing on the market, is integrated with postfix as a milter, and handles a lot of stuff, such as:

  • greylisting
  • IP scoring
  • DKIM verification and signing
  • ARC verification
  • SPF verification
  • DNS lists
  • Rate limiting

It also has fancy stuff like neural networks. Woohoo!

As another bonus point: It’s trivial to confine with AppArmor, which I really love. Postfix and Dovecot are a mess to confine with their hundreds of different binaries.

I found it via uberspace, which plan on using it for their next uberspace7 generation. It is also used by some large installations like and

I plan to migrate mail from uberspace in the upcoming weeks, and will post more details about it.

01 December, 2018 10:40PM

November 30, 2018

Paul Wise

FLOSS Activities November 2018





  • myrepos: respond to some tickets
  • Debian: respond to porterbox schroot query, remove obsolete role accounts, restart misbehaving webserver, redirect openmainframe mail to debian-s390, respond to query about consequences of closing accounts
  • Debian wiki: unblacklist networks, redirect/answer user support query, answer question about page names, whitelist email addresses
  • Debian packages site: update mirror config
  • Debian derivatives census: merge and deploy changes from Outreachy applicants and others


The purple-discord upload was sponsored by my employer. All other work was done on a volunteer basis.

30 November, 2018 09:03PM

hackergotchi for Chris Lamb

Chris Lamb

Free software activities in November 2018

Here is my monthly update covering what I have been doing in the free software world during November 2018 (previous month):

Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws almost all software is distributed pre-compiled to end users.

The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

This month I:


Debian LTS

This month I have worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project.

  • Investigated and triaged, libsdl2-image, lighttpd, nginx, pdns, poppler, rustc & xml-security-c amongst many others.

  • "Frontdesk" duties, responding to user queries, etc.

  • Issued DLA 1572-1 for nginx to fix a denial of service (DoS) vulnerability — as there was no validation for the size of a 64-bit atom in an .mp4 file this led to CPU exhaustion when the size was zero.

  • Issued DLA 1576-1 correcting a SSH passphrase disclosure in ansible's User module leaking data in the global process list.

  • Issued DLA 1584-1 for ruby-i18n to fix a remote denial-of-service vulnerability.

  • Issued DLA 1585-1 to prevent an XSS vulnerability in ruby-rack where a malicious request could forge the HTTP scheme being returned to the underlying application.

  • Issued DLA 1591-1 to fix two vulnerabilities in libphp-phpmailer where a arbitrary local files could be disclosed via relative path HTML transformations as well as an object injection attack.

  • Uploaded libsdl2-image (2.0.3+dfsg1-3) and sdl-image1.2 (1.2.12-10) to the unstable distribution to fix buffer overflows on a corrupt or maliciously-crafted XCF files. (#912617 & #912618)

  • Uploaded ruby-i18n (0.7.0-3) to unstable [...] and prepared a stable proposed update for a potential 0.7.0-2+deb9u1 in stretch (#914187).

  • Uploaded ruby-rack (1.6.4-6) to unstable [...] and (2.0.5-2) to experimental [...]. I also prepared a proposed update for a 1.6.4-4+deb9u1 in the stable distribution (#914184).


  • python-django (2:2.1.3-1) — New upstream bugfix release.

  • redis:

    • 5.0.1-1 — New upstream release, ensure that Debian-supplied Lua libraries are available during scripting. (#913185), refer to /run directly in .service files, etc.
    • 5.0.1-2 — Ensure that lack of IPv6 support does not prevent startup Debian where we bind to the ::1 interface by default. (#900284 & #914354)
    • 5.0.2-1 — New upstream release.
  • redisearch (1.2.1-1) — Upload the last AGPLv3 (ie. non-Commons Clause)) package from my GoodFORM project.

  • hiredis (0.14.0-3) — Adopt and tidy package (#911732).

  • python-redis (3.0.1-1) — New upstream release.

  • adminer (4.7.0-1) — New upstream release & ensure all documentation is under /usr/share/doc.

I also sponsored uploads of elpy (1.26.0-1) & muttrc-mode-el (1.2+git20180915.aa1601a-1).

Debian bugs filed

  • molly-guard: Breaks conversion with usrmerge. (#914716)

  • git-buildpackage: Please add gbp-dch --stable flag. (#914186)

  • git-buildpackage: gbp pq -Pq suffixes are not actually optional. (#914281)

  • python-redis: Autopkgtests fail. (#914800)

  • git-buildpackage: Correct "saving" typo. (#914280)

  • python-astropy: Please drop unnecessary dh_strip_nondeterminism override. (#914612)

  • shared-mime-info: Don't assume every *.key file is an Apple Keynote file. (#913550, with patch)

FTP Team

As a Debian FTP assistant this month I ACCEPTed 37 packages: android-platform-system-core, arm-trusted-firmware, boost-defaults, dtl, elogind, fonts-ibm-plex, gnome-remote-desktop, gnome-shell-extension-desktop-icons, google-i18n-address, haskell-haskell-gi-base, haskell-rio, lepton-eda, libatteanx-serializer-rdfa-perl, librdf-trine-serializer-rdfa-perl, librdf-trinex-compatibility-attean-perl, libre-engine-re2-perl, libtest-regexp-pattern-perl, linux, lua-lxc, lxc-templates, ndctl, openssh, osmo-bsc, osmo-sgsn, othman, pg-rational, qtdatavis3d-everywhere-src, ruby-grape-path-helpers, ruby-grape-route-helpers, ruby-graphiql-rails, ruby-js-regex, ruby-regexp-parser, shellia, simple-revision-control, theme-d, ulfius & vim-julia.

30 November, 2018 08:23PM