Tuesday, August 23, 2016

What do you do when you need more systemd instances?

You wanna make it there? You need to go there!
Today, we had a very interesting problem. We needed to have systemd run additional instances of itself to manage custom daemons. This works like the following.

You would need to enable "lingering" for the corresponding users:
loginctl enable-linger <user> # only once via root

# alternative: touch /var/lib/systemd/linger/<user>
After this, you can happily put your service file here:
mkdir -p ~/.config/systemd/user/
vim ~/.config/systemd/user/test.service
[Unit]
Description=test

[Service]
Type=simple
ExecStart=/bin/sleep 10000
Restart=on-abort

[Install]
WantedBy=default.target
Then, enabling/starting/stopping your brand-new service (and everything else you would expect from a proper task management) works like a charm, also after reboots.
systemctl --user enable test.service
systemctl --user start test.service
systemctl --user status test.service
(Please note the --user argument; run this under a non-root user.)

Well, life could be so easy if everything runs on Ubuntu 16.04 (or on a recent distribution for that purpose). In fact, not all our production servers do. There's a big portion of openSUSE 12.3 servers, which need special handling.

Once we brought our test setup described above to said servers, we noticed that
systemctl --user
fails with
Failed to issue method call: Process /bin/false exited with status 1
Not very helpful indeed, so we dug deeper. The core issue here is that there is simply no user instance of systemd running. This is what /lib/systemd/system/user@.service is good for. On my Ubuntu 16.04, user@1000.service is enabled and running, thus maintaining a second systemd instance just for my login user in addition to root's system systemd - commonly known as pid 1. If you stop user@1000.service, you'll notice that systemctl --user also fails.

In short, on openSUSE 12.3, the mechanism to start user instances of systemd is simply broken. Starting the user@.service results in a failure.

How to fix it for openSUSE 12.3?

The outlook of updating all production servers, made the following solutions unacceptable (due to possibility of errors or failures while updating and necessary reboots; basically headaches on steroids):
  • update systemd (will it reboot at all?)
  • change the PAM config (will we authenticate again?)
  • even deeper changes in Linux (no please)
So, we decided to stick with what we have and what as known to work properly.

How does it work anyway?

If one pays closer attention to the user@.service unit, we see what it actually does. Let's do the same in a shell:
user:~$ systemd --user
It works and systemctl works as well! Horray. :)
So, it should be a no-brainer for root, right?
root:~# su - user -c '/usr/lib/systemd/systemd --user'
Failed to create root cgroup hierarchy: Permission denied
Failed to allocate manager object: Permission denied
Uh, that's odd. Maybe, that's the reason why systemd cannot create another instance of itself. But what's the difference here? When does it work and when not?

Use SSH!

In the first try, we connected to the server via ssh. There, the authentication and session creation process is successfully finished and thoroughly tested. So, we decided to use ssh when writing the following our-user@.service unit file:
[Unit]
Description=Alternative User Manager for %I
After=sshd.service

[Service]
ExecStart=/usr/bin/ssh %I@localhost /usr/lib/systemd/systemd --user
Restart=on-failure

[Install]
WantedBy=default.target
Enabling this, made the whole systemd user instance magic work again for openSUSE 12.3 again.

Cheers,
Sven

Further readings about system vs user instances of systemd:
https://www.freedesktop.org/software/systemd/man/systemd.html#--system
https://www.freedesktop.org/software/systemd/man/systemctl.html#--user

No comments:

Post a Comment