Automate it all!..
If you have read any of the previous entries in this series, you should have a pretty good idea of how I approach the creation and maintenance of the Linux audio lab in the CS department at Yale. Essentially, it boils down to this: automate what you can, do manually what you must. Why not automate everything? Surely, you can just create an incredibly well-structured script to do everything for you, right?
The me of a few months ago might have still believed that.
That old me would read up on bash scripting and “best practices” and begin a rewrite of the long-used post-install script he had been using for over 6 years. Despite his time investment, his new script would end up causing more problems than it was worth.
The me of today knows better.
… or Not
The problem is that, when tackling a new addition or methodology regarding lab imaging and support, there is a time sink required to determine the best path forward . The solution we arrive at will be a compromise based on the solution’s best fit. For me, the best solution is invariably the one that costs me the least amount of time. I’ll sink time up-front if it saves me from repetitive tasks down the road. Or, I’ll sink an hour here and there to manually do something if testing and deployment of some new solution will require more hours than seems practical at the moment. This makes sense, no? After all, I’m an academic. The lab lives to serve student learning and research, and I serve the lab only so far as I must in order to keep it running optimally.
So, back to that script.
Rewrite at Your Own Risk
It turns out that many bash script “best practices” are simply overkill for what I am doing, which is package installs, mostly from repositories, with some local file installs, configuration tweaks, etc. Adding a bunch of bash-specific logic to account for every eventuality quickly becomes both impractical and unsustainable. In a large-impact corporate setting? Sure. You don’t want to erase the boss’s email. But in an audio/music lab where the most important thing is reliable low-latency performance, the edge-cases are pretty scant and somewhat unimportant. (Usually, what bites you is something breaks on an update.)
That said, the script I have been using for years (that I cobbled together with our dedicated ITS specialist) had no error reporting at all1Technically, this isn’t true, I captured terminal via | tee to a file. But an error didn’t stop the script from continuing, it just recorded that something happened.. That was okay when the script was pretty sparse. But generally it’s bad for a number of reasons that became apparent as the script became longer and more complex over the years. First, things can get really messed up if you write something wrong in your script, especially when it comes to file system navigation. Packages can be written to the wrong directories, installations will fail because local files will not be where expected, etc. Second, without error reporting you will not be aware of that screw up, or any failures regarding package installs, unless you manually test everything after the script runs. (Or check your output log if you keep one. Hint: keep one.
$command | tee -a installoutput.log) If you are installing only two packages, no error reporting is actually probably best because you can easily enough open those apps, test, and move on. This is because, depending on your error reporting strategy, you may find that if there is an error in the script that stops progress, you cannot simply re-run the script after fixing the problem — you may have to complete things manually (or by cutting and pasting the script from where it broke, but that requires copying into it all your local variables, etc.) This actually happened to me. Pkcon (best practice on a KDE Neon system) throws an error when attempting to install a package that is already installed. Now, this seems like a bad idea because, in reality, there is no error, the package is simply already installed. Apparently, pkcon doesn’t check for that, and it doubles down by erroring and suggesting the package might already exit. Thanks pkcon.
Anyway, despite (eventually) getting the script completely updated, adding new sources, removing packages that no longer exist or are already installed by default, etc, there were still surprising errors that borked an installation here and there. One was simply a GitHub repo timing out with a weird error about token overage or some nonsense. Manually rerunning the line of code worked with no problems, but it borked that install and required manual intervention.
What to do with third party packages and installers?
In most instances, when updating the script, I stopped including local packages in an assets folder and switched to downloading the packages “fresh” from GitHub or wherever the source came from using wget. This way we always get the latest package version. This only bit me twice – once because of the above-mentioned error (which I think was traffic related) and again when trying to get an automatic download from SourceForge, which doesn’t want to allow that sort of thing (because how can they throw ads into your eyes if they let a script just download files??)
I only have one manual application installation with Ardour. I decided to download the binary (which I pay for as a subscriber) and make it publically available on my home NAS and manually pull that file down after the script runs because it’s too large to host on GitHub and any other hosting solution requires $$. Why manually? Because the way Synology wants to make the file available is the most secure for my home network. I could poke a hole in the firewall, I guess, and leave myself open to haxx0rz. I could also put it on this site somewhere, but the pull speed is atrocious. (Thanks BlueHost.) I could also build it locally, but I’m already doing that for several big applications and the installation time is approaching thirty minutes. File this under “Fix For Next Time.”
What to do with passwords and account creation?
Another thing I have to do manually is set passwords for some local accounts. AFAIK, there is no way to include this in a script without the passwords living in plain text, which is a big no-no. Is there a solution? Not apparently after 10 minutes of Googling. Again, file under “FFNT.”
Problems remain despite my best efforts
Bugs, bugs, bugs. Yes, believe it or not, bugs and inconsistencies exist in OSS. One issue (not a bug, proper) I had was installing packages from KXStudio, their application suite with dependencies like JACK2 and a number of other pro audio libraries. In that process, one file
/etc/security/limits.d/audio.conf is installed. That file makes it possible for JACK to run “realtime.” Except this time, that file was written as
audio.conf.disabled. This is because a check is made during installation to see if writing that file will break the system as currently configured. Running a generic kernel is apparently one of those things, so the file is written, but with ‘disabled’ appended. This is, of course, written nowhere that I could logically find, but sure. Worse, starting JACK manually either with Cadence or QJackCTL works — at least it seems like everything is fine and there are no worries. Only starting Ardour and trying to create a session will reveal that JACK (which is already started??) can’t start, and the session can’t be created. Wild. The fix is easy, just rename the file to
audio.conf after installing your low-latency kernel. But finding that out is a drag. Also, installing the low-latency kernel first to avoid this step means a manual reboot or writing a script with sufficient logic that runs at startup to perform this reboot and then pick up at the appropriate place. Not going to happen. Why? Too much work for what might be a bug or might change the next time I have to do this. I have removed so many workarounds in my days I’d just rather not.
The other bug I encountered is currently (1/21/23) open and involves a problem with systemd and the creation of standard user folders like Documents and Music. Running
$xdg-user-dirs-update fixes the problem for the user that runs it, but that means everyone that logs in for the first time has to run that command. (Or, again, I write something to execute the command the first time everyone logs in — extraneous cruft that will be an unnecessary inefficiency once the bug is fixed.) This means our LDAP accounts will encounter this problem. Not a killer, but annoying. My current solution is to simply copy those folders into
/etc/skel and hope that fixes it. But while I’m doing that, why not add some custom wallpaper? Oh, and how about that “readme” that goes on the desktop with helpful tips and tricks? Gotta get those in place and get the configs copied too! Oh and… huh, will you look at the time? Classes started a week ago.
Change is blowing in the wind
Packages change, updates break things, blah blah blah. One reason I continue to avoid building “the perfect script” is that every time I have to re-image the labs, something new and unexpected pops up that requires rewriting the procedure anyway. That happens every time. No exceptions. There is no amount of script magic that prevents it. And that’s okay. I learn something new every time I do this, and figuring things out is kind of fun. I have a working script now, and an up-and-running-lab ready for work. That’s what matters.
If you are interested in getting notifications when I drop a new article or news item, hit the subscribe button, will ya? I promise nothing more often than twice a month! Oh, and I don’t sell your data or anything. Apple/Google/Meta/Microsoft already have it, 😛