Orchestrating lecture recordings during COVID

Building an automated cloud-based system to record university lectures

January 16, 2022 post

automationsysadmincloudinfrastructure-as-codedevops

During COVID, all lectures at my university moved to Microsoft Teams. Most professors didn't bother recording them, so a lot of students would just sit there with OBS running to capture their screen. Works, but it's kind of a waste of time when you're doing it for multiple lectures every week.

A friend had the idea to automate the whole thing: spin up a cloud VM, have it join the Teams meeting, record everything, then tear down the VM. He asked if I wanted to help build it. Sounded like a good way to learn some DevOps stuff I'd been reading about but never actually used, so I said yes.

The system we ended up with would automatically fetch lecture schedules, provision VMs on Hetzner Cloud, join Teams meetings through browser automation, record the lectures, and clean everything up. All without touching anything manually.

How it works

The core of the operation was roboEdu.sh, a bash script that acted as the conductor. It would query the University of Bologna's public API to get the lecture schedule, then spawn subprocesses to handle each class. About 10 minutes before a lecture was due to start, the script would kick off the infrastructure provisioning. We chose Hetzner Cloud for the backend because it was affordable and had a straightforward API. Terraform would spin up a fresh VM instance, and then Ansible would take over to configure it, installing Chromium, ffmpeg, Node.js, and moving our scripts into place.

Getting into the actual Teams meeting was the trickiest part. We used Puppeteer to drive a headless Chromium instance. It had to handle the login process using credentials stored in a secrets file, navigate the join flow, and deal with all those random UI popups and error messages Teams likes to throw at you. We even set it up to take screenshots every minute so we could debug what went wrong when a recording failed.

Once inside the meeting, ffmpeg did the heavy lifting, capturing the audio and video streams. We encoded everything in H.265 to keep the file sizes manageable, though that meant we had to use VLC or MPV to watch them later. The system ran periodic health checks to ensure the recording was still active. After the lecture ended, there was a buffer period, and then the recording was downloaded to our local machine. Afterwards, the script would then immediately destroy the VM to stop the billing clock.

Getting it running

Setting it up required a bit of tooling, such as jq, Terraform, and Ansible. Configuration was pretty minimal, just a couple of YAML files for the university credentials and the cloud API key.

Authentication (secrets/unibo_login.yml):

username: "nome.cognome@studio.unibo.it"
password: "la_mia_password"

Cloud API Key (secrets/hcloud_key):

your_hetzner_cloud_token_here

We usually ran it via a cron job to make it truly "set and forget":

#!/bin/bash
mkdir -p /var/log/roboEdu/
/path/to/roboEdu.sh <course_name> <year> >> /var/log/roboEdu/<course>-<year>-$(date '+%y%m%d').log 2>&1

Telegram Integration

Later on, another student contributed a Telegram integration using Telethon, which allowed the system to push recordings directly to a chat. It was a nice addition, though setting it up was a bit more involved than the rest of the system.

You had to create an app on the Telegram developer portal to get an API ID and hash, then configure a materie.txt file to map course IDs to readable tags. After generating a session file by verifying your phone number with a python script, you could find the chat ID of your target group and pass it to the main script with a -T flag. It turned the system into a full pipeline from live lecture to a file on your phone.

Impact

The system recorded lectures during the pandemic for personal study purposes. More importantly for me, it was my first real DevOps project and I learned way more than I expected about infrastructure orchestration and browser automation.

The code is available on GitHub if you're curious about the implementation details.