Back to Blog

Orchestrating lecture recordings during COVID

Building an automated cloud-based system to record university lectures

During COVID, all lectures at my university moved to Microsoft Teams. Most professors didn't record them, so students who missed a class had to rely on screen-capture recordings made by whoever attended, not exactly practical.

A friend had the idea to automate the recording, rent a VPS, have it join the Teams meeting, record everything, then tear down the VM. He asked if I wanted to help build it and I was eager to learn, so I said yes.

The system we ended up with would automatically fetch lecture schedules, provision VMs on Hetzner Cloud, join Teams meetings through browser automation, record the lectures, and clean everything up.

How it works

The core of the operation was roboEdu.sh, a bash script that acted as the conductor. It would query the University of Bologna's public API to get the lecture schedule, then spawn subprocesses to handle each class. About 10 minutes before a lecture was due to start, the script would kick off the infrastructure provisioning. We chose Hetzner because it was cheap and worked well. Terraform would spin up a new VM instance, and then Ansible would take over to configure it, installing Chromium, ffmpeg, Node.js, and moving our scripts into place.

Getting into the actual Teams meeting was the trickiest part. We used Puppeteer to drive a headless Chromium instance. It had to handle the login process using credentials stored in a secrets file, navigate the join flow, and deal with all those random UI popups and error messages Teams likes to throw at you. We even set it up to take screenshots every minute so we could debug what went wrong when a recording failed.

Once inside the meeting, ffmpeg captured the audio and video streams. We encoded everything in H.265 to keep the file sizes manageable (we were on a tight budget), which meant we had to use VLC or MPV to watch them later. The system ran periodic health checks to ensure the recording was still active. After the lecture ended, there was a short buffer period before the recording was downloaded to our local machine. The script then immediately destroyed the VM to stop the billing clock.

Getting it running

Setting it up required a bit of tooling: jq, Terraform, and Ansible. Configuration was minimal — two secrets files, one for university credentials and one for the cloud API key.

secrets/unibo_login.yml for university credentials:

username: "nome.cognome@studio.unibo.it"
password: "la_mia_password"

secrets/hcloud_key for Hetzner Cloud API token:

your_hetzner_cloud_token_here

We usually ran it via a cron job:

#!/bin/bash
mkdir -p /var/log/roboEdu/
/path/to/roboEdu.sh <course_name> <year> >> /var/log/roboEdu/<course>-<year>-$(date '+%y%m%d').log 2>&1

Telegram Integration

Later on, another student contributed a Telegram integration using Telethon, which allowed the system to push recordings directly to a chat.

You had to create an app on the Telegram developer portal to get an API ID and hash, then configure a materie.txt file to map course IDs to readable tags. After generating a session file by verifying your phone number with a python script, you could find the chat ID of your target group and pass it to the main script with a -T flag.

Impact

The system recorded lectures during the pandemic for personal study purposes. More importantly for me, it was my first real DevOps project and I learned way more than I expected about infrastructure orchestration and browser automation.

The code is available on GitHub if you're curious about the implementation details.