How Sendler works

alex-profile-picture

Alex Loiko, Co-Founder & Cloud Architect at Sendler

Written on: 2024-06-10

Have you ever wondered how Sendler produces a video like this?

In the linked video, I provide a 90-second explanation, and this blog post serves as a more detailed guide.

How does Sendler record your videos?

We don't "let the AI generate them" (yet), and we don't have a mobile phone farm like this one

mobile-farm

Instead, like many other SaaS startups, we rent a fleet of servers from Amazon Web Services. These servers look like this:

aws-server

Chrome browsers

We run Chrome browsers on these AWS servers and tell the browser to scroll websites up and down.

There's a challenge: AWS servers don't have graphical displays, which means typical screen recording software won't work since they require a physical screen. Moreover, web browsers are graphical programs and won't start without a display.

Xvfb

Fortunately, there's a niche Linux tool called Xvfb (X Virtual Framebuffer) that allows graphical applications to run without a real screen. Combined with FFmpeg, it enables us to record the browser's display.

If you're using Linux or Windows with the Windows Subsystem for Linux, you can follow along and test the commands Sendler uses. Install Xvfb with:

sudo apt install Xvfb -yXvfb :99 -screen 0 1280x720x20 &

This creates a virtual display with a screen resolution of 1280x720 and a color depth of 20. You can then start Chrome in this virtual display:

DISPLAY=:99 google-chrome https://sendler.ai

To record a video of this display, use:

ffmpeg -video_size 1280x720 -f x11grab -i :99.0 chrome_rec.mp4

This command records the 1280x720 region of the virtual screen and saves it as chrome_rec.mp4. See the video above for a demo.

Fun fact: initially, we struggled with an on-screen mouse cursor. Our first solution was to move it off-screen, only to discover there are at least two cursors: one invisible, which Chrome uses for hover events, and the visible system cursor.

Scrolling

We use browser automation tools to simulate natural-looking scrolling. Our algorithm has evolved through multiple versions, now incorporating various randomizations and improvements for more realistic scrolling.

AWS servers

Rendering web pages is compute-intensive. We found that the default server types weren't sufficient, especially when running multiple recordings simultaneously. We had to optimize our program and upgrade to more expensive, compute-optimized servers.

Handling graphical effects, like those generated by WebGL, adds another layer of complexity. Configuring AWS servers to render these effects faithfully while remaining cost-effective has been a significant challenge, though fortunately, these tasks are not frequent yet.

FFmpeg

FFmpeg is not just for recording screencasts; it's a versatile tool for a variety of video processing tasks. We use it for merging recorded website videos with user uploads, cropping, generating GIFs, creating circular videos, rescaling, encoding, and decoding.

example-gif

FFmpeg's filter graphs - a set of inputs, outputs, and processing steps - can seem daunting at first but become manageable with practice. Here's a complex example that demonstrates its power, processing input.MOV with a transparency mask through various steps to create output.MOV. The example generates the round distorted video effect above:

ffmpeg -i input.MOV -loop 1 -i snapmask.png -filter_complex "\[0:v]split[a][b];\[1:v]alphaextract, scale=1080:1080[mask];\[a]scale=1080:1080 [ascaled];\[ascaled][mask]alphamerge[masked];\[b]crop=946.56:532:70.72:278, boxblur=10:5,scale=1920:1080[background];\[background][masked]overlay=420:0"\-c:a copy output.MOV

Without Chrome, Xvfb, X11, and FFmpeg, Sendler's video recording backend would look very different.

This was the pilot devblog post, marking the beginning of what we hope will become a series of monthly updates from the Sendler development team.