The Challenge

We have a saying here at the Technical Operations team of Insight TV: “Work Smarter, not Harder” This means adopting a CI/CD approach to everything we do to see how we can use technology to optimise our operations and processes to support business initiatives.

As the business grew (and continues to grow) there was increasing demand and pressure on the Media Ops team to deliver our content in various formats and flavours to our affiliates for content distribution deals as well as to our own digital platform. We quickly knew we had to improve our on-premise infrastructure here in Amsterdam and combine it with Cloud services to reduce transfers over the public internet and ensure we have control of the entire lifecycle of our asset e.g. a typical TX Master asset is close to 200GB in size.

Adding flexibility, capacity and throughput to our infrastructure is one thing, but managing these new resources and optimising our return on investment is another thing. We quickly realised that running multiple instances of Bash and Python scripts along with a multitude of watchfolders was not a sustainable and scalable solution.

The Solution

Enter Apache Airflow. Initially created in 2014 by Airbnb, open-sourced in 2015 and then joining the Apache Software Foundation’s Incubator Program, It is now an Apache Top-Level Project.

Image courtesy of Apache Airflow.

Whilst hunting for an Orchestration tool to help us tie all of our microservices and disparate resources together in a ‘single pane of glass’, I stumbled upon Apache Airflow.

The advantages were immediately obvious:

  • Python based using standard frameworks.
  • Familiar technology stack i.e. Flask, PostgreSQL, Celery running on Ubuntu.
  • Easy to scale vertically and horizontally.
  • Existing hooks to Cloud Services.
  • Operator friendly UI.
  • Easy to chain tasks and set dependencies.
  • Open Source.
  • Actively developed with a huge user-base including some heavyweight tech companies.
  • Ability to pass metadata between tasks using Xcoms.

Using the Configuration-as-Code principle, workflows are authored as Directed Acyclic Graphics (DAGs) which define tasks and their running order/dependencies and then assigned to Operators. These DAGs are then run by a Scheduler and executed by a Worker or cluster of Workers.

Suitability for Media

Although popular with Data Scientists for moving data and executing ETL pipelines, there was little (to no) information on its suitability to execute long running, resource intensive tasks typical with media related workflows.

These are some of the top features we found that made it suitable for our needs:

  • Easily extensible – Utilise the full flexibility of Python (and its associated modules within a task).
  • Manage hardware resources – The use of resource pools and queues allowed us to manage (and load balance) resource intensive tasks e.g. CPU resources for transcode tasks or Internet connectivity resources for inbound/outbound file transfers.
  • Run Bash commands using the BashOperator.
  • Easy integration with media specific resources (e.g. transcode, file QC, HDR/SDR up/down/cross-conversion, rewrapping etc.) without requiring a 3rd party vendor to develop plug-ins.
  • Self-healing and decision-making tasks – Failed tasks automatically retry, a task is capable of making decisions and driving the direction of downstream tasks based on the results or output from upstream tasks.
  • Use Variables to define fixed parameters for a DAG. Each variable is in JSON format making it easily readable and can be manipulated by the operators e.g. setting a maximum nit value for a HDR upconversion is as simple as updating the appropriate parameter in the Web UI.
  • Consolidation and easy-access to resource logs through the UI.

Example Workflow

watch_for_file

A basic task that polls a designated folder on our NAS. We use regex to ensure only the right file gets triggered e.g. based on file-naming convention or extension. This task also pushes the triggered filename to a Xcom for easy access by downstream tasks and moves the file to a working directory.

trigger_next

Creates a new DAG run ready to pick up the next file.

get_transfer_characteristics

Call MediaInfo CLI to lookup the media file’s Transfer Characteristics and store the value in a Xcom.

hdr_sdr_detect

Determine if the returned Transfer Characteristic is HDR HLG or BT.709 (SDR) and push the relevant transcode profiles downstream.

submit_elemental_job 

Submit a transcode job to our AWS Elemental Server.

get_aws_job_id

Lookup the Elemental Job ID from the response returned.

poll_aws_job_id

Poll the Elemental Server to lookup the status of the previously submitted job. The task will complete once the transcode job completes or will fail if the transcode job fails.

convert_to_xavcmxf

Call Nablet MediaEngine to encode our final XAVC-I Class 300 UHD master file.

remove_tmp_yuvfile

Delete the temporary v210 raw file generated by the Elemental.

move_to_done

Move the source file from the working directory to the done directory.

Lessons Learnt

  • Ensure your Airflow server has sufficient resources (CPU and RAM) to run all required services.
  • Use sensors in reschedule mode to prevent DAG concurrency contention.
  • Ensure all of your workers use the exact same versions of all components and modules including Airflow.
  • Consolidate your DAG variables into a single JSON variable to eliminate multiple DB calls.
  • Ensure your test and staging environments match production exactly otherwise you will have inconsistent results. DAGs can sometime be challenging to troubleshoot.

Current Integration Points

Airflow now provides us with a way to manage and integrate the following services and resources together:

File Storage/Transfer

  • Dell EMC Isilon
  • Synology NAS
  • AWS S3
  • Dropbox
  • Aspera

Transcode/File Processing

  • AWS Elemental Server
  • Nablet Media Engine
  • Technicolor ITM
  • Baton QC
  • MediaInfo
  • FFMpeg
  • Forensic Watermarking

Push Notifications

  • MS Teams

Conclusion

This is just the tip of the iceberg for us as we continue to build new workflow pipelines to ensure we hand-off labour intensive, repetitive tasks to machines to allow us to be more creative and innovative. Stay tuned for more technical posts in the future!

Gavin Ho

Gavin Ho

Gavin is Insight TV's Technical Director. He manages the technical direction and makes sure its approaches are in line with the latest industry developments.