Dynamic build matrix in GitHub Actions

Kirill Müller


I wanted to try out the new fromJSON() that allows dynamic build matrices in GitHub Actions for quite some time now. Today was the day.

GitHub Actions allows automating build and deployment processes (CI/CD), tightly integrated with GitHub. A build matrix is a way to define very similar workflows that differ only by configuration parameters. Usually, a build matrix is defined directly in the .yaml files together with the workflows. This blog post shows how to define these build matrices dynamically, so that the “source of truth” for the matrix definition is outside the .yaml file.

Photo by Markus Spiske

The configuration for a workflow is a YAML file that has a context and expression syntax with very few basic functions. Two very powerful functions are toJSON() and fromJSON():

  • toJSON() can capture pieces of the workflow configuration as JSON and pass it to your workflow code
  • fromJSON() allows injecting arbitrary configuration pieces created from JSON code

The basic setup comprises of two jobs: one that creates the workflow definition as JSON and stores it as output, and another dependent job that injects this output via fromJSON() into its matrix definition. A third job is defined for testing if outputs are passed correctly between jobs.

The original blog post contains a somewhat terse description. This blog post shows a walkthrough how I converted a static to a dynamic build matrix in the DBItest project.

Original matrix

In DBItest, we test compatibility of new or updated tests with backend packages. Each backend is run in a build matrix, which is defined as follows:

jobs:
  backend:
    strategy:
      fail-fast: false
      matrix:
        package:
          - duckdb
          - RSQLite
          - RMariaDB
          - RPostgres
          - RKazam

The relevant backends are defined in the Makefile, we want to get the list from there so that we can use a single source of truth.

This is a very simple build matrix, ideally suited for first experiments. The techniques shown here are applicable to build matrices of any complexity and size.

Derive and verify JSON

Our goal is to create the package: section from the above matrix in JSON format. To derive the JSON format, I use the sed stream editor, my beloved hammer that I use whenever I see a text transformation task in the shell:

echo '{ "package" : ['
## { "package" : [
sed -n "/^REVDEP *:= */ { s///; p }" revdep-dev/Makefile | sed 's/ /, /g' | xargs -n 1 echo | sed -r 's/^([^,]*)(,?)$/"\1"\2/'
## "RMariaDB",
## "RSQLite",
## "RPostgres",
## "RKazam",
## "duckdb"
echo "]}"
## ]}

This is not pretty, but still valid JSON when put together. We can prettify with jq ., later we will use jq -c . to condense to a single line.

(
  echo '{ "package" : ['
  sed -n "/^REVDEP *:= */ { s///; p }" revdep-dev/Makefile | sed 's/ /, /g' | xargs -n 1 echo | sed -r 's/^([^,]*)(,?)$/"\1"\2/'
  echo "]}"
) | jq .
{
  "package": [
    "RMariaDB",
    "RSQLite",
    "RPostgres",
    "RKazam",
    "duckdb"
  ]
}

We verify the YAML version by piping to json2yaml which can be installed with npm install json2yaml:

---
  package:
    - "RMariaDB"
    - "RSQLite"
    - "RPostgres"
    - "RKazam"
    - "duckdb"

These tools are preinstalled on the workers. This avoids time-consuming installation procedures in this first job that needs to be run before the main jobs can even start.1

Define job

Once we have derived the JSON, we’re ready to define a job that creates the matrix. This must be done in the same workflow file where the matrix is defined, ideally before the main job. The job runs on ubuntu-latest, and also must clone the repository. In the bash snippet, the $matrix variable contains the JSON. It is shown and pretty-printed before it is provided as output via echo ::set-output ....

jobs:
  matrix:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v2

      - id: set-matrix
        run: |
          matrix=$((
            echo '{ "package" : ['
            sed -n "/^REVDEP *:= */ { s///; p }" revdep-dev/Makefile | sed 's/ /, /g' | xargs -n 1 echo | sed -r 's/^([^,]*)(,?)$/"\1"\2/'
            echo " ]}"
          ) | jq -c .)
          echo $matrix
          echo $matrix | jq .
          echo "::set-output name=matrix::$matrix"

  backend:
    # Original workflow
    # ...

Verify output

Before plugging in the generated JSON into our build job, we add another check job to verify if the generated JSON is transported correctly across job boundaries. The needs: matrix declares that the job must wait before the first matrix job succeeds. The job’s output is queried via ${{ needs.matrix.outputs.matrix }}, the quotes ensure that bash processes this correctly. We install and use json2yaml to double-check what the YAML snippet looks like.

jobs:
  matrix:
    # job defined above

  check-matrix:
    runs-on: ubuntu-latest
    needs: matrix
    steps:
      - name: Install json2yaml
        run: |
          sudo npm install -g json2yaml

      - name: Check matrix definition
        run: |
          matrix='${{ needs.matrix.outputs.matrix }}'
          echo $matrix
          echo $matrix | jq .
          echo $matrix | json2yaml

  backend:
    # Original workflow
    # ...

Use output

Finally, we’re ready to use the generated JSON as a build matrix. The workflow now uses matrix: ${{fromJson(needs.matrix.outputs.matrix)}} instead of the hard-coded matrix:

jobs:
  matrix:
    # see above

  check-matrix:
    # see above

  backend:
    needs: matrix

    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix.outputs.matrix)}}

    # rest unchanged

This gives a workflow as shown in the image below. Click on the image to view the workflow on GitHub.

Final workflow with dynamic build matrix

Next steps

For R packages, I see two use case where dynamic matrices can be useful:

  • Testing if package checks pass if one suggested package is not installed. Ideally, we remove suggested packages one by one and run in parallel.
  • Testing reverse dependencies. For some packages we may hit the limit of 256 jobs per workflow run. Allocating downstream packages to workers, minimizing the number of packages to be installed on each worker, sounds like an interesting optimization problem.

What are your use cases for dynamic build matrices? Drop us a line at mail@cynkra.com!

Caveats

Even with this simple build matrix it took more time than I would have hoped to get the bits and pieces right. Quoting is hard. Setting up the check-matrix job really saves time, I wish I had done this from the start.

Both fromJson() and fromJSON() appear to work. The internal functions from the expression syntax seem to be case-insensitive throughout.

For older versions, jq needs to be called as jq . to act as pretty-printer. For newer versions this can be omitted.

Today I also learned that workflows can be temporarily disabled. This is useful in situations where you experiment with a workflow and want to avoid running other workflows for every test.


  1. You can use any tool or ecosystem you are familiar with to come up with the JSON definition. To avoid long installation times, use a specific image for your step via uses: docker://... or implement a container action, also possible in the same repository





Setting up a load-balanced Jitsi Meet instance

Patrick Schratz


Jitsi Meet is a self-hosted Free and Open-Source Software (FOSS) video conferencing solution. During the recent COVID-19 pandemic the project became quite popular and many companies decided to host their own Jitsi instance.

Photo by Chuttersnap

There are many different ways to install and run Jitsi on a machine. A popular choice in the DevOps space is to use Docker via docker-compose, which was the method used in our scenario.

While at cynkra we have been running our own Jitsi instance quite happily for some months, there was a slightly challenging task coming up: hosting a virtual meeting for approximately 100 participants.

The Challenge

cynkra actively supports the local Zurich R User Group. For one of their recent meetings, about 100 people RSVP’ed.

When browsing the load capabilities of a single Jitsi instance, one finds that the stock setup gets into trouble starting at around 35 people and will go down at around 70 people. The limiting factor is said to be the “videobridge”. One solution is to add a second videobridge to the Jitsi instance. Jitsi can then distribute the load and should be able to host more than 100 people in a meeting.

The best approach to do this is to deploy the second videobridge on a new instance to avoid running into CPU limitations on the main machine. While there is a guide in the Jitsi Wiki and a video about it, many people struggle (1, 2) to get this set up successfully.

Hence, we thought it would be valuable to take another, hopefully simple and understandable, stab at explaining this task to the community.

Load-balancing Jitsi Meet

In the following we will denote the main machine which Jitsi runs on as MAIN. The second machine, which will only host a standalone videobridge, will be named BRIDGE.

  1. The first step is to create a working installation on MAIN, following the official docker guide from the Jitsi developers. There is no need to use Docker. An installation on the host system will also work.

    At this point we assume that you already have installed Jitsi with SSL support at a fictitious domain.

  2. To be able to connect to the XMPP server (managed by prosody) on MAIN from BRIDGE (details in point 4 below), port 5222 needs to be exported to the public. This requires adding

    ports:
      - "5222:5222"
    

    to the prosody section in docker-compose.yml and ensuring that the port is opened in the firewall (ufw allow 5222).

  3. On BRIDGE, start with the same .env and docker-compose.yml as MAIN.

    In docker-compose.yml, remove all services besides jvb. The videobridge will later connect to all services on MAIN.

    Make sure that JVB_AUTH_USER and JVB_AUTH_PASSWORD in .env are the same as on MAIN, otherwise the authentication will fail.

  4. On BRIDGE in .env change XMPP_SERVER=xmpp.<DOMAIN> to XMPP_SERVER=<DOMAIN>.

  5. Run docker-compose up and observe what happens. The videobridge should successfully connect to <DOMAIN>. On MAIN, in docker logs jitsi_jicofo_1, an entry should appear denoting that a new videobridge was successfully connected.

    It looks like

    Jicofo 2020-10-23 19:01:52.173 INFO: [29] org.jitsi.jicofo.bridge.BridgeSelector.log() Added new videobridge: Bridge[jid=jvbbrewery@internal-muc.<DOMAIN>/d789de303e9b, relayId=null, region=null, stress=0.00]
    

    If you have another videobridge running on MAIN, you should see that the identifier of the new videobridge (here d789de303e9b) is different to your main videobridge identifier. On BRIDGE the logs should show something like

    INFO: Joined MUC: jvbbrewery@internal-muc.<DOMAIN>
    INFO: Performed a successful health check in PT0S. Sticky failure: false
    

To test that the external videobridge is active, one can disable the main videobridge (docker stop jitsi_jvb_1) and try to enable the camera in a new meeting.

Troubleshooting and Tips

  • If you see something like SASLError using SCRAM-SHA-1: not-authorized this indicates that the JVB_AUTH_PASSWORD and/or JVB_AUTH_USER on BRIDGE are incorrect.

  • If you change something in .env of MAIN you need to delete all config folders before running docker-compose up again. Otherwise changes won’t be picked up even when force destroying the containers.

  • Do not run gen-passwords.sh multiple times as JVB_AUTH_PASSWORD and BRIDGE will not be able to connect anymore.

  • Unrelated to the content above: if you want to create a user manually for your instance, the following command might be helpful:

    docker exec jitsi_prosody_1 prosodyctl --config /config/prosody.cfg.lua register <USER> <DOMAIN> "<PASSWORD>"
    




DevOps Expert (f/m/d, 60-100%)

cynkra team


We are on the lookout for a DevOps System Engineer. You are familiar with Linux, Docker, Git, CI/CD and Ansible. We offer interesting projects around the R ecosystem and a lot of freedom. Some on-site time will be required, but a lot of remote work is possible as well. This position is especially attractive if you are interested in data science and want to learn more about R.

Photo by James Pond

Your role

  • Responsibility for enhancing and maintaining our hosted RStudio service products and our internal infrastructure

  • Communicate with IT departments of our clients

  • Set up and maintain infrastructure as code

What we look for

  • Very good knowledge of Linux, Docker, Git, CI/CD and Ansible.

  • 60%-100% commitment

  • Ability and desire to learn and improve on the job

  • Very good command of written and spoken German

  • Good working knowledge of written and spoken English

  • An interest in R and data science in general is a plus.

What we offer

  • An open source friendly environment that encouranges community contribution

  • Interesting projects around consulting and open source software development

  • Offices in Zurich: at Stauffacher and near ETH Hönggerberg

  • Flexible working hours with the possibility to work from home

How to apply

Please submit your application via mail@cynkra.com. Get in touch with us if you have further questions.

Who we are

cynkra is a Zurich-based data consulting company with a strong focus on R. We use R and the tidyverse in the vast majority of our projects. We are an RStudio Full Certified Partner.

We support businessess and organizations by helping them picking the right tools, implementing solutions, training and code review. We are enthusiastic about open source software and contribute to a large number of R packages. Learn more at www.cynkra.com.









More posts

Dynamic build matrix in GitHub Actions

Kirill Müller

Setting up a load-balanced Jitsi Meet instance

Patrick Schratz

DevOps Expert (f/m/d, 60-100%)

cynkra team

Maintaining multiple identities with Git

Kirill Müller

Relational data models in R

Angel D'az, Kirill Müller

tempdisagg: converting quarterly time series to daily

Christoph Sax

tsbox 0.2: supporting additional time series classes

Christoph Sax

DevOps System Engineer (40-60%)

cynkra team

Introducing dm: easy juggling of tables and relations

Balthasar Sager

tsbox 0.1: class-agnostic time series

Christoph Sax

Data Scientist/Engineer (40-100%)

cynkra team

Time series of the world, unite!

Christoph Sax

Done “Establishing DBI”!?

Kirill Müller


Other blogs

R-bloggers