Artifacts Management in Container-Native Workflows (Part 1)

--

Part 2 of the series is also ready!

Argo empowers users to define and run container-native workflows on Kubernetes. In Argo, each step of the workflow is a container (or more if you use Docker-in-Docker) and steps can be a sequential or parallel. Argo workflow engine traverses through the graph (which can be seen as a variation of a directed acyclic graph (DAG)) of a workflow, and if one step completes successfully, the next one is started.

Here is an example of a container-native workflow in Argo:

Workflow that each step is a container

This workflow contains both sequential (S1, S2, S3.*, S4) and parallel steps (S3.1, S3.2, S3.3) and each step might run the same or different container images. Result of an individual container step determines the outcome of the whole workflow.

With container-native workflows, you can achieve amazing things like build, test and deploy your apps with the single click of a button and do so consistently given that each step is containerized. This is all great except containers are stateless, in the sense that other than the exit code of a container step, there is no context saved after a container finishes running. To make a container-native workflow more useful, it would be highly desirable to achieve the following:

  1. Provide a way to archive the artifacts (binaries, log files, etc.) generated from each step.
  2. Allow later steps to use the artifacts generated from the earlier steps.
  3. Allow reuse of the artifacts in subsequent workflow executions.

First try…using persistent volumes

One could use persistent volumes to run stateful containerized applications. And, this is indeed what we first looked into. In fact, Kubernetes provides a rich set of solutions for managing volumes. If we manage to mount the correct volumes for each container step, we could easily solve storing and sharing artifacts produced in a workflow. When running Argo on AWS, persistent volumes map to the EBS volumes.

Using AWS EBS volume to save artifacts

In this approach, from time T1 to time T4, we will have a EBS volume mounted on the Kubernetes node under host path /data so that the containers running on that host can save artifacts into the container path /src.

However, we quickly realize this solution would not fly. It will not work for the parallel steps! The parallel steps of a workflow could be scheduled on multiple nodes in a Kubernetes cluster and given that an EBS volume could only be mounted on one host at any given time that means these parallel steps will not all have access to the artifacts generated from the previous step(s). It may be possible to force schedule all the parallel steps on a single node in the cluster, but that limits scalability and performance and may not be even possible if the node cannot accommodate all the parallel steps because of resource constraints. In our example, step S3.1, S3.2, and S3.3 can be scheduled to run on different nodes at the same time but the EBS volume (vol-1234567) can not be mounted onto more than one host.

The limitations of using an EBS volume for storing artifacts made us choose among the following possible options (not an exhaustive list, but ones we considered):

  1. Do not support running parallel steps (and always run in sequential if artifacts are required)
  2. Use something like a network filesystem (such as EFS) that supports multi-host mounting
  3. Use an object storage (such as S3) that allows concurrent access

Option 1 is clearly not acceptable. Option 2 (NFS) can work given that it allows concurrent access. Option 3 (S3) also works as multiple nodes in a cluster can concurrently access an S3 bucket. However, option 3 is more cost-effective than option 2 as S3 object storage is about 10x cheaper compared to EFS (NFS) on AWS.

Also, S3 object storage also benefits from ubiquity. Most cloud providers provide an S3 compliant object storage and one can use a solution like Minio for on-premises. Given this, we picked S3 as the backing store for artifacts in Argo.

Storing artifacts using Object Storage (S3)

Here is a flow of artifacts in Argo using an S3 object storage. We download the required artifacts from S3 before the container starts and upload resulting artifacts to S3 after the container finishes.

Using S3 to archive artifacts

The idea is that instead of only running the user-defined container of the step, we place a Init Container and Post Container in the same pod as the user-defined container. Init container will download all the required artifacts from S3, and Post container will upload all the needed artifacts to S3. Since these three containers are within the same Kubernetes pod, they will be on a single host. We then define an emptyDir volume for the pod so that all three containers can share the artifact directory. As the three containers run in sequence (Init -> Build -> Post) in a pod, we are able to run as many pods in parallel as needed.

Summary

This is an overview of the challenges we encounter and how we come to the artifacts management solution at Argo. In the next part of the blog, we will talk more in depth about:

  1. Details about how exactly we align the containers in the Pod (so that user does not have to modify their existing image for artifacts).
  2. How we handle live or done logs of a container step which is essentially another type of artifact.

In part 2, I will cover in-depth the implementation details of the solution.

Tianhe Zhang is a member of the technical staff at Applatix, a startup committed to helping users realize the value of containers and Kubernetes in their day-to-day work.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response