Introducing Argo Rollouts v1.0

Jesse Suen
Argo Project
Published in
6 min readMay 27, 2021

--

Progressive Delivery done right!

The Argo maintainers are proud to introduce Argo Rollouts v1.0! Read on to understand the motivations and design for this project and our journey to v1.0. Please visit our releases page for instructions on installing the release.

By late 2019, Intuit was in the midst of transitioning hundreds of services from traditional VMs onto Kubernetes. One of the biggest gaps developers had faced with Kubernetes, was the loss of sophisticated deployment strategies such as blue-green and canary that were possible using Spinnaker, our previous deployment tool. Because many legacy services required updating all pods at once to ensure API version consistency for repeated requests, standard rolling updates simply would not work for these services. Some developers were used to staging preview stacks, testing the new version, then flipping the switch once testing was successful. Others were spoiled with Kayenta, which provided an integrated canary analysis tool that worked seamlessly with Spinnaker.

We soon came to the realization that a more advanced deployment controller specific to Kubernetes, would be necessary to provide the same level of advanced update strategies that were available pre-Kubernetes. And that is how Argo Rollouts was born, to fill-in the immediate gap of providing blue-green and canary deployments to Kubernetes.

While these new deployment strategies made it much safer to do deploys, automation was still lacking. The process of promoting services to production and analyzing metrics was still a manual process. This brought us to the next phase of Rollouts, progressive delivery.

Progressive delivery is the practice of incrementally exposing new versions of your application to an initially small subset of users, then gradually larger and larger subset, in order to mitigate the risk of negative impact (e.g. bugs).

This is ideally coupled with analysis of key business metrics (e.g. the four golden signals) so that promotions and rollbacks are fully automated and machine-driven. For more on progressive delivery and our approach, watch our KubeCon talk.

As we developed the progressive delivery features of Rollouts, we knew extensibility and flexibility would be crucial to its success. With thousands of services at Intuit, and even more from the community, there would be no one-size-fits-all solution in terms of metrics or traffic shaping patterns. So we designed Argo Rollouts to let developers choose their own analysis metrics, customize their canary steps, or even choose their own ingress or service mesh provider. This flexibility has paid for itself in full, allowing ourselves and the community to contribute key integration points such as the Prometheus, DataDog, NewRelic, Wavefront, Kayenta, Job, and custom Web metric providers:

With ingress and service meshes, Rollouts currently supports various traffic providers including Linkerd (SMI), Istio, AWS LoadBalancer, Ambassador, and NGINX.

Fast-forward two years and two KubeCon sessions later (1, 2), the Argo maintainers are proud to announce Argo Rollouts v1.0! While Rollouts has already been in production use for a few years by many of our users and has been powering critical flagship services including Intuit’s own QuickBooks and TurboTax, the fit and finish of the project has reached a new level of maturity. We’re happy to guide you through some of the features in the v1.0 release.

Rollouts Dashboard

One of the harder parts that comes with performing complex deployments such as canary and progressive delivery is observability for monitoring and understanding what is going on during the update. To make things more understandable for developers, a new rollout dashboard is available in the kubectl argo rollouts plugin. To start it, just run the dashboard command and visit the UI over http://localhost:3100/:

$ kubectl argo rollouts dashboard

At a glance, the dashboard presents the various rollouts in the namespace, the status of the update, and performs administrative actions such as restarting and promotion.

Diving into a specific rollout provides even more details about the rollout, such as previous revisions, canary steps, and more actions to roll back, abort, update images, etc...

While this view is great if you have local access to the namespace, it’s even possible to run and expose this dashboard via a Service/Ingress (manifests arriving shortly). Now that this dashboard is available, we’re thinking of ways to make this view also available in the Argo CD UI via extensions.

Richer Kubernetes Events and Prometheus Stats

Continuing down the theme of increased usability, the Kubernetes events that a Rollout emits have been revamped with more details about what is happening during an update. New events are broadcasted, and events now include revision and step information, along with more details about why the event occurred.

Additionally, all events will be emitted as Prometheus metrics. This makes it possible to visualize rollout events in an aggregated view in your Prometheus dashboards:

Enriching the events of a Rollout is just a starting point since the framework now paves the road for the upcoming Rollout notification support. In the next release, it will soon be possible to receive notifications (Slack, PagerDuty, etc…) for when a rollout pauses, aborts, completes, finishes a canary step, and many other events.

Workload Referencing

We often describe Argo Rollouts as a drop-in replacement for a Deployment, which it is. Although it is easy to start a brand new service using a Rollout object, one challenge when migrating to Rollouts has been the need to duplicate the entire pod template spec of the Deployment into the Rollout spec. Starting with v1.0, there is an alternative way for migration, called workload referencing.

The idea is that instead of copying and pasting the spec.template of the Deployment to the Rollout during the migration, you can simply reference the existing Deployment, like so:

This keeps a single definition/source of truth for the pod with the Deployment. But it also allows you to separate the spec of the update strategy from the pod template. Once your Rollout is scaled up and running, you can then scale down the Deployment to zero (since the pods will be managed by the Rollout).

If you are a kustomize user, you may be frustrated because until recently, the tool did not apply patches as expected to custom resources. Workload referencing helps solve this problem by allowing you to continue managing your pod spec patches against your existing Deployment objects normally while the Rollout only contains details about the update strategy.

Other Enhancements

Other notable features and fixes include (please check the release notes for a complete list):

  • Ambassador Edge Stack traffic shaping support
  • More options for canarying with Istio, using DestinationRules and cross-namespace VirtualServices
  • Kubectl plugin enhancements (linting, ability to wait for Rollout status and conditions)
  • Improvements to reduce chances of 500 routing errors

Special Thanks

A large number of people and companies contributed time and resources to this release and it wouldn’t have been possible without the support from the fantastic and rapidly growing Argo Rollouts Community! We especially like to call out the support from the following companies, without whom, this release wouldn’t have been possible: Bilibili, Bucketplace, CodeFresh, DataDog, Datawire, Dynatrace, Intuit, NewRelic, Onfido, Paypal, Quizlet, Salesforce, Shipt, Skillz, and Spotify.

Future Direction

We hope you enjoyed reading about the highlights of the v1.0 release! Future releases will support many more ingress controllers and metric providers, as well as leveraging even more advanced traffic shaping techniques such as header-based routing, or shadow trafficking. For more on what’s next, check out our roadmap. Each day we’re blown away by the continued momentum around the project and we’re excited to see what comes next. Cheers to safer application deliveries!

--

--

Co-founder and CTO at Akuity | Argo Project Leader with 6-year tenure | ex-Intuit | ex-Applatix | Running the Enterprise Company for Argo — https://akuity.io