facebook noscript

Securing GitOps Deployments in AWS EKS

April 28, 2020
engineering-default

As every engineer knows, software advancements often come with novel risks, regardless of improvements in security. Every new technology that aims to ease the ever-increasing demands of IT operations teams also brings new security challenges along with it.

For example, the invention of the Secure Shell (SSH) cryptographic network protocol was the canonical breakthrough in the operations field, finally enabling the easy remote management of computers. For years, we have taken the notion of SSH protocol security for granted, but SSH should still be used with some portion of caution[1].

The container explosion of the 2010s put a new set of exciting tools into the hands of IT professionals. More and more IT organizations all over the world use Docker, or some equivalent, to manage their software life cycles, continuous integration and continuous development (CI/CD). Kubernetes deployment has become a de facto standard for building resilient infrastructures and lowering cloud computing costs. VGS is not an exception, and we use Docker and Amazon Elastic Kubernete Services (AWS EKS) to push Kubernetes clusters extensively to tailor our operational experience and create a seamless CI/CD pipeline.

When managing Kubernetes clusters deployment and continuous development in an AWS EKS architecture, there are several security challenges that operations teams need to overcome. This focus of this blog post is how we at VGS approach access authorization in a Kubernetes cluster. A typical day in our systems may include thousands of container launches in various clusters, and most of these launches are automated from Git as a source of truth, using GitOps as a pipeline for deployment. Using its push and pull requests for continuous integration is an efficient and secure way to deploy clusters.

kubernets-vsg

Even though GitOps conveniently allows for rapid changes in live infrastructure and cluster deployments through an effortless CI/CD pipeline, there is an inherent risk involved in executing GitOps: the possibility of deploying an unwanted image to a live cluster. For this reason, among many others, securing GitOps push and pull requests then becomes a priority to ensure the security of the entire pipeline.

In many use cases, an unwanted image in a cluster or clusters does not mean some specifically crafted rootkit. It may be a container build with a vulnerable version [2] of a framework that wasn’t detected during the development cycles while using Git – as developers rarely read CVE feeds when monitoring their application.

It may also be a local test image with some stub data, which - because of small typo - can easily replace a production dataset after a maligned commit.

Another possibility is that somebody accidentally deployed the newest version of your favorite database while using Git to a cluster, while an application layer has not yet been changed to address a backward incompatibility inside.

Working with Kubernetes clusters in a secure way should be impossible if the application ecosystem was not addressing unwanted image risk and security.

This fact was noticed even before Kubernetes was born: Google’s Borg, a Kubernetes predecessor, encompasses a Binary Authorization [3] system to address just that. Conceptually unchanged, this system has migrated into the open-source world, incarnating as the Grafeas [4] application.

Today, Grafeas powers the Google Cloud Container Analysis service, which provides a binary authorization toolkit for images running in Google Cloud, and Kritis [5], which is an open-source add-on installable on any modern Kubernetes cluster or application.

OK, but what about fellow Kubernetes AWS EKS users? While there is no software as a service (SaaS) offering for Grafeas on EKS, it is easy to do yourself as a standalone deployment for applications, using either PostgreSQL or an embedded database, as it provides Helm Charts [6] to do so.

In order to install Kritis using the stock Helm chart [7], you need to ensure that the EKS Control Plane can communicate with the worker nodes 443 port, because Kritis works as an admission webhook, and is deployed on worker nodes – it might be not possible with some firewall configurations.

Kubectls-vgs

Kritis uses image metadata, served by Grafeas, to decide if an image is eligible to run in a cluster, and is configurable with policies expressed as Kubernetes native objects..

apiVersion: kritis.grafeas.io/v1beta1
kind: ImageSecurityPolicy
metadata:
  name: my-isp
  namespace: default
spec:
  attestationAuthorityName: kritis-authority
  privateKeySecretName: kritis-authority-key
  imageAllowlist:
  - gcr.io/my/image
  packageVulnerabilityRequirements:
maximumSeverity: HIGH # LOW|MEDIUM|HIGH|CRITICAL|BLOCK_ALL|ALLOW_ALL
maximumFixUnavailableSeverity: ALLOW_ALL # LOW|MEDIUM|HIGH|CRITICAL|BLOCK_ALL|ALLOW_ALL
allowlistCVEs:
  - providers/goog-vulnz/notes/CVE-2017-1000082
  - providers/goog-vulnz/notes/CVE-2017-1000081

Before setting up such a Kubernetes security policy, it is necessary to provide automation for filling up Grafeas storage with metadata, like vulnerability or signature. One may use applications like Shopify’s voucher [8] project for this process, or create custom scripts, as the Grafeas API is easily script-able and available as an OpenAPI specification [9].

One significant caveat coming with Binary Authorization, not necessarily specific to EKS, is that Kritis and Grafeas work exclusively with digest-styled [10] image tags, like:

quay.io/verygoodsecurity/software@sha256:5ff9720a4b....

An open issue exists [11] in the Kritis source repository with a request for work around this, because most modern Continuous Delivery systems do not support that image tag style – and FluxCD, one we use at VGS, does not [12] support it either. It requires quay.io/verygoodsecurity/software:1.0.0 style to work.

As a workaround for this, we have implemented a reverse proxy service for Kritis, which runs as a sidecar in Kritis deployment and provides tag resolution functionality, changing cluster deployment definitions in flight so that tags are replaced with respective RepoDigest.

Don't miss the next Developer Office Hours with our CTO

Join Us

Here is an example sidecar deployment (do not forget to reconfigure your Kritis to listen to 8443 port before running it):

 - name: {{ .Values.imgproxy.image.name }}
    tty: true
    image: "{{ .Values.imgproxy.repository }}{{ .Values.imgproxy.image.image }}:{{ .Values.imgproxy.image.tag }}"
    args: ["--tls-cert-file=/var/sidecar/tls.crt",
           "--tls-key-file=/var/sidecar/tls.key",
           "--client-cert-file=/var/sidecar-client/client-cert.crt",
           "--client-ca-cert-file=/var/sidecar-client/client-ca.crt",
           "--client-key-file=/var/sidecar-client/client-key.key",
           "--quay-organization=verygoodsecurity",
           "--quay-token-file=/var/token/quay",
           "--upstream-uri=localhost:8443",
           "--port=443"]
    volumeMounts:
      - mountPath: /var/sidecar-client
        name: client-tls
      - mountPath: /var/sidecar
        name: reverse-tls
      - mountPath: /var/token
        name: repository-token
      - mountPath: /var/sidecar
        name: ca-cert
    ports:
      - name: https
        containerPort: 443
        protocol: TCP

Because RepoDigest is effectively an immutable cryptographic hash of image content in terms of EKS security, messing with tags in cluster deployment with Kubernetes on AWS won’t help an attacker to bypass the Kubernetes security assurances of Binary Authorization – only a set of authorized RepoDigest will be allowed for deployment by Kritis.

Invoke git clone https://github.com/verygood-ops/kritis-reverse-proxy && docker build to get your Kritis image tag resolver proxy built locally.

Feel free to submit your pull requests to add more functionality, like, for example, support for resolving tags into RepoDigests against repositories other than Quay.

Resources:

  1. http://joeyh.name/blog/entry/ssh_port_forwarding/
  2. https://www.cisecurity.org/advisory/a-vulnerability-in-apache-struts-could-allow-for-remote-code-execution_2018-093/
  3. https://cloud.google.com/security/binary-authorization-for-borg
  4. https://grafeas.io/
  5. https://github.com/grafeas/kritis
  6. https://github.com/grafeas/grafeas/tree/master/grafeas-charts
  7. https://github.com/grafeas/kritis/tree/master/kritis-charts
  8. https://github.com/Shopify/voucher
  9. https://petstore.swagger.io/?url=https://raw.githubusercontent.com/grafeas/grafeas/master/proto/v1beta1/swagger/grafeas.swagger.json
  10. https://success.docker.com/article/images-tagging-vs-digests
  11. https://github.com/grafeas/kritis/issues/351
  12. https://github.com/fluxcd/flux/issues/885
Maksym Kulish Maksym Kulish

Infrastructure Engineer at VGS

Share

You Might also be interested in...

compliance-default

Email And PCI Compliance: How to Stay Secure & Compliant

Marshall Jones May 5, 2020

compliance-with-iso-proxy-is008583

Achieving PCI Compliance with ISO8583

Marshall Jones March 20, 2020

pci-for-small-businesses

PCI Compliance for Small Businesses

Stefan Slattery March 13, 2020