Preface
Recently when working with TAP in an EKS environment, I started to get really annoyed with ECR.
While ECR is a perfectly capable OCI Registry, it has a serious drawback when using TAP, which is the need to generate a repository in advance before you can push an image.
As far as I know, ECR is the only registry that has such a requirement, and as TAP pushes 1 or 2 different images per workload via the supply chain (1 for the image itself and 1 for the deliverable if using the Registry Ops model), the burden of creating ECR repos was becoming a real pain.
Finding A Solution
When looking at how to make the situation better, many options were available, I could deploy Harbor in the environment and use that instead, keep suffering with the current approach, or find a solution using tools I know and love from the CNCF landscape.
In the end i decided to go with the third approach and settled on using 2 of my favorite tools, Crossplane and Kyverno to get the job done.
Solution Pre Requisites
I wont go into detail on how to install Crossplane or Kyverno in this post, as the resources for these tasks out there are great. I will instead base this post on the following being already configured in your environment:
- TAP is installed
- Crossplane is installed
- Crossplane AWS provider is installed and configured with appropriate credentials to manage ECR repos
- Kyverno is installed
Once we have these prerequisites, we are ready to build out the solution.
The Solution
The first thing we want to do is build a crossplane Composite Resource Definition and a corresponding Composition, that will create the ECR repos for us:
Composite Resource Definition:
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xworkloadecrrepos.tap.vrabbi.cloud
spec:
group: tap.vrabbi.cloud
names:
kind: XWorkloadECRRepo
plural: xworkloadecrrepos
claimNames:
kind: WorkloadECRRepo
plural: workloadecrrepos
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
properties:
workloadName:
type: string
repoPrefix:
type: string
region:
type: string
providerName:
type: string
required:
- region
- repoPrefix
- workloadName
- providerName
required:
- parameters
Composition:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: workloadecrrepo
labels:
crossplane.io/xrd: xworkloadecrrepos.tap.vrabbi.cloud
provider: aws
spec:
writeConnectionSecretsToNamespace: crossplane-system
compositeTypeRef:
apiVersion: tap.vrabbi.cloud/v1alpha1
kind: XWorkloadECRRepo
resources:
- name: imagerepo
base:
apiVersion: ecr.aws.crossplane.io/v1beta1
kind: Repository
spec:
forProvider:
forceDelete: true
patches:
- type: CombineFromComposite
combine:
variables:
- fromFieldPath: "spec.parameters.repoPrefix"
- fromFieldPath: "spec.parameters.workloadName"
- fromFieldPath: "spec.claimRef.namespace"
strategy: string
string:
fmt: "%s/%s-%s"
toFieldPath: "metadata.annotations[crossplane.io/external-name]"
policy:
fromFieldPath: Required
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.region"
toFieldPath: "spec.forProvider.region"
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.region"
toFieldPath: "metadata.labels[region]"
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.providerName"
toFieldPath: "spec.providerConfigRef.name"
- name: bundlerepo
base:
apiVersion: ecr.aws.crossplane.io/v1beta1
kind: Repository
spec:
forProvider:
forceDelete: true
patches:
- type: CombineFromComposite
combine:
variables:
- fromFieldPath: "spec.parameters.repoPrefix"
- fromFieldPath: "spec.parameters.workloadName"
- fromFieldPath: "spec.claimRef.namespace"
strategy: string
string:
fmt: "%s/%s-%s-bundle"
toFieldPath: "metadata.annotations[crossplane.io/external-name]"
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.region"
toFieldPath: "spec.forProvider.region"
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.region"
toFieldPath: "metadata.labels[region]"
- type: FromCompositeFieldPath
fromFieldPath: "spec.parameters.providerName"
toFieldPath: "spec.providerConfigRef.name"
Now that we have those defined, we basically have a new namespaced CRD we can use called “WorkloadECRRepo” which an example of one would look like:
apiVersion: tap.vrabbi.cloud/v1alpha1
kind: WorkloadECRRepo
metadata:
name: example-app-repos
namespace: default
spec:
parameters:
workloadName: example-app
region: eu-west-2
repoPrefix: xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/tap/workloads
providerName: aws-provider
This in turn would create for us 2 ECR repos:
- xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/tap/workloads/example-app-default
- xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/tap/workloads/example-app-default-bundle
Now that in it of itself makes life much easier, but adding in Kyverno makes it even easier!
One of the great features we have in Kyverno, is the ability to have what’s called a Generate Policy. A Generate Policy, basically allows you to define that when a specific resource is created or updated, you can define a set of resources to create in accordance with that resource.
A simple and helpful example, is that when a namespace is created, we want to auto create a default network policy, and also create an image pull secret and maybe a CA cert secret in that namespace to help our developers get started.
The idea i had, was to create a Kyverno generate policy, that when a workload is created, we will automatically create an instance of the CRD mentioned above.
This would allow us to automate the ECR repo creation and completely hide the complexity from our end users.
The first step for allowing this is that Kyverno needs to be given the RBAC rights to create and manage the resources it will be generating for us:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/instance: kyverno
app.kubernetes.io/name: kyverno
name: kyverno:tap-helpers
rules:
- apiGroups:
- tap.vrabbi.cloud
resources:
- workloadecrrepos
verbs:
- create
Next we need to define the cluster policy that will generate the repos CRD
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: create-workload-ecr-repos
spec:
background: false
rules:
- name: create-workload-ecr-repos
match:
any:
- resources:
kinds:
- Workload
generate:
kind: WorkloadECRRepo
apiVersion: tap.vrabbi.cloud/v1alpha1
name: "{{request.object.metadata.name}}-ecr-repos"
namespace: "{{request.namespace}}"
synchronize: false
data:
metadata:
ownerReferences:
- apiVersion: carto.run/v1alpha1
kind: Workload
name: "{{request.object.metadata.name}}"
uid: "{{request.object.metadata.uid}}"
spec:
parameters:
workloadName: "{{request.object.metadata.name}}"
region: eu-west-2
repoPrefix: xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/tap/workloads
providerName: aws-provider
As you can see, the policy, defines that the rule is applied based on the source resource being of the type workload, and we are stamping out the object, using parameters from the incoming workload resource, such as its name and namespace.
Another key thing you can see i have added, is the ownerRefernces section. This is a nice trick one should strongly consider to use in Kyverno Generate Policies, which ties the lifecycle of the generated resource to that of the source resource thats creation triggered it. By doing so, when the workload resource is deleted, so to will the ECR repos be deleted.
This enables us to clean up after our selves and not leave garbage around in the system.
With all of this in place, we can hide the complexity and limitations of ECR from our end users, and have the same UX we get from other registries in TAP, with ECR as well.
Summary
While this project was interesting in it of itself, I really think it talks to the bigger picture when it comes to one of the key benefits of TAP.
Because TAP is completely based in Kubernetes, and is managed via CRDs, and controllers, in a very Kubernetes Native manner, possible integrations with the CNCF landscape are endless, and that truly is a game changer in the PaaS world. As great as platforms like Heroku or CF may be, they don’t have a community as large or active as the CNCF landscape and Kubernetes ecosystem at whole, that can be leveraged easily, to extend and enhance them.
I really love seeing how like in this scenario, different tools can come together to build a full cohesive solution, and through innovation, and collaboration, we can overcome limitations and inconveniences we encounter in ecosystem tooling for the benefit of our end users.