Recently i was working on a deployment of TAP for a customer on top of a few TKGi clusters.
While TAP works on any conformant kubernetes cluster, as I have said many times, Kubernetes is not 100% cloud agnostic, and every distribution can have some weird quirks.
When deploying to TKGi a few of these quirks came up, and in this post we will discuss the quirks, and how to solve them.
Docker issue
One thing that is important to validate is that you have performed the migration to Containerd from Docker on your TKGi clusters ahead of installing TAP.
In most cases this should happen automatically when moving to a recent version of TKGi but if for some reason you are still on Docker, be warned that the installation will not succeed. TAP does not work on Docker based container runtimes in kubernetes and requires the use of Containerd in order to function properly.
Contour Issue
When we deployed TAP onto the clusters, we had an issue that the envoy pods of Contour simply would not start.
After investigating this issue together with GSS, the issue was found to be that the Contour package provided in TAP, does not work out of the box with clusters that have their nodes configured to only support IPv4 networking and have disabled at the node level, IPv6 networking.
This was a change made in the Tanzu packaging of Contour, where as of TAP 1.3, the behavior of Contour was changed and it’s defaulting to IPv6 with IPv4 compatibility now.
When debugging the issue, in the envoy pod logs we found the following:
[2023-03-16 12:05:57.334][1][info][upstream] [source/common/upstream/cds_api_helper.cc:35] cds: add 10 cluster(s), remove 2 cluster(s)
[2023-03-16 12:05:57.334][1][info][upstream] [source/common/upstream/cds_api_helper.cc:72] cds: added/updated 0 cluster(s), skipped 10 unmodified cluster(s)
[2023-03-16 12:15:09.584][1][warning][config] [source/common/config/grpc_subscription_impl.cc:126] gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) ingress_http: malformed IP address: ::
ingress_https: malformed IP address: ::
stats-health: malformed IP address: ::
To solve this, we need to add a simple overlay which will change the flags passed to the Contour deployment, and switch it to use IPv4 instead of IPv6.
The first step is to create a secret with the overlay like bellow:
apiVersion: v1
kind: Secret
metadata:
name: ipv4-overlay
namespace: tap-install
stringData:
ipv4-overlay.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"metadata":{"name":"contour"}, "kind": "Deployment"})
---
spec:
template:
spec:
containers:
#@overlay/match by="name"
- name: contour
#@overlay/replace
args:
- serve
- --incluster
- '--xds-address=0.0.0.0'
- --xds-port=8001
- '--stats-address=0.0.0.0'
- '--http-address=0.0.0.0'
- '--envoy-service-http-address=0.0.0.0'
- '--envoy-service-https-address=0.0.0.0'
- '--health-address=0.0.0.0'
- --contour-cafile=/certs/ca.crt
- --contour-cert-file=/certs/tls.crt
- --contour-key-file=/certs/tls.key
- --config-path=/config/contour.yaml
Next we need to update our TAP values file to instruct TAP to use this overlay and apply it to the contour package.
This can easily be done by using the package_overlays section in our TAP values and adding a snippet like bellow:
package_overlays:
- name: contour
secrets:
- name: ipv4-overlay
This will solve the issue, and once applied to the cluster, the envoy pods will enter into a running state and contour will successfully deploy as expected.
Source Testing Issue
On TKGi when using NCP as the CNI, there are some quirks one can encounter. One of these quirks is related to the fact that NCP syncs labels from pods into NSX tags.
The issue seems to be with labels that have a value of "true" such as:
apps.tanzu.vmware.com/auto-configure-actuators: "true"
apps.tanzu.vmware.com/has-tests: "true"
With these labels, NCP seems to not be able to create the tag and as such does not finish networking config of the pod, causing the images to get stuck in an initialization phase indefinitely.
This issue is odd, but can easily be fixed by adding a simple overlay to remove these labels from the pods before they are created.
The first step is to create a secret as follows:
apiVersion: v1
kind: Secret
metadata:
name: testing-template-labels-overlay
namespace: tap-install
type: Opaque
data:
testing-template-labels-overlay.yaml: |
#@ load("@ytt:overlay","overlay")
#@ def testing_template_matcher():
apiVersion: carto.run/v1alpha1
kind: ClusterSourceTemplate
metadata:
name: testing-pipeline
#@ end
#@overlay/match by=overlay.subset(testing_template_matcher())
---
spec:
ytt: |
#@ load("@ytt:data", "data")
#@ load("@ytt:overlay", "overlay")
#@ def merge_labels(fixed_values):
#@ labels = {}
#@ if hasattr(data.values.workload.metadata, "labels"):
#@ labels.update(data.values.workload.metadata.labels)
#@ end
#@ labels.update(fixed_values)
#@ return labels
#@ end
#@ def bad_labels():
#@ if/end hasattr(data.values.workload.metadata.labels, "apps.tanzu.vmware.com/has-tests"):
#@overlay/remove
apps.tanzu.vmware.com/has-tests: "true"
#@ if/end hasattr(data.values.workload.metadata.labels, "apps.tanzu.vmware.com/auto-configure-actuators"):
#@overlay/remove missing_ok=True
apps.tanzu.vmware.com/auto-configure-actuators: "true"
#@ end
#@ def merged_tekton_params():
#@ params = []
#@ if hasattr(data.values, "params") and hasattr(data.values.params, "testing_pipeline_params"):
#@ for param in data.values.params["testing_pipeline_params"]:
#@ params.append({ "name": param, "value": data.values.params["testing_pipeline_params"][param] })
#@ end
#@ end
#@ params.append({ "name": "source-url", "value": data.values.source.url })
#@ params.append({ "name": "source-revision", "value": data.values.source.revision })
#@ return params
#@ end
---
apiVersion: carto.run/v1alpha1
kind: Runnable
metadata:
name: #@ data.values.workload.metadata.name
labels: #@ overlay.apply(merge_labels({ "app.kubernetes.io/component": "test" }),bad_labels())
spec:
#@ if/end hasattr(data.values.workload.spec, "serviceAccountName"):
serviceAccountName: #@ data.values.workload.spec.serviceAccountName
runTemplateRef:
name: tekton-source-pipelinerun
kind: ClusterRunTemplate
selector:
resource:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
#@ not hasattr(data.values, "testing_pipeline_matching_labels") or fail("testing_pipeline_matching_labels param is required")
matchingLabels: #@ data.values.params["testing_pipeline_matching_labels"] or fail("testing_pipeline_matching_labels param cannot be empty")
inputs:
tekton-params: #@ merged_tekton_params()
Once you apply this secret we simply need to use the package_overlays section in the TAP values file to instruct TAP to apply this change to the OOTB Templates package:
package_overlays:
- name: ootb-templates
secrets:
- name: testing-template-labels-overlay
Once this is applied and TAP reconciles, your testing pods will work as expected.
Prisma Scanner issue
For this customer we are using the Prisma scanner and there too we encountered the Label issue, as well as another issue regarding security context configuration.
These issues are easily fixable with 2 steps.
As the prisma package is not part of the TAP installation it will be fixed in one step, and the label issue will be fixed in a different step.
To fix the labels issue, we can create the following overlay secret:
apiVersion: v1
kind: Secret
metadata:
name: scan-stamping-labels-overlay
namespace: tap-install
type: Opaque
data:
scan-stamping-labels-overlay.yaml: |
#@ load("@ytt:overlay","overlay")
#@ def scan_template_matcher():
apiVersion: carto.run/v1alpha1
kind: ClusterSourceTemplate
metadata:
name: source-scanner-template
#@ end
#@overlay/match by=overlay.subset(scan_template_matcher())
---
spec:
ytt: |
#@ load("@ytt:data", "data")
#@ load("@ytt:overlay", "overlay")
#@ def merge_labels(fixed_values):
#@ labels = {}
#@ if hasattr(data.values.workload.metadata, "labels"):
#@ labels.update(data.values.workload.metadata.labels)
#@ end
#@ labels.update(fixed_values)
#@ return labels
#@ end
#@ def bad_labels():
#@ if/end hasattr(data.values.workload.metadata.labels, "apps.tanzu.vmware.com/has-tests"):
#@overlay/remove
apps.tanzu.vmware.com/has-tests: "true"
#@ if/end hasattr(data.values.workload.metadata.labels, "apps.tanzu.vmware.com/auto-configure-actuators"):
#@overlay/remove missing_ok=True
apps.tanzu.vmware.com/auto-configure-actuators: "true"
#@ end
---
apiVersion: scanning.apps.tanzu.vmware.com/v1beta1
kind: SourceScan
metadata:
name: #@ data.values.workload.metadata.name
labels: #@ overlay.apply(merge_labels({ "app.kubernetes.io/component": "source-scan" }),bad_labels())
spec:
blob:
url: #@ data.values.source.url
revision: #@ data.values.source.revision
scanTemplate: #@ data.values.params.scanning_source_template
#@ if data.values.params.scanning_source_policy != None and len(data.values.params.scanning_source_policy) > 0:
scanPolicy: #@ data.values.params.scanning_source_policy
#@ end
We can now use the package_overlays section in the TAP values file to apply these changes:
package_overlays:
- name: ootb-templates
secrets:
- name: scan-stamping-labels-overlay
As mentioned above, we also need to update the Prisma package installation.
In this case, as prisma is installed not as part of the TAP installation itself we need to apply the overlay ourselves on the package installation.
First we need to create the overlay secret:
apiVersion: v1
kind: Secret
metadata:
name: prisma-sec-context-overlay
namespace: tap-install
type: Opaque
stringData:
prisma-sec-context-overlay.yaml: |
#@ load("@ytt:overlay","overlay")
---
#@ def st_matcher():
apiVersion: scanning.apps.tanzu.vmware.com/v1beta1
kind: ScanTemplate
#@ end
#@overlay/match by=overlay.subset(st_matcher()), expects="1+"
---
spec:
template:
#@overlay/match missing_ok=True
#@overlay/remove
securityContext:
runAsNonRoot: true
We can now apply this overlay to our prisma package installation with the following command:
kubectl annotate pkgi -n tap-install prisma ext.packaging.carvel.dev/ytt-paths-from-secret-name.0=prisma-sec-context-overlay
Summary
While there were indeed some issues encountered with TAP on TKGi, overall with just a few overlays, we can get this working end to end pretty easily. Understanding the mechanisms and intricacies of YTT overlays and Carvel packaging is indeed a steep learning curve, but once you get a hold of it, it is extremely powerful and an amazing toolset to have at your fingertips.