-
Integrating Trivy scanner in TAP
Introduction
One of the really great features in TAP is the pluggable architecture of the scanning tools.
TAP by default integrates with Grype as the source code and image scanner, however it also has beta support currently for Snyk and for Carbon Black (both limited currently for image scanning only).
While these are the provided solutions from VMware, the pluggable architecture of the scanning components, allows us to easily plug in our own scanner of choice. this could be an open source scanner like Trivy or a proprietary tool like Aqua or Prisma.
In this blog post, we will discuss how one could build such a custom integration, using the very common scanner Trivy.
TLDR
I have built the integration for trivy now for both source code and image scanning and it is published including the source code on github at the following URL: https://github/com/vrabbi-tap/tap-scanner-integrations
Instructions on installing the packages can be found in that repo.Overview
The goal is to create a scantemplate that will provide the exact same UX and features as the provided grype one offers but simply change which scanner we want to use.
The default scantemplate we will use as our baseline is called private-image-scan-template which is automatically created in your developer namespace defined in the TAP values file.
The scan template has the following flow in which each step runs in its own dedicated container:- Scan the image
- Configure access to the metadata store
- input the scan results to the metadata store
- check compliance against the defined scan policy
- aggregate results and create final output
In order to integrate our own scanner, the only image we need to build ourselves is the image used for the scanning process itself. we will also make some changes to the command line flags passed to the other containers but we can use the provided images as is without any issues.
The way data is passed between the containers is via a shared mounted volume which is then mounted into all containers at the path /worksapce.
The general process we need to follow is:
- build an image that accepts an image URI as an input via an environment variable
- the image should run the scan against the inputted image
- the image must output the scan results in Cyclonedx or SPDX SBOM formats
- the image must output a summary YAML with the CVE count in the image split by severity
- the SBOM and summary yaml should be saved to files in the shared mounted volume so that they can be used in the following steps of the scanning process.
Pre Requisites
- The first thing we need is a TAP environment with the testing and scanning supply chain installed.
We will utilize the out of the box scan templates defined for grype later on as a baseline to build from. - A machine with docker installed to build our image
Lets get this working
Creating the scanning script
We will be basing the logic of our script on the script which is used in the official grype scan template which can be retrieved by running the following commands:
IMAGE=`kubectl get scantemplate -n default private-image-scan-template -o json | jq -r '.spec.template.initContainers[] | select(.name == "scan-plugin") | .image'` docker pull $IMAGE id=$(docker create $IMAGE) docker cp $id:/image/scan-image.sh ./grype-script.sh docker rm -v $id
We will start just like the grype script by accepting the scan directory, scan file path and whether or not to pull the image as variables.
#!/bin/bash set -eu SCAN_DIR=$1 SCAN_FILE=$2 PULL_IMAGE="" if [[ $# -gt 2 ]] then PULL_IMAGE=$3 fi
The next step is to change directories to the shared volume, where we have write permissions
pushd $SCAN_DIR &> /dev/null
The next step is very important, we need to specify how to reference the image in the scan command in 2 different cases. The first case which is designated by the variable PULL_IMAGE being an empty string is when the source image is a publicly accessible image, not requiring credentials to pull it. The second case is where credentials are needed (this is the default assumption in TAP). In this second case, we need to pull down the image as a tarball and then tell our scanner to scan the local tarball instead of pulling from the registry.
There are many ways to pull the image but we will use the same tool as VMware use in the grype image called krane, as it also will be beneficial in a later step.if [[ -z $PULL_IMAGE ]] then ARGS=$IMAGE else krane pull $IMAGE myimage ARGS="--input myimage" fi
Now the next step is to run the scan itself and output the SBOM with the vulnerability data embedded in it, in a supported format which in the case of Trivy will by CycloneDX JSON and put this in a file.
trivy image $ARGS --format cyclonedx --security-checks vuln > $SCAN_FILE
While this does give us a valid CycloneDX SBOM as an output, TAP requires 2 specific fields be set correctly in order for the metadata store to be able to index the data correctly which trivy does not do out of the box. The needed fields are ".metadata.component.name" which should be the image repo URI without a tag or a digest at the end, and ".metadata.component.version" which should be the sha256 value of the image.
In order to solve this, we will extract that data from the SBOM if it is there, and otherwise we will parse the inputted image URI, and finally we will add these fields to the outputted BOM file.NAME=`cat $SCAN_FILE | jq -r '.metadata.component.properties[] | select(.name == "aquasecurity:trivy:RepoDigest") | .value | split("@") | .[0]'` DIGEST=`cat $SCAN_FILE | jq -r '.metadata.component.properties[] | select(.name == "aquasecurity:trivy:RepoDigest") | .value | split("@") | .[1]'` if [[ -z $NAME ]]; then NAME=`echo $IMAGE | awk -F "@" '{print $1}'` fi if [[ -z $DIGEST ]]; then DIGEST=`echo $IMAGE | awk -F "@" '{print $2}'` fi if [[ -z $DIGEST ]]; then if [[ -z $PULL_IMAGE ]]; then DIGEST=`krane digest --tarball myimage` else DIGEST=`krane digest $IMAGE` fi fi cat $SCAN_FILE | jq '.metadata.component.name="'$NAME'"' | jq '.metadata.component.version="'$DIGEST'"' > $SCAN_FILE.tmp && mv $SCAN_FILE.tmp $SCAN_FILE
Now we need to create the summary report, of number of CVEs at each of the different CVE criticality levels. TAP has 5 defined levels: critical, high, medium, low, unknown. While this may seem easy, the issue is that in the SBOM, we may receive multiple different ratings for a single vulnerability comming from different sources. In this example, i have decided to go with whatever the highest criticality level is found for each CVE.
critical=0 high=0 medium=0 low=0 unknown=0 for row in $(cat $SCAN_FILE | jq -r '.vulnerabilities[] | @base64'); do VULN=`echo ${row} | base64 --decode` if [[ `echo $VULN | jq '.ratings[] | select(.severity == "critical")'` != "" ]]; then critical=$((critical+1)) elif [[ `echo $VULN | jq '.ratings[] | select(.severity == "high")'` != "" ]]; then high=$((high+1)) elif [[ `echo $VULN | jq '.ratings[] | select(.severity == "medium")'` != "" ]]; then medium=$((medium+1)) elif [[ `echo $VULN | jq '.ratings[] | select(.severity == "low")'` != "" ]]; then low=$((low+1)) elif [[ `echo $VULN | jq '.ratings[] | select(.severity == "info")'` != "" ]]; then low=$((low+1)) else unknown=$((unknown+1)) fi done
Now that we have the counters for each level we can create the summary YAML file:
trivyVersion=`trivy --version --format json | jq -r .Version` cat << EOF > $SCAN_DIR/out.yaml scan: cveCount: critical: $critical high: $high medium: $medium low: $low unknown: $unknown scanner: name: Trivy vendor: Aqua version: $trivyVersion reports: - /workspace/scan.json EOF
And the final step is to print the content of our 2 files, first the SBOM itself and then the summary YAML:
cat $SCAN_FILE cat $SCAN_DIR/out.yaml
The script can now be saved on your machine and in my case, i called it scan-image.sh like VMware call their script in the Grype scanner.
Creating the Dockerfile
Now that we have our script all configured and ready to be used, we need to build a container image with the script and all needed tools within it.
While most of the dependencies are very common and can be downloaded as precompiled binaries, krane is not available as a precompiled binary and building it from source would be too much of a pane. To solve this we will simply copy that file from the image for the grype scanner we referenced earlier as part of the Dockerfile and build process.
I have decided to use ubuntu as my source image, and know that the dependencies we need beyond krane are jq, wget, curl, trivy, and of course our script we built before.chmod 755 scan-image.sh IMAGE=`kubectl get scantemplate -n default private-image-scan-template -o json | jq -r '.spec.template.initContainers[] | select(.name == "scan-plugin") | .image'` cat <<EOF > Dockerfile FROM ubuntu RUN apt-get update && apt-get install -y wget curl && rm -rf /var/lib/apt/lists/* RUN wget "http://stedolan.github.io/jq/download/linux64/jq" && chmod 755 jq && mv jq /usr/local/bin/jq && curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin && mkdir /workspace COPY --from=$IMAGE /usr/local/bin/krane /usr/local/bin/krane COPY scan-image.sh /usr/local/bin/ USER 65534:65533 EOF
Now that we have our Dockerfile created, we can build the image and tag it with the repo URL we want this saved to, for example:
docker build . -t harbor.vrabbi.cloud/tap/trivy-scanner:1.0.0
Now we can push the image to our registry:
docker push harbor.vrabbi.cloud/tap/trivy-scanner:1.0.0
Creating the scan template CR
The final preparation step is to create our custom scan template CR YAML.
The first step is to output the out of the box scan template to a file we can edit:kubectl get scantemplate -n default private-image-scan-template -o yaml > private-scan-template.yaml
We now need to clean up the yaml a bit, and remove the following fields:
- .metadata.annotations
- .metadata.creationTimestamp
- .metadata.generation
- .metadata.labels
- .metadata.namespace
- .metadata.resourceVersion
- .metadata.uid
The only field under metadata we should have left is the metadata.name field.
Now we need to make a few changes to the spec of the scan template itself.
The first change is to point the initContainer with the name "scan-plugin" to use the image we just created and pushed to our registry.
After that we need to change the arguments passed to the container from:
./image/scan-image.sh /workspace /workspace/scan.xml true
To:
scan-image.sh /workspace /workspace/scan.json true
That deals with our scan step and now we need to make a change in the 3rd initContainer which has the name "metadata-store-plugin". here the change we need to make is also in the arguments passed to the container where we need to change the format it expects as an input, as well as the SBOM file name. We need to change the args section from:
- args: - image - add - --cyclonedxtype - xml - --path - /workspace/scan.xml
To:
- args: - image - add - --cyclonedxtype - json - --path - /workspace/scan.json
And the final change we need to make is in the policy compliance step which is the final initContainer and is named "complaince-plugin", where we also need to change the file name and the input format type in the args section from:
- args: - check - --policy - $(POLICY) - --scan-results - /workspace/scan.xml - --parser - xml - --format - yaml - --output - /workspace/compliance-plugin/out.yaml
To:
- args: - check - --policy - $(POLICY) - --scan-results - /workspace/scan.json - --parser - json - --format - yaml - --output - /workspace/compliance-plugin/out.yaml
Replacing the initial Scan Template
Because the initial scan template is managed by a carvel package, making the change in place will actually be undone within a matter of minutes, the next time kapp controller reconciles our packages.
To work around this for testing, we can pause the reconciliation of the relevant package.kubectl patch pkgi -n tap-install grype --patch '{"spec": {"paused": true}}' --type=merge
Now we can apply the updated scantemplate to the cluster:
kubectl apply -f private-scan-template.yaml
If you want to revert back to the initial scantemplate, you can simply run:
kubectl patch pkgi -n tap-install grype --patch '{"spec": {"paused": false}}' --type=merge
Same UX – Different Scanner
As TAP is all based on kubernetes and that is always the source of truth, one of the great things, is that TAP GUI picks the data up about what scanner we used to scan the image, and we actually get visibility into that in the supply chain plugin inside of TAP GUI:
And even with the custom scanner being used, we loose none of the capabilities and nice features we get with TAP like the visibility into CVEs from the supply chain plugin:
And the data is also integrated exactly the same in the new security analysis plugin which was added in TAP 1.3, which gives you a clear way to see the entire landscape of your workload in terms of security in a clear and concise way.
Summary
While there is a bit of work in building an integration like this, especially in terms of the parsing of the data and making sure you output the data in the correct way, the fact that TAP allows us to do this, and when we do, we still get the same great UX, is pretty amazing.
As CycloneDX is becoming a highly adopted standard, and scanners like Aqua Enterprise and Prisma Cloud already support this format as an output of their scanners, the ability to integrate them into TAP is almost identical to what we have done here, and the huge benefit is, that you can truly integrate TAP with your existing tooling in a non disruptive yet very beneficial way. -
Finding and Removing Stale CNS Volumes in vSphere
During a recent reorganization of our vSphere lab environment, I was made aware that we had a very weird situation.
We use vSAN as our main storage in the lab and we found that it was at over 95% percent used, but only 50 percent could be attributed to VMs and system data.After looking into this a bit, we found that the vast majority of the storage being used, was actually orphaned FCDs (First Class Disks), which are the vSphere object which maps to a PV (Persistent Volume) in kubernetes when using the vSphere CSI.
This actually makes sense, as we spin up Kubernetes clusters for testing on a daily basis, and when you delete a cluster, if you don’t delete the CSI volumes in advance, they will simply stay around indefinitely.
When trying to assess how best to clean up the environment, it became very clear that this aspect of the vSphere CSI and the CNS Volume implementation in vSphere, simply does not have a great flow for analysis of the situation, as well as their being no way to perform bulk operations.
When looking at the vSphere UI we can navigate to the relevant Datastore, and under the monitoring tab we will have a section called “Cloud Native Storage” with a “Container Volumes” page.
These are all the CNS volumes that are on this specific data store. If we click on the info card next to the volume name, we can get additional information about a volume that vSphere has collected.
The first tab includes basic vSphere related information including IDs, Datastore placement, Overall health, and any other data relating to the applied storage policies and the compliance of the volume with the storage policy.
The next tab is where we really get interesting information, which includes kubernetes data about the persistent volume that this volume represents.
This data includes the PV name, the namespace and name of the related PVC, all labels that are applied on that PVC, the pod or pods it is mounted to as well as one other key data point, which is the Kubernetes cluster that this PV was created in.
With this information, we should be able to build out a report of our persistent volumes with all the needed information, in order to assess what can be deleted and what cant be deleted.
The first place i looked for such info was vRealize Operations, however CNS Volumes are not even collected by the vCenter adapter, and as such we need to find another solution.
The next place I looked to get this info in a clear way was RVTools. While RVTools has the tab which includes a list of what it believes to be Orphaned VMDKs, and they do include the CNS volumes, the needed kubernetes metadata is not available making it a no go as well.
With this being the case, I decided to check what could be done from a CLI based solution. For this specific use case, i decided to go with the golang based CLI tool govc, which is fast and really easy to use.
Setting up GOVC:
export GOVC_INSECURE=true export GOVC_URL=lab-vcsa.vrabbi.cloud export GOVC_USERNAME=scott export GOVC_PASSWORD=SuperSecretPassword
The next step is to list the volumes on the vsanDatastore which can be done using the the volume.ls command:
govc volume.ls -ds=vsanDatastore
However, this command simply returned the name and ID of the volumes and not the metadata we need.
Luckily govc, can give us all of this data, when using the “-json” flag:
govc volume.ls -ds=vsanDatastore -json
As we can see, we truly get a huge amount of data in the call, for every CNS volume on the datastore, making this a great starting point.
The next part is the need to parse this data and extract only the needed parts which for this use case includes:
- Cluster Name
- PV Name
- Namespace of the PVC
- Owner (The vSphere user the CSI driver used to create this volume)
- Size of the volume
- Volume ID
- Datastore URL the volume resides on
Once we find where this data is located within the JSON body we get as a response, we can use the common CLI tool for json manipulation “jq” for extracting the needed data:
govc volume.ls -ds=vsanDatastore -json | jq -r '.[]' | jq '.[] | {cluster: .Metadata.ContainerCluster.ClusterId, pvc: .Name, namespace: .Metadata.EntityMetadata[0].Namespace, owner: .Metadata.ContainerCluster.VSphereUser, sizeGB: (.BackingObjectDetails.CapacityInMb/1024), datastoreUrl: .DatastoreUrl, id: .VolumeId.Id}'
This command will have an output similar to the following:
As we can see, the output is exactly the data we want, however it would be nice if we had this in CSV format which would allow us to open it in excel for example, making the analysis much easier to do.
We can add a simple addition to the this command at the end, again using jq, in order to convert this json data into a simple CSV.
The addition first needs to convert the multiple json documents outputted by the previous command into an array which is done via the following
jq -s '.'
The next step is extract the keys and make them the column headers, and then all values are placed in the rows beneath it, and finally we export this as a CSV
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv'
With all of this in place, we should receive a CSV formatted list of all the CNS volumes on our vsanDatastore, along with the metadata we need in order to analyze it.
The last step is to simply output this into a file which can be done very easily with a redirection of the output stream to a file. In the end, the final command to get this data is:
govc volume.ls -ds=vsanDatastore -json | jq -r '.[]' | jq '.[] | {cluster: .Metadata.ContainerCluster.ClusterId, pvc: .Name, namespace: .Metadata.EntityMetadata[0].Namespace, owner: .Metadata.ContainerCluster.VSphereUser, sizeGB: (.BackingObjectDetails.CapacityInMb/1024), datastoreUrl: .DatastoreUrl, id: .VolumeId.Id}' | jq -s '.' | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' > vsanDatastore-cns-volumes.csv
Once we have gone through the list of volumes, and have the list of CNS volumes we want to delete, we can simply create a text file containing the ID of those volumes which can be found in the 3rd column of the outputted CSV.
Once we have a file that looks like this:
We can simply run a command that loops through this list and via govc, will delete them using the govc volume.rm command:
xargs -a cns-vols-to-delete.txt -I{} -d'\n' govc volume.rm {}
Summary
While the UI is not very useful for these tasks, and neither are the standard monitoring or reporting tools for vSphere, it is good, that this information is accessible via automation tools like govc, which allows us to solve issues like this, in the interim at least, until a more streamlined approach is available via the vSphere UI or other official VMware tooling.
Another hope i do have, is that down the road, kubernetes distributions like TKG, OCP, Rancher, EKS-A and others as well, will have a mechanism to simply at cluster deletion time, delete all of the remaining PVs, which would eliminate this issue.
I hope this was helpful, as for me in the lab it was able to save over 6TB of storage which is huge!
-
Tanzu CLI on Windows made easy
As the Tanzu ecosystem grows, the reliance on Tanzu CLI grows as well. While the CLI itself works very well cross platform, the main issue we consistently encounter is the pain of installing the Tanzu CLI on a windows machine.
The process which is documented by VMware in the TKG and TAP documentation, is a huge list of manual steps which many require administrative permissions, and is simply a really bad user experience.
In this blog post we will see a small POC i put together that can help solve this issue.
While this POC is based on the TAP version of the Tanzu CLI, it could easily be tweaked to work with the TKG version as well.
What options do we have
When trying to think about simplifying the installation process, my initial thought was to build a WinGet package which is then easily discoverable with the new WinGet exosystem or maybe to build a Chocolatey package.
While those 2 options are valid in many environments, some enterprises simply don’t allow these package managers, and also still today, the vast majority of machines don’t have them installed.
As i wanted to build a solution that would make the process better for as many people as possible, I had to find another option.
Another simple option would have been to just write a simple PowerShell script to do the installation. While this is an option and may be the right choice in some cases, I wanted to have a better experience and actually manage the Tanzu CLI as a package, which can have full lifecycle management from installation, through upgrades and finally uninstalling the CLI and of course cleaning up when uninstalling.
This all led me to the realization that the best option for what i was looking for was to build an MSI installer for the Tanzu CLI.
How to build an MSI
The first thing i needed to figure out was, how do i even build an MSI.
After some research I found 2 projects that seemed very interesting:
When evaluating the 2 options, It was very clear that PowerShell Pro Tools was the right choice for a POC project, as it is much easier to get started with, and actually, the PowerShell Pro Tools uses the Wix mechanism under the hood but simply adds a simple abstraction layer above in PowerShell to make our lives much easier.
While the PowerShell Pro Tools module is great and can suffice many needs, i did end up having some issues with it, which led me to also utilize the wix toolset itself to patch the generated config for my needs.
How to build the initial config
The first step is to install the PowerShell Pro Tools module:
install-module PowerShellProTools
Once we have the module installed, we can start building our project.
You will need to download the Tanzu CLI zip file first as the MSI will need to contain these files.
Next we will need to generate a GUID which we need to save somewhere we will remember. This GUID is used as an ID of our MSI program which can be used later on when a new version comes out, so that the MSI can perform an upgrade.
$upgradeCode = ([guid]::NewGuid().ToString()) write-host "Tanzu CLI MSI Upgrade Code: $upgradeCode"
Once we have the Tanzu CLI zip file on our machine we can begin defining some variables in PowerShell:
# This is the Directory where auto generated files and eventually our MSI will be placed $outputDirName = "output" # This is the TAP version we are building this package for $tapVersion = "1.2.1" # This is the name of the zip file we have downloaded already $tanzuZipFileName = "tanzu-framework-windows-amd64.zip" # This is your Companies name $company = "TeraSky Israel" # This is where you put the upgrade code from above $upgradeCode = 'a8473e56-43ec-4665-9132-2ff94ac32b33' # The name of the Product which in our case is TAP $productName = "TAP" # The current directory where all files will be located $DIR=(pwd).path
Now that we have the needed variable set we need to create a few additional files:
- A script which performs the installation of the CLI and its plugins
- A script which uninstalls the CLI and its plugins
- An icon to be used for the program
- A file with VMware’s EULA for the Tanzu CLI
All of these files, as well as the final script that puts all of this solution together can be found in my GitHub repo .
Once we have those files locally in our current directory we can begin building our MSI itself.
A key feature that we are using in this MSI, is the ability to run scripts at specific hooks in the installation process. by default the MSI will simply copy files to a location we request, but we have a set of commands we need to run after the Tanzu CLI zip is copied over to the machine in order to install it as well as its plugins.
In order to define these actions we have a simple PowerShell command provided via the PowerShell Pro Tools module:
$installAction = New-InstallerCustomAction -FileId 'InstallPlugins' -CheckReturnValue -RunOnInstall -arguments '-NoProfile -WindowStyle Normal -InputFormat None -ExecutionPolicy Bypass' $uninstallAction = New-InstallerCustomAction -FileId 'UninstallPlugins' -RunOnUninstall -arguments '-NoProfile -WindowStyle Normal -InputFormat None -ExecutionPolicy Bypass'
Now that we have these actions defined in variables, we can run the command to build our MSI:
New-Installer -Productname $productName -Manufacturer $company -platform x64 -UpgradeCode $upgradeCode -Content { New-InstallerDirectory -PredefinedDirectory "ProgramFiles64Folder" -Content { New-InstallerDirectory -DirectoryName "tanzu" -Content { New-InstallerFile-Source .\$tanzuZipFileName -Id 'bundle' New-InstallerFile-Source .\install-plugins.ps1 -Id 'InstallPlugins' New-InstallerFile-Source .\uninstall-tanzu-cli.ps1 -Id 'UninstallPlugins' New-InstallerFile-Source .\EULA.rtf -Id 'EULA' } } } -OutputDirectory (Join-Path$PSScriptRoot"$outputDirName") -RequiresElevation -version $tapVersion -CustomAction $installAction,$uninstallAction -AddRemoveProgramsIcon $DIR\tanzu-icon.ico
This command will generate a few configuration files as well as the initial MSI file, however we are going to delete the MSI file, tweak the configuration files and then regenerate the MSI.
The first change we need to make is to the way our scripts at installation and uninstall phases are run. As some of the changes we are making require administrative permissions, we need to make sure that the scripts run with elevated permissions.
This can be achieved via the following command:
((Get-Content -path $outputDirName\TAP.$tapVersion.x64.wxs -Raw) -replace '<CustomAction Id=','<CustomAction Impersonate="no" Id=') | Set-Content -Path $outputDirName\TAP.$tapVersion.x64.wxs
Basically, we need to add ‘Impersonate=”no”‘ to the custom action field in our XML config file.
The next change we need to make is to integrate a UI form which will have our EULA in it which should come up as part of the installation:
((Get-Content-path $outputDirName\TAP.$tapVersion.x64.wxs -Raw) -replace '</Product>','<UIRef Id="WixUI_Minimal" /><UIRef Id="WixUI_ErrorProgressText" /></Product>') | Set-Content -Path $outputDirName\TAP.$tapVersion.x64.wxs
Once these tweaks have been made, we can now delete the initial MSI and build the new one based on our modified configuration file.
As mentioned above, the Powershell Pro Tools module utilizes the Wix toolset under the hoods, which means we don’t need to install anything else in order to build our MSI as the tools are already present, we just need to find them:
$modulePath= (Get-Module -ListAvailable PowerShellProTools)[0].path $modulePath = $modulePath.substring(0,$modulePath.LastIndexOf("\")) $binPath = "$modulePath\Wix\bin"
With the variables set with the path to the needed tools, we can now rebuild our MSI using 2 simple commands:
& $binPath\candle.exe ".\TAP.$tapVersion.x64.wxs" & $binPath\light.exe -dWixUILicenseRtf="$DIR\EULA.rtf" -ext WixUIExtension ".\TAP.$tapVersion.x64.wixobj" -o "TAP.$tapVersion.x64.msi"
Now that we have rebuilt our MSI, lets see what it looks like:
As we can see, we have a few configuration files, along with our MSI installer.
If we try to install the MSI via the UI:
Once we accept the EULA, we now get the UAC prompt as the installation requires admin permissions:
Once we accept the UAC prompt, the Tanzu CLI installation will begin and we should recieve about a minute later:
While the UI installation is nice, because this is an MSI, we can also perform the installation via automation using the msiexec.exe CLI tool for example:
# Must be run from ad elevated command prompt msiexec /i TAP.1.2.1.x64.msi ACCEPT=YES /qr+
Summary
This was a pretty fun POC, and I think it proves that the experience of installating Tanzu CLI can be much better and more streamlined.
Hopefully, this is beneficial for those out there looking to simplify installation of Tanzu CLI or other tools that have similar experiences currently in terms of installation experience.
There are many things that would be needed to make something like this official, and production grade, but as a POC of about 1.5 hours overall, I think the results are pretty cool!
-
Auto Generation of Certs for TAP Workloads
When we setup a TAP environment, one of the key aspects we must take into account is how we will be exposing our workloads outside of the cluster.
By default, TAP workloads are deployed as Knative services and are exposed via plain HTTP which is not a very secure or production ready solution.
Another option we get very easily with a few additional lines in our configuration file for TAP, is the ability to provide a secret that has a wildcard certificate in it that will be used for all of our workloads in the cluster.
While using a wildcard certificate for a platform has become a common practice as we have seen with TAS, OCP and others, wouldn’t it be great if we could have actual workload specific certificates auto generated for us by the platform and not need to use a wildcard?
In this post we will cover a way to achieve this in which our CA of choice is an Active Directory CA server.
Why not just use a wildcard
While the setup and configuration of a wildcard certificate is extremely easy it definitely has some drawbacks we must consider.
The most TAP centric issue we can encounter with wildcard certificates is that they simply don’t work at scale within TAP without changing how the ingress URLs are generated.
By default, TAP uses the following naming conventions for ingress URLs:
<WORKLOAD NAME>.<NAMESPACE>.<DOMAIN>
While that may seem appealing, this does not work well with wildcards, as there is no way to create a wildcard certificate that supports "x" number of subdomains, and a wildcard only can support a single segment (the first one) of a domain name being anything.
This means that in order to use the default naming convention, we need to not just have a wildcard for ".<DOMAIN NAME>" but rather we will need to add Subject Alternative Names with the format of ".<NAMESPACE>.<DOMAIN>" for all of the namespaces in which we will want to deploy workloads.
Knowing all of the needed namespaces is not feasible upfront which means we need a solution that is more dynamic.
Solution #1 – Change the Domain Template
This is probably the easiest solution. In this solution, we simply add a few more lines to our TAP configuration in which we will change the convention based on which an ingress will be created from:
<WORKLOAD NAME>.<NAMESPACE>.<DOMAIN>
To:
<WORKLOAD NAME>-<NAMESPACE>.<DOMAIN>
By doing this, we put everything that is dependant on the workload into a single section of the FQDN, and then a wildcard certificate will work.
To do this the additional lines we would need to add to our TAP values files for any cluster created with the full, run, or iterate profiles would be:
cnrs: domain_template: '{{.Name}}-{{.Namespace}}.{{.Domain}}'
While the above solution is an easy way to solve the issue of the wildcard certificate mentioned above, there are 2 other issues with wildcards we need to take into consideration.
- From a security perspective, the biggest concern with wildcard certificates is that when one server or sub-domain covered by the wildcard is compromised, all sub-domains may be compromised. In other words, the upfront simplicity of the wildcard can create significant problems should things go wrong.
- From a maintenance perspective we also have the need to remember on a typically yearly basis to replace the wildcard certificate in our clusters. If we were to forget to change the certificate in the cluster in time, ALL of our workloads would have TLS issues at the same time which could cause severe impact on your business.
As we can see, using a wildcard may suffice for some use cases, and can be an easy way to get started, there has to be a better way….
Generating Certificates At Runtime With Cert-Manager
One of the components included in TAP is Cert-Manager.
Cert-Manager is an industry standard kubernetes operator that can manage the entire lifecycle of certificates in the context of a Kubernetes environment.When we work with public domains, we can use the integration for example with LetsEncrypt or really any ACME server, and generate our certificates in an easy and automated manner.
One of the nice things with using Cert-Manager is that Knative which is the default deployment mechanism for our workloads in TAP, has an OOTB integration with Cert-Manager, in which it can auto generate the certificates when a new Knative Service is deployed!
This allows us to not need to worry about certificate creation, and let the platform deal with it automatically!
The way that this works is that you create a ClusterIssuer CR in your cluster, which is a custom resource that is provided by Cert-Manager, that is utilized for issuing the certificates we request of it.
The Issue Of On-Prem Environments
While the idea of auto generating trusted certificates sounds great, it has some challenges when working in a typical On-Prem environment.
Typically we see that Microsoft’s Active Directory CA solution, is the most commonly used CA when dealing with this type of environment, and unfortunately, Cert-Manager does not have an integration with this CA.
While we could build such an integration (It has been done in the past), this would require a lot of work and maintenance that simply is not an ideal situation or even possible for many organizations to undertake.
We could also decide to use the self signed issuer type in Cert-Manager, and simply allow Cert-Manager to create self signed certificates for each of our workloads.
While self signed certs may work for demo environments or even development environments, they really are not a solution for a production grade platform because everyone will receive certificate warnings any time they try to access the application.
The Solution – Using an Intermediate CA
When dealing with certificates, we have the concept of an intermediate or subordinate CA.
An intermediate CA, is a certificate that has been signed by the root CA, and has been given the "permissions" to issue certificates on behalf of the root CA.
Once we have an intermediate CAs full chain in a PEM format, as well as the private key for the intermediate CA in PEM format as well, we can use the Cert-Manager CA ClusterIssuer type, and have Cert-Manager generate certificates that are signed by the dedicated intermediate CA, we have provided it.
How to set this up
The first step is to install TAP as we always would without this solution. We also do not need to change the default naming template for our services, as certificates can be generated for the default naming convention as well!
Once we have deployed TAP we are going to next configure Cert-Manager and create the ClusterIssuer we will be using.
In order to do this you will need to have the certificate chain (cert.cer) and the private key (cert.key), both in PEM format saved in files on your working machine, and then we can create a secret from these files:
kubectl create secret generic tap-intermediate-ca -n cert-manager \ --from-file=tls.crt=cert.cer --from-file=tls.key=cert.key
Now that we have the secret created with our intermediate CA data, we can create the ClusterIssuer:
cat << EOF | kubectl apply -f - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: ca-issuer spec: ca: secretName: tap-intermediate-ca EOF
Now that we have everything setup, and ready to be configured in TAP, we will create one final secret, that contains a YTT overlay that will configure the Knative system to use the newly created ClusterIssuer and auto generate TLS certificates for our workloads:
cat << EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: cnrs-tls-overlay namespace: tap-install type: Opaque stringData: tls-overlay.yaml: | #@ load("@ytt:overlay", "overlay") #@ load("@ytt:data", "data") --- #@overlay/match by=overlay.subset({"metadata":{"name":"config-certmanager"}}) --- data: #@overlay/remove missing_ok=True _example: #@overlay/match missing_ok=True issuerRef: | kind: ClusterIssuer name: ca-issuer #@overlay/match by=overlay.subset({"metadata":{"name":"config-network"}}) --- data: #@overlay/remove missing_ok=True _example: #@overlay/match missing_ok=True autoTLS: "Enabled" #@overlay/match missing_ok=True httpProtocol: "Redirected" #@overlay/match missing_ok=True default-tls-secret: "kube-system/wildcard" #@overlay/match missing_ok=True domainTemplate: "{{.Name}}.{{.Namespace}}.{{.Domain}}" #@ def kapp_config(): apiVersion: kapp.k14s.io/v1alpha1 kind: Config #@ end #@overlay/match by=overlay.subset(kapp_config()) --- rebaseRules: #@overlay/append - path: [data] type: copy sources: [new, existing] resourceMatchers: - kindNamespaceNameMatcher: {kind: ConfigMap, namespace: knative-serving, name: config-certmanager} - kindNamespaceNameMatcher: {kind: ConfigMap, namespace: knative-serving, name: config-network} EOF
And now we can simply tell TAP to apply this overlay via a simple addition to our TAP values file:
package_overlays: - name: cnrs secrets: - name: cnrs-tls-overlay
The final step is to simply apply the changes to TAP using the Tanzu CLI:
tanzu package installed update tap -n tap-install -f <YOUR TAP VALUES FILE>
Summary
While the setup currently takes a bit of extra work, I strongly believe that this solution is a more secure, and more flexible solution.
Using this mechanism can work for any CA and is a very simple way to allow for more complex, or unique naming conventions for you Knaitve service ingress URLs in a secure and managed way.
Another key benefit is that typically an intermediate CA is made valid for 5 years, where as a standard certificate is valid for only 1 year if not less. Cert-Manager, because it is the one managing the certificates, also manages the lifecycle and will auto renew and rotate the certificates before they expire, keeping your mind clear, and freeing you from replacing certificates on a very frequent basis.
The one thing we have not covered but is a very good idea if you go down this approach, is to utilize External DNS which can auto create and manage DNS records for all of your workloads URLs freeing you from the need to create wildcard DNS records as well!
Using Industry standards like Cert-Manager and External DNS to enhance the TAP experience is a truly great setup, that offers a secure, flexible and easy to maintain platform!
-
Tanzu App Accelerator – Deep Dive
One of the key features in Tanzu Application Platform is a tool called “Application Accelerator”.
In this blog post, we will try to cover what the tool is, why we need it, what makes up an accelerator, and try to give some examples of how these features can be used to build your own custom accelerators.
What is App Accelerator
App Accelerator helps you bootstrap developing your applications and deploying them in a discoverable and repeatable way.
You can think of app accelerator as a way to publish a software catalog to your end users, which will allow them to fill out a simple and customizable form, that will then generate a templated set of files which they can start iterating on for there new project.
Enterprise Architects author and publish accelerator projects that provide developers and operators in their organization ready-made, enterprise-conformant code and configurations.
App Accelerator, is based on the Backstage Software Template plugin, and enriches it with additional capabilities and features to make it extremely easy to use.
An accelerator is built up of 3 key components:
- A Git Repository based on which projects will be generated
- A file in that repository called accelerator.yaml with the definition of how to template the project
- A Kubernetes Custom Resource pointing at the base Git Repository
Lets examine these different components.
Base Repository
The base Git repository is a standard git repository where we can define whatever we want. this could contain for example a sample applications source code, kubernetes manifests, backstage configuration files, or really anything we want.
One of the great parts of App Accelerator is that we can literally template anything we want. You could for example build an accelerator that generates a base structure for a Carvel package, or perhaps a TKG cluster config file. Truly, the sky is the limit!
This Git Repository can be either public or private based on your needs. It also supports sub-paths within a repo making the structure of how you lay out your accelerators very flexible.
The only special thing in this repository is the next component we will discuss which is the accelerator.yaml file which we must have in the repository.
Accelerator Yaml
As mentioned above, the only unique file we have in our base repository is the accelerator.yaml file.
By including an accelerator.yaml file in your Accelerator repository, you can declare input options that users fill in using a form in the UI. Those option values control processing by the template engine before it returns the zipped output files.
An Accelerator Yaml file, has 4 key parts which are accelerator, options, imports, and engine.
Accelerator Section
In this part of the yaml file, we define metadata about our accelerator. This can include things such as, the name of the accelerator, tags to assign to the accelerator to ease searching for the right accelerator via the UI or API, a description for the accelerator, and an icon to use in the catalog for the accelerator.
Options Section
The options section is where we build out the inputs to our accelerator, that will be used to template our project.
Options are defined as an array or objects with a few different options of keys we can define for each option, which will effect how i is exposed to the end user:
Name (Required)
This field which is the only required field, defines the name of the option which can then be used in our templating steps later on.
The name field must be provided in a camelCase format and cannot include special characters.
Label
The label field allows us to set a User Friendly string to be used as the label for this option in the App Accelerator form.
By default, if we don’t set a label, the name field will be used as the fields label.
Default Value
This field allows to define a default value for our option. This can be very helpful when a sane default value is available, but we still want the user to be able to customize the field if needed.
Input Type
This field allows us to specify the type of field we want to expose for this option. There are many different input types available such as:
- text (default) – a single line text box
- textarea – a multi-line string input field
- checkbox – allows an easy and elegant checkbox based list for either single or multi option inputs
- select – a dropdown list
- radio – a radio button list for single option inputs
Data Type
This field allows us to define the type of data we expect to receive in this input.
The valid types for this are, string, number, and boolean.
We can also accept an array value by simply using square brackets arount the type we want for example dataType: [string] for an input that expects an array of strings.
Required
This field accepts a boolean value, which denotes whether a field is required to be filled in by the user or not.
This can be extremely helpful when building out complex and rich accelerators, where we can decide as the architects, what a user needs to fill out for us to have enough information to template the new project for them.
Display
This field accepts a boolean value, which denotes whether this field should be displayed to the end user or not.
By using this value along with the default value field, we can hide specific inputs, but make the value usable by the templating engine.
Choices
This field can be used when the input type is of the types select, radio or checkbox.
This field is an array of objects with 2 keys, text and value. The text key is what will be shown to the user and the value key is what will be passed to the underlying engine.
Depends On
This field is an object with 2 keys, name and value.
This allows us to specify a dependency on a different option for whether this option will be visible or not.
A very common use case for this could be having a field called for example “Advanced Options” which is a boolean checkbox. when that value is set to true, we can display additional options we support, while making the initial form more palatable and easy to approach for the general use cases.
Validation Regex
This field allows us to define a regex in SPEL format which will be used to validate the input provided by the end user.
Using this field is a very good practice, to limit user input mistakes, and to ensure that the generated project contains a valid format.
Options which always exist
While nearly everything is up to the architect there is currently, one option which will always be included even when not specified which is “artifactId”. This option is displayed via the request form with the label “Name” and the description “Provide a name for your new project”.
Imports Section
The next section in the accelerator yaml file is an optional field that was added in TAP 1.2.
The imports section allows us to import one or more fragments (more on fragments later on in this post) into our accelerator, allowing for reuse of generalized functions across multiple accelerators.
This section receives an array of objects referencing the app accelerator fragments we want to use in this accelerator.
Each entry in this section must include a key “name” which will include the name of the fragment we want to import and can optionally include an additional key named “exposes”.
The exposes key, accepts an array of objects, that allows us to define which options from within the imported fragment we want to expose in the current accelerator. we can define specific dependent options here as well via the dependsOn key, and we can also in case of name clashes between different fragments, rename the value within the context of our accelerator via the “as” key.
Engine Section
The Engine section of our accelerator yaml file, is where we define the actual templating steps that will utilize the inputs the user has provided (defined in the options section and those imported via fragments) to generate our project for the requesting user.
When the accelerator engine executes the accelerator, it produces a ZIP file containing a set of files. The purpose of the engine section is to describe precisely how the contents of that ZIP file is created.
When you run an accelerator, the contents of the accelerator produce the result. It is made up of subsets of the files taken from the accelerator’s root directory and its subdirectories. You can copy the files as is, or transform them in a number of ways before adding them to the result.
As such, the YAML notation in the engine section defines a transformation that takes as input a set of files in the root directory of the accelerator, and produces as output another set of files, which are put into the ZIP file.
Every transform has a type. Different types of transform have different behaviors and different YAML properties that control precisely what they do.
A few key terms and concepts must be understood before discussing the different transformation types we have at our disposal.
SpEL Expressions
SpEL or in full the Spring Expression Language is used with our accelerator yaml file, to express logic and conditionals, within the yaml structure itself.
SpEL is a powerful expression language that supports querying and manipulating an object graph at runtime. The language syntax is similar to Unified EL but offers additional features, most notably method invocation and basic string templating functionality.
While similar to OGNL and MVEL, SpEL was created and is supported within all of the Spring portfolio, making it a logical choice for VMware to implement within the product.
The use of SpEL allows us to for example make options and transformation steps conditional based on specific criteria, it allows us to perform regex validations and much more.
one key thing to note that can be a bit difficult at the beginning until you get used to it, is that SpEL notations, also include for example quotes and other special characters we need to make sure we escape correctly in our accelerator yaml file so that it is always a valid yaml document. If we don’t do this our accelerator will not succeed to be executed due to parsing issues.
While learning SpEL notations can take a bit of time, the power it gives us and the flexibility we gain through it, make the learning curve a worth while endeavor.
Conditional Transformations
As mentioned above, just like with options and imports, we can add conditional logic to a transformation step within the accelerator engine specification.
This allows to to for example, include a pom.xml file if the user selected a value in the options called useMaven, and to not include it otherwise.
The conditional transformations, are a key functionality in most modular and custom accelerators, allowing us to write more generic accelerators that can produce different subsets of projects based on inputs, rather then needing to create a whole new accelerator for every permutation we may want to expose to our end users.
Now lets take a look at the transformation types we have at our disposal.
As mentioned, there are multiple (14!) types of transformation types.
Combo
This is a meta transform type. VMware refer to the combo transform as the “swiss army knife” of transforms.
This transformation type, allows for a simpler and more natural structure of our other transformations nested beneath it, which get “combined” together into a single combo transformation.
A combo transform allows us to combine the behaviors of Include, Exclude, Merge, Chain, UniquePath, and Let transformations in a way that feels natural.
Include
When performing transformations, we often only want to run a specific transform on a set of files, and sometimes based on different inputs and transformation logic, we may want to the zip file of the generated project to only include a subset of the files in the base repository.
The Include transform retains files based on their path, letting in only those files whose path matches at least one of the configured patterns (configured via a simple SpEL expression). The contents of files, and any of their other characteristics, are unaffected.
You will almost never use an include transform on its own, but combined together with other transformation steps that manipulate the selected files, this becomes one of the key steps which almost all accelerators will end up having.
Exclude
The Exclude transform retains files based on their path, letting everything in except those files whose path matches at least one of the configured patterns (configured via a simple SpEL expression). The contents of files, and any of their other characteristics, are unaffected.
This is very similar to the include transformation, as it is almost never used on its own but is very useful and commonly used, when combined together with additional transformation steps.
Merge
The merge transformation type, is a very useful “meta” transformation mechanism.
A Merge takes the outputs of several transforms executed independently on the same input source set and combines or merges them together into a single source set.
Basically this allows us in the engine to specify many independent transformations, and have the end results of the transformations be merged together to provide the final output of the project.
A very useful and common use of the merger transformation is to use merges with overlapping contents to apply a transformation to a subset of files and then replace these changed files within a bigger context.
While this is great that we can use overlapping paths, within a merge transformation when this occurs, it is important to note, that the default behavior, is to use the last instance of the conflicting file paths and have its result be the final one.
This means that if we have 2 steps manipulating a README.md file within our merge transformation block, by default the README.md outputted by the second transformation step will be the one we receive in the final projects files.
The Merge Conflict resolution options however are configurable and we will discuss that further on in this document.
Chain
The chain transformation type, is a very useful “meta” transformation mechanism.
While a merge runs each step within it independently and then merges the outputs, in a chain transformation, step are run sequentially and the output from previous steps is fed into the following tasks.
I common example of the usefulness of chain transformations is with the ReplaceText transform. Used by itself, the ReplaceText transform replaces text strings in all the accelerator input files. What if you wanted to apply this replacement to only a subset of the files? You can use an Include transformation step to select only a subset of files of interest and chain that subset into a ReplaceText transformation step.
Using chains is almost inevitable when building a production ready accelerator, and the freedom it gives us is huge!
Let
The let transformation type, is a mechanism that wraps another transformation step and can provide it with additional variables uniquely available in the scope of that transformations scope.
We can define variables via a simple array of objects, where each object includes a “name” key with the name of our local variable, and an “expression” key, where we can provide a SpEL expression to be used as the value of that local variable.
This transformation, is especially helpful when using transformations such as YTT or InvokeFragment, in which we are executing an external mechanism, which often times may have many options we simply want to hard code in our specific accelerator.
This is a common case when using fragments which we will cover in depth later on in this blog post.
Invoke Fragment
As we discussed above, and will dive deeper into later, we can import “sub-accelerators” as sorts of reusable functions into our accelerators via the fragments mechanism.
While they are imported like traditional functions in any programming language above, the InvokeFragment transformation is the mechanism by which we can run these fragments.
At the point of invocation, all currently defined variables are made visible to the invoked fragment. Therefore, if it was import-ed in the most straightforward manner, a fragment defining an option myOption is defining an option named myOption at the accelerator level, and the value provided by the user is visible at the time of invocation.
To override a value, or if an imported option has been exposed under a different name, or not at all, you can use a let construct when using InvokeFragment. This behaves as the Let transform: for the duration of the fragment invocation, the variables defined by let now have their newly defined values. Outside the scope of the invocation, the regular model applies.
An invoke fragment also allows us to cope with situations where the base file structure expected by a fragment is different then that which is present in the accelerator itself by use of the anchor field which allows us to basically get a “chroot” like behavior for the files of our accelerator in the context of the fragments execution.
Replace Text
This is one of the most important and highly used transformation types available.
This allows us a “sed” like transformation type, which can replace a specific string with another string, or can also do more complex tasks such as replace text found based upon a regex, with the contents of a file.
We also have a set of helper functions that we can use with this transformation, that make certain tasks much easier.
The additional helper functions we have include a variable called “files” which has a function defined on it “contentsOf” which allows us to replace a string with the contents of a file.
We also have helper functions for converting cases of our strings. We can convert between camel, kebab, pascal and snake casing of strings by using a function in the format of 2, for example camel2kebab or snake2pascal.
As this transformation manipulates the text data of our files, we must use this with caution as unlike other transformations, this is pretty much a “free for all” and we must test our accelerators adequately to make sure we don’t mess up formatting.
Rewrite Path
The RewritePath transformation type allows you to change the name and path of files without affecting their content.
This transform has 3 main fields which we can configure.
The first field is “regex”. this is where we define the regex in SpEL format for finding the source files we want to rewrite the paths of.
The second field is “rewriteTo”. This is where we define a SpEL expressin where we can define how to rewrite the path.
The third and final field is “matchOrFail”. This field accepts a boolean value, and defaults to false. This is where we can define what happens if the regex doesn’t match a file. Setting this to true can help prevent misconfigurations if you expect all incoming files that are coming in to this execution to match the regex.
This is commonly used for changing paths of source code files based on for example application name which the end user would input, or moving for example a project specific README.md file which you way have in a sub-folder to the root of the generated project which can be helpful if you want to have a README.md file in the accelerator repo itself that is dedicated to the accelerator and a different one for the generated project.
Open Rewrite Recipe
While using the power of both the rewrite path and replace text transformations we can really manipulate any file content and file structure, when dealing with Java specifically we have a better option.
The OpenRewriteRecipe transformation type allows you to apply any Open Rewrite Recipe to a set of files and gather the results.
Open Rewrite is an open source project which enables large-scale distributed source code refactoring for framework migrations, vulnerability patches, and API migrations with an early focus on the Java language.
While the Open Rewrite project has many different categories of recipes, App Accelerator currently supports only the Java related recipes.
The Open Rewrite version used is 7.24.0 and java code is interpreted based on the Java 11 grammar so if using a different Java version be warned their could be some unexpected behaviors. The official Open Rewrite docs should be consulted to see all of the available recipes, and how to use them.
While only the Java subset of recipes is currently supported, this already includes 25 different recipes which can be extremely helpful when constructing an accelerator for a java based application.
While the scope of this transformation is much more limited then both the rewrite path and replace text transformations, because this transformation is not doing text based manipulation, but rather it understands the data it is dealing with, we get a lot of advantages such as type safety, and additional nice feature like correctly dealing with imports in our java code, and it can understand fully vs. non-fully qualified names and treat them accordingly without needing to write replace text functions for all permutations a package name for example could be defined as.
The scope of Open Rewrite as a project is growing and I hope we will see more integration with it and its different recipes, as well as similar tools for other languages and frameworks over time.
YTT
This is bar far one of my favorite and my in my opinion most useful transformation types we get access to in app accelerator.
YTT (Yaml Templating Tool) is an amazing project which is part of the Open Source Carvel toolset, developed and maintained by VMware.
This transform by default will take all files passed to it as well as expose all variables in the accelerator as data values, and execute YTT as an external process.
While we don’t need to pass any parameters to a YTT transformation, we do have access to an optional parameter “extraArgs” which receives an array or strings which can be plain strings or SpEL expressions, and will be passed along to the invocation of the YTT process.
Similar to the advantages of using OpenRewriteRecipe for Java files, the YTT transformation give us amazing capabilities for transforming and manipulating yaml files, in a data structure aware manner.
As everything we do today seems to include some yaml, especially in the kubernetes ecosystem, having a tool like YTT at our exposal inside App Accelerator, opens up huge capabilities.
Use Encoding
Whenever we are talking about manipulation of textual data, for example in the ReplaceText transformation, the engine needs to be aware of the encoding to use for the file.
As the idea is to make App Accelerator as easy to consume, and simple, while at the same time, provide customization capabilities, a default encoding is assumed which is set to UTF-8.
If any files must be handled differently, use the UseEncoding transform to annotate them with an explicit encoding.
While the vast majority of files we deal with are UTF-8 encoded, it is great to see that even niche use cases like this are covered in the platform, allowing us to be comfortable that we are covered with whatever we need to do.
Unique Path
This is a transformation type that can be used to ensure there are no path conflicts between files transformed. You can often use this at the end of a Chain transformation.
This transformation type receives a single parameter named “strategy”. The value of the strategy key can be any of the conflict resolution strategies supported in the platform which we will cover in the next transformation type.
If you are using the Combo meta transformation type, an implicit UniquePath transformation is embedded after the merge transformation.
Conflict Resolution
As discussed multiple times above, if you’re using Merge (or Combo’s merge syntax) or RewritePath transformations, a transformation can produce several files at the same path.
If and when this happens, the engine must take an action: Should it keep the last file? Report an error? Concatenate the files together? etc.
The ConflictResolution transformation type, can be used to resolve these conflicts.
There are 6 different resolution strategies which can be defined via this transformation:
- Fail – when used, on the first occurrence of a path conflict, the accelerator will stop being processed
- UseFirst – when used, the first file produced at this path is the one that will be retained
- UseLast – when used, the last file produced at this path is the one that will be retained
- Append – when used, all files at the conflicting path or concatenated together in the order in which they were defined from first to last
- FavorOwn – When using fragments, this will prefer the file coming from the current executing fragment if possible, with a fall back to the accelerator version otherwise.
- FavorForeign – When using fragments, this will prefer the file coming from the accelerator if possible, with a fall back to the fragment version otherwise.
We also have some defaulting behavior for both Combo transformations and for Chain transformations.
When using a Combo transformation, the default behavior is set to UseLast as the conflict resolution strategy. To change this implicit behavior, you can add at the end of the combo a field “onConflict” with the value of the conflict resolution strategy you would like to use.
When using a Chain transformation, the default behavior is implicitly set to Fail. to override this behavior, you can explicitly set a UniquePath transformation at the end of the chain transformation, and specify the resolution strategy you would like to use.
As you can see, our options for configuring an accelerator are huge, and the options truly are endless. Now that we understand the ideas and capabilities around our accelerators repository, lets take a look at how we connect that repository to the platform.
Accelerator CRD
As is the case with almost everything in TAP, we configure our application accelerators using CRDs.
The cluster on which we will add the custom resources will be the cluster on which TAP GUI and Application Accelerator are installed. In a multi cluster setup this will be the view profile based cluster and in a single cluster setup it will be on that single cluster.
The main CRD we will cover is the accelerator resource, which is a namespaced resource that must be created for each accelerator we want to make available in the platform.
This resource must be created in the namespace the App Accelerator UI deployment is configured which is by default accelerator-system.
Because this is a standard kubernetes CRD, we can manage our Accelerator CRs via any mechanism we would like such as imperative kubectl commands, CICD tooling or a GitOps mechanism like Kapp Controller, ArgoCD or FluxCD.
The spec of the accelerator CRD is extremely simple and basic with just the right amount of options in my opinion.
Like all kubernetes resources, we have a standard metadata section where we can specify labels, annotations, the name of the resource and the namespace in which we want to deploy the resource.
Next within the spec itself we have 3 main types of fields:
Visualization Fields
The visualization fields are similar to the options one could define in the accelerator.yaml file under the accelerator section.
This includes the following optional fields:
- tags – An array of strings that will be applied as filtering strings to the accelerator within the platform
- iconUrl – A URL for an image to represent the Accelerator in a UI
- displayName – A short descriptive name used for an Accelerator as the label
- description – Description is a longer description of an Accelerator
Setting these in the CRD or in the accelerator.yaml file within the repo is completely a matter of preference, where i personally prefer to define it in the repo itself and keep my CR manifests as simple as possible but as mentioned it is simply a matter of personal preference.
Git Source Fields
The Git section is the section in which most of our configuration is done in a typical situation.
This is the section in which we define the connection between the accelerator CR and the Git repository where our accelerator.yaml and templated repo resides.
their are many different options for configuring the git source where most of them are directly mapped to fields in the Flux Source Controller GitRepository CRD which is used in the backend to pull and monitor our git source for updates.
The fields we can configure are:
- git.url – The repository URL, can be a HTTP/S or SSH address.
- git.interval – The interval at which to check for repository updates (defaults to 10 minutes)
- git.subPath – When an accelerator is not at the root of the repository, subPath can be used to specify the root directory of the accelerator
- git.secretRef – If the source repository requires authentication, this should be set to the secret name containing the git credentials. For HTTPS repositories, the secret must contain user name and password fields. For SSH repositories, the secret must contain identity, identity.pub, and known_hosts fields.
- git.ref – an object with different keys underneath that can support which git reference to use for pulling the correct version of the accelerator repository.
-
-
- git.ref.branch – the Git branch to checkout (defaults to master)
- git.ref.tag – the Git tag to checkout. This field has precedence over the branch value
- git.ref.semver – the Git tag semver expression to use to find the latest relevant tag. This field takes precedence of the tag value.
- git.ref.commit – the Git commit SHA to checkout. This field has the highest priority.
-
While as we will see in a moment we can use accelerators without a Git repository, it is highly recommended and much easier to maintain long term, when using Git as the source of our accelerators.
OCI Bundle Source Fields
The final section of configuration options in the accelerator CRD is nested under the “source” key.
Like with most other parts of the platform, we can use an OCI bundle which is generated via the imgpkg tool from the carvel toolset, to store our manifests inside a container registry instead of within a Git repository.
While this is supported to be used for all use cases, the main use case for it, and the only one I would personally consider using it for, is for quick iteration on a accelerator when doing the initial scaffolding.
The quick iteration workflow is performed in a similar manner to how it is done with the VS Code Local iteration workflow, where the Tanzu CLI accepts a flag “–local-path” where we point it to a path on our local filesystem, and we also provide a flag “–source-image” where we define the image URI for where to push the auto generated OCI bundle. This process will automatically generate the OCI bundle, push it to our registry (you must be logged in to the registry on your machine via docker login), and configure the accelerator CR to point to that image URI.
The fields available for configuration under the source key are:
- source.image – This is a reference to an image in a remote registry.
- source.imagePullSecrets – ImagePullSecrets contains the names of the Kubernetes Secrets containing registry login information to resolve image metadata.
- source.interval – the interval at which to check for repository updates
- source.serviceAccountName – the name of the Kubernetes ServiceAccount used to authenticate the image pull if the service account has attached pull secrets.
This can make local iteration easier, especially in organizations with stricter configurations on the Git server, which can delay the time it takes to be able to test out the accelerator.
As you can see, while simple and concise, the accelerator CRD offers us just the right level of configuration knobs, to define from where, and how to access the source code of the accelerator.
Fragments
After tossing around the term “fragment” many time throughout this post so far, lets now actually look at what a fragment is, why we would use them, and how we can use them.
Why do we need fragments
When we build out more and more accelerators, one of the things we can notice pretty quickly, is that there is often a lot of repetition and copy paste work involved between our accelerators.
More often then not, our accelerators will contain at least a few shared aspects whether that be options or transformations, and it can become difficult to manage at scale a fleet of accelerators, without being able to pull out shared logic and reuse it easily within all of the relevant accelerators.
This is why App Accelerator introduced a composition feature in the tool that allows re-use of parts of an accelerator, named Fragments.
What makes up a fragment
A great aspect of how this feature was implemented, is that a fragment is designed to look and behave nearly the exact same way as an accelerator does. It is made up of a set of files in a git repository, and contains an accelerator.yaml, with options, imports and engine sections exactly the same as within an accelerator.
This difference is that they are exposed to the platform via a different type of CRD namely a fragment.
How is a fragment added to the platform
The fragment CRD is a stripped down version of the accelerator CR, with the same git key under the spec section allowing us to point at our source code in a git repository in whichever manner we would like. it also contains a displayName field which provides a short and descriptive name for the fragment.
All other fields are not present in this CR, as you don’t invoke a fragment directly, nor is it visible in the catalog of App Accelerator. It also doesn’t support being based on an OCI bundle and requires the source to be a git repository.
Example use case for fragments
While the examples are endless, some common examples for fragments could include:
- Backstage configuration files – this could be an easy way to manage templates for different backstage configuration settings for components, domains, systems, apis etc. This then can be imported into any accelerator to allow configuration options for managing the backstage configuration yaml files.
- Language version selection – In many organizations, we may want to standardize on a set of specific versions of a language or framework we support. While we could give a dropdown for example in a Java based accelerator to select the Java version to user, or the same goes for any other language or framework, wouldn’t it make a lot more sense to pull this config out of our accelerators, and simply import this fragment into all of the accelerators based on that language? this would allow us to easily update the list of versions we support in one central place and not need to update dozens of accelerators anytime we want to support or remove support for a version.
- Live update Tiltfile – we can extract the Tiltfile to a seperate fragment and simply import it into our accelerators that support live update functionality. This allows us to update the base Tiltfile in a central location and simply reference it within all of our relevant accelerators.
- TAP Workload Yaml – the number of permutations of how one could configure a workload CR are huge with the OOTB supply chains, let alone with custom supply chains. Pulling out the workload yaml templating into a separate fragment allows us to make changes and expose new options for configuration in a centralized way and gain the benefits automatically across all of our accelerators.
The options are truly endless, but I hope that the above examples give you an idea of how fragments could be used to help simplify and accelerate the process of building and maintaining a fleet of accelerators.
Example Custom App Accelerator
I have put together a few custom app accelerators, that implement a lot of the different transformation options, different input types and and configuration capabilities as well as some heavy usage of fragments to emphasize the benefits they can bring.
The example accelerators and fragments are available in a public Github repository and will be updated overtime with more and more examples.
If you have any questions, suggestions or things you would like to see in that repo, feel free to open up issues or PRs!
Now that we have covered what an accelerator is, how we build one, how we add one to our platform, and also how we can use fragments to make them composable and enable reusage of shared functionality, lets now take a look at App Accelerator from a consumption perspective.
How to invoke an accelerator
An accelerator can be invoked via 3 main interfaces where 2 of them are UI based wizards and the other is a CLI based mechanism.
Lets take a look at the features, pros and cons of each of these different interfaces, and try to understand in which use cases, each of these tools would make sense to be used.
TAP GUI
The main User interface for App Accelerator is provided via a custom Backstage plugin provided as part of TAP GUI.
This interface allows end users to view the catalog of available accelerators, filter based on tags or search for them via a search bar.
It also provides a button which can open up in a new tab the source repository that backs the specified accelerator, and also a button to choose an accelerator to use, which will open up the accelerators input form in a simple graphical UI.
If we take for example the OOTB accelerator for the sample “Tanzu Java Web App” the input form is very simple and basic:
Once we have filled in the options available in the accelerator, we can simply click next and will receive the following page:
In order to generate the final project we can simply click on the “Generate Accelerator” button which will run the accelerator engine for our new project and that will provide us with the following screen:
We can see in the right hand side, the logs of the accelerator generation task, and on the left we are provided with 2 options.
We can click the download button which will open a window to select where we want the zip file with our templated project to be saved on our machine and we are ready to start iterating on the code locally!
We can also click on the “Explore zip file” button which will open a file browser modal where we can view the generated accelerator files
Just to show what a more complex accelerator form may look like here is the form for one of my custom highly customizable accelerators for a Java web app that integrates with App SSO:
Overall, the experience using the TAP GUI interface is pretty great and simple!
While this plugin works great, and provides a very good UX, one thing that is becoming ever so clear these days in all sorts of polls and research, is that developers want to live within their IDE and want to stay within the boundary of their IDE as much as possible. This is where the next interface comes into play.
VS Code Extension (Beta)
In TAP 1.2 amongst all of the amazing new capabilities, we also got a new VS Code Extension which is for App Accelerator.
This plugin enables us to explore the available accelerators in our platform, and generate a new project from one of them.
The configuration of the plugin is extremely easy and we only need to provide the URL of our App Accelerator server’s Ingress:
Once we have configured the accelerator URL in the VS Code extensions settings, we can go to the “Tanzu App Accelerator” extension in the list on the left and view the list of accelerators.
Once we click on a specific accelerator we want to use, a form is opened on the right to generate the project:
As we can see, we have a very similar form UX as we get in the TAP GUI interface, however this is provided directly in the IDE!!!
The other great feature of this plugin, is that when we click on the “Generate Project” button, we are prompted with a popup to select where we want the generated project to be saved, and once we set that, the accelerator will run, and a new window of VS Code will be opened with the workspace set the the newly generated project!
While their are some rough edges currently in the extension, the overall approach and UX is pretty awesome!
Being able to consume accelerators directly from the IDE is an amazing feature, and I really think that this is a perfect example, of how VMware are really focusing in TAP on meeting the different personas where they are, and providing the integrations into each personas natural ecosystem of tooling in a truly awesome way!
While UIs are great, and we often really like the experience they provide, some people or certain use cases, simply require a CLI or API based approach, and that is where the next interface comes into play.
Tanzu CLI – Accelerator Plugin
The Tanzu CLI has a dedicated plugin which is fully dedicated to App Accelerator.
Through this plugin, we can create, apply, update and delete accelerators and fragments.
We can use the CLI to view which accelerators we have:
We can then get the input options and details for a specific accelerator:
And now we can generate a project from the accelerator:
And that generated in my current directory a zip file with the newly generated project!
Summary
I am a big fan of the Application Accelerator functionality in TAP, especially with the new features that have been added in TAP 1.2!
I hope that this blog post has been helpful in organizing the subject of Application Accelerators for you, and given you a bit more of an understanding on the features available within the tool, and truly how extensible and customizable we can make our platform by creating some custom accelerators that are really well adopted and suited for your organization.
I am excited also to see where this project goes moving forwards in terms of new features and UX enhancements!
It is great to see how VMware have built a truly awesome technology, based on upstream open source plugins in backstage, and added that additional enterprise grade wrapper and feature enhancements to create a very cohesive and good experience for both the Enterprise architects or SREs that will define the accelerators, as well as for the developers who will be consuming the accelerators.
-
TAP And Helm – A Story Of YTT Magic
In this post I want to cover a really interesting scenario of customizing Tanzu Application Platform (TAP) I have been working on, and how YTT magician John Ryan helped me reach an elegant and really cool solution to an edge use case that I was really not sure how to handle.
General Use Case
The use case I was trying to solve was making TAP easier to start with and more brownfield friendly for customers that already are heavily invested in kubernetes and that manage their micro services via Helm Charts.
The goal was to have a supply chain, that would instead of deploying a Knative Service, would deploy a Helm chart.
I wanted to reuse as much of the Out Of The Box (OOTB) supply chains as possible and really only change what was needed for my specific use case.
General Approach
The technical approach I wanted to use was to integrate with the FluxCD Helm Controller, which provides a CRD called HelmRelease, which defines as one would expect a Helm release.
The Helm release object itself points to a Repository and a chart within it that it wants to deploy, some additional values around how to deal with upgrades and things of that sort, and most importantly for our use case, a field in which i can supply the values I want to use when deploying the Helm chart.
Expected UX
The way I imagined the UX to work, is that the platform administrator would install the Helm controller on the cluster, and then either the platform administrator or the Developer depending on the roles and responsibilities within the organization, would create a HelmRepository CR for the relevant helm repository in which the charts we want to deploy are located.
Once that was done, from a TAP workload perspective, the user would define as parameters on the Workload yaml, which chart they want to deploy, which version, from which repository, and also would define any values they would like to use when deploying the helm chart.
What was the first hurdle
One of the key differences when deploying a Helm chart vs for example a Knative Service, is that while in a Knative service the path in which we need to update the image reference is well defined as the API itself of the Knative service is well defined, Every helm chart could expect the image to be placed in a different field altogether as the user is simply defining a values file for a templating engine and is not actually working with a kubernetes native object.
My first thought was to simply hard code this to images.repository as that is the default in many Helm charts, however this doesn’t always work, and I decided that I wanted a more generic solution that could cover more scenarios.
How I planned to solve this hurdle
The idea I had in mind for solving this, was to set a default to be as mentioned above images.repository, but to add another optional parameter a user could supply on the workload with a dot separated path (basically a JSON path) to the field in which we should update the image reference with the image URI generated by the supply chain.
The Main Challenge
After coming up with the general idea, I started working on the Cartographer ClusterConfigTemplate resource that in the end will generate the HelmRelease yaml manifest.
All was going well until I started to deal with how to configure the image path as a variable for the helm chart.
The challenge was basically how do i generate a yaml overlay in YTT from the json path variable i was provided via the workload parameter.
To make this easier to understand lets look at an example.
The manifest i am trying to manipulate looks something like this:
apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: #@ data.values.workload.metadata.name namespace: #@ data.values.workload.metadata.namespace spec: interval: 5m chart: spec: chart: #@ data.values.params.chart_name version: #@ data.values.params.chart_version sourceRef: kind: HelmRepository name: #@ data.values.params.chart_name namespace: #@ data.values.workload.metadata.namespace interval: 1m upgrade: remediation: remediateLastFailure: true values: #@ chart_values()
And the data values that is received by the template looks something like:
workload: metadata: name: test-iterate namespace: default: params: image_key_path: example.awesome.custom_image.field chart_name: tomcat chart_version: 1.0.0 chart_values: hello: 1 testVar: test example: test: true image: demo.image.url/example/test:1.0.0
And the need is to take the value of data.values.image (demo.image.url/example/test:1.0.0) and overlay the value of the chart_values which is a yaml fragment at the path specified in the image_key_path parameter.
Basically, the end goal of the chart_values() function in the HelmRelease manifest i am templating would be in our case:
hello: 1 testVar: test example: test: true awesome: custom_image: field: demo.image.url/example/test:1.0.0
After trying on my own and not being able to think of a solution, I reached out to John Ryan from the Carvel team to see if he had any ideas on how to solve this issue.
Luckily, John who always is happy to help and has an amazing way of thinking and reasoning about tech in general and YTT in particular, came to the rescue and we started discussing the use case, and trying to really break down the scenario and figure out how to solve it one piece at a time.
The first part we needed to solve was how to go from a json path base notation into the corresponding yaml fragment we could then use.
Basically, how to go from:
example.awesome.custom_image.field
To:
example: awesome: custom_image: field: demo.image.url/example/test:1.0.0
But as we were discussing this and trying to come up with the mechanism, John brought up a very valid point which needed to be addressed as well. In yaml, a key can also contain a dot. this makes the simple splitting on a dot a less optimal solution.
After back and forth discussions, we came up with the idea that if we were able to split on every dot that didn’t have a “\” before it, which is a standard escaping character, and then remove the “\” character from the key itself we would have a solid solution to this issue.
Very quickly John came back to me with a set of YTT functions he wrote that solve this part of the issue.
#@ def as_dict(parts, leaf): #@ if len(parts) == 0: #@ return leaf #@ end #@ return {parts[0]: as_dict(parts[1:], leaf)} #@ end #@ def split_dot_path(path): #@ # "replace any dot that does NOT have a leading slash with ':::' #@ path = regexp.replace("([^\\\])\.", path, "$1:::") #@ path = path.replace("\.", ".") # consume escaping of '.' #@ return path.split(":::") #@ end
Basically, we are splitting the string at every dot that is not preceded by a backslash which is checked via a regular expression and then we pass that values into a function that converts the list into a dict which in the end gives us the needed output.
With these 2 functions defined, we can run something like:
#@ path = data.values.params.image_key_path #@ image = data.values.image #@ image_path = as_dict(split_dot_path(path), image)
Now that is pretty slick and is extremely useful!!!
But then came challenge number 2. Now we needed to tackle the task of how to overlay this newly generated yaml for the image on top of the other chart values the user has supplied.
This would have been pretty simple if our chart values we needed to overlay were static and were specified in the YTT template itself as we could simply add something like:
#! original config: want to preserve all this... --- chart_values: hello: 1 testVar: test example: test: true #@overlay/match by=overlay.subset({"chart_values":{}}) --- #@overlay/match-child-defaults missing_ok=True chart_values: #@ image_path
This would work perfectly and would be great ……. except that our chart values we need to overlay, are also located in a data value themselves and are not static.
So my thought was to simply use the programmatic way of using the overlay mechanism in YTT which would basically look something like:
#@ chart_value: overlay.apply(data.values.params.chart_values, image_path)
In the example above we are basically saying to overlay what exists within the params.chart_values data value with the yaml we have defined in the variable image_path from above.
While i was hopeful this would work, sadly it did not.
The reason it didn’t work was that YTT is a very safe tool, and it does its best to make sure you don’t make mistakes when manipulating yaml.
One of the mechanisms it has in place that implements this security aspect is that when overlaying on a yaml, by default if you define a key that doesn’t exist in the base yaml, it wont allow you to overlay it. The reasoning behind this, is that in many cases you may have made a typo for example and really meant to update spec.replicas for example but in the overlay you wrote by mistake spec.replica without the “s”.
Many other tools would simply comply and would add a new field called replica which in the end would not give you the solution and end result you were looking for.
While this is a great default, that can save a lot of hassle battling with unfortunate typos, we also need a way around this which would allow an overlay to define a new key as well, if and when needed, but it should be explicit that we know that the key may not exist and that we are ok with that.
The way this is typically defined in YTT is via adding an annotation like is in the static example from above:
#@overlay/match-child-defaults missing_ok=True
This annotation which we place above the overlaying step itself, basically says that we are aware that a key may be missing and that we are ok with that and want YTT to proceed with the overlay we have provided whether a key exists or not.
While this solution is great when using the annotation based approach for overlaying, there is no comparable solution offered for the overlay.apply() programmatic way of doing an overlay which is what we are using in this case.
In the end after more testing and a lot of trial and error I worked out a solution to this issue. We can define a new function that returns our image_path value under a key that would be in our case chart_values, but would add that annotation of missing_ok=True above it, and then use that function inside our overlay.apply() call.
#@ chart_values: overlay.apply(data.values.params, image_path())
While this got us now very close to a solution, we still have one issue, which is that our new yaml that is exposed by this overlay.apply() function is not:
conf: hello: 1 testVar: test example: test: true awesome: custom_image: field: demo.image.url/example/test:1.0.0
But rather it is:
conf: chart_values: hello: 1 testVar: test example: test: true awesome: custom_image: field: demo.image.url/example/test:1.0.0
Which to me meant we probably just need to reference the overlay.apply() call to be something like:
#@ conf: overlay.apply(data.values.params, image_path()).chart_values
But I was wrong. The error was that the value returned by the overlay.apply() call, is not a struct object in which we can reference all keys via the dot notation, as each key is an attribute of the parent object, rather we are returned a YTT custom object type called a yamlfragment which does not support the dot notation referencing as the keys themselves are not attributes of the parent object.
While i did come up with a working solution for this, which i found through some slack history searching of the carvel channel, When speaking with John, he gave me an even better approach i did not know was possible.
Basically, while we cant use the dot notation for referencing a key in a yamlfragment, we can use the [] notation to reference them.
This meant that i could use something like:
#@ conf: overlay.apply(data.values.params, image_path())["chart_values"]
While this solution is already pretty awesome and is definitely a good solution, i wanted to make it even more resilient, i wanted to support scenarios where perhaps the user didn’t define any additional chart_values, or maybe the user did not define a image_key_path as they are fine with our default path of image.repository.
The Full Solution
In the end, after adding in the defaulting logic, and adding in some checks to make sure i can support as many different permutations of inputs provided by the end user, I had a fully working, and pretty cool piece of YTT magic in my hands.
The final YTT template section that covers this solution in the end turned out to be:
#@ def as_dict(parts, leaf): #@ if len(parts) == 0: #@ return leaf #@ end #@ return {parts[0]: as_dict(parts[1:], leaf)} #@ end #@ def split_dot_path(path): #@ # "replace any dot that does NOT have a leading slash with ':::' #@ path = regexp.replace("([^\\\])\.", path, "$1:::") #@ path = path.replace("\.", ".") # consume escaping of '.' #@ return path.split(":::") #@ end #@ def image_path(): #@ if hasattr(data.values.params, "image_key_path"): #@ return data.values.params.image_key_path #@ else: #@ return "image.repository" #@ end #@ end #@ chart_config = as_dict(split_dot_path(image_path()), data.values.images.image.image) #@ config_source = data.values.params #@ def chart_overrides(): #@ return data.values.params #@ end #@ def image_override(): #@overlay/match-child-defaults missing_ok=True chart_values: #@ chart_config #@ end #@ def chart_values(): #@ if hasattr(data.values.params, "chart_values"): #@ return overlay.apply(chart_overrides(), image_override())["chart_values"] #@ else: #@ return image_override().chart_values #@ end #@ end #@ def helm_release(): --- apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: #@ data.values.workload.metadata.name namespace: #@ data.values.workload.metadata.namespace labels: #@ merge_labels({ "app.kubernetes.io/component": "helm-release", "app.tanzu.vmware.com/release-type": "helm" }) spec: interval: 5m chart: spec: chart: #@ data.values.params.chart_name #@ if/end hasattr(data.values.params, "chart_version"): version: #@ data.values.params.chart_version sourceRef: kind: #@ data.values.params.chart_repo.kind name: #@ data.values.params.chart_repo.name #@ if/end hasattr(data.values.params.chart_repo, "namespace"): namespace: #@ data.values.params.chart_repo.namespace interval: 1m upgrade: remediation: remediateLastFailure: true values: #@ chart_values() #@ end
Summary
As you can see, this was a challenging but really interesting use case, and while it was not trivial to solve, YTT has all of the capabilities to handle these types of situations if we simply approach the tool in the right mindset.
YTT is truly amazing in my opinion because while it has a very easy to understand and easy to write mechanism for day to day simple tasks, it also has the full power of a programming language to extend the capabilities and possible use cases without needing to bring in other tooling.
The other part of YTT and Carvel in general that makes me love it so much, is the Carvel team itself.
I have worked with many open source projects, from the kubernetes ecosystem and beyond, but have never encountered a team as eager to help and support the end users as the Carvel team.
With great technology, and great people building and leading the Carvel toolset, I really think that the Carvel tools are a must for those in the kubernetes ecosystem that want to really boost up their game, and start exploring the true cutting edge of what is possible.
I really want to thank John Ryan again for his help in this scenario as well as many other times in the past!
For those interested in trying out this supply chain and more details and how to utilize it, you can check out the dedicated Git Repository where i have uploaded the supply chain and the cartographer template that implement this solution.
-
Easy Customization of TAP 1.2
One of the greatest features in TAP 1.2 in my opinion is something that many have overlooked, including myself in my What’s new in TAP 1.2 blog post.
A big challenge that exists with platforms such as TAP, is how do i go about customizing the platform when the defaults or configuration nobs provided by the vendor simply are not enough.
This could be for many different use cases and scenarios, and we will cover in this blog post one key feature that I think can be very valuable in many scenarios, which is Auto Rendering of Tech Docs.
Back in May 2022 with TAP 1.1, I wrote a blog post on how I configured this which can be viewed here for more details.
While that approach mentioned in the original post is still possible, in TAP 1.2 we have a much better solution.
TAP is configured utilizing the Carvel toolset and is bundled into packages that we deploy to our cluster.
Carvel packages are a really awesome solution for managing applications on Kubernetes and you can see some of my blog posts in the Carvel topic on this site, that go deeper into the mechanism and why it is so useful.
TAP has also chosen to utilize a method referred to by many now as a Meta-Package which is basically a single package that installs many other packages.
This is how you can install TAP via a single YAML manifest and a single command, but still get over 30 different applications and components installed and managed for you that build up the platform.
While the idea of a package is that the author will build it to accept all the inputs they believe should be configurable by the end user, sometimes the end user may disagree with this decision.
In Carvel packages, because we have the access to utilize YTT under the hood, an escape hatch mechanism was made which allows you to save a YTT Overlay in a secret, and then annotate the PackageInstall CR in your cluster with the secret name. By doing this, those YTT Overlays you save will be taken into account when rendering the YAML manifests during the packages reconciliation.
Now this has always existed, so why am I talking about this in the context of something amazing in TAP 1.2?
Well the reason is, is that when you use a mechanism like TAP does of a Meta-Package which installs many other packages, the mechanism to overlay things becomes very difficult.
In the pre 1.2 days of TAP, you would have to either pause the reconciliation of the Meta Package like I showed as an example in the previous blog post on Tech Docs, or you could write an overlay for the meta package, that would overlay the package installation of the sub package, and that is a pretty messy overlay to deal with and not a great UX. It also would typically happen as a Day2 step, as most installations of TAP will happen via the Tanzu CLI as documented in the Official Documentation.
So with that background lets dive into the new feature in 1.2!
Package Overlays!!!
In TAP 1.2, a new field was added as a value option in the TAP Meta Package which is “package_overlays”. This field accepts an array of objects, where we can define the secret name and the package we want the secret/s applied to.
Let’s take a look at the Techdocs scenario as an example.
The first step is to Create a secret with my overlay for adding the docker socket container and volume mounts to the TAP GUI deployment.
cat << EOF > overlay.yaml #@ load("@ytt:overlay", "overlay") #@overlay/match by=overlay.subset({"kind": "Deployment", "metadata":{"name":"server","namespace":"tap-gui"}}) --- spec: template: spec: containers: #@overlay/match by=overlay.subset({"name":"backstage"}) - name: backstage #@overlay/match missing_ok=True env: - name: DOCKER_HOST value: tcp://localhost:2375 volumeMounts: - mountPath: /tmp name: tmp - mountPath: /output name: output #@overlay/append - command: - dockerd - --host - tcp://127.0.0.1:2375 image: harbor.vrabbi.cloud/tap/docker:dind-rootless imagePullPolicy: IfNotPresent name: dind-daemon resources: {} securityContext: privileged: true runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp name: tmp - mountPath: /output name: output #@overlay/match missing_ok=True volumes: - emptyDir: {} name: tmp - emptyDir: {} name: output EOF
This overlay is basically the same as the patch file created in the previous post, just using some YTT annotations to tell YTT how to overlay this config onto the manifests in the package.
Now lets create the secret with this file as its content. To do this, we need to create the secret in the same namespace the TAP Meta Package is installed in which unless you didn’t follow the official documentation is called tap-install.
kubectl create secret generic -n tap-install tap-gui-techdocs-overlay --from-file=overlay.yaml
This will create the secret that we need for overlaying the TAP GUI package.
Now the final step before installing or updating our TAP installation, is to add a few lines to our TAP values file which would look in this example like:
package_overlays: - name: tap-gui secrets: - name: tap-gui-techdocs-overlay
Once we have that in our values file, we can proceed as regular with deploying or updating TAP and this overlay will be applied to the relevant package automatically.
So How does it work?
In TAP, the carvel packages are all saved as imgpkg bundles in a Container/OCI registry.
Lets find the imgpkg bundle URL from the relevant package CR after we have imported the TAP Package repo into our cluster:
IMAGE_URL=`kubectl get pkg -n tap-install tap.tanzu.vmware.com.1.2.0 \ -o jsonpath="{.spec.template.spec.fetch[0].imgpkgBundle.image}"`
This command will give us the URL of the package in our OCI registry if we relocated the packages as is suggested or in the official registry if you are installing TAP in a POC mode.
In any event, we can now pull down the package to our machine to see the manifests it includes by using the imgpkg CLI tool:
imgpkg pull -b $IMAGE_URL -o ./tap-12-meta-package
Once the command finishes, we will get a folder generated with all of the manifests that were in the meta package.
Lets take a look at the file which is relevant for us:
cat ./tap-12-meta-package/config/package-overlays.yaml
And that will return the following YAML:
#@ load("@ytt:overlay", "overlay") #@ load("@ytt:data", "data") #@ def build_package_overlay(package): kind: PackageInstall metadata: name: #@ package #@ end #@ for package in data.values.package_overlays: #@overlay/match by=overlay.subset(build_package_overlay(package.name)) --- metadata: #@overlay/match missing_ok=True annotations: #@ i=0 #@ for secret in package.secrets: #@overlay/match missing_ok=True #@yaml/text-templated-strings ext.packaging.carvel.dev/ytt-paths-from-secret-name.(@= str(i) @): #@ secret.name #@ i=i+1 #@ end #@ end
Lets break this up into a few pieces to better understand whats going on
1. High level Annotations
#@ load("@ytt:overlay", "overlay") #@ load("@ytt:data", "data")
In these 2 lines, we are loading the overlay and data libraries of YTT into our files execution.
2. Defining a template to use for matching resources in the package
#@ def build_package_overlay(package): kind: PackageInstall metadata: name: #@ package #@ end
As can be seen, we are creating the base structure of a PackageInstall CR which contains the kind, and name of the object.
we are defining this as a function and it has an input called package which we use to fill in the package installs name when calling the function.
3. Iterating over the package overlays supplied in the values file
#@ for package in data.values.package_overlays:
This is a simple for loop which we close in the final line of the file via the “#@ end” annotation.
4. Defining the object the overlay should apply to
#@overlay/match by=overlay.subset(build_package_overlay(package.name)) ---
In this overlay annotation we are telling YTT to overlay on objects based on them matching the output of the function we defined above called build_package_overlay.
This is a very elegant way to specify, I want to update an object in an array of many objects, that meets specific criteria. while there is no need for the function necessarily, and we could just inline its definition into the overlay annotation, doing it this way makes it much easier to understand, and update if needed.
5. Defining the overlay patch itself
metadata: #@overlay/match missing_ok=True annotations:
In these lines we are defining where the overlay should be applied meaning, we need the addition of an annotation to a package install CR.
The annotation we see here, is a great feature of YTT. YTT is a very safe tool, and it is precautious and doesn’t want to break things for you, or allow you to break things by mistake as much as possible.
The way YTT works with overlays, is that if a field doesn’t exist for example metadata.annotations at all, we can’t simply add metadata.annotations.bla to our manifests.
The reason for this is that we want to be certain that the added field is not simply lets say a typo and that you wrote metdata instead of metadata. this would make it so that the outputted YAML, while it does follow what you inputted, does not follow what you intended.
This YTT annotation is the way to say, I know that this field is defined correctly, so whether it exists or not please proceed and add the underlying values.
6. Adding the annotations/s to the CR
#@ i=0 #@ for secret in package.secrets: #@overlay/match missing_ok=True #@yaml/text-templated-strings ext.packaging.carvel.dev/ytt-paths-from-secret-name.(@= str(i) @): #@ secret.name #@ i=i+1 #@ end #@ end
This section is using the benefit of YTT exposing Starlark, a pythonic language to us via YAML annotations, and is basically setting an index and then iterating over each secrets i have said should be used as an overlay for this package, and then it constructs the kubernetes annotation using the index count as needed in order to support multiple overlays for a single package.
It finally increases the counter index by one for each iteration, and then it closes out the 2 for loops we are in.
Summary
It is pretty amazing the impact 21 simple lines of YTT can have on the UX of a project, but it really does make the product so much more customizable in an elegant, and simple way.
I really look forward to trying out this mechanism more and more over time as different use cases come up which need some tweaking of the platform.
While most people won’t need to use this mechanism necessarily, it is great that it exists in the event that you do need it. whether that is now, or in the future as your implementation of TAP evolves.
-
What’s New In TAP 1.2
Tanzu Application Platform just released version 1.2 and it is a huge release!
The release not only has many bug fixes and security patches but also includes a whole set of new functionality that really brings TAPs capabilities to the next level!
Support for Air Gapped Installation (Beta)
One of the key things added in TAP 1.2 is the initial support for Air Gapped installations.
Having these capabilities is a critical feature, as it will allow highly regulated environments such as insurance companies, governmental agencies and also companies in the security fields to utilize the amazing capabilities of TAP within their environments which is not an easy task when dealing with Kubernetes platforms.
While the Air Gapped Support is currently in Beta, this means we are on the right direction and can hopefully see a fully supported air gapped topology in an upcoming release in the not to distant future.
Support for new workload types (Beta)
TAP from the beginning was very well suited for web applications similar to the TAS (Cloud Foundry) platform supported.
As the use of TAP grows, new types of applications need to be supported as well, and due to the pluggability and customization capabilities of TAP, in the 1.1 release we saw an additional type of workload added which was the function type of workload.
Function workloads similar to what one would typically deploy to a system like lambda was a logical next step and in TAP 1.2 not only were enhancements made to both the web type and to the functions type of workload, we also get 2 new types of workloads:
TCP Workloads
The tcp workload type allows you to deploy traditional network applications on Tanzu Application Platform. Using an application workload specification, you can build and deploy application source code to a manually-scaled Kubernetes deployment which exposes an in-cluster Service endpoint. If required, you can use environment-specific LoadBalancer Services or Ingress resources to expose these applications outside the cluster.
The tcp workload is a good match for traditional applications, including HTTP applications, that are implemented as follows:
-
- Store state locally
- Run background tasks outside of requests
- Provide multiple network ports or non-HTTP protocols
- Are not a good match for the web workload type
Applications using the tcp workload type have the following features:
-
- Do not natively autoscale, but can be used with the Kubernetes Horizontal Pod Autoscaler
- By default are exposed only within the cluster using a ClusterIP Service
- Use health checks if defined by a convention
- Use a rolling update pattern by default
Queue Workloads
The queue workload type allows you to deploy applications that run continuously without network input on Tanzu Application Platform. Using an application workload specification, you can build and deploy application source code to a manually-scaled Kubernetes deployment with no network exposure.
The queue workload is a good match for applications that manage their own work by reading from a queue or a background scheduled time source, and don’t expose any network interfaces.
Applications using the queue workload type have the following features:
-
- Do not natively autoscale, but can be used with the Kubernetes Horizontal Pod Autoscaler
- Do not expose any network services
- Use health checks if defined by a convention
- Use a rolling update pattern by default
Application Accelerator Enhancements
Another key area that got enhanced greatly in TAP 1.2 is the Application Accelerator.
The key new features for App Accelerator are:
- Fragments
- Sub Path Support
- VS Code integration
Fragments
While Application Accelerators have always been great since TAP 1.0, one of the key things we always heard from customers was that they end up duplicating many of the options of their accelerators for all the custom accelerators they build.
While this can work at small scale, VMware recognized that at larger scale this can be very tedious and can be hard to maintain.
To address this, VMware introduced in TAP 1.2 Accelerator Fragments.
Accelerator fragments are reusable accelerator components that can provide options, files or transforms. They may be imported to accelerators using an import entry and the transforms from the fragment may be referenced in an InvokeFragment transform in the accelerator that is declaring the import.
This basically means we can have a set up accelerator fragments in a shared repository and utilize them within our larger accelerators in a really simply and DRY manner!
Sub Path Support
Another key functionality that can make integrating Application ACcelerator into your workflow easier is the support for Git Sub Paths.
This is accomplished via the git.subPath field in the accelerator CRD, and the value is the folder inside the git repository to consider as the root of the accelerator or fragment. Defaults at the root of the repository.
This makes it possible to consolidate accelerators into a single mono-repo for example and not require a git repo per accelerator which could get cumbersome to manage over time.
While this may seem like a small change, in practice it is changes like this that give an overall smoother onboarding to the platform and make it easier to integrate into existing processes within customer environments.
VS Code Integration
Another key aspect of this release is enhancing the experience of developers inside of their IDE.
While TAP GUI is an amazing platform that i believe gives huge benefit to customers of all shapes and sizes, integrating into the IDE is a must for a DevX platform today and VMware understand this and are working to make this story even more compelling!
In TAP 1.2 one of the enhancements to the VS Code integration is the addition of an Application Accelerator extension.
The application Accelerator Visual Studio Code extension lets you explore and generate projects from the defined accelerators in Tanzu Application Platform using VSCode.
This allows the developer to jumpstart a new project using the accelerators the Platform team has supplied them with directly from their IDE, again shortening the time it take to get from an idea, to actual code in a really unique and simple manner!
Application SSO
One of the key challenges people encounter when building out distributed micro service based architectures is the whole area of managing an authentication mechanism which should preferably be an SSO system.
Application Single Sign-On for VMware Tanzu® (AppSSO) provides APIs for curating and consuming a “Single Sign-On as a service” offering on Tanzu Application Platform.
With AppSSO, Service Operators can configure and deploy authorization servers. Application Operators can then configure their Workloads with these authorization servers to provide Single Sign-On to their end-users.
AppSSO allows integrating authentication and authorization decisions early in the software development and release life cycle. It provides a seamless transition for workloads from development to production when including Single Sign-On solutions in your software.
It’s easy to get started with AppSSO, deploy an authorization server with static test users, and eventually progress to multiple authorization servers of production-grade scale with token key rotation, multiple upstream identity providers, and client restrictions.
Having this addition in TAP is pretty awesome and can make the lives of developers and operations teams so much easier while keeping security at a very high standard as is required.
Support For Kaniko
TAP is a very pluggable and customizable system built with very good defaults, but also with the ability to override these defaults when needed.
Since TAP 1.0 we were able to go from source code to a URL very easily! much of this was possible due to the use of Tanzu Build Service which is the default image building tool in TAP.
TBS utilizes the Open-Source Kpack project, which is a Kubernetes operator that allows us to build container images using Cloud Native Buildpacks.
When using TBS, no Dockerfile is needed and we gain huge amounts of added features like, smart caching, auto rebasing of images, SBOM generation etc.
While this is great, some use cases simply are not possible with Buildpacks, and in other situations, even if they could be done with Buildpacks, companies may prefer for one reason or another to utilize Dockerfile based builds instead.
In TAP 1.1 a new feature was added which allowed us to supply a pre built image to TAP, and it would utilize that image for the remaining steps of the supply chain instead of building the image itself within the supply chain.
Now in TAP 1.2 we have a huge improvement to the flexibility and set of options we are provided OOTB, and TAP now can officially support Dockerfile based builds as part of the supply chain itself using the Open-Source project Kaniko.
This means that whether we use the default and highly recommended TBS approach, or decide to use Dockerfiles via Kaniko, we will get the same UX and the same outcome for our developers which is indeed a pretty awesome thing to see in action!
TAP GUI Enhancements
TAP GUI is the central portal of the platform and as such, enhancements are almost always to be expected in this realm and TAP 1.2 doesn’t let us down in this aspect either!
TAP GUI enhacements include:
- Better support for Multi Cluster Topologies
- PR Based flow visibility
- Security Scan Data visbility
- Customization capabilities
- Pod Log viewer
Multi Cluster Enhancements
One of the enhancements made in TAP GUI, is related to the multi cluster topology of TAP which is available since TAP 1.1!
With the enhanced visibility in TAP 1.2, we can now see the supply chain steps across all clusters correlated in a simple and intuitive UI. this means we can track our workloads across all of our clusters from Build time till the actual deploying of the app in production from a single plane of glass!
PR Based Flow Visibility
Another key enhancement in TAP 1.2 is that for the multi cluster scenarios, we now get OOTB support for a PR base promotion cycle. this means that TAP will create a PR for us with the updated manifests when a change is made to our app manifests for whatever reason that may be.
Not only does TAP support a PR based flow though, it is also integrated into TAP GUI!
This means that when visualizing the workload, if a PR based flow is configured, you will be able to see that there is a PR that needs to be merged in order for the workload to proceed in the promotion flow, and will have a simple button to click that will direct you to the relevant PR, where you can review and then merge the changes to have the workload proceed in its lifecycle steps.
Security Scan Data Visibility
One of the key focuses in TAP is around the whole idea of image and platform security.
A key part in this is the metadata store, where image and source code scan results, which occur as part of our supply chains are stored.
By having a central place where our vulnerability data is aggregated, we can easily use this data to our advantage and understand the security status of our images.
TAP 1.2 has added the ability to now have the vulnerability results visible within TAP GUI, allowing for a shift left approach, and allowing our developers to easily see where they have issues in terms of security, and allow them to have a quick and simple iterative loop of feedback when building an app, to make sure it is secure and up to the companies standards, without the toil of traditional ticket based processes.
Having this visibility is a huge step in the direction of really shifting as much as we can left and making our platforms and application even more secure and protected without adding complexity and slowing down the development process.
Customizing the UI
While TAP GUI is nice looking, in most organizations we need the ability to customize and “brand” our UIs to be inline with our companies logos and general branding considerations.
TAP GUI as of TAP 1.2 now supports customizing the Portals Icon, Title, Header, Org name, Help menu links, Authentication page title and more!
While this may seem like a small addition, this is a reason nice ability that just makes the experience of using TAP GUI that much nicer in an organization.
Pod Log Visibility
One of the great plugins VMware have built for TAP GUI that has been around since the beginning is the Application Live View plugin. This plugin allows us to view Realtime metrics of our containers using the spring boot actuator mechanism.
While this has always been great for debugging and understanding the status of our apps, one key thing was missing from TAP GUI in order to really support a debugging workflow.
TAP 1.2 now adds the ability to view pod logs for our workloads! This allows us to get much better visibility and to be able to truly understand what is going on in our applications without needing to constantly jump back and forth from the UI to our terminal.
VS Code Extension Enhancements
The VSCode extension for TAP is really great and as is the theme with TAP 1.2, it also had a huge revamp and got some great attention and upgrades!
The key enhancements that were added in TAP 1.2 include:
- Workload Panel
- Apply and Delete Workload commands
- Live Hover integration for Spring Tools
- Support Multiple Projects in a workspace
Workload Panel
A very nice enhancement in the VS Code Extension from a UX perspective, is the Workload Panel widget that has been added.
The current state of the workloads is visible on the Tanzu Workloads panel in the bottom left corner of the VS Code window. The panel shows the current status of each workload, namespace, and cluster. It also shows whether Live Update and Debug are running, stopped, or disabled per workload.
Having this simple view of the status of your workloads is a really nice addition to the extension making the feedback loop to the developer simpler and more streamlined.
Applying and Deleting Workloads
The extension since 1.0 supported inner loop workflows by allowing us to start an app via the Tilt integration with the live updates capability.
While this is great, it also limited the scope of the plugin to Java based apps only as that is the only live update integration currently in TAP.
In TAP 1.2 the extension enables you to apply workloads on your Tanzu Application Platform-enabled Kubernetes cluster no matter what the language is. While this doesnt mean Live update is supported, we still get basic functionality and integration in our IDE for workloads written now in different languages, which paves the groundwork for a better polyglot integration I’m sure we will see being added into TAP in the next few releases.
The ability to apply and delete workloads via the IDE plugin, while a simple and small addition, is a truly welcome addition as it again limits the amount of times i need to switch over to using my terminal which means greater productivity, and an overall better DevX for our end users.
Live Hover
TAP has a great integration with the Spring ecosystem, and amongst the integrations it has, we have a very deep integration with the Spring Boot Actuators mechanism which we can see being fully utilized in Application Live View within TAP GUI.
Another way to integrate with this data, which is part of the Open Source Spring community, is through the Spring Boot Tools extension for VS Code which includes amongst other things, a Live hover functionality.
The Spring Tools 4 can connect to running Spring processes to visualize internal information of those running Spring processes inline with your source code. This allows you to see, for example, which beans have bean created at runtime, how they are wired, and more.
The Spring Tools 4 shows hints by highlighting sections of source code with a light green background. Hovering over the highlights with the mouse pointer, data from the running app is displayed in a popup.
All of this data is collected via the Actuator endpoints which can be configured simply in TAP via the conventions controller.
In TAP 1.2 experimental support for this functionality has been added which again just makes the developers lives that much better!
As can be seen in the following example, the contextual info we recieve can be amazing to have when developing and iterating on our workloads.
I am very excited to see this functionality mature over time and hopefully through frameworks like Steeltoe, grow into Polyglot solutions that can support over time more and more use cases and languages for apps deployed via TAP.
Multi-Project Workspaces
The TAP Dev Tools extension for VS Code previously only supported a single project per workspace. Now as of TAP 1.2, when working with multiple projects in a single workspace, you can configure the Tanzu Dev Tools Extension settings on a per-project basis by using the dropdown selector in the Settings page.
This is a great UX enhancement that will make the usability of the Extension much better for developers.
Intellij Extension
Tanzu Developer Tools for IntelliJ is VMware Tanzu’s official IDE extension for IntelliJ IDEA to help you develop with the Tanzu Application Platform (TAP). The Tanzu Dev Tools extension enables you to rapidly iterate on your workloads on supported Kubernetes clusters with Tanzu Application Platform installed.
This extension, just like the VS Code extension, enables live update capabilities and the ability to debug our workloads deployed on TAP enabled clusters directly from our IDE!
Currently in this first release of the extension, Only Java apps are supported and the extension only works on MacOS, but I’m sure we will see additional functionality added and support for additional platforms and languages in upcoming releases of TAP!
It is great to see that VMware are extending the IDE support and meeting the developers where they love to be which is in the IDE of their choice!
Support for ECR Container Registry
For those that work with kubernetes and have dealt with controllers that need access to a container registry, understand the complexities that are inevitable that come along with ECR (Amazons Elastic Container Registry).
While ECR is a great registry, the authentication mechanism is very difficult to integrate with.
The typical approach of using an access token for authentication to a registry via an image pull secret in kubernetes works against ECR, but only for a few hours.
The access tokens have a very short lifetime, and there is no way to auto renew them.
This makes integration with ECR complex to say the least.
To overcome this issue, VMware have integrated and added support for using ECR with IAM role bound service accounts.
The Tanzu Application platform supports using ECR for both Tanzu Build Service images, as well as the images created as part of a workload in a supply chain.
While you can use the typical “secret” configuration to store credentials for ECR, the token that is used to authenticate to ECR expires every 12 hours. For this reason, it is suggested to use an AWS IAM role bound to a Kubernetes service account to allow the Tanzu Application Services to authenticate to ECR.
While I personally am a big fan of harbor, and think it provides great benefit even when running in AWS, making it superior in my personal opinion to ECR, having the ability to integrate with such a popular registry, despite the complexity is a great sign and shows that VMware are truly working hard to meet their customers where they are and give the a true choice of tooling and not lock them into a specific set of tooling that VMware perscribes.
Maven Artifacts as a source to a supply chain
While TAP provides a full solution for CI and CD, and with the builtin Testing capabilities of Tekton and Build capabilities of TBS, we can go directly from source code to a URL, replacing an existing CI system such as Jenkins can be a difficult and sometimes undesirable task.
VMware understand this, and want to allow people to utilize TAP and integrate it in their existing workflows, at whichever point makes the most sense for the customer.
A key move that shows this in TAP 1.2, is the addition of a new capability to build a workload from a Maven artifact built outside of the supply chain.
This approach aids integration with existing CI systems, such as Jenkins, and can pull artifacts from existing Maven repositories, including Jfrog Artifactory.
With this mechanism, you can have a build process in Jenkins for example that will build your jar/war files and push them to a maven repository and have a TAP workload watch the repository for new builds using version selectors on the workload yaml.
Currently, this supports only java artifacts and has limited authentication mechanism support, but overtime, this will become a more feature rich integration, and will hopefully support additional language artifacts and authentication mechanisms.
It is great to see the flexibility provided to end users via TAP, and how it can integrate into existing pipelines, while at the same time not giving up on the UX and single plane of glass ideals that the platform offers.
Application Live View Enhancements
Application Live View, is a great functionality that exists since TAP 1.0 and gives us a really nice UI (Within TAP GUI) to see our spring boot applications Actuator Data and in some cases like log levels, actually make live changes directly from the UI!
While this has always been great, it was limited to Spring based applications.
In TAP 1.2 we now have App Live View support for Steeltoe based applications.
Having this capability now extended to be a polyglot solution, shows the huge effort VMware are putting into making the platform meet the needs of its customers, no matter what language or frameworks they decide to use.
It is great to see the platform extending in these directions, and I’m sure we will see more and more integrations with languages and frameworks in this space in the future.
Snyk Integration
Another integration that has been added in TAP 1.2 is the first ISV integration with TAP!
This integration allows us to utilize Snyk for image scanning within our supply chain instead of the default scanner which is Grype.
The Snyk integration is currently in Beta, and only supports image scanning and not source code scanning, but it definitely shows huge promise, and I’m looking forward to see this integration and others get added to that TAP ecosystem over the future releases!
The integration with TAP brings along with it, a lot of backend changes to make the scanning tool more easily pluggable, by adding in support for not only CycloneDX formats which Grype for example can provide, but also for SPDX formats, as is provided by Snyk.
Currently the world is undecided on which of the 2 standards (CycloneDX and SPDX) will become the de-facto standard, and it is great to see that VMware are building their solution to support both standards, making integrations much easier, no matter which direction things go in this regard in the broader ecosystem.
Community Standardization
One of the things I believe we will see repeat itself down the road again and again is that VMware especially in the security area, may release things ahead of the majority of the community.
A good example for this is the integration we have had since TAP 1.0 with Cosign, to not only sign our images but also a validating webhook, which will make sure that only images we have signed can run in our clusters.
This is a topic that many people are talking about, and slowly it is being integrated, but at the time TAP 1.0 was being released, VMware were one of the first to actually implement a solution for this into their platform.
Over the past year or so, the community, including VMware have been working hard on developing the correct tooling and practices around the idea of image signature verification, and the tool that seems to be the appropriate candidate to be a community standard is the Policy Controller, which is a validating webhook developed and maintained as part of project Sigstore.
As such, VMware in TAP 1.2, are deprecating the initial proprietary implementation of an image policy webhook and are integrating the community standard policy controller mechanism.
This is part of a general approach which I find great in TAP in particular, and the wider Tanzu ecosystem at while, which is that VMware are trying as much as possible to stay aligned with the upstream community, and not to simply build a solution that is proprietary and only take from the upstream. VMware have put large amounts of engineering time into cosigned as an example to help push the tool forwards. By doing this, VMware, its customers, and the broader ecosystem all win.
I am very excited to see these community standards grow and mature overtime, and see how VMware integrates them into the Tanzu ecosystem, making it a truly Open-Source based Enterprise solution.
General Usability Improvements
Amongst the key features mentioned above, TAP 1.2 also includes some general usability improvements that simply make the lives of our developers and platform operators better.
Tanzu Apps CLI Plugin Enhancements
The App command in Tanzu CLI is a key access mechanism to the platform for our developers. In TAP 1.2, we have some new features that make the usage a lot easier, and remove the need in many cases to revert to using yaml files and kubectl to apply them.
- Added a –sub-path flag to allow specifying the sub path in the repo or image to use as the base of the application code
- Added a –service-account flag to allow specifying the service account to use for the workload
- Added support for ignoring local files when in the inner loop development via a .tanzuignore file.
- Added additional info when running “tanzu apps workload get”, including source information and supply chain step information
-
-
Static IPs for TCE and TKGm on vSphere
Preface
Tanzu Kubernetes Grid Multi Cloud (TKGm) and Tanzu Community Edition (TCE) are both great distributions of kubernetes.
Both of them are based on the same underlying technology and framework. This post is equally relevant and applies to both systems the same.
Problem Statement
One of the challenges we have encountered many times with TKGm and TCE is that they require us to use DHCP for node networks.
While this may not seem like a big issue, it actually is in many cases if not dealt with correctly.
The main issue is that Kubernetes components have many certificates and all comunication is secured, however in ClusterAPI which is the provisioning mechanism used in tanzu clusters, the certificates are generated for the IPs the nodes receive when they are first deployed.
Basically this means that the IP address MUST remain the same for the entire lifecycle of the machine.
In an ideal situation this should not be an issue, but as we know the world is not ideal.
We have power outages, host crashes, scheduled downtime etc. that can all be causes for our clusters to go down.
When that happens, the nodes depending on how you set up the environment, may or may not receive the same IP when they are powered back on.
While this can be solved by ClusterAPI pretty easily for worker nodes using the Machine Health check functionality, as the worker nodes if they un-responsive can be simply recreated, for control plane nodes its not as easy.
If you have a single control plane node (Don’t do this!!!) then unless you can manually change the IP back to the original IP you are out of luck. If its a 3 node control plane, if 1 node got the wrong IP all is fine and a new node will replace it but if 2 or 3 nodes have the original IP changed, you will be in a bad place as well.
Current Official Solution
The TKG documentation mentions that for control plane nodes you should create DHCP reservations which will make sure that the node always receives the same IP, but this is hard to maintain. This requires that every time a new control plane node is created that you manually edit the DHCP server to add a reservation for the nodes IP and MAC address.
A new node will be created for multiple reasons, and managing this at scale is not a fun task.
A Better (Yet Unsupported) Solution
The true solution for this would be to use static IP allocation for nodes.
Currently, TKGm and TCE do not support static IP address management, but adding in the functionality is actually really easy!
The reason DHCP is used is because nodes are spun up and spun down constantly in a ClusterAPI (CAPI) environment and as such, we need a way to have VMs assigned an IP automatically when CAPI creates the machine.
While DHCP is one way to achieve this, using an IP Address Management (IPAM) solution is another option that can serve the same purpose but also at the same time solve the issue of having dynamic IP addresses, as the IPAM can provide us static IPs for our nodes.
TKGm and TCE support some of the most common CAPI providers, but there are a whole plethora of providers out there including one called CAPM3 (Cluster API Provider Metal3) which is a bare metal based CAPI provider.
CAPM3, includes within it not only the infrastructure provider, but have also created a seperate controller called the CAPM3 IPAM Controller, which implements a simple to use in cluster CRD based IPAM which can integrate well with CAPI based solutions.
This controller, adds 3 key CRDs to our cluster:
- IP Pool – a way to define a pool of IPs that nodes will be assigned an IP from
- IP Claim – Similar to a PVC but for an IP
- IP Address – The actual IP object which is assigned to a VM via a claim
Basically the flow of how things work is that for each Machine object we need a IP Claim to be created which will in turn allocate an IP and generate an IP Address CRD and then we need to plumb that back up into our machines.
In CAPM3, the infrastructure provider handles this for them, but for CAPV we need a solution to automate that which has already been created by the team at Spectro Cloud and is called “cluster-api-provider-vsphere-static-ip“. This controller will basically act as the bridge between the IPAM provider of CAPM3 and the CAPV objects we deploy via Tanzu.
ClusterAPI vSphere by default has network address allocation type set to DHCP but if we change it to static via the vSphere Machine Template CRD, the machine provisioning will wait until the vSphere Machine CRD is updated with a static IP Address and then it will provision the node with that static IP configuration.
This means that the final flow we are looking for is that
- The cluster is created
- The first Control Plane node CRD is created but is set to get a static IP and is in a pending state
- The Spectro Cloud controller sees this and creates a IP Claim CR for us
- The CAPM3 IPAM will create an IP Address CR for us from the configured pool of IPs
- The Spectro Cloud controller sees the newly created IP Address CR and populates the data of that CR onto our vSphere Machine CR
- CAPV sees that all the details it needs are now in the objects spec and provisions the VM
- The rest of the deployment process continues as normal
This process basically is the same for each and every VM in the cluster.
The key thing we still havent mentioned though is how do you specify which IP Pool a cluster should have its nodes IPs allocated from.
This is accomplished via labels which we apply to the vSphere Machine Template as well as to the IP Pool which when they match they form a pair.
It is also important to note that an IP Pool is scoped to a single cluster. this means that you will create a 1:1 ratio between an IP Address Pool and a Kubernetes Workload Cluster.
While this may sound complex, the implementation is very simple and the UX is pretty simple once the solution is setup.
How Do We Implement This
The preperation phase has 4 key steps
- Deploy a Management Cluster (not covered in this post)
- Deploy the CAPM3 IPAM Controller
- Deploy the Spectro Cloud CAPV Static IP Controller
- Add YTT overlays to our Tanzu config files
Once we have completed these 4 steps, we can start creating clusters with static IPs!
Deploying the IPAM controller
The easiest way to do this is via a simple kubectl apply command on our management cluster:
kubectl create ns capm3-system kubectl apply -f https://github.com/metal3-io/ip-address-manager/releases/download/v1.1.3/ipam-components.yaml
Basically we are creating a namespace and then simply installing the controller and all its resources from the official release artifact.
Deploying the CAPV Static IP Controller
Currently the Spectro Cloud team don’t release any artifacts and you need to build the code and manifests youself using the different make targets in the repo.
To make your lives easier, I have already run this and pasted the generated manifest in a gist for consumption which points to the container i built for this which is hosted on ghcr.
kubectl apply -f https://gist.githubusercontent.com/vrabbi/b20af526c091cced11495f578a5a3fc5/raw/128d922f9497272b952580d6e2e357020669a5db/capv-ipam-controller.yaml
This is all there is to install on our cluster!
Creating the needed overlays
This step is needed in order for us to integrate our Tanzu Cluster creation with the newly installed components.
The first overlay is pretty simple. It configures DHCP to be false on all vSphere Machine Templates, and then also sets the label needed to match with an IP Pool to them.
cat << EOF > ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-static-ip-overlay.yaml #@ load("@ytt:overlay", "overlay") #@ load("@ytt:data", "data") #@ if data.values.USE_STATIC_IPS: #@overlay/match by=overlay.subset({"kind": "VSphereMachineTemplate"}), expects="1+" --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate metadata: #@overlay/match missing_ok=True labels: #@overlay/match missing_ok=True cluster.x-k8s.io/ip-pool-name: #@ data.values.CLUSTER_NAME #@overlay/match missing_ok=True cluster.x-k8s.io/network-name: #@ data.values.VSPHERE_NETWORK.split("/")[-1] spec: template: spec: network: devices: #@overlay/match by=overlay.index(0) - dhcp4: false #@ end EOF
The second file we need to create is a file that defines the additional data values we want to have available to us for configuring the IPAM integration in our cluster configuration file.
cat << EOF > ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-static-ip-default-values.yaml #@data/values #@overlay/match-child-defaults missing_ok=True --- USE_STATIC_IPS: false FIRST_IP: LAST_IP: SUBNET_PREFIX: 24 DEFAULT_GATEWAY: DNS_SERVER: 8.8.8.8 EOF
As can be seen, we have a total of 6 new values that we can use in order to configure this solution on a per cluster basis:
- USE_STATIC_IPS – This is a Boolean set by default to false in order to not change TCE/TKGm default behavior. You must set this to true in the Cluster Config file to have static IP management enabled for the cluster.
- FIRST_IP – as mentioned an IP Pool is needed per cluster so this will be configured in that IP Pool as the first IP in a range of IPs that the IPAM solution will manage for our cluster.
- LAST_IP – this is the last IP the IP Pool will manage closing the IP Address range from the value of the FIRST_IP variable.
- SUBNET_PREFIX – this is the subnet prefix of the network your nodes are being provisioned to. by default i have set it to 24 which is the equivalent of 255.255.255.0 or a Class C network.
- DEFAULT_GATEWAY – this is the default Gateway of the node network.
- DNS_SERVER – this is the DNS Server you want to be defined on the clusters nodes.
The final file we need to add is a file that will create the IP Pool for us at cluster creation time:
cat << EOF > ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-static-ip-ippool-addition.yaml #@ load("@ytt:data", "data") #@ if data.values.USE_STATIC_IPS: --- apiVersion: ipam.metal3.io/v1alpha1 kind: IPPool metadata: name: #@ data.values.CLUSTER_NAME namespace: #@ data.values.NAMESPACE labels: cluster.x-k8s.io/network-name: #@ data.values.VSPHERE_NETWORK.split("/")[-1] spec: clusterName: #@ data.values.CLUSTER_NAME pools: - start: #@ data.values.FIRST_IP end: #@ data.values.LAST_IP prefix: #@ data.values.SUBNET_PREFIX gateway: #@ data.values.DEFAULT_GATEWAY prefix: #@ data.values.SUBNET_PREFIX gateway: #@ data.values.DEFAULT_GATEWAY namePrefix: #@ "ip-{}".format(data.values.CLUSTER_NAME) dnsServers: - #@ data.values.DNS_SERVER #@ end EOF
Once these 3 files are in place, we can start to create our clusters with static IPs!
Creating a Cluster
As mentioned, we have added a few variables that if you want to use static IPs you will need to set in your clusters configuration file before deploying the cluster.
Once you have added those values as described above, you can simply create the cluster via the Tanzu CLI
tanzu cluster create -f
If you want to see what objects are created you can easily do so via kubectl as everything is a kubernetes resource.
If you have the kubectl plugin called lineage or the kubectl plugin called tree installed the visibility of the solution becomes really awesome:
Kubectl Lineage Example:
Kubectl Tree Example:
The Future
Currently in Upstream ClusterAPI work is being done to add in an official set of IPAM APIs that will allow providers to build a solution like this directly into their providers in a streamlined manner.
CAPV is already looking into this very strongly and once this functionality is released in Core ClusterAPI, I am sure that shortly afterwards we will see similar solutions to this being implemented by official providers in a standardized way which will make the adoption of such features into a supported solution within Tanzu much more feasible.
In the meantime, this is a great solution to help with specific use cases where you really dont have the ability to manage the clusters via DHCP and want the simplicity of having Static IPs.
Summary
While at first the solution looks complex, the setup takes less then 5 minutes and the added value in my opinion is huge!
I use this in my home lab and it works great but make sure to validate it many times and thoroughly before implementing this in any sort of production environment.
As mentioned there is no support today for this solution but if Static IPs matter to you and you want this functionality in TCE/TKG make sure to make your voices heard and raise issues, comment on issues etc. in the TCE or Tanzu Framework Github Repos and also talk to your VMware account team and explain to them why you want this type of a solution.
-
Multi Cluster TAP – Why Do We Need It?
One of the great features in Tanzu Application Platform is that it supports a multi cluster deployment.
Many people may ask, Why does this matter? and also may ask is this really needed?
To answer these questions lets take a quick look back at TAS – A great VMware maintained PaaS solution and how it handled or didn’t handle the multi cluster story.
Tanzu Application Services (Cloud Foundry)
In TAS, the entire solution is deployed in a per environment basis. This means that the entire stack otherwise referred to as a foundation, must be deployed in every environment I want to run a TAS based application.
If we take a simple use case where a company would have 3 different environments (Dev, QA and Prod) this would mean that i would deploy the entire TAS stack 3 times.
There would also be no connection from a platform level between the 3 different environments.
This means that when i move from Dev to QA and finally to prod, in each environment the platform would compile my code, Build the Container / Blob, Create the deployments manifests and finally deploy my application.
This also means that whenever I make a change in one environment I must manually or via some custom automations, propagate those changes the same to each of my additional environments.
While writing these automations is not a terribly complex task, it is a level of overhead we must maintain over time.
Another key issue with this approach is that my artifact is being built multiple times and not just once. while Buildpacks and the overall CF platform do offer the benefit of reproducibility making the multiple builds outputs be essentially the same, it still is a repeated task that can make things like attestation, artifact tracing and overall confidence in the idea of artifact promotion much more difficult to achieve.
Another key aspect that makes the multiple foundations a challenge is the fact that it takes up A LOT of resources!
For example, when running on vSphere, the minimal footprint as documented on the TAS documentation would require:
82 vCPUs, 120GB RAM and 2TB of storage.
Now if we times that by our 3 environments, and this is all speaking of a single AZ deployment which is not suggested for a production deployment, we are talking about at minimum, 246 vCPUs, 360GB RAM and 6TB of storage!
That can become very expensive overtime.
Tanzu Application Platform
In TAP, the story is very different.
One of the key reasons that TAP can support a Multi Cluster topology is that unlike TAS, TAP is built to be customized, manipulated, and tweaked to your needs.
Because the steps in our software supply chain on TAP are configurable, we can have each cluster that has TAP on it, perform different tasks and steps.
TAP also is based on an OSS project called cartographer. Cartographer as the underlying tool for choregraphing the supply chain, has a clear separation between CI and CD.
while for CI we have what is referred to as a Cluster Supply Chain, In the CD world we have a Cluster Delivery. both Resources are similar in there structure, but each one is purpose built for its specific domain.
When it come to promotion between environments, we are given 2 OOTB solutions that can offer us a great mechanism for promoting artifacts from one environment to X other environments.
The first option is to use GitOps. TAP will build our image for us and also generate our Kubernetes manifests amongst other tasks it performs. once the generated manifests are rendered, not only can TAP apply them to the cluster, it can also push the manifests up to a Git Repository. This can allow for traceability, and also can help us promote artifacts, as the artifacts are now saved outside of the cluster, in a central location.
The other OOTB option is to use what i refer to as OCIOps or other places may refer to as RegOps. this is where the manifests are stored not in Git but rather in an OCI bundle in a Container registry. While the technology used is different the underlying objective and outcome is the same in this regard. In both cases our configuration is pushed to a central location, from which it will be pulled and run in additional clusters.
So Why Multi Cluster
The reasons for wanting to go down a multi cluster approach can be split into multiple general sections:
- Resource Utilization
- Security
- Artifact traceability and attestation
- Scalability
- Multi Cloud
- Industry Standards
- Visibility
Lets break these down and understand how Multi Cluster Topologies can help.
Resource Utilization
As mentioned before, It is inevitable that we will need multiple environments. while that is true, the need to deploy the entire platform in every environment can get very costly in terms of Cash, Resources and also Time. The time aspect is key as well and is often overlooked. The need to rebuild an app at every promotion can be a timely task and when you deal with large scales, this can quickly build up and become a true bottleneck.
The fact that you don’t need to run Tanzu Build Service, Tekton, Image Scanning mechanisms and more in every cluster can save huge resources, and the fact you don’t need to run knative and contour in your build cluster for example also save huge amounts of resources.
While the numbers may not seem huge at the beginning, when we start to scale out our landscape, the numbers simply keep growing and that’s where this type of a multi cluster topology can really shine!
Security
Another key aspect that Multi Cluster Topologies can hep with, is in the realm of security.
Many organizations have regulations and security requirements that can limit the accessibility to the internet for example from our clusters. not only this but we overall want our environments to be as locked down as possible, and to only open up our clusters to the bare minimum needed, even if it is within our companies network.
By separating out the Build aspects from our runtime aspects into separate clusters, we can for example have our build cluster open to either the internet for pulling down dependencies from sources we need or to our internal artifact registries, but at the same time we can lock down our runtime clusters to only have access for example to our container registry for pulling images and our git server for pulling down manifests, without the need to have the runtime clusters open to our build time infrastructure.
This could offer huge flexibility to on one hand not impede on developer agility and speed while at the same time, keeping our environment safe.
Artifact traceability and attestation
Another key aspect that Multi Cluster Topologies can hep with, is related to security but has many other implications and benefits as well.
The idea of artifact traceability and attestation is not new, but it is getting a lot of attention these days, especially since we have seen multiple large scale attacks happen in the industry that could have been prevented it such mechanisms were in play.
the general Idea of being able to trace the provenance of an artifact throughout its lifecycle is a key aspect in making sure that our Software supply chain is secure and that nothing malicious has altered our software throughout the supply chain.
This is a topic that is being heavily invested on in TAP, but is already seeing its first steps of integration through the built in integration with Cosign, which will automatically sign our images when they are built in the build cluster. The next step is a built in configuration we have in the runtime clusters, that can validate and enforce that only images signed by a specific build cluster for example can be run in the cluster. this allows us to be certain that only images we know the provenance of can be run in our clusters.
Another key aspect is that because TAP uses Cloud Native Buildpacks by default to build images, we get an SBOM attached to each of our images built which is a key feature and needed info when trying to attest to the origin of an image, and to what is in that image.
While much more work is needed in this area, by having the signing and SBOM generation capabilities built in from the beginning into the platform, we are already in a pretty good place.
This is a huge improvement over the traditional PaaS approach of rebuilding an artifact in every environment as the attack vectors in that case are much wider, and we don’t actually have the traceability back to the original source which is our development environment. in TAP this is all possible with little to no configuration post deployment of the platform which is pretty amazing!
Scalability
Another key aspect that multi cluster can help with is scalability of our platform.
Typically we see customer starting with relatively small kubernetes clusters, but as time passes, these clusters keep growing. we have had many instances by customers where for many reasons including scalability issues, the need to split clusters has been required.
This is easily dealt with in TAP as we can simply add another CD target which would be our new cluster, without effecting our running environment.
Multi Cloud
TAP is truly a multi cloud PaaS.
While tools like TAS could be run on different clouds, there were complexities involved in getting this to work and to manage it at scale.
A key factor that differentiates the 2 tools in this regard is where the platform begins, and what it needs to integrate with.
TAS included within it the container runtime and container orchestrator as well as the platform itself. this meant that the platform was tightly coupled to the underlying container orchestration tool. This also meant that the integration point between TAS and our cloud or clouds of choice was specific to each cloud provider. TAS had to be deployed with a specific cloud provider interface (CPI) that knew how to interact with the specific cloud provder to perform tasks such as managing VMs, LBs, Security Groups etc.
This means that with every cloud provider, the configuration was different, and that you could only deploy TAS on a cloud, that had a TAS Cloud Provider implementation.
In TAP, the platform does not include a container orchestrator, rather it simply relies on you installing the platform on a conformant Kubernetes cluster which could be of any form shape or size.
Because the integration layer for TAP is simply Kubernetes, you can run TAP anywhere that has Kubernetes. This can be vSphere, GCP, AWS or Azure which are all supported in TAS as well, or it could even be Bare Metal servers in your datacenter, or any other Cloud Providers Kubernetes offering as long as that Kubernetes Cluster is a conformant kubernetes cluster.
Decoupling the runtime from the platform is a huge benefit as it means, the TAP development team can concentrate on building out the platform itself and not need to worry about the underlying cloud provider the platform is running on.
This also helps us as users be assured that we could migrate to a different cloud, or add another cloud into our environment and onboard it into TAP very easily, without any special configurations or changes!
Industry Standards
When we look at the broader ecosystem today, the idea of a multi cluster supply chain, with purpose built clusters that artifacts are promoted between using GitOps principles is becoming a De-Facto standard.
TAP as a platform itself not only is built up of industry standard tools such as Kubernetes, Knative, Contour, Cert Manager, Buildpacks, Tekton, FluxCD etc. but it also enables us to utilize industry standard practices such as GitOps in a really easy to consume manner.
This can be extremely helpful when onboarding new developers or platform engineers into our environments, as the concepts and methodologies used in TAP are industry standards which means that we have a much better chance of finding people that are familiar with these approaches then other proprietary and tool specific approaches we see in other PaaS platforms.
Visibility
Another key aspect that TAP in a multi cluster topology offers us is in the realm of visibility.
When we have multiple environments, visibility across our landscape becomes a challenging topic to tackle.
TAP which uses the CNCF Backstage project for its GUI, has a huge advantage because Backstage is built in a way that easily allows connecting multiple clusters into the same GUI, and giving a true single plane of glass visibility across our entire landscape.
TAP GUI includes within it built in support to visualize our workloads across clusters, side by side, allowing us to see how a particular service is performing and where it is running across all of our clusters.
Summary
The idea of a multi cluster PaaS is truly exciting in my opinion. I think that having this capability brings the opportunity to think of how we build out our environments in ways we haven’t thought of as possible till today.
I’m truly excited to see how the Multi Cluster TAP functionality will grow and evolve over time.
For those interested in how i set up my Multi Cluster TAP environments in my vSphere environments, you can check out my Github Repo which has a simple BASH script that can deploy 5 TKGm clusters and fully configure TAP on them in a simple and automated way.