Preface
Having self service capabilities for cluster provisioning is a great thing for many organizations, but deploying a cluster is not enough.
We need the ability to have certain software installed in every cluster, before our end user begins working with it.
This could be infra services like an Ingress Controller, External DNS or Cert Manager. It could also be monitoring tools like Prometheus or Grafana, logging tools like FluentBit or Loki, and many times it could be security tooling like Aqua or Prisma.
Many organizations due to the complexity of this setup, decide to have some sort of pipeline that provisions clusters, installs software imperatively, and then sends a notification to the requesting user with the details for connecting to this cluster.
In this post, we will see how this can be achieved in a declarative approach, without custom automation, using TMC.
TMC Continuous Delivery – Initial Release
TMC back in July 2022 added a new feature "Continuous Delivery".
This feature is in essence a wrapper implementation of FluxCD, which is one the 2 most common GitOps controllers alongside ArgoCD.
TMC in the initial release of this feature supported the ability to define raw YAML manifests or Kustomize configurations to be deployed to a cluster from a Git repository.
The configuration of what to deploy was done at the cluster level. We also needed to first enable the service on a cluster manually, as it is not deployed by default when a cluster is created or attached to TMC.
While this was a great feature, it simply did not solve many issues we still had, one of which is the title and subject of this post.
TMC Continuous Delivery – January 31st 2023
On January 31st 2023, VMware released a major update to TMC which included 2 key features that are relevant to this post:
- Added support for installing a Helm chart from a Git repository
- Added support for continuous delivery to cluster groups
Lets take a look into what each of these features includes.
Helm Chart Support
FluxCD supports mainly 2 deployment mechanisms, Kustomize and Helm. Kustomize is a very common tool, and the integration in FluxCD allows us to deploy either raw YAML definitions or Kustomize configurations to our clusters. While kustomize is a great solution in some cases, most software we consume from vendors or OSS projects today are packaged as helm charts.
FluxCD supports the deployment of helm charts that are sourced from Git repositories, and in this new release of TMC, we now have the ability to configure Helm Releases as well, just like we would Kustomizations.
This opens up additional options of what we can easily integrate into deployment and management flows using TMC.
While adding in support for Helm is great, just like Kustomize in the past, it requires us to enable the service on a cluster by cluster basis, and the configuration is also at the cluster level only.
Cluster Group Continuous Delivery
The main feature in my mind that will give huge value to customers from the January 31st release, is this feature.
In TMC we have always had the concept of a Cluster Group, which is a grouping of clusters under a single logical entity, that we can apply policies to such as security policies, RBAC and much more.
In this release, we now have the ability to enable Continuous Delivery at the Cluster Group level. This means we can now say that any cluster that gets added to a specific cluster group, will automatically have the CD service enabled.
We also have the ability to define Kustomization configurations at the Cluster Group level, which will be automatically applied to all clusters in the relevant Cluster Group!
This feature opens up some really awesome capabilities.
Installing Carvel Packages on all clusters
TMC beyond providing FluxCD also supports my favorite package management solution Carvel.
Not only does TMC install and provide the APIs and UI for installing Carvel packages, it also configures the "Tanzu Standard" package repository which includes common tooling needed for kubernetes clusters such as Contour, Cert Manager, Harbor, Prometheus, Grafana, FluentBit, External DNS and more.
The great thing is, is that these packages are supported by VMware!
The difficulty is that just like with the CD service, the package management capabilities are cluster scoped.
While this is true, in the end, Carvel package management is all kubernees resources in the end.
Recently I have begun playing around with deploying carvel package installations declaratively instead of doing it imperatively via the Tanzu CLI or the TMC GUI, and doing so actually offers us more advanced capabilities and fine grained control, as we can utilize advanced fields in the resource specs, that we cant configure via the Tanzu CLI or via the TMC GUI.
What gets create when we install a package via TMC
When we create a package installation, 4 or 5 resources are created for us:
- Service Account – A service account that is used to perform the installation.
- Cluster Role – A cluster role with Cluster Admin credentials used to allow the installation of the packages resources.
- Cluster Role Binding – A Cluster Role Binding which binds our new service account, with the new Cluster Role from above.
- Package Install – The actual package install resource.
- (optional) Secret – A secret containing the values file we want to use for the installation.
How do we do this declaratively
The first step, is to create the relevant manifests. In this example we will configure the contour package from the Tanzu Standard Repository. While not needed, in the example bellow i have added annotations to the resources, to match what the Tanzu CLI does, as well as named the resources in the same manner. This just seems like a beneficial thing to do in my mind, but is not a requirement.
- First is the Service Account YAML:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-packages
name: contour-tkg-packages-sa
namespace: tkg-system
- Next is the Cluster Role YAML
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system-cluster-role
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- Next is the Cluster Role Binding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system-cluster-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: contour-tkg-system-cluster-role
subjects:
- kind: ServiceAccount
name: contour-tkg-system-sa
namespace: tkg-system
- Next is the Secret with the data values for contour on vSphere with NSX ALB
apiVersion: v1
kind: Secret
metadata:
name: contour-tkg-system-values
namespace: tkg-system
type: Opaque
stringData:
contour-data-values.yaml: |
infrastructure_provider: vsphere
contour:
configFileContents: {}
useProxyProtocol: false
replicas: 2
pspNames: "vmware-system-restricted"
logLevel: info
envoy:
service:
type: LoadBalancer
annotations: {}
nodePorts:
http: null
https: null
externalTrafficPolicy: Cluster
disableWait: false
hostPorts:
enable: false
hostNetwork: false
terminationGracePeriodSeconds: 300
logLevel: info
pspNames: null
certificates:
duration: 8760h
renewBefore: 360h
- Finally is the Package Install itself
apiVersion: packaging.carvel.dev/v1alpha1
kind: PackageInstall
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package-ClusterRole: contour-tkg-system-cluster-role
tkg.tanzu.vmware.com/tanzu-package-ClusterRoleBinding: contour-tkg-system-cluster-rolebinding
tkg.tanzu.vmware.com/tanzu-package-Secret: contour-tkg-system-values
tkg.tanzu.vmware.com/tanzu-package-ServiceAccount: contour-tkg-system-sa
name: contour
namespace: tkg-system
spec:
packageRef:
refName: contour.tanzu.vmware.com
versionSelection:
constraints: 1.20.2+vmware.1-tkg.1
prereleases: {}
serviceAccountName: contour-tkg-system-sa
values:
- secretRef:
name: contour-tkg-system-values
But why Cluster Admin???????
Indeed, all Tanzu Package installation performed via TMC or Tanzu CLI, use Cluster Admin permissions. This is because they don’t know what permissions any specific package may or may not need, in order to be installed. While that may be acceptable for some, I wanted to perform least privilege concepts, and that is where I found a great tool called audit2rbac.
What is Audit 2 RBAC
Audit 2 RBAC is an amazing tool put together by Jordan Liggitt. The tool can auto generate RBAC resource YAMLs for a specific user or service account based on a kubernetes Audit log. Luckily, TKG which is the distribution I am using, enables us to have audit logging enabled on our clusters. Once you have an audit log file, you simply pass the file to audit2rbac and tell the CLI tool, what user or service account you are interested in, and it will create the YAML definitions for the relevant RBAC needs.
Using audit2rbac for the Contour Package
After installing the contour package in one of my clusters I copied the audit.log file from the Control Plane node in my cluster to my local machine where I have audit2rbac installed.
The audit log in TKG is configured to be placed at the path /var/log/kubernetes/audit.log on the Control Plane nodes.
The next step was to simply run the following command:
audit2rbac -f audit.log --serviceaccount tkg-system:contour-tkg-system-sa \
--generate-annotations tkg.tanzu.vmware.com/tanzu-package=contour-tkg-system \
--generate-labels="" \
--generate-name contour-tkg-system > contour-pkgi-rbac.yaml
If we take a look at what we receive as output we can look at the file we just created and see that it created 4 resources:
- Cluster Role
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- namespaces
- nodes
- pods
- serviceaccounts
- services
verbs:
- get
- list
- watch
- apiGroups:
- ""
resourceNames:
- tanzu-system-ingress
resources:
- namespaces
verbs:
- create
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- cert-manager.io
resources:
- certificates
- issuers
verbs:
- get
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterrolebindings
- clusterroles
verbs:
- create
- get
- list
- patch
- update
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- rolebindings
- roles
verbs:
- get
- list
- watch
- Cluster Role Binding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: contour-tkg-system
subjects:
- kind: ServiceAccount
name: contour-tkg-system-sa
namespace: tkg-system
- Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system
namespace: tanzu-system-ingress
rules:
- apiGroups:
- ""
resourceNames:
- tanzu-system-ingress
resources:
- namespaces
verbs:
- get
- patch
- update
- apiGroups:
- ""
resources:
- serviceaccounts
- services
verbs:
- create
- get
- patch
- update
- apiGroups:
- apps
resourceNames:
- envoy
resources:
- daemonsets
verbs:
- create
- get
- patch
- update
- apiGroups:
- apps
resourceNames:
- contour
resources:
- deployments
verbs:
- create
- get
- patch
- update
- apiGroups:
- cert-manager.io
resources:
- certificates
- issuers
verbs:
- create
- get
- patch
- update
- apiGroups:
- rbac.authorization.k8s.io
resourceNames:
- contour-rolebinding
resources:
- rolebindings
verbs:
- create
- get
- patch
- update
- apiGroups:
- rbac.authorization.k8s.io
resourceNames:
- contour
resources:
- roles
verbs:
- create
- get
- patch
- update
- Role Binding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
annotations:
tkg.tanzu.vmware.com/tanzu-package: contour-tkg-system
name: contour-tkg-system
namespace: tanzu-system-ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: contour-tkg-system
subjects:
- kind: ServiceAccount
name: contour-tkg-system-sa
namespace: tkg-system
The issue
While this output is much better then giving cluster admin credentials, this configuration, will require you to pre create the tanzu-system-ingress namespace. as well as it will require some additional permissions.
The reason additional permissions are needed in many cases, is that in kubernetes, you cant create a cluster role or role that give higher permissions then what you yourself have. If the package includes RBAC resources, you must add the permissions from them, into your package service accounts RBAC, otheriwse Kapp Controller will fail to reconcile the package installation.
There is also a bug in the audit2rbac tool where it sometimes can make incorrect over zealous configurations, which simply cant work. The main change one must make in the manifests is that if we take this snippet for example:
- apiGroups:
- rbac.authorization.k8s.io
resourceNames:
- contour-rolebinding
resources:
- rolebindings
verbs:
- create
- get
- patch
- update
The issue is that the "create" verb does not work in kubernetes RBAC in conjunction with the resourceNames section. the solution to this with the highest level of security is:
- apiGroups:
- rbac.authorization.k8s.io
resources:
- rolebindings
verbs:
- create
- apiGroups:
- rbac.authorization.k8s.io
resourceNames:
- contour-rolebinding
resources:
- rolebindings
verbs:
- get
- patch
- update
- delete
Here we are creating 2 sets of permissions, 1 for creating the resource which cant be scoped down to a specific resources name, and then the rest of the verbs in a seperate rule, where we do scope the permissions to the specific resource names.
The reason we need to pre-create the namespace is that as this is a least privilege approach, it has split the RBAC into both cluster wide and namespace specific permissions, and as the installation targets the tanzu-system-ingress namespace, that is where the role and role binding are targeted at.
Solving the missing permissions issue
The easiest way to solve this requires some scripting and a few CLI tools. the CLI tools I use for this process are:
- kubectl
- kubectl split-yaml plugin
- imgpkg
- ytt
- jq
The general flow is:
- set a few environment variables
export PKG_NAME=contour.tanzu.vmware.com
export PKG_VERSION=1.22.3+vmware.1-tkg.1
export PKG_NS=tkg-system
- pull down the imgpkg bundle of this package
export BUNDLE_URI=$(kubectl get pkg $PKG_NAME.$PKG_VERSION -n $PKG_NS -ojson | \
jq -r .spec.template.spec.fetch[0].imgpkgBundle.image)
imgpkg pull -b $BUNDLE_URI -o ./contour-package-bundle
cd contour-package-bundle
- create a file with the values you would supply when applying a package
cat <<EOF > my-custom-values.yaml
infrastructure_provider: vsphere
contour:
configFileContents: {}
useProxyProtocol: false
replicas: 2
pspNames: "vmware-system-restricted"
logLevel: info
envoy:
service:
type: LoadBalancer
annotations: {}
nodePorts:
http: null
https: null
externalTrafficPolicy: Cluster
disableWait: false
hostPorts:
enable: false
hostNetwork: false
terminationGracePeriodSeconds: 300
logLevel: info
pspNames: null
certificates:
duration: 8760h
renewBefore: 360h
- template out the package and retrieve all RBAC relevant resources
ytt -f my-custom-values.yaml -f config/ | kubectl split-yaml -f -
cd split-yaml
if [ -d "rbac.authorization.k8s.io_v1--ClusterRole" ]
then
mkdir ../additional-cluster-roles/
mv rbac.authorization.k8s.io_v1--ClusterRole/* ../additional-cluster-roles/
fi
if [ -d "rbac.authorization.k8s.io_v1--Role" ]
then
mkdir ../additional-roles/
mv rbac.authorization.k8s.io_v1--Role/* ../additional-roles/
fi
cd ..
rm -rf split-yaml
- now you can either write a custom script or manually take the values from the roles and clusterroles in the new folders and update the baseline recieved from the audit2rbac tool.
- create overlay to remove resourceNames field when verb is create
cat <<EOF > remove-resource-name-config-when-create.yaml
#@ load("@ytt:overlay","overlay")
#@ def bad_rule():
verbs:
- create
#@ end
#@overlay/match by=overlay.all
---
rules:
#@overlay/match by=overlay.subset(bad_rule()), expects="1+"
-
#@overlay/remove
resourceNames: []
EOF
- run the overlay against all rbac files and then add those updated files to your git repo.
Solving the Role vs Cluster Role issue
Solution 1 – Pre Create The Namespace
The easiest solution is to pre create the namespace. while this requires a step before installation, you could also simply add the namespace configuration to the FluxCD Kustomization repo as well.
Solution 2 – Move everything to a Cluster Role
You could also simply get rid of the seperation made between Cluster Role and Role permissions and just merge the 2 sets of permissions.
While this is less resources to maintain, and aligns to the resources created by Tanzu CLI and the TMC GUI, it veers away from the least privilege model, and is a suboptimal choice.
None the less, this is still a much better choice from a security perspective, over the default Cluster Admin approach!
What would a file manifest look like
In the end after configuring RBAC correctly and everything else, you can see an example of a working configuration in the following Git repo.
Summary
While there is definitely a learning curve with these things, the ability to configure flux at the cluster group level is huge. I think that overtime we will see the carvel packaging ecosystem evolve to more easily support fine grained RBAC as we have tried to do here.