Home – vRabbi's Blog

Advanced autoscaling of Cluster API based clusters

ClusterAPI is becoming the standard approach for deploying Kubernetes clusters no matter which infrastructure provider you want to run on, whether that be a public cloud provider like AWS, Azure, GCP, Oracle, Akamai etc., a virtualization platform like vSphere, Proxmox, Openstack, etc., or bare metal servers using a management layer like Metal3, Tinkerbell, Canonical MaaS etc.

ClusterAPI is not only a great option being used directly by many consumers, it also is a key foundation for the majority of the commercial multi cloud Kubernetes distributions today such as Tanzu Kubernetes Grid (VMware by Broadcom), EKS Anywhere (AWS), Rancher (Suse), Anthos (Google), NKP (Nutanix), Palette (SpectroCloud), and many more.

ClusterAPI offers a way to define and manage our Kubernetes clusters declaratively using CRDs defined by the project and by the different providers, making it an extensible yet standardized platform allowing for simplicity o0f management for Kubernetes clusters across a multitude of different targets. Currently there are 32 official Infrastructure providers for ClusterAPI, 9 bootstrap providers, and 10 Control plane providers allowing for a wide range of support for deploying clusters exactly as you require in your own environments.

Workload autoscaling in the Kubernetes world is a very common and beneficial feature used in most Kubernetes environments via built in mechanisms like HPA, or by additional tooling such as VPA or my personal favorite KEDA which can help us perform event driven autoscaling of our workloads and which is becoming a de-facto standard in this space.

Another type of autoscaling which is critical when it comes to Kubernetes is Cluster Autoscaling. While workload autoscaling is a very mature space with great tools and practices, the Cluster Autoscaling world is much more challenging.

Workload autoscaling is agnostic to the underlying infrastructure and the specific Kubernetes distribution being used, whereas cluster autoscaling is highly coupled to both making it a much more fragmented space, with many challenges which will differ based on your specific setup.

Lets take for example AWS vs Proxmox. When using AWS we have cloud concepts such as Autoscaling Groups (ASGs) which can be used by different autoscaling solutions such as the Cluster Autoscaler project which is an official sub-project of the Kubernetes SIG Autoscaling. While this works great for AWS, Proxmox does not work that way and no similar concept exists making Cluster Autoscaling a much bigger challenge.

This is where ClusterAPI though can offer a huge benefit. Cluster Autoscaler has a provider based architecture and while the majority of them are different cloud providers, one of the providers is actually ClusterAPI!

This means that if our clusters are deployed and managed using ClusterAPI we can now add autoscaling capabilities to our cluster without any infrastructure provider dependencies. This is enabled because the ClusterAPI provider simply treats the ClusterAPI resources (In specific MachineDeployments and MachinePools) as the “cloud provider” and it delegates the actual creation of the machines and interaction with the specific infrastructure provider to ClusterAPI, allowing for a seamless, and cloud agnostic interface for Cluster Autoscaler.

This approach has proven to be very successful and has benefited many environments, however as we all know, the Kubernetes landscape never rests and newer, shinier and more advanced technologies keep arising changing often the focus and direction the industry decides to take in a specific area.

Cluster Autoscaling for example is one these areas where we are seeing emerging technologies in this space gaining a lot of traction, which are trying to solve challenges seen with the traditional Cluster Autoscaler. The key project we see gaining traction in this area is Karpenter.

Karpenter which was originally developed by AWS and is now an official sub project of Kubernetes managed under SIG Autoscaling, provides new and exciting advancements in the Cluster Autoscaling space which can help with lowering operational overhead, improve resource optimization, and reduce costs.

While Karpenter is extremely interesting and in AWS it is gaining huge traction, because it is a newer project, it also has the challenge of not supporting nearly as many cloud and infrastructure providers. currently only AWS and Azure have production ready providers for Karpenter making it a great solution in either of those clouds but irrelevant if running anywhere else.

This is an area that the ClusterAPI community has been looking at for the past few months and we now have an Alpha implementation of a Karpenter provider for ClusterAPI which just like with the Cluster Autoscaler provider, it provides a cloud agnostic interface for autoscaling any Kubernetes cluster deployed and managed via ClusterAPI!

While this provider is currently under heavy development and is in very early stages, it is a huge step in the right direction, and is the start of a very interesting and exciting path towards making Karpenter much more accessible to the masses, and also increasing the case for the major benefits of ClusterAPI based management of Clusters!

The Karpenter provider for ClusterAPI is currently in the process of being migrated to the Kubernetes SIGs GitHub organization under the umbrella of SIG Cluster lifecycle which is the SIG in charge of ClusterAPI.

The repo is currently located here but the final URL should be this and all status of the migration can be found here.

I am extremely excited to see where we as a community evolve this provider in the near future, and as a maintainer of the repo, We welcome all comments, issues, feedback and of course PRs to help us build the best provider and solution for the wider community!

Another interesting space I have been investing time investigating and evangelizing throughout the community (most recently at OSS Summit Europe last week in Vienna) is the potential of using KEDA for Cluster Autoscaling which is already possible with ClusterAPI based clusters, as KEDA can auto scale any Kubernetes resource which implements the /scale sub-resource which ClusterAPI MachineDeployments do!

Using KEDA as a Cluster Autoscaler is still very much uncharted territory with some clear rough edges, but the potential is huge, and the ability it opens for predictive autoscaling, instead of just reactive autoscaling is extremely interesting to me, and i hope to continue work in this area too, in order to create more and more solutions, and options for the community in this space!

If you are interested in the KEDA + ClusterAPI possibilities, you can checkout the slides from my recent talk at OSS Summit, or reach out directly!

September 22, 2024

ClusterAPI, Karpenter, KEDA, tkg

Integrating Active Directory CA (ADCS) with TAP

Recently a colleague of mine wrote 2 great blog posts (blog-1, blog-2) regarding configuration of TAP to issue certificates signed by ADCS. The solution he documented utilizes Hashicorp Vault as an intermediate CA between cert-manager and ADCS.

This approach is a very scalable and simple approach which is very well suited for production setups.

The reason for using Hashicorp Vault was that ADCS is not supported by the default cert-manager ClusterIssuer and Issuer resources, whereas Vault is supported OOTB making it a great way to integrate these systems.

A while back i remebered seeing that Nokia had built an external Cert Manager issuer for ADCS which also seemed like an interesting direction to investigate, however they have stopped maintaining the provider, and till recently this seemed like a lost cause.

Recently when taking a look at this again i found that there is an active fork of this provider which has since been updated and seems to be well maintained now making it an interesting project to explore.

In this post I will show how we can integrate the ADCS Cert Manager Provider into TAP directly.

The first step is to install the Provider using the Helm chart using a custom values file with some specific configurations. Lets create that values file:

cat <<EOF > adcs-values.yaml
crd:
  install: true
controllerManager:
  manager:
    image:
      repository: ghcr.io/vrabbi/adcs-issuer
      tag: 2.0.8
    resources:
      limits:
        cpu: 100m
        memory: 500Mi
      requests:
        cpu: 100m
        memory: 100Mi
  rbac:
    enabled: true
    serviceAccountName: adcs-issuer
    certManagerNamespace: cert-manager
    certManagerServiceAccountName: cert-manager
  replicas: 1
  environment:
    KUBERNETES_CLUSTER_DOMAIN: cluster.local
    ENABLE_WEBHOOKS: "false"
    ENABLE_DEBUG: "false"
  arguments:
    enable-leader-election: "true"
    cluster-resource-namespace: "adcs-issuer"
    zap-log-level: 5
    disable-approved-check: "false"
  securityContext:
    runAsUser: 1000
  enabledWebHooks: false
  enabledCaCerts: false
  caCertsSecretName: ca-certificates
metricsService:
  enabled: true
  ports:
  - name: https
    port: 8443
    targetPort: https
  type: ClusterIP
webhookService:
  ports:
  - port: 443
    targetPort: 9443
  type: ClusterIP
EOF

Now we can add the helm repository and install the chart:

helm repo add djkormo-adcs-issuer https://djkormo.github.io/adcs-issuer/
helm install adcs-issuer  djkormo-adcs-issuer/adcs-issuer \
 --version 2.0.8 \
 --namespace adcs-issuer \
 --values adcs-values.yaml \
 --create-namespace

With the ADCS provider installed we now need to create our ClusterAdcsIssuer which is the equivalent of the default ClusterIssuer but for the ADCS provider.

To do this we need to create a secret with NTLM credentials for ADCS:

cat <<EOF | kubectl apply -n adcs-issuer -f - 
apiVersion: v1
stringData:
  password: MySecretPassword
  username: MyAwesomeUsername
kind: Secret
metadata:
  name: adcs-issuer-credentials
  namespace: adcs-issuer
type: Opaque
EOF

And now we can create our ClusterAdcsIssuer. In order to do this we need 3 pieces of information:

The Base64 encoded CA cert of the ADCS serve
The FQDN of the ADCS Web Enrollment URL
The Certificate Template we want to use to generate the certificates

With that data we can create our issuer:

cat <<EOF | kubectl apply -f -
apiVersion: adcs.certmanager.csf.nokia.com/v1
kind: ClusterAdcsIssuer
metadata:
  name: adcs
spec:
  caBundle: <BASE64_ENCODED_CA_DATA>
  credentialsRef:
    name: adcs-issuer-credentials
  statusCheckInterval: 1m
  retryInterval: 1m
  url: https://<ADCS_WEB_ENROLLMENT_FQDN>/certsrv/
  templateName: "<TEMPLATE NAME>"
EOF

Now that we have our setup of the ADCS provider configured we can test out the generation of a certificates by using the following example:

cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: adcs-cert-tap-test
spec:
  dnsNames:
  - test.vrabbi.cloud
  issuerRef:
    group: adcs.certmanager.csf.nokia.com
    kind: ClusterAdcsIssuer
    name: adcs
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  subject:
    organizations:
    - vRabbi
  secretName: adcs-cert-tap-test
EOF

If the setup works as expected you should have a secret called adcs-cert-tap-test with 3 keys including tls.crt, tls.key and ca.crt.

kubectl get secret adcs-cert-tap-test -o yaml

With that now working, we now need to integrate this into TAP which requires a few overlays and depends on the profile in which you are setting this up.

I will describe bellow the configuration for a full profile cluster, but for any other profile, you simply can exclude the overlays for the irrelevant packages.

When creating a certificate using an external Cert Manager provider, as we see in the example above we need to specify the issuerRef a bit differently as we need the API Group of the custom issuer and we need the kind to be the specific kind of the custom issuer vs. the default used in TAP which is the standard OOTB ClusterIssuer kind.

In order to make these changes we first need to understand which parts of TAP need to be manipulated. This splits into 2 key areas:

Knative – All web workloads get auto generated TLS certificates and we need to configure Knative to use our custom issuer
Other Packages with Exposed HTTPProxy resources – some packages (TAP GUI, Metadata Store, API Portal, API Auto Registration) expose services outside the cluster and define certificates and HTTPProxy resources.

For the Knative configuration, if we search through the docs, we can see that there is a configmap in the knative-serving namespace called config-certmanager where we can define the issuer to use.

When trying to see how this is handled in TAP I used the following process:

Find the imgpkg bundle URI of the cnrs package

kubectl get package -n tap-install cnrs.tanzu.vmware.com.2.4.1 -o json | jq -r .spec.template.spec.fetch[0].imgpkgBundle.image

Pull down the bundle to our local machine

imgpkg pull -b harbor.vrabbi.cloud/tap/tap-packages@sha256:57569892d8371ed52c5ebd6177930d3de490e1a1095b39873dc9c12d717cd16a -o cnrs

Explore the package files. Here we can find the relevant files at the following path:

packages/serving/bundle/config/overlays/

The overlay used in TAP for configuring this configmap is called overlay-knative-config-certmanager.yaml The contents of the file are:

#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:yaml", "yaml")
#@ load("@ytt:assert", "assert")
#@ load("@ytt:data", "data")
#@ load("values.star", "issuer_ref")

#@ if data.values.ingress_issuer:
#@ if data.values.domain_name or data.values.domain_config:
#@overlay/match by=overlay.subset({"metadata":{"name":"config-certmanager","namespace":"knative-serving"}})
---
data:
  #@overlay/match missing_ok=True
  issuerRef: #@ yaml.encode(issuer_ref())
#@ else:
#@ assert.fail("cannot set an ingress_issuer without configuring a custom domain")
#@ end
#@ end

Create a custom overlay which similarly updates the issuerRef but with our specific needs
Apply the overlay via TAP values

With the idea of how we can do this now understood we can write our overlay which needs to be stored in a secret in the tap-install namespace so it can be used as an overlay on the cnrs package.

The final overlay for KNative is as follows:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"metadata":{"name":"config-certmanager","namespace":"knative-serving"}})
---
data:
  #@overlay/match missing_ok=True
  issuerRef: |
    kind: ClusterAdcsIssuer
    name: adcs
    group: adcs.certmanager.csf.nokia.com

We can now take this and create secret with the content:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: adcs-knative-overlay
  namespace: tap-install
stringData:
  zzz-adcs-package-overlay.yml: |
    #@ load("@ytt:overlay", "overlay")

    #@overlay/match by=overlay.subset({"metadata":{"name":"config-certmanager","namespace":"knative-serving"}})
    ---
    data:
      #@overlay/match missing_ok=True
      issuerRef: |
        kind: ClusterAdcsIssuer
        name: adcs
        group: adcs.certmanager.csf.nokia.com
EOF

We can now move on to all of thew other packages where the general exploration phase is the same but what we notice in the relevant packages is that TAP uses Certificate resources directly stamped out as part of the packages which actually works great for us, as we can create one overlay that can apply to all of these packages!

The final overlay for the other packages is:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind": "Certificate"}), expects="1+"
---
spec:
  #@overlay/match missing_ok=True
  issuerRef:
    #@overlay/match missing_ok=True
    name: adcs
    #@overlay/match missing_ok=True
    kind: ClusterAdcsIssuer
    #@overlay/match missing_ok=True
    group: adcs.certmanager.csf.nokia.com

And we can just like with the Knative overlay create a secret with this overlay as well:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: adcs-certificate-overlay
  namespace: tap-install
stringData:
  zzz-adcs-package-overlay.yml: |
    #@ load("@ytt:overlay", "overlay")

    #@overlay/match by=overlay.subset({"kind": "Certificate"}), expects="1+"
    ---
    spec:
      #@overlay/match missing_ok=True
      issuerRef:
        #@overlay/match missing_ok=True
        name: adcs
        #@overlay/match missing_ok=True
        kind: ClusterAdcsIssuer
        #@overlay/match missing_ok=True
        group: adcs.certmanager.csf.nokia.com
EOF

With those secrets created the final step is to tell TAP to use these overlays and apply them to the relevant package installations which can easily be done via the following stanza in your TAP Values file:

package_overlays:
- name: tap-gui
  secrets:
  - name: adcs-certificate-overlay
- name: api-auto-registration
  secrets:
  - name: adcs-certificate-overlay
- name: api-portal
  secrets:
  - name: adcs-certificate-overlay
- name: metadata-store
  secrets:
  - name: adcs-certificate-overlay
- name: cnrs
  secrets:
  - name: adcs-knative-overlay

With this behind us we can now apply the new values using the tanzu CLI as usual and we should see all certificates being generated with our custom ADCS issuer!

February 13, 2024

carvel, security, tap

Importing Users and Groups in Tanzu Developer Portal (TAP-GUI)
TDP is an amazing solution built on top of backstage, and the declarative nature in which it is configured and the entities are added is a huge benefit in my mind.

With that said, one of the pain points this brings is user management.

Users and Groups are entity types in backstage just log components and systems, and as such the default way of adding them to your system is to create YAML manifests defining users and groups, and then to register them into backstage. While this is possible to do, the challenge in large organizations is that the need to define these users and groups in multiple places is a big burden and always leads to misconfigurations and lack of consistency across different systems.

It is important to clearly define what we are talking about in this post. TDP since its initial release has supported configuration of login providers which can be nearly any OIDC compliant IDP, as well as a few other key IDPs. This allows us to for example add a login mechanism supporting logins from GitHub, Azure AD, Okta, Gitlab etc.

While that part is solved (except LDAP which is not possible OOTB), the syncing of users and groups from an external identity provider into backstage as user and group entities has been a challenge for many customers.

Recently i discovered that while not documented, there actually is a way of ingesting users automatically from any LDAP server and from Azure AD (now Entra ID) in TAP since the very early releases. In this blog post we will see how we can configure this to work.

Before we dive into the configuration, we need to understand a few terms used in backstage that are relevant to this process:
1. Catalog Processor – The catalog has a concept of processors to perform catalog ingestion tasks, such as reading raw entity data from a remote source, parsing it, transforming it, and validating it. These processors are configured under the catalog.processors configuration key
2. Catalog Provider – Similar to a processor but is the newer and more recommended approach, entity providers sit at the very edge of the catalog. They are the original sources of entities that form roots of the processing tree. The dynamic location store API, and the static locations you can specify in your app-config, are two examples of builtin providers in the catalog.
3. Catalog Location – The catalog holds a number of registered locations, that were added either by site admins or by individual Backstage users. Their purpose is to reference some sort of data that the catalog shall keep itself up to date with. Each location has a type, and a target that are both strings. A location is used heavily in processors, as they define the data which is passed to the ingestion process in which the processor is executed.
As of the early releases of TDP, we have an unspoken treasure in the system, which is the Azure AD and LDAP catalog processors. With these options we can have our users and groups synced automatically from our external IDP into backstage constantly.

Lets see how we can configure this for LDAP:

LDAP Integration

To get started with the LDAP configuration we need to get some data about our LDAP server which in my case is an instance of Active Directory. The details we will need are:
1. what protocol to use (LDAP or LDAPS)
2. the FQDN of our LDAP server
3. the Binding users DN and password
4. the base DN for users and for groups
Once we have that information we can add the needed configuration in our TAP values.

Under the tap_gui.app_config section we most likely already have a catalog key where we have defined some locations. for our use case we will add another key under the catalog key called “processors” under which we will configure our LDAP processor:
```
catalog:
  processors:
    ldapOrg:
      providers:
      - target: ldap://FQDN_OF_YOUR_LDAP_SERVER
        bind:
          dn: "DN_OF_YOUR_BIND_USER"
          secret: "PASSWORD_OF_YOUR_BIND_USER"
        users:
          dn: "BASE_DN_FOR_USER_SEARCHING"
          options:
            scope: sub
            filter: "(objectClass=person)"
          map:
            description: l
            name: sAMAccountName
            rdn: sAMAccountName
        groups:
          dn: "BASE_DN_FOR_GROUP_SEARCHING"
          options:
            scope: sub
            filter: "(objectClass=group)"
          map:
            rdn: sAMAccountName
            name: sAMAccountName
            description: l
```
An example configuration could look like:
```
catalog:
  processors:
    ldapOrg:
      providers:
      - target: ldaps://demo-ad.vrabbi.demo
        bind:
          dn: "CN=Scott Rosenberg,OU=Users,DC=vrabbi,DC=demo"
          secret: "MyS3cR3tP@ssw0rd"
        users:
          dn: "DC=vrabbi,DC=demo"
          options:
            scope: sub
            filter: "(objectClass=person)"
          map:
            description: l
            name: sAMAccountName
            rdn: sAMAccountName
        groups:
          dn: "DC=vrabbi,DC=demo"
          options:
            scope: sub
            filter: "(objectClass=group)"
          map:
            rdn: sAMAccountName
            name: sAMAccountName
```
With that configured we now have told backstage how to process users and groups from that specific LDAP environment, but we still need to tell backstage to pull in data in the first place which as mentioned above, is done via catalog locations.
Just like with a standard git location for a base catalog, we will configure another entry under tap_gui.app_config.catalog.locations like bellow:
```
locations:
  - type: ldap-org
    target: ldap://YOUR_LDAP_SERVER_FQDN
    rules:
      - allow: [User, Group]
```
As we can see we are creating a location of type ldap-org, and then simply providing the target just as it is provided in the processor above.
The rules section is where we can specify that this location is only allowed to register entities of specific types into our environment which in the case of LDAP makes sense to configure as users and groups.

Once we have this configured we can simply update our TAP installation with the new values and within a few minutes, your LDAP users and groups will be available within your TDP instance!
Azure AD Integration
For Azure AD the prinicipals are the same as above, we just need to do a bit of work on the Azure AD side first and then we can configure our instance.

The work we must do in Azure AD is to create a App Registration. This app must have the following permissions (they cant be delegated) for Microsoft Graph:
- User.Read.All
- GroupMember.Read.All
In many cases you will need administrative consent to allow these permissions. Once we have that configured we then need to generate a client secret for our app registration, and collect our client ID and tenant ID settings.

With all of that data we can configure our Azure AD processor:
```
catalog:
  processors:
    microsoftGraphOrg:
      providers:
      - target: "https://graph.microsoft.com/v1.0"
        authority: "https://login.microsoftonline.com"
        tenantId: "YOUR_AZURE_AD_TENANT_ID"
        clientId: "YOUR_AZURE_AD_APP_REGISTRATION_CLIENT_ID"
        clientSecret: "YOUR_AZURE_AD_APP_REGISTRATION_CLIENT_SECRET"
        userFilter: "accountEnabled eq true and userType eq 'member'"
        groupFilter: "mailEnabled eq true"
        userSelect:
        - id
        - displayName
        - description
        groupSelect:
        - id
        - displayName
        - description
```
And we can then configure our location as well to point at the graph API:
```
  locations:
  - type: microsoft-graph-org
    target: https://graph.microsoft.com/v1.0
    rules:
      - allow: [Group, User]
```
With all of this configured we can update our TAP installation with the new values and within a few minutes we should see all of our users and groups synced into our TDP instance!
Summary

This capability is extremely powerful and when using Azure AD can and should be easily integrated alongside Azure AD authentication using the built-in integration, allowing for a seamless end to end user management mechanism.

For LDAP, I would suggest looking into deploying a simple solution like Dex, which can provide an OIDC interface above LDAP and integrate backstage to use Dex as the backend IDP for logins. You can also take a look at the community ldap-auth plugin for backstage which could be wrapped up via a TDP wrapper and integrated into your environment using the TDP configurator, but currently i believe that using Dex will provide a better experience, with less overhead and much easier to configure making it the better solution currently for this use case.

Hopefully this will help you better customize your TDP instance, and make the adoption smoother for you and your users!
November 22, 2023

Backstage, tap
Tanzu Developer Portal Configurator – Deep Dive
In TAP 1.6 we got the first glance at the Tanzu Developer Portal (TDP) configurator.

While the idea was promising, the functionality was extremely limited, and was closer in my mind to a proof of concept then a true value add.

This has now completely changed in TAP 1.7, and we now have an amazing tool, which opens up huge potential for all TAP customers.

Before diving deep into the technical details lets discuss what the configurator tool is, and why we need it.

Quick History

Backstage is an amazing CNCF Project which is the base for the Tanzu Developer Portal (TDP) which was previously known as TAP GUI.

While since the initial GA of TAP, we have had a great portal, which has been enhanced with every release, the portal has not been able to take full advantage of the power of backstage.

Backstage is a highly extensible project, which is built on a plugin based model, and in the open source backstage world, more then 150 plugins have been published, allowing for integrations with many common tools present in customer environments.

The Tanzu Developer Portal Configurator tool, enables you to add plugins to Tanzu Developer Portal, turning it into a customized portal!

While on one hand, TDP till now has been very locked down, the experience of integrating plugins in OSS Backstage is very tedious, and is not for the light hearted, The TDP Configurator tool, is a great step in the direction of making integrating both third party plugins as well as custom in house built plugins a much more maintainable and simple task.

How does it work

The configurator tool, is actually run as a standard TAP workload, as we want to use the Tanzu Build Service capabilities to build our custom portal container image.

When building the custom portal image, we pass it the source bundle which contains the configurator code itself, as well as a base64 encoded string which is simply a list of “wrapper” plugins we want to have added to our portal.

The TDP Configurator takes the list of the plugins that you want to add into your portal. With that list, TDP Configurator generates a developer portal customized to your specifications.

The end result of the configurator, is an OCI image which can be referenced when deploying TAP and configured the same as with the OOTB TDP image, to provide a great experience with your own custom plugin setup to meet your organizations needs.

What is a wrapper plugin

One of the challenges around OSS backstage, and the integration of plugins, is that for every plugin you need to manually edit the code base of backstage itself to include your plugins.

This process is tedious and error prone, and the TDP configurator helps in this regard by introducing the concept of surfaces, and plugin wrappers.

A surface is a discrete capability that a plug-in provides. This can include:
- The ability to show up on the sidebar
- The ability to be accessed at a URL, such as https://YOUR_PORTAL_URL/plugin
- The ability to show up as a Catalog Overview tab
Basically with a wrapper, we have a defined specification where we can define how a plugin should be visualized within the portal itself.

A wrapper is a method of exposing a plugin’s surfaces to the TDP Configurator so that the plugin can be integrated into the portal.

A wrapper imports a reference to the underlying plugin and defines the surfaces that the plugin should expose.

VMware provided wrapper plugins

In TAP 1.7, VMware have released an initial list of validated community plugin wrappers, and have published them to the public npm.js registry for all to use.

The current list of the 9 VMware validated plugins is:
1. Github Actions – allows for a simple way to see the github action runs for your component and view the logs, and status of each job
2. Grafana – this plugin allows easily exposing grafana alerts and links to dashboards related to a component directly on the components overview page
3. Home – this plugin allows users to customize there home page to include any of the provided widgets and customize their backstage experience even more!
4. Jira – this plugin ingests data from jira regarding our component and visualizes it for us on our components within backstage
5. Prometheus – this plugin allows us to add metric graphs for our components to visualize key data for developers right within TDP
6. Snyk – this plugin shows users the vulnerability results from snyk scans against the source repository of a component
7. Sonarqube – this plugin visualizes sonarqube scan results for our components
8. Stack Overflow – this plugin exposes a widget on the home page to see op results on specific topics from stack overflow with hyperlinks out to the results
9. Tech Insights – this plugin performs checks and visualizes the health of our components in regards to those checks directly in the portal
While VMware have supplied us with these 9 plugins, the true power of the configurator is that you can also make your own, and if you have basic typescript and react knowledge it is extremely easy to do!

If you do not have any typescript or react knowledge, i strongly recommend learning the basics first, and then coming back to try out the configurator.

Building Custom Wrapper Plugins

While i wont go into the full details here in this blog, as the official documentation does a pretty good job of walking you through the steps, I will mention a few key things that i have learned along the way while creating over a dozen wrapper plugins over the past few days.
1. In the documentation there are 2 approaches to how to build the TDP image. the second option which uses a custom supply chain is in my mind the much better approach, and i actually have my own custom supply chain, which extends the capabilities of the one in the docs a few more steps, but either approach works well. If you want to check my supply chain out it is on Github in the following repo.
2. sometimes the community plugins are not well maintained, and sometimes they can be a bit buggy. The best option if that occurs to you when wrapping a plugin is to fork the source repository of the plugin, and make any changes needed and publish your own version of the plugin for initial testing. once you have the fix implemented, open a PR and contribute back your fixes to the community! while TDP is a commercial product, because it is based on OSS backstage, we get the great opportunity of being a part of the OSS community. I strongly recommend getting involved with the greater backstage community as you will learn a lot and contributing back is always a great thing!
3. Be careful of the order you specify the plugins in within the TDP configurator file. while the order does not matter in terms of compilation and getting a working portal, the order has direct relation to the order of the tabs on a component, or the order of the items on the sidebar of the portal. thinking clearly about what makes the most sense is highly advisable.
4. Many plugins expose a method which allows a tab to conditionally appear in the UI based on the existence of a specific annotation on the catalog-info.yaml of the component. Think carefully if you want a tab to always be shown even if no data will be available, or if you want the tab to only show up when the user explicitly has added the annotation. basically this is the old time question of opt-in or opt-out methods. I personally prefer the opt0in method, and that is why i actually created my own wrapper of the github actions plugin, and am not using VMware’s validated wrapper, because VMware configured the wrapper so that the plugin always was exposed and I preferred it differently.
What Plugins Can Be Wrapped

nearly any plugin can be wrapped into a TDP plugin. Bellow you can find a list of the plugins I have built wrappers for over the past few days:
1. Github Insights – Source Plugin Repo – Published Wrapper Plugin
  This plugin gives us an overview tab on our components with the details of our projects from github including releases, contributors, the readme, languages used etc.
2. Github Pull Requests – Source Plugin Repo – Published Wrapper Plugin
  This plugin provides a tab on our components where we can see all of the PRs on our components source repository, filter by state of the PR and view basic details about them, as well as quickly use the hyperlinks to open up the PR directly on github.
3. Github Actions – Source Plugin Repo – Published Wrapper Plugin
  This is the same plugin as is provided by VMware for seeing Github action runs for a components, and the logs and details of them, but is configured to show the tab on a component page, only when the required annotations are provided.
4. Todo Frontend – Source Plugin Repo – Published Wrapper Plugin
  This is the frontend part of the TODO plugin, which shows us all of the TODO comments in the source code of our components with hyperlinks directly to the relevant lines in our SCM solution.
5. Todo Backend – Source Plugin Repo – Published Wrapper Plugin
  This is the backend element of the TODO plugin which actually performs the logic behind the scenes which is visualized by the frontend plugin.
6. Harbor Frontend – Source Plugin Repo – Published Wrapper Plugin This is the frontend element of the harbor plugin, which shows a tab when the relevant annotation is added to a component with the list of the image tags related to this components, as well as details about the image such as size, last pull and push times, number of vulnerabilities, severity, and how many fixable vulnerabilities there are in the image, as well as a link to the specific artifact in harbor for you to get more information.
7. Harbor Backend – Source Plugin Repo – Published Wrapper Plugin
  This is the backend element of the harbor plugin, which performs the queries against the harbor instance in order to pull in the relevant data about our images.
8. FluxCD – Source Plugin Repo – Published Wrapper Plugin
  This plugin exposes the FluxCD CRs related to a workload on a dedicated tab for the component, and also if given permission allows for triggering reconcilliation of source objects as well as pausing and resuming reconcilliation of flux CRs.
9. Tekton – Source Plugin Repo – Published Wrapper Plugin
  This plugin visualizes all tekton pipelineruns related to a component and allows for simple filtering and visualization of the relevant steps.
10. ChatGPT Frontend – Source Plugin Repo – Published Wrapper Plugin
  This is the frontend element of the ChatGPT Playground plugin, which gives a ChatGPT interface directly within your portal, as a dedicated sidebar item. While it by default works against OpenAI, I have actually forked the original plugin and exposed the ability to set the URL to be used, allowing this to work with any OpenAI compatible API such as fast chat, or Azure OpenAI for example.
11. ChatGPT Backend – Source Plugin Repo – Published Wrapper Plugin
  This is the backend element of the ChatGPT plugin which performs the needed logic and communication with the OpenAI APIs and returns the results back to us.
12. Developer Toolbox – Source Plugin Repo – Published Wrapper Plugin
  This plugin exposes over 20 different developer focused tools in a single page within your portal to allow for simple day to day tasks, such as generating QR codes, encoding and decoding, file diffing, format conversion etc.
13. K8sGPT – Source Plugin Repo – Published Wrapper Plugin
  This plugin exposes the findings of the K8sGPT operator on our components for us, allowing us to see where issues exist in our application froma kubernetes perspective as well as advice on how to fix the issues.
These are the wrappers i have built as I found them to be useful and beneficial for my use cases, and while you can use these packages which have been published to a public repository on npm.js, I strongly recommend trying to wrap a plugin your self as well.

I had not touched typescript or react in a while, and that made the start of my journey a bit challenging, but after doing 2-3 wrappers, I got a hold of the concepts, and was able to build a wrapper plugin within a matter of 10-15 minutes, which is pretty awesome!

Next Steps

While I have built a bunch of plugins, and have integrated them into my own TAP environments, everything i have discussed above, i did manually. I strongly recommend building out an automation process for updating plugins within the wrappers, and building new versions of the portal image. Even if you don’t have a full fledges CI setup to also test the new image, and to validate the plugins work, automating the build time process in it of itself will bring you huge efficiency and lower the toil of managing custom plugins over time!

Summary

As I hope you can tell, the capabilities this opens, are endless, and the massive productivity boost one can gain now from TDP, by being able to add in any needed tools is quite amazing, all without the complexity of managing OSS backstage.

If the plugin wrappers above seem interesting to you, and you want to check out the code you can find all of these wrappers I have built, in the following git repo.
November 7, 2023

tap
VMware Explore 2023 Las Vegas – Recap
TLDR

This years VMware Explore in Las Vegas was an amazing conference filled with great content and an overall really good vibe.

The announcements that were made during the conference were extremely interesting and it is great to see the deep investment VMware are making in the AI world together with Nvidia with the whole Private AI Foundation announcement.

VMware also announced the changes in the Aria and Tanzu spaces, which while again we need to get used to new names, and rebranding of products, I believe is actually a much more cohesive and powerful separation of the products and focuses between the 2 suites of products, which will ultimately be better for VMware customers.

Tanzu News

As part of the conference this year, VMware announced the new model of the Tanzu Portfolio, which is now focused on both Cloud Native Applications and platforms, as well as on Multi Cloud Management.

Within the New and revamped Tanzu, we now will find 2 main offerings:
1. Tanzu Application Platform (TAP)
2. Tanzu Intelligence Services (TIS)
Tanzu Application Platform

TAP, which should not be confused with what was known as TAP till this conference, as that is only a part of it. the new TAP is actually the combination of the previous TAP, together with TMC, TSM and the new and truly amazing Tanzu Application Engine (more details in a dedicated post)!

Tanzu Intelligence Services

TIS includes different tools including (Wavefront, ah no Tanzu Observability, ah no Aria Operations for Applications, ah no Tanzu Insights), Cloud Health as well as new and exciting features to be added in the near future which will enable for continuous and highly optimized intelligence services for constant optimization of our environments.

Aria Updates

While last year the buzz of aria was multi cloud, the vast majority of the multi cloud features like Hub and Graph were moved to be part of Tanzu, and Aria is refocusing on the core elements it started from which is the intial vRealize Suite of tools (vRA, vROPS, vRLI and vRNI), and is refocusing mostly on Private and Hybrid clouds.

Broadcom

at the conference it seemed to be made pretty clear the benefits that this acquisition will bring, and I believe that while only time will tell, the opportunities this acquisition can bring are huge for all of VMware and broadcoms customers. hopefully with the right execution, and the right amount of time for adjustments, the market will realize how great this move was for all parties.

Summary

The conference included some amazing content, and its truly great to see the innovation coming out from VMware!
August 26, 2023

Uncategorized
TAP 1.6 – Tanzu CLI Improvements
Across the Tanzu portfolio as a whole, their has been a major effort for a long time to build a truly cohesive and user friendly experience for operators and developers.

Since the early days of TKG, and since the beginning of TAP, the Tanzu CLI has been a key element of how users interact with the platform.

While TKG, TAP and recently TMC as well, all were using Tanzu CLI as the base CLI for their products, each product had its own releases of the CLI, which were often incompatible with one another, and the distribution of the CLI was not simple, as it had to be downloaded from Tanzu Network, and no automated installation, or package manager support existed.

VMware understood this challenge, and did a complete rearchitecture of the CLI in terms of package and plugin management, as well as distribution of the CLI, and now in TAP 1.6 we are seeing the fruit of this long awaited new Tanzu CLI!

With TAP 1.6 and on, Tanzu CLI is now a separately released product / tool, which is not only available on Tanzu Network, but also as a github release artifact, as well as it being made available in all main package managers such as brew, apt, yum, and choco.

The Tanzu CLI now also offers an easy way to download and upgrade the product specific plugins via the new concept of plugin groups.

To show how great this new UX is, lets take an example of a windows user, that has choco package manager setup on their machine, and needs to use Tanzu CLI for TKG 2.2, TMC and TAP 1.6:
```
choco install tanzu-cli
tanzu plugin install --group vmware-tkg/default:v2.2.0
tanzu plugin install --group vmware-tmc/tmc-user
tanzu plugin install --group vmware-tap/default:v1.6.1
```
This experience is truly a game changer, and makes enablement and getting users onboarded to TAP in particular as well as all of the Tanzu products, so much easier, and a much more polished UX.

The new CLI model also makes airgapped support easy with the ability to relocate the plugins to an airgapped OCI registry via a simple download command and then one more simple upload command:
```
tanzu plugin download-bundle --to-tar /tmp/plugin_bundle_complete.tar.gz
tanzu plugin upload-bundle --tar /tmp/plugin_bundle_complete.tar.gz --to-repo `registry.example.com/tanzu-cli/plugin`
```
Once images are relocated, all an end user needs to do to get the plugins, is to run one more command before installing the plugins as usual which is to set the URI for the internally hosted plugins:
```
tanzu plugin source update default --uri registry.example.com/tanzu-cli/plugin/plugin-inventory:latest
```
As you can see, this is a huge improvement over previous releases, and will make the overall getting started experience so much smoother!

Summary

It is really great to see the whole Tanzu picture coming together, and having the unified CLI experience, will hopefully be just the first step in the direction of truly making the better together story a reality!
July 30, 2023

tap, tkg, tmc
TAP 1.6 – New Bitnami Services
In TAP 1.5, VMware added the integration with Crossplane for dynamic backend service provisioning.

With TAP 1.5, we also got a few OOTB Bitnami backed offerings that utilize this Crossplane integration. Now in TAP 1.6, this integration has been expanded to also support MongoDB and Kafka.

These 2 new services integrate easily with spring boot applications using spring cloud bindings, as well as a bunch of other languages and frameworks, which have implementations for the service binding specification for kubernetes.

While many organizations will end up building their own offerings for backing services that are more tailor made for their exact needs and requirements, having more and more OOTB offerings is an amazing thing.

These OOTB offerings allow for a quick and easy way to get started using service bindings, and can be expanded on, and customized to your needs, as you mature with your adoption of the platform.

just as was the case with the previous version of the bitnami services in TAP 1.5, we can not only use the open source Bitnami based charts, but you can also easily integrate this with charts provided via VMware Application Catalog, which is the commercial offering based on the Bitnami catalog, with some really great security, manageability, and traceability benefits.

Overall, this is a addition to the product which may go un-noticed by many, but the simple fact that just as in TAP 1.5 i could simply run:
```
tanzu services class-claim create my-psql \
    --class postgresql-unmanaged \
    --paramater storageGB=10
```
and get a PostgreSQL cluster up and running, and ready to be bound to my application, you can now run the same for mongodb or kafka, my simple changing the name of the class:
```
tanzu services class-claim create my-mongo \
    --class mongodb-unmanaged \
    --paramater storageGB=10

tanzu services class-claim create my-kafka \
    --class kafka-unmanaged \
    --paramater storageGB=10
```
The simplicity of the UX, with the huge power and flexibility it provides, is truly amazing!

I’m really happy to see this offering is still growing with every release, and can’t wait to see what will be coming in future releases as well to keep on enhancing this amazing feature of TAP!
July 30, 2023

tap
TAP 1.6 – App Live View Improvements
One of the great features in TAP Developer Portal (previously TAP GUI) since the initial release, has been the app live view plugin, which can help visualize actuator data from spring based java applications as well as steeltoe based dotnet core applications.

In TAP 1.6, Application Live View (ALV) has been enhanced with 2 key new features
1. Per User RBAC for secure access to sensitive operations
2. Spring native support
Lets dig a bit into each of these enhancements and see what is in store for us in this release!
Per User RBAC for sensitive operations

In TAP 1.5, the ALV components were enhanced to add in an additional layer of security around exposure of accelerator data.

While actuator data can be extremely helpful when debugging or simply monitoring an application, depending on the settings you have configured, for which actuator endpoints to expose, it can also be a security risk to have such powerful access to manipulate an application live through Tanzu Developer Portal (TDP).

ALV now has split the endpoints exposed into the regular actuator endpoints and the “sensitive” endpoints. this seperation allows for what was made the default in TAP 1.5, which is that only non-sensitive endpoints are visualized in TDP by default.

While this is great from a security perspective, we loose out by default from having the ability to run actions that may be extremely helpful such as downloading a heap dump or thread dump, or changing the log levels on a live running pod.

With TAP 1.6, we can now not only toggle this setting like in TAP 1.5 to allow or disable sensitive endpoints, but we can actually allow this feature for a subset of users, while denying it for the rest of our user base.

This is done via standard kubernetes RBAC resources, using a custom resource and verb pairing:
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: alv-execute-sensitive-op-role
rules:
- apiGroups: ['appliveview.apps.tanzu.vmware.com']
  resources:
  - sensitiveoperationsaccesses
  verbs: ['execute']
```
With that Cluster Role applied to you cluster you can start creating role bindings for specific users to utilize this role as such:
```
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: scott-alv-admin
  namespace: dev-team-1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: alv-execute-sensitive-op-role
subjects:
- kind: User
  name: "scott@vrabbi.cloud"
  apiGroup: rbac.authorization.k8s.io
```
While this is a great feature, to gain full usage of it, and really see the benefits, you must have a very specific setup.

By default in TAP, access to the kubernetes clusters from the UI, is performed by using a shared service account token which is placed in your TAP values file. If this were to be how you configured your environment, the setting mentioned above, does not help very much, as all users once logged into the TDP, will use the same credentials to access the cluster.

With that said, TAP provides us with the ability to use per user authorization to the kubernetes clusters, however this does require using the same OIDC provider for your cluster as well as for TDP itself. This method is documented for GKE and for EKS, however it should be possible to get it to work with other kubernetes distributions as well, as long as the same OIDC client setup is used for both the cluster and TDP.

Spring Native Support

The other new feature in ALV in TAP 1.6 is support for native compiled Spring based applications.

Spring native apps, are becoming a really popular and paradigm shifting topology organizations are starting to adopt. While not all apps can be compiled today to spring native, due to either dependencies which are not compatible yet, or simply some internal logic which doesn’t support it within your own code, the ability to compile your app into native machine code, with a lightweight JVM replacement via GraalVM, and then being able to save magnitudes of memory and CPU usage, as well as much shorter startup times, makes using Java so much more practical, and powerful, when dealing with micro services and other cloud native architectures, in which size and performance, and speed of startup truly are the name of the game.

While native apps are now supported in TAP 1.6 within ALV, it is important to note that not all actuator endpoints are available yet. I’m sure that with future releases, more and more endpoints will be integrated into ALV, making the experience even better than it is right now!
July 30, 2023

security, tap
TAP 1.6 – GitOps RI With Hashicorp Vault

In TAP 1.5, a new installation model was introduced based on a GitOps model, utilizing the Carvel toolset under the hood, to power it all.

With TAP 1.6, beyond overall bug fixes, and nice changes to the overall UX of the GitOps Installation method, a really key feature that has been added, is the integration with Hashicorp Vault.

The GitOps installation model, requires us to use a secret management solution as some of our TAP values are indeed very sensitive and can’t simply be pushed to git.

In TAP 1.5, we had 2 options. we could use Mozilla SOPs, which is the easiest method, in which we encrypt fields within a YAML file using a key pair, and then pushg the encrypted files to git. We then provide the private key to decrypt the content to the GitOps tooling in our cluster, which is responsible for decrypting the content and applying the needed configuration.

The other option we had in TAP 1.5, was the use of External Secrets Operator (ESO) which is included in TAP, and configuring ESO to use AWS Secrets Manager for storing our sensitive values. In this scenario, the GitOps tooling would pull down the sensitive data from AWS Secrets Manager using ESO, and then deploy what is needed to our cluster.

What’s new in TAP 1.6, is the support for my favorite, and probably the most commonly used secrets manager today in the kubernetes ecosystem which is Hashicorp Vault.

This is enabled, just like the AWS Secrets Manager solution, via TAP’s ESO integration.

While this may seem like a small feature, it truly is a game changer and opens up huge opportunities for customers that are either on prem, or multi cloud users, where having a cloud agnostic solution like Vault, is a much more viable solution then using a cloud specific offering.

The new integration includes a set of easy preperation scripts for creating the needed roles and policies within Vault as well as on the cluster itself to enable the integration.

While setting up the GitOps installation can take a bit more time then the manual installation method, and adds a level of complexity, the day2 management and benefits it provides, far outweigh the added upfront complexity, which also to be honest, is not too difficult to understand and perform.

Summary

This is another small enhancement in the way ESO is being integrated into TAP, and I’m truly looking forward to seeing more and more secret management capabilities and integrations in future releases!

July 30, 2023

gitops, security, tap
TAP 1.6 – TAP GUI Supply Chain Plugin Updates
The Supply Chain Visibility plugin in Tanzu Developer Portal (TDP) which is formerly know as TAP GUI, is a key element of the TAP solution.

This plugin enables the visibility of our workloads “path to production” as it traverses through the cartographer supply chain.

In TAP 1.6, there are a few key enhancements that have been made, which enhance the usability, and overall UX of using this plugin.

the main new features i want to call out in this post are:
1. New Log Viewer
2. Deliverable Status
3. SBOM downloading
In previous versions of TAP we had a very simple log viewer interface for the different steps within a supply chain that logs were relevant for, such as testing pipelines and image builds.

While the previous log viewer worked it had some drawbacks that made it quite difficult to use in advanced scenarios.

In TAP 1.5 for example, if you had a testing pipeline with multiple tasks, only the logs from the first task were shown. another issue was the build logs for the kpack image build were all streamed together from all containers in one view, which made understanding which step failed sometimes a difficult task. There were also no logs for the config writer step which was unfortunate.

In TAP 1.6, the log viewer was completely revamped, and not only those issues above were solved, but we also got the ability to see the side by side view of a log from a task in tekton and the script content which is run by that tekton task, making debugging much easier and streamlined.

Source Tester Log Viewer:

Image Build Logs:

Config Writer Logs:

As can be seen, the new log viewer is a huge improvement and it makes the overall UX much smoother than before!

The next major update for the plugin is actually in the overview page of the plugin.

Previously, you could see the workload status and basic details as well as which clusters the deliverable was applied to in the overview table, but the deliverable status itself was not shown.

Now in TAP 1.6, we hav a new field introduced to show us the deliverable status! While this may seem like a small feature, the ability to have a birds eye view of the overall status of my CI and CD status of all my workloads at one glance is awesome!

The final major feature i want to highlight for the supply chain plugin in TDP is a great new feature in the security realm.

This new feature allows you to download the Software Bill of Materials (SBOM) from the Supply Chain plugin directly. This allows you to easily with a single click, obtain the SBOM in SPDX or CycloneDX formats which are stored in the metadata store as a result from source code and image scanning steps in a supply chain!

This feature is great, and the ability to download in all of the major industry standard formats, allows for integrating easily with the wider security ecosystem, no matter which formats they work with.

Summary

As can be seen, this release of the plugin includes some great updates, with enhanced usability at the forefront!

I’m truly excited to see where this plugin goes in the future, as the product keeps maturing and growing, and as backstage enables more and more opportunities and extension points for plugins like this!
July 30, 2023

tap

LDAP Integration

Azure AD Integration

Summary

Quick History

How does it work

What is a wrapper plugin

VMware provided wrapper plugins

Building Custom Wrapper Plugins

What Plugins Can Be Wrapped

Next Steps

Summary

TLDR

Tanzu News

Tanzu Application Platform

Tanzu Intelligence Services

Aria Updates

Broadcom

Summary

Summary

Per User RBAC for sensitive operations

Spring Native Support

Summary

Summary