Home

  • TAP 1.6 – AppSSO Improvements

    TAP 1.6 – AppSSO Improvements

    Managing SSO for applications is a complex task, and since TAP 1.4, we have had a great feature called AppSSO, which is aimed at helping make the story of SSO integration much easier for developers and operators.

    In TAP 1.6, a major effort was put into AppSSO to make it even more streamlined, and simple to use, while also exposing some more new advanced features, which can be very handy when needed.

    In TAP 1.6 the main changes in AppSSO include:

    1. Cluster Workload Registration Class
    2. Cluster Unsafe Test Login
    3. Custom Claim Mappings
    4. Configurable token expiry settings

    Lets go over what each of these improvement entails and what use cases they can help with.

    Cluster Workload Registration Class

    This integration is extremely exciting to see. With this new API, we can easily integrate AppSSO with Crossplane and the Services Toolkit components to allow for dynamic service provisioning of OIDC clients from an auth server!

    It exposes an AuthServer as a ready-to-claim AppSSO service offering, making it really easy for developer to create new clients for an auth server on demand!

    When you create a ClusterWorkloadRegistrationClass (cwrc) resource, this will create behind the scenes a Crossplane Composition and a Services Toolkit ClusterInstanceClass.

    The composition itself templates out a new CRD which is also part of this new offering caller a WorkloadRegistration.

    While in the past we had the ClientRegistration API, the new API, is more portable, and easily can be promoted between environments without changes to the spec, supporting a GitOps approach which is always the preferred approach in the Kubernetes world.

    As we have the ability via the service toolkit to limit permissions to users, for specific classes, and each cwrc creates a new class which is tied to a specific auth server, we can easily manage RBAC and permission boundaries for different IDPs we use, in multi tenant scenarios, while still providing a simple and clean UX for a self service offering to our developers which is awesome!

    While more complex and advanced feature can be added, lets take a look at a simple example of what this could look like:

    CWRC Resource

    apiVersion: sso.apps.tanzu.vmware.com/v1alpha1
    kind: ClusterWorkloadRegistrationClass
    metadata:
      name: dev-sample
    spec:
      base: 
        spec:
          authServerSelector:
            matchLabels:
              env: dev
    

    Creating a new client

    apiVersion: services.apps.tanzu.vmware.com/v1alpha1
    kind: ClassClaim
    metadata:
      name: dev-sample-01
    spec:
      classRef:
        name: dev-sample
      parameters:
        workloadRef:
          name: my-workload
        redirectPaths:
          - /redirect/uri
    

    As you can see, the new API is extremely simple to use and understand, and can open up some amazing opportunities for offering a truly self service SSO offering to your end users.

    While this is great for production, and more advanced setups, in dev environments or POCs, we often simply want something that just works. The above APIs are simple, but still require configuring the auth server and understanding more low level details than we want to deal with in dev environments. This is where the next new feature comes in.

    Cluster Unsafe Test Login

    This new API is the recommended way to get started with AppSSO in non production environments.

    The new CRD ClusterUnsafeTestLogin (cutl) is so simple, it does not even have a spec!

    the only element which is configurable is the name.

    apiVersion: sso.apps.tanzu.vmware.com/v1alpha1
    kind: ClusterUnsafeTestLogin
    metadata:
      name: demo
    

    When you apply the above resource to a cluster, a few things will be created for you. First an Auth Server with an internal unsafe IDP will be deployed. next a token signing key will be generated, and finally a ClusterWorkloadRegistrationClass (cwrc) will be created, which will in turn create a ClusterInstanceClass and a Composition as mentioned above.

    The auth server is configured in this setup with an http only issuer URI and allows all CORS origins. While this is definitely not a production setup, it makes integrating into your apps much easier, and removes all of the auxiliary issues from the time of development, making it much easier to get started, and to make sure the SSO elements are working for you as expected, and then you can switch to a more production ready setup, with a real backing IDP, more secure CORS configurations and of course TLS only communication.

    The auth server which gets deployed as part of a CUTL, includes a single user with the username “user” and the password “password”.

    Once you apply a CUTL resource, you can then create a class claim just like with a CWRC as seen above.

    apiVersion: services.apps.tanzu.vmware.com/v1alpha1
    kind: ClassClaim
    metadata:
      name: dev-sample-01
    spec:
      classRef:
        name: demo
      parameters:
        workloadRef:
          name: my-workload
        redirectPaths:
          - /redirect/uri
    

    This new API truly makes getting started on integrating SSO into you apps a breeze, and makes the barrier of entry to AppSSO so much lower!

    While these past 2 features focused on simplifying the API and consumption of AppSSO, the next 2 improvements are for more advanced use cases, but are truly amazing to see, and actually can be game changers in certain environments.

    Lets take a look at the one I am most excited about which is custom claim mappings support!

    Custom Claim Mappings

    With this new capability, service operators can control which claims appear in an Auth Server issued ID Token, and how to obtain this value from an upstream identity provider.

    Recently when working on a project where i needed to test 4 different OIDC providers, i learned that the 4 claims which this specific app needed, only one of them was the same accross the different OIDC providers. While all data was there, every provider named their claims differently (group / groups / group_names / Groups), (full_name, name, fn, Full_Name) and so on.

    While this may seem like a simple issue to solve, it actually is not easy, as within an application i need to look at a specific field in an ID Token to extract the data i need to know who this user is, and what permissions they should have.

    This issue gets even more complex when working with an upstream IDP which is not OIDC based, such as LDAP or SAML.

    With this new feature, we can now at the Auth Server level, perform mappings between upstream fields, and the claims on the Auth Server generated tokens!

    for example, if i want to standardize on the name “groups” for my groups claim, yet i am using workspace one which provides this data as “group_names”, i can simply set the following in my Auth Server:

    spec:
      identityProviders:
      - name: wso-idp
        openid:
          idToken:
            claims:
              - fromUpstream: "group_names"
                toClaim: "groups"
    

    This simple yet extremely powerful feature is a huge boost to the AppSSO tooling which opens up awesome opportunities for integrations.

    While the above works for external IDPs such as OIDC, LDAP and SAML, the configuration for internal unsafe IDPs in an auth server is a bit different, but also easy to configure.

    spec:
      identityProviders:
        - name: test-users
          internalUnsafe:
            users:
              - username: joe
                password: "password"
                roles:
                  - "dev"
                claims:
                  given_name: "joe"
                  family_name: "rosenberg"
                  middle_initial: "A"
                  email: "joe@vrabbi.cloud"
                  alt_address: "123 Awesome Street"
              - username: wendy
                password: "password"
                roles:
                  - "operator"
                claims:
                  alt_address: "456 Cool Street"
                  middle_initial: "T"
    

    As you can see here, we simply configure per user, the custom roles and claims they should have in their ID tokens. With this ability, we can easily simulate real world test cases, even when using an internal unsafe IDP!

    There is one more new feature in AppSSO in TAP 1.6, which while it may sound less interesting and exciting, it makes the offering more secure and more tunable to whatever your security teams requirements are.

    Token Expiry Settings

    In TAP 1.6, we now have the ability to configure the token expiry settings at a per auth server level for access, id, and refresh tokens.

    The default expirations set by AppSSO are 12 hours for access and id tokens, and 30 days for refresh tokens.

    In order to configure these settings, you just need to add the following to your auth server manifest:

    spec:
      token:
        accessToken:
          expiry: "5m"
        idToken:
          expiry: "5m"
        refreshToken:
          expiry: "8h"
    

    The time durations can be provided in seconds (s), minutes (m), or in hours (h). this means that for example the default 30 days for refresh tokens is actually set as “720h”.

    Summary

    As can be seen, a lot of effort was put into AppSSO in this release, making the integration tasks easier and more streamlined, and also improving on the capability set, and adding new and improved functionality to help all personas involved.

    I’m really excited about these new features and can’t wait to see them being used in the wild!

  • TAP 1.6 – Metadata Store Improvements

    TAP 1.6 – Metadata Store Improvements

    The Metadata Store has been a key element in a secure supply chain within TAP since the GA of TAP, and it provides a central location where all CVE data and SBOMs are stored for our source code and images.

    In TAP 1.6, we get a great new set of functionality in the metadata store, allowing for us to now have vulnerability reports stored per build and not just a single report per workload image.

    This new feature can be extremely beneficial, as it allows us to perform queries and figure out in which version a specific vulnerability was first introduced, as well as a really great ability to understand which CVEs TBS for example with new dependencies was able to solve for us without the need to change anything in our source code!

    The aggregated report is not gone, we simply now also have per build reports, giving us the best of all worlds!

    While this may seem like a small feature, it actually truly is huge, and is a huge milestone down the path to being able to perform full end to end tracability and attestation for our supply chains, as well as providing a clear and simple API to be able to gain truly important insights on the development flows within our organization.

    This new functionality, combined with the new data being included in the Metadata Store DB via AMR, can open up endless opportunities for data driven decision making and reporting for TAP, which simply has not been possible till today.

    Summary

    While this new functionality is API/CLI accessible only today, I truly hope to see this integrated into a UI flow in a future release, where we could do diffs between reports of an image and gain clear visibility of the changes between specific image builds in a clear and concise manner!

  • TAP 1.6 – Local Source Proxy

    TAP 1.6 – Local Source Proxy

    One of the best new features, if not the best new feature in TAP 1.6, is the introduction of a new component called Local Source Proxy (LSP).

    One of the main challenges we have seen with rolling out TAP, is that while TAP aims to provide an abstraction above kubernetes, making the infrastructure invisible to the developer, the amount of infra level config that is needed on the developers machine, just to get started was a nightmare.

    From installing Tanzu CLI from Tanzu network, to getting the right access to a kubernetes cluster and configuring the correct kubeconfig, and then above all of that, the user needed docker on their machine, as well as credentials to access the image registry of choice within the organization in order to do iterative development, as the OCI registry is used as a conduit for passing the source code form the developers IDE to the remote cluster.

    In TAP 1.6, a huge amount of this has been solved. I have another post in which i go into detail on the new Tanzu CLI, and in this post we will be talking about LSP which solves the docker issue.

    Now in TAP 1.6, my developers no longer need access to the image registry from their machines, and don’t need to have docker either!

    LSP is a new components which has been integrated into the entire inner loop tooling. With LSP, when a developer begins a new inner loop development by use of the Tanzu apps CLI, or from the IDE plugins, instead of the source code being uploaded to the image registry from their machine, a port forward is opened between the LSP service on the cluster and their machine which serves as a inline proxy for the apps plugin to push the code too, which in turn mushes it to the image reigstry, however as this happens from within the cluster, a single registry credential is needed to be configured for LSP itself, and no developer ever needs access to it!

    This new method, also works with ECR which is great, as the flow currently with ECR is even worse, sue to the fact that amazon do not allow for repo creation on push, and that meant that in nearly all cases, developers needed to pre-create repositories in ECR for every new microservice, making their lives much more entangled in the infrastructure then it ideally should be.

    The new LSP model, works really well, and after testing it for a few weeks, i can say that it truly is a game changer for the developer experience, especially when onboarding developers to TAP, as its one less thing they need to deal with.

    I was working with a customer on TAP recently where they have harbor configured with OIDC based authentication, which due to how Harbor works, requires their developers to login to the Harbor UI every few hours, and then login via the CLI again, just in order to do inner loop development. This new feature provided by LSP will greatly simplify the DevEx, and make the context switching and overall cognitive load much more reasonable for the average developer.

    Summary

    It is really great to see the improvements in this area, and i believe that these types of changes are what will truly help make the adoption of TAP a much smoother process, making developers truly happy, as well as giving operators the control and governance they need, without standing in the way of the developers.

  • TAP 1.6 – TBS Improvements

    TAP 1.6 – TBS Improvements

    Tanzu Build Service (TBS) is a key component of TAP, allowing for building images directly from source code without needing to write and maintain docker files.

    TBS itself is built upon the opensource project kpack which till recently was hosted under the pivotal github repo, and recently was donated to the Cloud Native Buildpacks project and is now located at https://github.com/buildpacks-community/kpack.

    In TAP 1.6, TBS has a few changes that should be noted.

    The first major change is the removal of the Ubuntu Bionic stack, making the Jammy stack the officially supported option, now that ubuntu bionic is EOL from canonical. As Jammy was made the default in TAP 1.5 it should not cause any issues, but if you are using the bionic stack, be warned!

    The next major change, is the way you install the full dependencies package. In TAP the full TBS dependencies which are required in air gapped environments, and strongly recommended in production environments even when internet access is available, are packaged in a separate carvel package repository, and installed as another package install alongside TAP.

    Previous to TAP 1.6, the versioning of the package repository and package itself were based on the version of the TBS package within TAP and not on the TAP version itself. Now with TAP 1.6 and on, the versioning is the same as the TAP version, making the process easier to manage.

    Another key difference in the installation is that previously you did not need to pass any values into the TBS full dependencies package install, but in TAP 1.6 and on, you need to pass values to the package which should be the same values file that you provide to TAP itself.

    With all that out of the way we can now talk about the feature im truly excited about which is the introduction of the new Buildpack and ClusterBuildpack CRDs.

    Previously in Kpack, we had 3 main system level CRDs:

    1. ClusterStack – the pairing of a build and a run image which are used to build your images
    2. ClusterStore – a cluster scoped resource that references multiple buildpackages.
    3. ClusterBuilder / Builder – resources to define and create Cloud Native Buildpacks builders all within the kpack api. basically a pairing between a store and a stack.

    While this model worked, the clusterstore caused a bunch of issues, and had some limitations and pain points, as well as management overhead.

    As per the Buildpack CRD RFC:

    The ClusterStore being the single location for all buildpack images used in multiple builders makes managing the available buildpacks within kpack cumbersome. Adding new buildpacks requires modifying an existing resource and removing buildpacks requires carefully selecting the buildpack image to remove from the list of buildpackage images. The kp cli was built to handle this complexity but, the kp cli should not be a prerequisite to managing buildpacks within kpack.
    The ClusterStore being a monolithic specification of all available buildpacks leads to performance issues within kpack. If the list of available buildpacks within a ClusterStore is lengthy the reconciliation of a single ClusterStore can take considerable time. If a single buildpack within the ClusterStore is unavailable reconciliation will fail which will cause the entire contents of the ClusterStore to be unavailable.

    With that background we can now see why a new solution was needed.

    To solve these issues, Kpack has introduced a new set of CRDs, Buildpack and ClusterBuildpack, both providing the same functionality just one at the namespace scope and one at the cluster scope.

    A Buildpack CR is simply a CR pointing at the OCI image of the relevant buildpack, these buildpack CRs are now referenceable within a builder directly, without the need of working with stores.

    As of TAP 1.6, the new TBS configuration uses this new set of APIs instead of the previously used ClusterStore APIs.

    While this may seem like a backend detail which should not really matter to most, and that may indeed be the case, it allows for less errors and a better performing platform, and it also allows for easier integration of third party or custom homegrown buildpacks.

    A good example of this is when i had a use case to add a third party buildpack which enables installing apt packages into images.

    When i did this in previous TAP versions, i needed to package my own Cluster Store, and then also update my builder accordingly, now with TAP 1.6, i can simply add the new buildpack resource and reference it in the builder, with no need to rebuild and package a cluster store which is a tedious process.

    This model opens up a much easier path to extending TBS with custom buildpacks which is a great option to have in your toolbelt for the day that you need it.

    Summary

    While these changes may not be game changing things you will interact with daily, the subtle yet continuous improvements in these components such as TBS, are great to see, and all form together what in my opinion is the most mature and full featured Developer Platform offering on the market!

  • TAP 1.6 – Crossplane Updates

    TAP 1.6 – Crossplane Updates

    Crossplane has been updated to version 1.12.1 in TAP 1.6 and this bring along some really amazing features!

    Beyond the bump of crossplane which we will discuss in length bellow, a few more fixes and additions were made to the TAP packaging of Crossplane to improve the UX.

    These updates include the support for installing providers in environments with custom CA certificates, ability to configure if to orphan or delete all crossplane resources and XRDs when deleting the package installation, support for working with an externally installed crossplane implementation via Helm or other means, and finally improved package configuration to make the package installation wait for the Crossplane providers to be ready and healthy before completing.

    With that all behind us, lets get the real exiciting features that have been unlocked, with the update to Crossplane 1.12.

    While the updates in Crossplane 1.12 are huge, I want to focus here on the 2 key features which can greatly improve the integration within TAP!

    Observe Only Resources

    While I am a huge fan of Crossplane, in the world of IaC, Terraform is probably the most common tool used.

    When evaluating Crossplane and Terraform, each has its pros and cons, but one of the things we saw as a huge challenge for crossplane was the fact that they did not have a mechanism similar to a data source in terraform.

    Crossplane only knew about objects it managed. this made integrations in public clouds in particular a very difficult task.

    If i want to create an RDS instance in an existing VPC, i want to be able to pull out the VPC details that i need from the cloud, and pull out the needed subnet details as well, and then use them in my resource i want to create.

    Previous to Crossplane 1.12, you needed to pass in any values to your resources manually, and their was no dynamic lookup mechanism available.

    This has now changed and we now have a great new feature called Observe Only Resources (OOR)!

    With OOR, Crossplane is able to observe and expose the full live state of an external resource, without performing any write or destructive operations.

    This can be read about in depth, including the design decisions and more in the following link.

    This opens up amazing capabilities and allows for more straight forward and production ready compositions to be made for real world scenarios in the TAP world for backing services in the different cloud providers!

    I am already working on some interesting use cases, and plan to share some of my new examples in the near future on github.

    The next major change in Crossplane 1.12 that can strongly benefit TAP is the introduction of provider families.

    Provider Families

    Crossplane is amazing, and offers great valkue, especially when integrating with public cloud services, however till now their were serious issues in terms of performance due to the large number of CRDs installed by Crossplane providers.

    When you for example installed the AWS provider in TAP 1.5, that would install 850+ CRDs to your cluster, which in the best of cases slowed down your cluster, and in some cases could cause your API server to crash if it was not sized correctly for such load!

    The Crossplane team understood this issue was serious, and began working on this from multiple directions.

    The first and best approach taken, was to go to the upstream kubernetes community and try to work on better scalability of the parts of the API server in charge of managing CRDs. While this work is progressing, and improvements are being made, the process is slow, and will take a long time to role out to all environments.

    The next approach is what we have here now in Crossplane 1.12, which is the idea of Provider Families.

    With provider families the idea is to break up the old monolithic providers into smaller, service based providers, and then depending on which resources you need to manage, you only install the providers that you need.

    The AWS provider for example has been broken into 155 different providers.

    Lets take the example of the Official AWS Provider from upbound which previously installed 903 CRDs into your cluster!!! If lets say we need to manage RDS instances, VPCs, and IAM roles, we would now need the following providers:

    • provider-aws-iam – 22 CRDs
    • provider-aws-vpc – 1 CRD
    • provider-aws-rds – 21 CRDs

    This means that for this type of environment you would go from 903 down to 44 CRDs!!! This is a huge improvement, and it enables us to truly build our solutions as we need them, without putting unneeded stress on our clusters.

    Summary

    The new version of Crossplane, truly unlocks a huge set of use cases for advanced service bindings, allowing for maximum control, with maximum DevEx, and maximum performance all at the same time!

  • TAP 1.6 – Namespace Provisioner Improvements

    TAP 1.6 – Namespace Provisioner Improvements

    What Is The TAP Namespace Provisioner

    Namespace Provisioner provides a secure, automated way for platform operators to provision namespaces with the resources and namespace-level privileges required for their workloads to function as intended. It enables operators to add additional customized namespace-scoped resources using GitOps to meet their organization’s requirements and provides continuous reconciliation using the kapp-controller to maintain the actual desired state of the namespace-scoped resources.

    Why Is This Needed

    For anyone that dealt with TAP before TAP 1.4, they know the experience of “Preparing a developer namespace”. As TAP is a fully kubernetes based solution, almost all configuration is done via kubernetes YAML manifests. TAP is such a powerful system, covering so many areas in the development lifecycle and the path to production, which is amazing however, this also means that lots of resources, credentials, templates, etc. must be created in a namespace in order to provide the developer with the needed permissions to deploy their applications.

    The “wall of YAML” that was needed to prepare a namespace manually was not a great experience, and was often a true burden on the platform team, and caused for difficulties and delays in the onboarding of new applications to the platform.

    What Is New In TAP 1.6

    TAP 1.6 offers some really nice simplifications for previously tedious options. Some of the key new features include:

    1. A simplified way to skip creation of certain OOTB resources NSP typically would install
    2. A simplified way to manage the default service account in NSP managed namespaces
    3. Support for lists and objects to be passed to NSP via annotations
    4. Simplified TAP Values to not need to provide the path value in the additional_sources stanza

    Let’s take a look at each of these improvements and see what they offer.

    Skipping OOTB Resources

    One of the nice elements of TAP is that it is a batteries included solution, but the batteries are swappable.

    When using the testing and scanning supply chain, by default the scanner which is used is grype.

    While Grype is a good solution, many companies want to use other scanners, either OSS like trivy, or commercial like Prisma, Aqua, Carbon Black, Snyk etc.

    NSP when used in a cluster with the testing and scanning supply chain configured, by default installs the grype package for each managed namespace in order to support a seamless experience.

    While this is great for those that use Grype, it was a pain for those that wanted to use a different scanner.

    In TAP 1.6, we now have a new option which allows us to disable grype installation either globally or at the per namespace level.

    To disable Grype at the global level, in your TAP values file you can now simply add:

    namespace_provisioner:
      default_parameters:
        skip_grype: true
    

    This will skip the grype installation for all namespaces. If however you want to do this at the per namespace level instead you can simply add the following label or annotation to your namespace:

    # via annotation
    kubectl annotate ns YOUR_NAMESPACE_NAME param.nsp.tap/skip_grype=true
    
    # via label
    kubectl label ns YOUR_NAMESPACE_NAME param.nsp.tap/skip_grype=true
    

    This can also be done via the gitops mechanism by adding the “skip_grype” parameter to the namespaces definition:

    #@data/values
    ---
    namespaces:
    - name: dev
      skip_grype: true
    

    As you can see, this is a much better experience then in previous versions, and allows for easily integrating with other scanners when so desired.

    This is also possible to disable the automatic adding of limit ranges per namespace via a similar mechanism where the only difference is the parameter name, which instead of being skip_grype, is skip_limit_range:

    namespace_provisioner:
      default_parameters:
        skip_limit_range: true
    

    Managing Service Account Secrets

    One of the resources that TAP manages for us via NSP in every developer namespace, is the default service account.

    This is the service account that by default will be used for applying and managing our workloads.

    When using TAP in a GitOps flow topology, we need to add a Git secret to this service account in order to support both pulling down source code from private repositories, as well as to push the generated kubernetes YAML to our Git repositories.

    In TAP 1.6, the process of managing this has becoming much more streamlined. Lets see how you would do this now in TAP 1.6:

    The first step is to create a secret with the needed values in the tap-install namespace:

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: Secret
    metadata:
      name: git-creds-for-workloads
      namespace: tap-install
    type: Opaque
    stringData:
      content.yaml: |
        git:
          host: GIT-SERVER-URL
          username: GIT-USERNAME
          password: GIT-PASSWORD-OR-TOKEN
    EOF
    

    Next we need to add a file to our NSP git repository with the following content:

    #@ load("@ytt:data", "data")
    #@ load("@ytt:base64", "base64")
    ---
    apiVersion: v1
    kind: Secret
    metadata:
      name: git-creds
      annotations:
        tekton.dev/git-0: #@ data.values.imported.git.host
    type: kubernetes.io/basic-auth
    data:
      username: #@ base64.encode(data.values.imported.git.username)
      password: #@ base64.encode(data.values.imported.git.password)
    

    Note that this file has no credentials or FQDNs in it, rather we are using YTT to template this secret for us, and will use the secret from above in the first step to fill in the needed values at runtime.

    Next we need to configure NSP to reference our git repository with the above defined YTT template, as well as our secret we created with the git authentication details:

    namespace_provisioner:
      additional_sources:
      - git:
          ref: origin/main
          subPath: ns-provisioner-samples/credentials
          url: https://github.com/vmware-tanzu/application-accelerator-samples.git
      import_data_values_secrets:
      - name: git-creds-for-workloads
        namespace: tap-install
        create_export: true
      default_parameters:
        supply_chain_service_account:
          secrets:
          - git
    

    As can be seen above, we are actually using the sample from vmware with the same yaml definition in Git as provided above, but this can also be hosted in your own repo if you so desire.

    We can also set this at a per namespace level by simply removing the “default_parameters” section from the NSP section in our TAP values, and provide which secrets to add to our workload via annotations on our namespace:

    kubectl annotate ns YOUR_NS \
      param.nsp.tap/supply_chain_service_account.secrets='["git-creds"]'
    

    and if mulitple secrets were needed, for example when also doing the cosign integration within a supply chain you could simply add that to the array:

    kubectl annotate ns YOUR_NS \
      param.nsp.tap/supply_chain_service_account.secrets='["git-creds","cosign-creds"]'
    

    While above this is setting the service accounts secrets, the same is possible for image pull secrets via the same mehanism simply replacing the “secrets” key for “imagePullSecrets” wither via the annotation or via the TAP values for a global setting, or also via the NSP GitOps model by adding the same configuration in your desired namespaces yaml file.

    Arrays And Objects Via Annotations

    sometimes, we need to pass a set of values related to one another into NSP for a specific namespace, in order to provide the right level of customization and automation for our environments.

    NSP now supports the ability as seen above in the managing secrets section, to pass in arrays as well as json objects to NSP as parameters via annotations which will be automatically parsed correctly into the relevant types in NSP.

    If we want to pass in an array value we can simply provide the value of our array within single quotes and square brackets as an annotation, and for an object within single quotes, we simply use curly brackets. Lets see a few examples and what they will end up looking like:

    kubectl annotate ns dev-ns \
      param.nsp.tap/basic.array='["sample-1","sample-2","sample-3"]'
    
    kubectl annotate ns dev-ns \
      param.nsp.tap/basic.object='{"s1":"v1","s2":"v2"}'
    
    kubectl annotate ns dev-ns \
      param.nsp.tap/advanced='[{"s1":"v1","s2":"v2"},{"s3":"v3"}]'
    

    With the above samples, we would recieve the following in our desired namespaces configmap:

    #@data/values
    ---
    namespaces:
    - name: dev-ns
      advanced:
      - s1: v1
        s2: v2
      - s3: v3
      basic:
        array:
        - sample-1
        - sample-2
        - sample-3
        object:
          s1: v1
          s2: v2
    

    while the above is just an example, the options this unlocks are truly exciting. you could via a few annotations enable auto creatuin of AppSSO Auth servers, Backing services via class claims, Spring Cloud Gateway configurations, ACS configurations, a dedicated API portal per namespace, and much more.

    Simplified additional sources section of TAP values

    The final improvement i want to mention is that you no longer need to provide the path parameter when using the additional_sources section in your TAP values.

    previously, for every additional git source you provided, you needed to specify a path to which the files in that repo would be synced to. Now in TAP 1.6, this can still be provided, but if it is not provided, a default value will be created for you, making it one less thing to configure and mess up!

    Summary

    As you can see, a lot of nice features have been added to NSP in this release. While none are game changers, the simplicity it provides is a huge benefit, and will make managing NSP configurations a much lower barrier for entry than before, allowing more customers to provide better value and more customized experiences to the platforms end users!

  • TAP 1.6 – IDE Plugin Improvements

    TAP 1.6 – IDE Plugin Improvements

    The TAP IDE plugins are a critical element of the platform, as they are the main interface for most developers to the platform, and meeting developers where they want to be, which is within their IDE is a critical element of any good platform.

    As is the case with every release of TAP, the IDE plugins in TAP 1.6 have also been updated with some nice new features!

    Their are 4 key improvements in this area of TAP in TAP 1.6:

    1. Local Source Proxy support for IntelliJ and VSCode
    2. Support for Spring Native applications for IntelliJ and VSCode
    3. Support for Gradle projects with Live Update and Remote Debugging for IntelliJ and VSCode
    4. App Accelerator plugin for IntelliJ is now GA

    Local Source Proxy (LSP)

    The Local Source Proxy is a great new feature in TAP 1.6, which i have written a dedicated post about for anyone interested in more details, but the main goal, is to allow for simpler developer environment configuration, and removing the need for developer machines to have credentials and be configured to be able to push the source code from their local IDE when doing iterative development to the companies container registry.

    With the support for LSP, developers no longer need to handle docker logins, or mapping of source images in their IDE settings or within their Tiltfiles!

    While this may seem like a small feature, the impact this has for end users is huge, and is overall a much more elegant and secure option than what was previously available!

    Spring Native Support

    Spring Native applications are truly awesome, and the performance benefits are huge!

    With that said, they have their challenges, and till TAP 1.6, Spring Native apps could not be iterated on with Live Update due to the nature of the apps being pre compiled to native executables and not being exploded JARs as is the case with standard spring apps.

    With TAP 1.6, developers can now Live Update and debug spring-native applications non-natively and then deploy to a cluster as a native image.

    This means that while developing it will be the same type of deployment using a non native compilation as with a standard spring app, but when promoting the code to our build clusters, the image that will be built is a Spring native app.

    This allows for the best of both worlds, in which we can benefit from live update and remote debugging in development, but gain performance and resource benefits when promoting our apps to higher environments!

    Gradle Support

    While I personally am a fan of Maven, and do not really like gradle, it is a very common option used in Java applications.

    Till TAP 1.6, only MAven based projects were supported with the IDE plugins, but now we have official support, as well as updated sample accelerators, which provide support for Gradle based projects for all of the different aspects and features of the IDE plugins!

    This is extremely helpful for onboarding existing applications into the environment.

    While not yet officially supported, due to the new capabilities, I have even gotten this to work with Kotlin based applications using Gradle!

    IntelliJ App Accelerator Plugin

    With the release of TAP 1.6, the new and improved App Accelerator plugin for IntelliJ is now GA and has feature parity with the VSCode plugin.

    The main new capability which was added in this release, is the ability to create a git repo when generating a new project from an accelerator automatically!

    This is a great improvement, and makes the choice between VSCode and IntelliJ to simply be a matter of preferences of the developer, and no longer a matter of which features of the TAP plugins they need.

    Summary

    The IDE plugins are a key element of the platform, and expanding the types of apps which can benefit from the plugins, as well as making the UX for developers simpler is a huge plus!

  • TAP 1.6 – CVE Triage Flow

    TAP 1.6 – CVE Triage Flow

    TAP has many features which help with securing our software supply chain.

    One of the key elements of security is obviously source code and image scanning which TAP has had since GA, but as we all know, finding the vulnerabilities is one thing, but how to triage these found vulnerabilities is an entire beast in it of itself, and this is actually where a lot of the true pain lies.

    As per the documentation:

    The new Triage feature of Tanzu Application Platform allows you to store vulnerability analysis information alongside the current data handled by SCST – Store. Using the Tanzu Insight CLI, users can now perform basic triaging functions against any detected vulnerabilities. The main objective is to reduce spreadsheet and tool toil by centralizing CVE scanning, identification, and triaging in one place.

    While the current flow is documented as a purely CLI based flow, using the insight plugin for Tanzu CLI, because this is also a fully API driven flow, Hopefully we will see this flow also integrated into a UI based flow in a future release.

    Currently this is an experimental feature, and will likely change over time as the feature gets more and more usage and feedback from customers, but the idea in it of itself is extremely powerful, and testing it out and providing feedback is a great way to influence the future of the product in this area!

    What does the flow look like

    The first step when beginning to triage a new vulnerability, is to find which CVE we are going to triage, and for which image and workload.

    Lets walk though a simple flow with the new Triage command:

    tanzu insight triage update \
      --cveid $CVE_ID \
      --pkg-name $PKG_NAME \
      --pkg-version $PKG_VERSION \
      --img-digest $IMG_DIGEST \
      --artifact-group-uid $WORKLOAD_UID \
      --state in_triage
    

    As can be seen above, we are setting the specific CVE in question into a triage state.

    Next once we have done our triage and gathered the information, we can update the triage data in the metadata store by using the triage command again, but this time with all the needed information.

    When we talk about the information that should be included in triage data this is categorized in the TAP triage flow, into a few key pieces of data:

    1. State of the triage – this can be any of the following:
      • resolved = the vulnerability has been remediated.
      • resolved_with_pedigree = the vulnerability has been remediated and evidence of the changes are provided in the affected components pedigree containing verifiable commit history and/or diff(s).
      • exploitable = the vulnerability may be directly or indirectly exploitable.
      • in_triage = the vulnerability is being investigated.
      • false_positive = the vulnerability is not specific to the component or service and was falsely identified or associated.
      • not_affected = the component or service is not affected by the vulnerability. –justification should be specified for all not_affected cases.
    2. Justification of the impact anaylsis – this can be any of the following:
      • code_not_present = the code has been removed or tree-shaked.
      • code_not_reachable = the vulnerable code is not invoked at runtime.
      • requires_configuration = exploitability requires a configurable option to be set/unset.
      • requires_dependency = exploitability requires a dependency that is not present.
      • requires_environment = exploitability requires a certain environment which is not present.
      • protected_by_compiler = exploitability requires a compiler flag to be set/unset.
      • protected_at_runtime = exploits are prevented at runtime.
      • protected_at_perimeter = attacks are blocked at physical, logical, or network perimeter.
      • protected_by_mitigating_control = preventative measures have been implemented that reduce the likelihood and/or impact of the vulnerability.
    3. Response from package maintainer or supplier – can be one or more of the following:
      • can_not_fix
      • will_not_fix
      • update
      • rollback
      • workaround_available
    4. A Plain text comment – this can be used to add any additional notes beyond the above options to explain the status and logic behind a decision.

    With those pieces of data an example of a CVE triage command could be:

    tanzu insight triage update \
      --cveid $CVE_ID \
      --pkg-name $PKG_NAME \
      --pkg-version $PKG_VERSION \
      --img-digest $IMG_DIGEST \
      --artifact-group-uid $WORKLOAD_UID \
      --state not_affected \ 
      --justification requires_environment \
      --response workaround_available \
      --comment "This is only exploitable on ARM based processors and we run all K8s on x86 processors"
    

    Beyond the basic flow from above, we can also list all CVE triages and there status using the triage list command, as well as copy triage data between images easily using the triage copy command.

    Summary

    While this is only a CLI flow, and the UX is still a bit rough, the idea and promise of a solution like this is huge, and it is great to see the direction the TAP team are taking, going beyond just alerting that a vulnerability exists, but trying to assist customers in how to build workflows and processes in order to start to tackle the vulnerabilities found in an organized, and central manner!

    I’m very excited to see this mature over time, and to see where this ends up evolving over the next few releases!

  • TAP 1.6 – AMR Observer

    TAP 1.6 – AMR Observer

    As part of the new version of the scanning mechanism in TAP which was released in Alpha in version 1.5, and has now been promoted to Beta in TAP 1.6, we now have a new component called the Artifact Metadata Repository Observer (AMR Observer).

    Overview

    This new component is part of the Artifact Metadata Repository (AMR), a service designed to be a central source of truth for artifact metadata across multiple clusters and environments.

    AMR Observer watches for changes in resources and emits cloud events when these changes occur. It enhances the capabilities of the AMR by providing real-time updates about the state of resources.

    The AMR Observer brings several key features to TAP 1.6:

    1. Real-time Monitoring: The AMR Observer watches for changes in specific resources and emits cloud events when these changes occur. This allows for real-time monitoring and tracking of resources.
    2. Customizable Configuration: Users can configure the AMR Observer to watch for changes in specific resources. This customization allows users to focus on the resources that are most relevant to their needs.
    3. Integration with AMR: As a component of the AMR, the AMR Observer contributes to the AMR’s goal of providing a single, consistent view of artifact metadata across multiple clusters and environments.

    The AMR Observer is deployed to the build and run clusters when enabled. It communicates with the Kubernetes API Server to obtain the cluster’s location ID and emits a cloud event to the AMR Cloud Event Handler. The AMR Observer watches for ImageVulnerabilityScans and workload ReplicaSets.

    Why Does This Matter

    As part of the redesign of the scanning mechanisms within TAP, one of the key goals was to simplify the extensibility of the platform to allow for a Bring Your Own Scanner (BYOS) model in a much more simplified and straight forward manner.

    While TAP has always allowed for you to bring your own scanner, the UX for this was not great to say the least, and required deep knowledge and tight coupling with TAP itself, making the process not very approachable for most customers.

    When scanning an image or source code, 4 key steps must happen:

    1. Perform the scan
    2. Output an SBOM in CycloneDX or SPDX format
    3. Push the data to the central metadata store
    4. validate the scan results against your desired security policy

    In the old version of the Scanning architecture, when bringing your own scanner, you needed to solve all of these issues in one resource, causing confusion and a great dealk of overhead.

    In the new AMR model, you only need to deal with steps 1 and 2, while the platform will automatically handle the rest for you.

    AMR Observer is the new component which is in charge of the 3rd step, which basically means that it will watch for scan CRs in your cluster, and when the occur, it will retrieve the outputted SBOM which is now stored beyond in the metadata store, also as an OCI artifact in your container registry of choice, and normalize the data and send it to the AMR metadata store via cloud events.

    This is only a small part of the AMR observer, but it is a huge improvement of the current situation, decoupling the scanning from the platform specific needs, makes integrating new scanners much easier and provides many other security benefits as well.

    The other thing mentioned above which AMR Observer watches are the replicasets of your clusters, and in particular, those of your deployed workloads.

    When a change happens in a workload, for example a new revision is being deployed, the AMR Observer will report this data to AMR as well via cloud events. this allows us to have a single place of truth, where we can understand what is running where at any given time!

    While currently the data collected by AMR is relatively limited, the foundation is there for more and more relevant data to be added in future releases which I am very excited about.

    By having a central location which can aggregate data from CI, CD and running applications, as well as the ability to correlate these pieces of data, we can achieve truly amazing insights, we can perform simple correlation between a source and the final running app in production, and understand historically what it went through in order to get there, which means that we can also use this data to understand how a platform like TAP is assisting us in improving our developer experience, and velocity using a mechanism like DORA metrics or other similar standards.

    Another key element of AMR, is the GraphQL endpoint which is exposed in TAP 1.6, allowing us to not only get the data to a central location, but also query that data based on our own unique needs, using a simple and user friendly GraphQL playground which is included in TAP 1.6!

    Because this endpoint is now exposed to us, we can also use it to build out dashboards in external monitoring solutions such as Grafana.

    We can see for example bellow an example of a widget i have built in Grafana, to show based on the location which is the runtime cluster, which revisions of apps are running, and additional information about them:

    While not all data is exposed here, with the foundation in place, im truly excited to see where this goes in the future releases!

    Summary

    While the new scanning mechanism is not fully at feature parity with the original mechanism, the new features and design including the AMR Observer, are extremely promising in my mind, and once they mature a bit more they will truly provide a much better offering to TAP users than what we currently have!

    This new architecture and what it can bring to the table is truly exciting from my perspective, as it allows for not only a better experience within the platform itself, with more advanced functionality, but also allows for easy integration into thrid party systems, using the GraphQL endpoint!

  • TAP 1.6 – App Scanning 2.0 Improvements

    TAP 1.6 – App Scanning 2.0 Improvements

    The new scanning model “Supply Chain Security Tools – Scan 2.0” which was introduced back in TAP 1.5, now includes some great new improvements, and has been promoted from Alpha to Beta!

    The new model, is much easier to extend and customize to your own organizations needs, and is built with a more scalable and secure architecture.

    In the previous model of the scanning feature in TAP, image scanning definitions needed to handle 4 main topics:

    1. Perform the scan
    2. Output an SBOM in CycloneDX or SPDX format
    3. Push the data to the central metadata store
    4. validate the scan results against your desired security policy

    Now with this new model, the image scanning definition is only responsible for scanning the image and outputting an SBOM with the results in CycloneDX or SPDX formats. From their the platform will handle the rest by pushing the SBOM to an OCI registry, and then the AMR Observer will pull down this data and transfer it via CloudEvents to the AMR Persister which will save the data in the Metadata Store.

    With TAP 1.6, we now have the ability to easily integrate the new scanning mechanism in the OOTB testing and scanning supply chain, and we also get visibility into the results from the scans in the Tanzu Developer Portal (TDP) formerly known as TAP GUI.

    The new mechanism is based on a CRD called ImageVulnerabilityScan (IVS) which you define in your cluster, and sample IVS templates are provided in the docs for Grype, Trivy, Prisma, Snyk and Carbon Black.

    Summary

    The new Scanning framework is really looking great, and the functionality while not yet feature parity with the initial framework, does provide alot of benefits.

    The main lacking currently, is the lack of ScanPolicy support. One other key lacking in the new model so far, is that it only covers image scanning at this point, and does not cover source code scanning.

    Source code scanning has also been removed from the OOTB supply chains in this version, but can be re-integrated if you need that functionality.

    While the Source Code scanning in TAP was never great, hopefully VMware will add back this functionality in a new and more feature rich manner by integrating with common SAST and DAST solutions which would suite the needs of TAP workloads much better.