Auto Generation of Certs for TAP Workloads

When we setup a TAP environment, one of the key aspects we must take into account is how we will be exposing our workloads outside of the cluster.

By default, TAP workloads are deployed as Knative services and are exposed via plain HTTP which is not a very secure or production ready solution.

Another option we get very easily with a few additional lines in our configuration file for TAP, is the ability to provide a secret that has a wildcard certificate in it that will be used for all of our workloads in the cluster.

While using a wildcard certificate for a platform has become a common practice as we have seen with TAS, OCP and others, wouldn’t it be great if we could have actual workload specific certificates auto generated for us by the platform and not need to use a wildcard?

In this post we will cover a way to achieve this in which our CA of choice is an Active Directory CA server.

Why not just use a wildcard

While the setup and configuration of a wildcard certificate is extremely easy it definitely has some drawbacks we must consider.

The most TAP centric issue we can encounter with wildcard certificates is that they simply don’t work at scale within TAP without changing how the ingress URLs are generated.

By default, TAP uses the following naming conventions for ingress URLs:

<WORKLOAD NAME>.<NAMESPACE>.<DOMAIN>

While that may seem appealing, this does not work well with wildcards, as there is no way to create a wildcard certificate that supports "x" number of subdomains, and a wildcard only can support a single segment (the first one) of a domain name being anything.

This means that in order to use the default naming convention, we need to not just have a wildcard for ".<DOMAIN NAME>" but rather we will need to add Subject Alternative Names with the format of ".<NAMESPACE>.<DOMAIN>" for all of the namespaces in which we will want to deploy workloads.

Knowing all of the needed namespaces is not feasible upfront which means we need a solution that is more dynamic.

Solution #1 – Change the Domain Template

This is probably the easiest solution. In this solution, we simply add a few more lines to our TAP configuration in which we will change the convention based on which an ingress will be created from:

<WORKLOAD NAME>.<NAMESPACE>.<DOMAIN>

To:

<WORKLOAD NAME>-<NAMESPACE>.<DOMAIN>

By doing this, we put everything that is dependant on the workload into a single section of the FQDN, and then a wildcard certificate will work.

To do this the additional lines we would need to add to our TAP values files for any cluster created with the full, run, or iterate profiles would be:

cnrs:
  domain_template: '{{.Name}}-{{.Namespace}}.{{.Domain}}'

While the above solution is an easy way to solve the issue of the wildcard certificate mentioned above, there are 2 other issues with wildcards we need to take into consideration.

  1. From a security perspective, the biggest concern with wildcard certificates is that when one server or sub-domain covered by the wildcard is compromised, all sub-domains may be compromised. In other words, the upfront simplicity of the wildcard can create significant problems should things go wrong.
  2. From a maintenance perspective we also have the need to remember on a typically yearly basis to replace the wildcard certificate in our clusters. If we were to forget to change the certificate in the cluster in time, ALL of our workloads would have TLS issues at the same time which could cause severe impact on your business.

As we can see, using a wildcard may suffice for some use cases, and can be an easy way to get started, there has to be a better way….

Generating Certificates At Runtime With Cert-Manager

One of the components included in TAP is Cert-Manager.
Cert-Manager is an industry standard kubernetes operator that can manage the entire lifecycle of certificates in the context of a Kubernetes environment.

When we work with public domains, we can use the integration for example with LetsEncrypt or really any ACME server, and generate our certificates in an easy and automated manner.

One of the nice things with using Cert-Manager is that Knative which is the default deployment mechanism for our workloads in TAP, has an OOTB integration with Cert-Manager, in which it can auto generate the certificates when a new Knative Service is deployed!

This allows us to not need to worry about certificate creation, and let the platform deal with it automatically!

The way that this works is that you create a ClusterIssuer CR in your cluster, which is a custom resource that is provided by Cert-Manager, that is utilized for issuing the certificates we request of it.

The Issue Of On-Prem Environments

While the idea of auto generating trusted certificates sounds great, it has some challenges when working in a typical On-Prem environment.

Typically we see that Microsoft’s Active Directory CA solution, is the most commonly used CA when dealing with this type of environment, and unfortunately, Cert-Manager does not have an integration with this CA.

While we could build such an integration (It has been done in the past), this would require a lot of work and maintenance that simply is not an ideal situation or even possible for many organizations to undertake.

We could also decide to use the self signed issuer type in Cert-Manager, and simply allow Cert-Manager to create self signed certificates for each of our workloads.

While self signed certs may work for demo environments or even development environments, they really are not a solution for a production grade platform because everyone will receive certificate warnings any time they try to access the application.

The Solution – Using an Intermediate CA

When dealing with certificates, we have the concept of an intermediate or subordinate CA.

An intermediate CA, is a certificate that has been signed by the root CA, and has been given the "permissions" to issue certificates on behalf of the root CA.

Once we have an intermediate CAs full chain in a PEM format, as well as the private key for the intermediate CA in PEM format as well, we can use the Cert-Manager CA ClusterIssuer type, and have Cert-Manager generate certificates that are signed by the dedicated intermediate CA, we have provided it.

How to set this up

The first step is to install TAP as we always would without this solution. We also do not need to change the default naming template for our services, as certificates can be generated for the default naming convention as well!

Once we have deployed TAP we are going to next configure Cert-Manager and create the ClusterIssuer we will be using.

In order to do this you will need to have the certificate chain (cert.cer) and the private key (cert.key), both in PEM format saved in files on your working machine, and then we can create a secret from these files:

kubectl create secret generic tap-intermediate-ca -n cert-manager \
  --from-file=tls.crt=cert.cer --from-file=tls.key=cert.key

Now that we have the secret created with our intermediate CA data, we can create the ClusterIssuer:

cat << EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: ca-issuer
spec:
  ca:
    secretName: tap-intermediate-ca
EOF

Now that we have everything setup, and ready to be configured in TAP, we will create one final secret, that contains a YTT overlay that will configure the Knative system to use the newly created ClusterIssuer and auto generate TLS certificates for our workloads:

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: cnrs-tls-overlay
  namespace: tap-install
type: Opaque
stringData:
  tls-overlay.yaml: |
    #@ load("@ytt:overlay", "overlay")
    #@ load("@ytt:data", "data")
    
    ---
    #@overlay/match by=overlay.subset({"metadata":{"name":"config-certmanager"}})
    ---
    data:
      #@overlay/remove missing_ok=True
      _example:
      #@overlay/match missing_ok=True
      issuerRef: |
        kind: ClusterIssuer
        name: ca-issuer
    
    #@overlay/match by=overlay.subset({"metadata":{"name":"config-network"}})
    ---
    data:
      #@overlay/remove missing_ok=True
      _example:
      #@overlay/match missing_ok=True
      autoTLS: "Enabled"
      #@overlay/match missing_ok=True
      httpProtocol: "Redirected"
      #@overlay/match missing_ok=True
      default-tls-secret: "kube-system/wildcard"
      #@overlay/match missing_ok=True
      domainTemplate: "{{.Name}}.{{.Namespace}}.{{.Domain}}"
    
    #@ def kapp_config():
    apiVersion: kapp.k14s.io/v1alpha1
    kind: Config
    #@ end
    
    #@overlay/match by=overlay.subset(kapp_config())
    ---
    rebaseRules:
    #@overlay/append
    - path: [data]
      type: copy
      sources: [new, existing]
      resourceMatchers:
      - kindNamespaceNameMatcher: {kind: ConfigMap, namespace: knative-serving, name: config-certmanager}
      - kindNamespaceNameMatcher: {kind: ConfigMap, namespace: knative-serving, name: config-network}
EOF

And now we can simply tell TAP to apply this overlay via a simple addition to our TAP values file:

package_overlays:
- name: cnrs
  secrets:
  - name: cnrs-tls-overlay

The final step is to simply apply the changes to TAP using the Tanzu CLI:

tanzu package installed update tap -n tap-install -f <YOUR TAP VALUES FILE>

Summary

While the setup currently takes a bit of extra work, I strongly believe that this solution is a more secure, and more flexible solution.

Using this mechanism can work for any CA and is a very simple way to allow for more complex, or unique naming conventions for you Knaitve service ingress URLs in a secure and managed way.

Another key benefit is that typically an intermediate CA is made valid for 5 years, where as a standard certificate is valid for only 1 year if not less. Cert-Manager, because it is the one managing the certificates, also manages the lifecycle and will auto renew and rotate the certificates before they expire, keeping your mind clear, and freeing you from replacing certificates on a very frequent basis.

The one thing we have not covered but is a very good idea if you go down this approach, is to utilize External DNS which can auto create and manage DNS records for all of your workloads URLs freeing you from the need to create wildcard DNS records as well!

Using Industry standards like Cert-Manager and External DNS to enhance the TAP experience is a truly great setup, that offers a secure, flexible and easy to maintain platform!

2 Replies to “Auto Generation of Certs for TAP Workloads”

  1. Within the https traffic, will the whole cert chain sent to the clients? I had issues with such internally signed certs when the (web) server did not provide all the certs and the client could not store and trust it.

Leave a Reply

Discover more from vRabbi's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading