Additional volumes capability #736

seuf · 2019-11-26T14:17:21Z

Additional Volumes

This pull request allow operator to mount additional volumes in the postgresql Statefulset.
a new key is available in the postgresql manifest : additionalVolumes.
For each volume listed, a volume will be added to the podSpec and a a volumeMount to each container of the pod.
Each item must contain a name, mountPath and volumeSource definition.
The volumeSource must be a kubernetes volume declaration. This alow you to mount existing persitentVolumeClaim, configMap or a shared emptyDir with the initContainer.

Example :

  additionalVolumes:
    - name: data
      mountPath: /var/lib/postgresql
      volumeSource:
        PersistentVolumeClaim:
          claimName: pvc-postgresql-data
          readyOnly: false
    - name: tmp
      mountPath: /tmp
      subPath: foo.txt
      volumeSource:
        configMap:
          name: my-config-map
    - name: empty
      mountPath: /opt/empty
      volumeSource:
        emptyDir: {}

Related issues :

docs/reference/cluster_manifest.md

erthalion · 2019-12-06T14:02:17Z

Thanks for the PR and sorry for late reply. I like the idea, since I'm almost sure it can enable more interesting approaches. Few commentaries about the current implementation:

It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.
Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?
I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).
This change seems unrelated:

@@ -1334,11 +1366,11 @@ func (c *Cluster) generateCloneEnvironment(description *acidv1.CloneDescription)
                        c.logger.Info(msg, description.S3WalPath)

                        envs := []v1.EnvVar{
-                               v1.EnvVar{
+                               {
                                        Name:  "CLONE_WAL_S3_BUCKET",
                                        Value: c.OpConfig.WALES3Bucket,
                                },
-                               v1.EnvVar{
+                               {
                                        Name:  "CLONE_WAL_BUCKET_SCOPE_SUFFIX",
                                        Value: getBucketScopeSuffix(description.UID),
                                },

It would be nice to see a few simple unit tests for this functionality.
Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

seuf · 2019-12-09T16:04:34Z

* It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.

I used the same approach as the secrets. A sidecar pod can need the additional volume too. For example, I can use an initContainer which grab a new certificate from my PKI and share it via emptyDir with the Spilo container and the telegraf sidecar (for tls auth). (that's typically why I'm doing this PR)

* Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?

Since the volume mounted is not managed by the operator itself, I don't think the should handle resizing, because we don't know of if the additional Volume is a ext4 or xfs. (ak resize2fs or xfsgrow)

* I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).

Yes ! You're right. we should forbidden to override /var/lib/postgresql/data for mount math.

* This change seems unrelated:

Yes, I know It's because I've run a go fmt . I can unstash this.

* It would be nice to see a few simple unit tests for this functionality.

I'll try to add one.

* Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

That's why CI tests are failing ?

erthalion · 2019-12-10T09:47:40Z

It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.

I used the same approach as the secrets. A sidecar pod can need the additional volume too. For example, I can use an initContainer which grab a new certificate from my PKI and share it via emptyDir with the Spilo container and the telegraf sidecar (for tls auth). (that's typically why I'm doing this PR)

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

for i := range podSpec.Containers {
       mounts := podSpec.Containers[i].VolumeMounts
       for _, v := range additionalVolumes {
               mounts = append(mounts, v1.VolumeMount{
                       Name:      v.Name,
                       MountPath: v.MountPath,
                       SubPath:   v.SubPath,
                })
       }
       podSpec.Containers[i].VolumeMounts = mounts
}

Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?

Since the volume mounted is not managed by the operator itself, I don't think the should handle resizing, because we don't know of if the additional Volume is a ext4 or xfs. (ak resize2fs or xfsgrow)

Yes, this part is questionable, but I believe we still have to synchronize volumes via add/remove at the very least, just to give a possibility to manage them somehow via manifest.

I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).

Yes ! You're right. we should forbidden to override /var/lib/postgresql/data for mount math.

Right, and also crosscheck that "additional volumes" are not clashing between themselves.

Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

That's why CI tests are failing ?

It's not clear for me yet, but could be related. I would suggest first fix the build (or at least you can check if it's the same in your environment), and then we can take a look at the CI.

seuf · 2019-12-10T11:24:00Z

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

It is desired. In my use case I want to generate a certificate (.key and .pem) in an initContainer.
Then all the containers (including the Spilo one) are mounting an emptyDir /tls shared with the initContainer containing the certificates.
I also specify in the postgresql manifest that I override the ssl config :

example :

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: demo-postgresql
  namespace: postgresql
  labels:
    environment: demo
spec:
  dockerImage: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  initContainers:
  - name: certs
    image: my-cert-image:latest
    imagePullPolicy: Always
    volumeMounts:
      - mountPath: /tls
        name: tls
  volume:
    size: 100Gi
    storageClass: pd-ssd
  additionalVolumes:
    - name: tls
      mountPath: /tls
      volumeSource:
        emptyDir: {}
  postgresql:
    version: "11"
    parameters:
      ssl_ca_file: "/tls/pki.ca.crt"
      ssl_key_file: "/tls/postgresql.key"
      ssl_cert_file: "/tls/postgresql.crt"

I can also add a telegraf sidecar. It will have the same shared volume, used to authenticate to my postgresql in TLS.

erthalion · 2019-12-10T11:44:48Z

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

It is desired. In my use case I want to generate a certificate (.key and .pem) in an initContainer.
Then all the containers (including the Spilo one) are mounting an emptyDir /tls shared with the initContainer containing the certificates.

Interesting. Then it conflicts with the ideas in #625, since with this implementation one can't create e.g. a separate volume for one tablespace, while it would be mounted to all the sidecards. Is it possible to make this implementation more flexible to satisfy both use cases?

seuf · 2019-12-10T12:23:48Z

Yes, I can add a sidecarMount boolean in configuration !

zimbatm · 2019-12-10T14:15:24Z

@seuf: is your goal to handle custom TLS certs? Because I am implementing #690 which takes a slightly different approach and mounts on /tls

seuf · 2019-12-10T14:21:01Z

@seuf: is your goal to handle custom TLS certs? Because I am implementing #690 which takes a slightly different approach and mounts on /tls

Yes. My initContainer also generate a Diffie hellman paramater .pem file and fetch the ca cert from my existing pki.

@erthalion I've updated the PR with the sidecarMount boolean parameter.

zimbatm · 2019-12-10T14:38:25Z

@seuf what is your plan to manage certificate rotation?

seuf · 2019-12-10T15:00:05Z

@seuf what is your plan to manage certificate rotation?

We have long term certificates for stateful apps. So until the certificate expire, we can just bump the version of postgresql in the manifest and patroni will do the rolling update with new certificates :)

erthalion · 2019-12-11T19:58:10Z

Yes, I can add a sidecarMount boolean in configuration !

Thanks. But after thinking about this, I believe it would be less extensible. What about having an option e.g. target or something similar, that can have values either all, which means mount to all sidecars, or name of a particular sidecar to mount to? If missing, we can mount by default only to spilo container. In this schema your implementation would be not much more complicated, still suit your goals and be easily extensible via adding new values.

erthalion · 2019-12-18T10:23:59Z

Yes, I can add a sidecarMount boolean in configuration !

Thanks. But after thinking about this, I believe it would be less extensible. What about having an option e.g. target or something similar, that can have values either all, which means mount to all sidecars, or name of a particular sidecar to mount to? If missing, we can mount by default only to spilo container. In this schema your implementation would be not much more complicated, still suit your goals and be easily extensible via adding new values.

So, @seuf what do you think about this suggestion?

seuf · 2019-12-18T13:45:08Z

So, @seuf what do you think about this suggestion?
The idea of naming the target container where the additional volume will be mounted is nice. But that mean I need to handle comma separated list of containers in case i have multiples sidecar and i want to mount additional volume only in some of them. Also the tests will be more complicated..

erthalion · 2019-12-18T13:52:11Z

But that mean I need to handle comma separated list of containers in case i have multiples sidecar and i want to mount additional volume only in some of them.

Nope, for that purpose we will use special keyword all instead of a container name.

Also the tests will be more complicated..

I'm not asking for implementing the part about target container, you can leave it empty. You can leave the current logic, just transform schema and names a bit to allow this to be implemented in the future.

seuf · 2019-12-20T13:10:45Z

Hello @erthalion. I've updated the PR with targetContainers option in manifest. It takes an array of containers. if it contains allor is empty : additional volumes will be mounted to all the containers.

The CI still fail with The command "hack/verify-codegen.sh" exited with 1. . I've runned the ./hack/update-codegen.sh script but there is no changes to commit. Tests are ok in local.

frittentheke · 2020-01-10T10:20:54Z

@seuf thanks for implementing this.

If I am not mistaken, this could also allow to define an emptyDir to hold the PostgreSQL unix socket, then allowing a sidecar (i.e. postgres exporter) to access the database without doing full TCP, SSL and authentication simply to scrape some metrics.

frittentheke · 2020-03-27T15:31:05Z

@seuf thank you for this cool PR.

I pieced together some code and was about to create a PR myself just to allow sharing the PostgreSQL socket (/var/run/postgresql/) between the postgres container and i.e. a monitoring sidecar. But your approach is much better as it serves more than just this purpose.
Especially the generic approach to mount secrets, pvcs or just an emptydir is great and allows great customization without adding new fields to the CR.

Do you need any help / testing @seuf? Could this me merged for 1.5 @FxKu ?

FxKu · 2020-03-27T16:36:23Z

@frittentheke @seuf yeah, we have the plan to merge it for the next version. It overlapped with #798 so we had to pick one PR first, and went for TLS as it covered one specific use case. However if conflicts are removed, I see a good chance it can be merged. @erthalion how do you see it, as the reviewer?

…to be mounter or not

Check that there are no volume mount path clashes or "all" vs ["a", "b"] mixtures. Also change the default behaviour to mount to "postgres" container.

seuf · 2020-04-06T12:00:37Z

I've rebased this branch over master. But I can't run tests succefully.
there is an error in teams_test.go :(

FxKu · 2020-04-07T07:59:46Z

We switched to go 1.14 recently. I remember, I had to add quotes in teams_test.go. The only other error I see is e2e again, but that's something we need to fix.

FxKu · 2020-04-07T08:27:54Z

pkg/cluster/k8sres.go

 	volumes []v1.Volume,
+	additionalVolumes []acidv1.AdditionalVolume,


Passing volume[] twice is giving me some headache. The first on was introduced by @zimbatm for the TLS secrets. Maybe that could be mapped to an additionalVolumes too, or it's just renamed tlsVolumes. Anyway, can be done in an extra PR.

@FxKu -> would this mean that it's either TLS or additionalVolumes?
Our scenario would require both.

no I thought both can be mapped into one array. Hm, but maybe it's also better to keep it separated to highlight the special role of TLS secrets. At least, it has it's own field in the manifest.

@FxKu if there is a new functionality available fully covering a previously specially handled use case, there should be some sort of deprecation. This here seems to be one of those cases where the "old" functionality is simply extended upon and folks simply can move to the new, more feature rich way.

But this could also be the case for the PR implementing a broader sidecar support (#890). Why having multiple things doing the same thing and creating ever more logic to "merge" them and also having to maintain these increasingly complicated code paths.

The two have been merged - see below.
It can be pushed directly: #918
@FxKu and Christian - have a look.
If you want, we can also merge go via @seuf's branch seuf#1 instead - if/once it will be approved by @seuf.
It's same code in both cases.

FxKu · 2020-04-07T08:44:16Z

👍

erthalion · 2020-04-15T07:10:37Z

👍

FxKu · 2020-04-15T07:13:41Z

Thanks @seuf for your contribution. Also thanks @gertvdijk, @zimbatm, @frittentheke, @muff1nman
and @ReSearchITEng for your comments :)

seuf requested review from CyberDem0n, FxKu, Jan-M, RafiaSabih, avaczi, erthalion and sdudoladov as code owners November 26, 2019 14:17

erthalion self-assigned this Nov 26, 2019

gertvdijk reviewed Dec 4, 2019

View reviewed changes

docs/reference/cluster_manifest.md Outdated Show resolved Hide resolved

seuf force-pushed the additionnal-volumes-mount branch from 6938533 to bf4455e Compare December 5, 2019 13:30

FxKu mentioned this pull request Dec 9, 2019

Custom TLS secrets #690

Closed

seuf force-pushed the additionnal-volumes-mount branch 2 times, most recently from ef74187 to 03674b7 Compare December 9, 2019 16:43

seuf force-pushed the additionnal-volumes-mount branch from 52ab39b to cb941af Compare December 20, 2019 12:48

FxKu modified the milestones: 1.4, 1.5 Feb 21, 2020

fischerman mentioned this pull request Mar 30, 2020

Fully speced global sidecars #890

Merged

seuf and others added 8 commits March 30, 2020 12:27

Allow additional Volumes to be mounted

7b155ad

added TargetContainers option to determine if additional volume need …

e62242b

…to be mounter or not

fixed dependencies

07b90ea

updated manifest additional volume example

4639ea7

More validation

1916b5f

Check that there are no volume mount path clashes or "all" vs ["a", "b"] mixtures. Also change the default behaviour to mount to "postgres" container.

More documentation / example about additional volumes

316a047

Revert go.sum and go.mod from origin/master

1cf4cc5

Declare addictionalVolume specs in CRDs

e667d86

frittentheke mentioned this pull request Apr 1, 2020

add monitoring #264

Open

fixed k8sres after rebase

70eee0e

seuf force-pushed the additionnal-volumes-mount branch from 7bbb5e5 to 70eee0e Compare April 6, 2020 11:55

resolv conflict

556ffab

FxKu reviewed Apr 7, 2020

View reviewed changes

This was referenced Apr 15, 2020

Tls using additionalVolumes of pr736 #918

Closed

Tls using pr736 seuf/postgres-operator#1

Closed

FxKu merged commit ea3eef4 into zalando:master Apr 15, 2020

ReSearchITEng mentioned this pull request Apr 15, 2020

make tls pr798 use additionalVolumes capability from pr736 #920

Merged

This was referenced Apr 24, 2020

fix typo in additionalVolume struct #933

Merged

Typo in AdditionalVolume struct that makes it impossible to specify volumeSource in cluster manifest #934

Closed

frittentheke mentioned this pull request May 6, 2020

Share the PostgreSQL socket with the sidecars containers #962

Merged

		volumes []v1.Volume,
		additionalVolumes []acidv1.AdditionalVolume,

Additional volumes capability #736

Additional volumes capability #736

Uh oh!

Conversation

seuf commented Nov 26, 2019