Skip to content

Conversation

@seuf
Copy link
Contributor

@seuf seuf commented Nov 26, 2019

Additional Volumes

This pull request allow operator to mount additional volumes in the postgresql Statefulset.
a new key is available in the postgresql manifest : additionalVolumes.
For each volume listed, a volume will be added to the podSpec and a a volumeMount to each container of the pod.
Each item must contain a name, mountPath and volumeSource definition.
The volumeSource must be a kubernetes volume declaration. This alow you to mount existing persitentVolumeClaim, configMap or a shared emptyDir with the initContainer.

Example :

  additionalVolumes:
    - name: data
      mountPath: /var/lib/postgresql
      volumeSource:
        PersistentVolumeClaim:
          claimName: pvc-postgresql-data
          readyOnly: false
    - name: tmp
      mountPath: /tmp
      subPath: foo.txt
      volumeSource:
        configMap:
          name: my-config-map
    - name: empty
      mountPath: /opt/empty
      volumeSource:
        emptyDir: {}

Related issues :

@seuf seuf force-pushed the additionnal-volumes-mount branch from 6938533 to bf4455e Compare December 5, 2019 13:30
@erthalion
Copy link
Contributor

erthalion commented Dec 6, 2019

Thanks for the PR and sorry for late reply. I like the idea, since I'm almost sure it can enable more interesting approaches. Few commentaries about the current implementation:

  • It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.

  • Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?

  • I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).

  • This change seems unrelated:

@@ -1334,11 +1366,11 @@ func (c *Cluster) generateCloneEnvironment(description *acidv1.CloneDescription)
                        c.logger.Info(msg, description.S3WalPath)

                        envs := []v1.EnvVar{
-                               v1.EnvVar{
+                               {
                                        Name:  "CLONE_WAL_S3_BUCKET",
                                        Value: c.OpConfig.WALES3Bucket,
                                },
-                               v1.EnvVar{
+                               {
                                        Name:  "CLONE_WAL_BUCKET_SCOPE_SUFFIX",
                                        Value: getBucketScopeSuffix(description.UID),
                                },
  • It would be nice to see a few simple unit tests for this functionality.

  • Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

@FxKu FxKu mentioned this pull request Dec 9, 2019
@seuf
Copy link
Contributor Author

seuf commented Dec 9, 2019

* It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.

I used the same approach as the secrets. A sidecar pod can need the additional volume too. For example, I can use an initContainer which grab a new certificate from my PKI and share it via emptyDir with the Spilo container and the telegraf sidecar (for tls auth). (that's typically why I'm doing this PR)

* Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?

Since the volume mounted is not managed by the operator itself, I don't think the should handle resizing, because we don't know of if the additional Volume is a ext4 or xfs. (ak resize2fs or xfsgrow)

* I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).

Yes ! You're right. we should forbidden to override /var/lib/postgresql/data for mount math.

* This change seems unrelated:

Yes, I know It's because I've run a go fmt . I can unstash this.

* It would be nice to see a few simple unit tests for this functionality.

I'll try to add one.

* Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

That's why CI tests are failing ?

@seuf seuf force-pushed the additionnal-volumes-mount branch 2 times, most recently from ef74187 to 03674b7 Compare December 9, 2019 16:43
@erthalion
Copy link
Contributor

  • It's not clear for me, why we mount all the extra volumes to all the containers on a pod? It's not like secrets, so most likely only one container will need one volume. Maybe it makes sense to mount this only for Spilo container.

I used the same approach as the secrets. A sidecar pod can need the additional volume too. For example, I can use an initContainer which grab a new certificate from my PKI and share it via emptyDir with the Spilo container and the telegraf sidecar (for tls auth). (that's typically why I'm doing this PR)

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

for i := range podSpec.Containers {
       mounts := podSpec.Containers[i].VolumeMounts
       for _, v := range additionalVolumes {
               mounts = append(mounts, v1.VolumeMount{
                       Name:      v.Name,
                       MountPath: v.MountPath,
                       SubPath:   v.SubPath,
                })
       }
       podSpec.Containers[i].VolumeMounts = mounts
}
  • Currently only adding is implemented, what do we need to do to synchronize those volumes? I guess, we can add/remove volumes mounts, and probably for PersistentVolumeClaims it makes sense also enable resizing as it's implemented for the main volume. What do you think?

Since the volume mounted is not managed by the operator itself, I don't think the should handle resizing, because we don't know of if the additional Volume is a ext4 or xfs. (ak resize2fs or xfsgrow)

Yes, this part is questionable, but I believe we still have to synchronize volumes via add/remove at the very least, just to give a possibility to manage them somehow via manifest.

  • I guess we also need to do at least some minimal verification that provided mounts do not clash in any way (including the default mount for pgdata).

Yes ! You're right. we should forbidden to override /var/lib/postgresql/data for mount math.

Right, and also crosscheck that "additional volumes" are not clashing between themselves.

  • Something is wrong with the dependencies, it doesn't build from PR branch for me, which is not the case on the master. Let's fix it and see how to make this feature complete :)

That's why CI tests are failing ?

It's not clear for me yet, but could be related. I would suggest first fix the build (or at least you can check if it's the same in your environment), and then we can take a look at the CI.

@seuf
Copy link
Contributor Author

seuf commented Dec 10, 2019

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

It is desired. In my use case I want to generate a certificate (.key and .pem) in an initContainer.
Then all the containers (including the Spilo one) are mounting an emptyDir /tls shared with the initContainer containing the certificates.
I also specify in the postgresql manifest that I override the ssl config :

example :

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: demo-postgresql
  namespace: postgresql
  labels:
    environment: demo
spec:
  dockerImage: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  initContainers:
  - name: certs
    image: my-cert-image:latest
    imagePullPolicy: Always
    volumeMounts:
      - mountPath: /tls
        name: tls
  volume:
    size: 100Gi
    storageClass: pd-ssd
  additionalVolumes:
    - name: tls
      mountPath: /tls
      volumeSource:
        emptyDir: {}
  postgresql:
    version: "11"
    parameters:
      ssl_ca_file: "/tls/pki.ca.crt"
      ssl_key_file: "/tls/postgresql.key"
      ssl_cert_file: "/tls/postgresql.crt"

I can also add a telegraf sidecar. It will have the same shared volume, used to authenticate to my postgresql in TLS.

@erthalion
Copy link
Contributor

If I understand correctly from this part, if you define a volume for one sidecar, it will be also mounted to all other containers (including the spilo container), which is probably undesired. Or am I missing something?

It is desired. In my use case I want to generate a certificate (.key and .pem) in an initContainer.
Then all the containers (including the Spilo one) are mounting an emptyDir /tls shared with the initContainer containing the certificates.

Interesting. Then it conflicts with the ideas in #625, since with this implementation one can't create e.g. a separate volume for one tablespace, while it would be mounted to all the sidecards. Is it possible to make this implementation more flexible to satisfy both use cases?

@seuf
Copy link
Contributor Author

seuf commented Dec 10, 2019

Yes, I can add a sidecarMount boolean in configuration !

@zimbatm
Copy link
Contributor

zimbatm commented Dec 10, 2019

@seuf: is your goal to handle custom TLS certs? Because I am implementing #690 which takes a slightly different approach and mounts on /tls

@seuf
Copy link
Contributor Author

seuf commented Dec 10, 2019

@seuf: is your goal to handle custom TLS certs? Because I am implementing #690 which takes a slightly different approach and mounts on /tls

Yes. My initContainer also generate a Diffie hellman paramater .pem file and fetch the ca cert from my existing pki.

@erthalion I've updated the PR with the sidecarMount boolean parameter.

@zimbatm
Copy link
Contributor

zimbatm commented Dec 10, 2019

@seuf what is your plan to manage certificate rotation?

@seuf
Copy link
Contributor Author

seuf commented Dec 10, 2019

@seuf what is your plan to manage certificate rotation?

We have long term certificates for stateful apps. So until the certificate expire, we can just bump the version of postgresql in the manifest and patroni will do the rolling update with new certificates :)

@erthalion
Copy link
Contributor

Yes, I can add a sidecarMount boolean in configuration !

Thanks. But after thinking about this, I believe it would be less extensible. What about having an option e.g. target or something similar, that can have values either all, which means mount to all sidecars, or name of a particular sidecar to mount to? If missing, we can mount by default only to spilo container. In this schema your implementation would be not much more complicated, still suit your goals and be easily extensible via adding new values.

@erthalion
Copy link
Contributor

Yes, I can add a sidecarMount boolean in configuration !

Thanks. But after thinking about this, I believe it would be less extensible. What about having an option e.g. target or something similar, that can have values either all, which means mount to all sidecars, or name of a particular sidecar to mount to? If missing, we can mount by default only to spilo container. In this schema your implementation would be not much more complicated, still suit your goals and be easily extensible via adding new values.

So, @seuf what do you think about this suggestion?

@seuf
Copy link
Contributor Author

seuf commented Dec 18, 2019

So, @seuf what do you think about this suggestion?
The idea of naming the target container where the additional volume will be mounted is nice. But that mean I need to handle comma separated list of containers in case i have multiples sidecar and i want to mount additional volume only in some of them. Also the tests will be more complicated..

@erthalion
Copy link
Contributor

But that mean I need to handle comma separated list of containers in case i have multiples sidecar and i want to mount additional volume only in some of them.

Nope, for that purpose we will use special keyword all instead of a container name.

Also the tests will be more complicated..

I'm not asking for implementing the part about target container, you can leave it empty. You can leave the current logic, just transform schema and names a bit to allow this to be implemented in the future.

@seuf seuf force-pushed the additionnal-volumes-mount branch from 52ab39b to cb941af Compare December 20, 2019 12:48
@seuf
Copy link
Contributor Author

seuf commented Dec 20, 2019

Hello @erthalion. I've updated the PR with targetContainers option in manifest. It takes an array of containers. if it contains allor is empty : additional volumes will be mounted to all the containers.

The CI still fail with The command "hack/verify-codegen.sh" exited with 1. . I've runned the ./hack/update-codegen.sh script but there is no changes to commit. Tests are ok in local.

@frittentheke
Copy link
Contributor

@seuf thanks for implementing this.

If I am not mistaken, this could also allow to define an emptyDir to hold the PostgreSQL unix socket, then allowing a sidecar (i.e. postgres exporter) to access the database without doing full TCP, SSL and authentication simply to scrape some metrics.

@FxKu FxKu modified the milestones: 1.4, 1.5 Feb 21, 2020
@frittentheke
Copy link
Contributor

@seuf thank you for this cool PR.

I pieced together some code and was about to create a PR myself just to allow sharing the PostgreSQL socket (/var/run/postgresql/) between the postgres container and i.e. a monitoring sidecar. But your approach is much better as it serves more than just this purpose.
Especially the generic approach to mount secrets, pvcs or just an emptydir is great and allows great customization without adding new fields to the CR.

Do you need any help / testing @seuf? Could this me merged for 1.5 @FxKu ?

@FxKu
Copy link
Member

FxKu commented Mar 27, 2020

@frittentheke @seuf yeah, we have the plan to merge it for the next version. It overlapped with #798 so we had to pick one PR first, and went for TLS as it covered one specific use case. However if conflicts are removed, I see a good chance it can be merged. @erthalion how do you see it, as the reviewer?

@frittentheke frittentheke mentioned this pull request Apr 1, 2020
@seuf seuf force-pushed the additionnal-volumes-mount branch from 7bbb5e5 to 70eee0e Compare April 6, 2020 11:55
@seuf
Copy link
Contributor Author

seuf commented Apr 6, 2020

I've rebased this branch over master. But I can't run tests succefully.
there is an error in teams_test.go :(

@FxKu
Copy link
Member

FxKu commented Apr 7, 2020

We switched to go 1.14 recently. I remember, I had to add quotes in teams_test.go. The only other error I see is e2e again, but that's something we need to fix.

Comment on lines 516 to +517
volumes []v1.Volume,
additionalVolumes []acidv1.AdditionalVolume,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing volume[] twice is giving me some headache. The first on was introduced by @zimbatm for the TLS secrets. Maybe that could be mapped to an additionalVolumes too, or it's just renamed tlsVolumes. Anyway, can be done in an extra PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FxKu -> would this mean that it's either TLS or additionalVolumes?
Our scenario would require both.

Copy link
Member

@FxKu FxKu Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I thought both can be mapped into one array. Hm, but maybe it's also better to keep it separated to highlight the special role of TLS secrets. At least, it has it's own field in the manifest.

Copy link
Contributor

@frittentheke frittentheke Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FxKu if there is a new functionality available fully covering a previously specially handled use case, there should be some sort of deprecation. This here seems to be one of those cases where the "old" functionality is simply extended upon and folks simply can move to the new, more feature rich way.

But this could also be the case for the PR implementing a broader sidecar support (#890). Why having multiple things doing the same thing and creating ever more logic to "merge" them and also having to maintain these increasingly complicated code paths.

Copy link
Contributor

@ReSearchITEng ReSearchITEng Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two have been merged - see below.
It can be pushed directly: #918
@FxKu and Christian - have a look.
If you want, we can also merge go via @seuf's branch seuf#1 instead - if/once it will be approved by @seuf.
It's same code in both cases.

@FxKu
Copy link
Member

FxKu commented Apr 7, 2020

👍

@erthalion
Copy link
Contributor

👍

@FxKu FxKu merged commit ea3eef4 into zalando:master Apr 15, 2020
@FxKu
Copy link
Member

FxKu commented Apr 15, 2020

Thanks @seuf for your contribution. Also thanks @gertvdijk, @zimbatm, @frittentheke, @muff1nman
and @ReSearchITEng for your comments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants