Skip to content

Commit d014713

Browse files
[pip][design] PIP 289: Secure Pulsar Connector Configuration (apache#20903)
1 parent d6734b7 commit d014713

File tree

1 file changed

+242
-0
lines changed

1 file changed

+242
-0
lines changed

pip/pip-289.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# PIP 289: Secure Pulsar Connector Configuration
2+
# Background knowledge
3+
4+
Pulsar Sinks and Sources (a.k.a. Connectors) allow you to move data from a remote system into and out of a Pulsar cluster. These remote systems often require authentication, which requires secret management.
5+
6+
The current state of Pulsar Connector secret management is fragmented, is not documented in the "Pulsar IO" docs, and is not possible in certain cases. This PIP aims to address these issues through several changes.
7+
8+
The easiest way to show the current short comings is by way of example.
9+
10+
## Elasticsearch Example
11+
Here is the current way to deploy an Elasticsearch Sink without the use of plaintext secrets:
12+
13+
```shell
14+
$ bin/pulsar-admin sinks create \
15+
--tenant public \
16+
--namespace default \
17+
--sink-type elastic_search \
18+
--name elasticsearch-test-sink \
19+
--sink-config '{"elasticSearchUrl":"http://localhost:9200","indexName": "my_index"}' \
20+
--secrets '{"username": {"MY-K8S-SECRET-USERNAME": "secret-name"},"password": {"MY-K8S-SECRET-PASSWORD": "password123"}}'
21+
--inputs elasticsearch_test
22+
```
23+
24+
When run targetting Kubernetes, the above works by mounting secrets `MY-K8S-SECRET-USERNAME` and `MY-K8S-SECRET-PASSWORD` into the sink pod container as [environment variables](https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-functions/secrets/src/main/java/org/apache/pulsar/functions/secretsproviderconfigurator/KubernetesSecretsProviderConfigurator.java#L85-L99):
25+
26+
```shell
27+
username=secret-name
28+
password=password123
29+
```
30+
31+
Those environment variables are then [injected](https://github.com/apache/pulsar/blob/674655347da95305cf671f0696f113dcca88b44d/pulsar-io/common/src/main/java/org/apache/pulsar/io/common/IOConfigUtils.java#L67-L78) into the config when it is loaded at runtime based on [annotations](https://github.com/apache/pulsar/blob/b7eab9469177eda2c56e36bb9871aab48a17d4ec/pulsar-io/elastic-search/src/main/java/org/apache/pulsar/io/elasticsearch/ElasticSearchConfig.java#L99-L113) on the `ElasticSearchConfig`.
32+
33+
### Problem
34+
35+
The annotation approach, which is the only way to inject secrets into connectors, requires that all secret fields are annotated with `sensitive = true` and that all secret fields are at the top level of their configuration class. However, the Elasticsearch config contains an `ssl` field that has nested secrets. See:
36+
37+
```json
38+
{
39+
"elasticSearchUrl": "http://localhost:9200",
40+
"indexName": "my_index",
41+
"username": "username",
42+
"password": "password",
43+
"ssl": {
44+
"enabled": true,
45+
"truststorePath": "/pulsar/security/truststore.jks",
46+
"truststorePassword": "truststorepass",
47+
"keystorePath": "/pulsar/security/keystore.jks",
48+
"keystorePassword": "keystorepass"
49+
}
50+
}
51+
```
52+
53+
Because `truststorePassword` and `keystorePassword` are not at the top level, we do not currently have a secure way (i.e. non-plaintext) to configure those settings.
54+
55+
## RabbitMQ Example
56+
57+
Another relevant example shows how the Pulsar code base has not consistently implemented secret management for connectors. For the RabbitMQ Sink, the sensitive fields are [annotated correctly](https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQAbstractConfig.java#L61-L73), but the configuration is not loaded via the `IOConfigUtils#loadWithSecrets` method, which means the only way to load rabbit secrets is as plaintext values in the config.
58+
59+
## Kafka Connect Adapter Example
60+
61+
The final relevant example is the Kafka Connect Adapter. This adapter allows you to run Kafka Connectors in Pulsar Connectors. Because of the recursive nature of these connectors, the configuration for the wrapped connector is stored in a map named [kafkaConnectorConfigProperties](https://github.com/apache/pulsar/blob/55523ac8f31fd6d54aacba326edef1f53028877e/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarKafkaConnectSinkConfig.java#L59-L62). Because this field is an arbitrary map, we cannot rely on the Pulsar `sensitive` annotation flag to determine whether to load the secret when building the config class.
62+
63+
# Motivation
64+
65+
Increase Pulsar Function security by giving users a way to configure Pulsar Connectors with non-plaintext secrets.
66+
67+
The recent [CVE-2023-37579](https://github.com/apache/pulsar/wiki/CVE%E2%80%902023%E2%80%9037579) resulted in the potential to leak connector configurations. Because we do not always provide a way to configure connector configuration in the connector's secrets map, leaking the configuration meant leaking secrets.
68+
69+
# Goals
70+
71+
## In Scope
72+
73+
* Provide users with a secure way to configure official Pulsar Connectors as well as third party connectors.
74+
* Improve documentation to reflect the current state of secrets management in Pulsar Connectors.
75+
* Only sinks and sources will benefit from this change.
76+
* Only the `JavaInstanceRunnable` class will benefit from this change.
77+
78+
## Out of Scope
79+
80+
* This PIP will not prevent users from configuring secrets via insecure methods, such as plaintext configuration.
81+
* Functions are out of scope because they do not need arbitrary secret injection. Functions can already access secrets through the `Context#getSecret` method.
82+
* Python and Go Function Runtimes--sinks and sources are not typically written in these languages.
83+
84+
# High Level Design
85+
86+
* Add a new secrets injection mechanism which allows for arbitrary secret injection into the connector configuration at runtime.
87+
* Update existing, official connectors to properly use the already available secret injection mechanism.
88+
* Fix the documentation for the existing secrets management methods.
89+
90+
# Detailed Design
91+
92+
## Design & Implementation Details
93+
94+
In order to add a new way to inject, or interpolate, secrets, we need to add a new method to the `SecretsProvider` interface, which can be implemented by users, but is not exposed to function/connector runtimes. This new method will be used to first determine if a secret should be interpolated for a given value, and if so, return the interpolated value. If the value is not a secret, or the secret does not exist, the method will return `null` and no interpolation will occur. The notable difference for this method is that it does not have a "path" to the secret. Therefore, the existing `secrets` map might not apply for certain use cases. In the environment variable scenario, this is a natural fit because the `value` can be interpreted as the name of the environment variable. For usage of the new configuration mechanism, see the [cli](#cli) section.
95+
96+
In the event of a value collision between the old way and this new way to inject secrets, the old way will take precedence.
97+
98+
In order to add support for the existing `sensitive` annotation, I propose fixing all the connectors that have explicit secrets in their configurations.
99+
100+
Fixing the documentation will be a matter of updating the existing documentation to reflect the current state of the code.
101+
102+
## Public-facing Changes
103+
104+
### Public API
105+
106+
#### Add new method to SecretsProvider Interface
107+
108+
Add the following method to the `SecretsProvider` interface:
109+
110+
```java
111+
interface SecretsProvider {
112+
/**
113+
* If the passed value is formatted as a reference to a secret, as defined by the implementation, return the
114+
* referenced secret. If the value is not formatted as a secret reference or the referenced secret does not exist,
115+
* return null.
116+
*
117+
* @param value a config value that may be formatted as a reference to a secret
118+
* @return the materialized secret. Otherwise, null.
119+
*/
120+
default String interpolateSecretForValue(String value) {
121+
return null;
122+
}
123+
}
124+
```
125+
126+
There are only two official implementations of the `SecretProvider` interface. The `ClearTextSecretsProvider` and the `EnvironmentBasedSecretsProvider`. Given that the `ClearTextSecretsProvider` is only plaintext, it will not override the new method. Here is the proposed implementation for the `EnvironmentBasedSecretsProvider`:
127+
128+
```java
129+
public class EnvironmentBasedSecretsProvider implements SecretsProvider {
130+
/**
131+
* Pattern to match ${secretName} in the value.
132+
*/
133+
private static final Pattern interpolationPattern = Pattern.compile("\\$\\{(.+?)}");
134+
135+
@Override
136+
public String interpolateSecretForValue(String value) {
137+
Matcher m = interpolationPattern.matcher(value);
138+
if (m.matches()) {
139+
String secretName = m.group(1);
140+
// If the secret doesn't exist, we return null and don't override the current value.
141+
return provideSecret(secretName, null);
142+
}
143+
return null;
144+
}
145+
}
146+
```
147+
148+
### Binary protocol
149+
150+
No change.
151+
152+
### Configuration
153+
154+
There is no new configuration for this change. It is always enabled.
155+
156+
### CLI
157+
158+
* Here is the new way that users will map secrets into nested configs:
159+
160+
```bash
161+
$ bin/pulsar-admin sinks create \
162+
--tenant public \
163+
--namespace default \
164+
--sink-type elastic_search \
165+
--name elasticsearch-test-sink \
166+
--sink-config '{
167+
"elasticSearchUrl": "http://localhost:9200",
168+
"indexName": "my_index",
169+
"username": "${username}",
170+
"password": "${password}",
171+
"ssl": {
172+
"enabled": true,
173+
"truststorePath": "/pulsar/security/truststore.jks",
174+
"truststorePassword": "${truststorepass}",
175+
"keystorePath": "/pulsar/security/keystore.jks",
176+
"keystorePassword": "${keystorePassword}"
177+
}' \
178+
--secrets '{"username": {"MY-K8S-SECRET-USERNAME": "secret-name"},"password": {"MY-K8S-SECRET-PASSWORD": "password123"},"keystorePassword": {"MY-K8S-KEYSTORE-PASS": "xyz"},"truststorepass": {"MY-K8S-TRUSTSTORE-PASS": "abc"}}'
179+
--inputs elasticsearch_test
180+
```
181+
182+
### Metrics
183+
184+
No new metrics are added by this change.
185+
186+
# Monitoring
187+
188+
Not applicable.
189+
190+
# Security Considerations
191+
192+
The primary security consideration is whether there is any risk in giving users a way to interpolate environment variables into their connector. This change only affects the `EnvironmentBasedSecretsProvider`, which is only used by the Kubernetes Function runtime. As such, there are no environment variables to leak. Further, all connectors have access to their environment variables, so no additional risk is present.
193+
194+
# Backward & Forward Compatibility
195+
196+
## Revert
197+
198+
Reverting this change is as simple as downgrading the function worker and stopping then starting the function.
199+
200+
## Upgrade
201+
202+
Upgrade by upgrading the function worker and stopping then starting the function. Also, the user will need to update their connector configuration to use the new syntax.
203+
204+
# Alternatives
205+
206+
While exploring this PIP, I considered several alternatives.
207+
208+
### Merge Secret Map into Config Map
209+
210+
Attempt to merge all secrets configured for the connector into the connector's configuration. See https://github.com/apache/pulsar/pull/20863 for an example of this approach.
211+
212+
The primary issue with this design is the fact that the secrets map configured for a connector is of type `Map<String, Object>` where the keys are meant to be top level fields in the connector configuration and the values are paths to the secrets. As such, we cannot use the secrets map to recursively inject secrets into the config, which is a requirement for some connectors.
213+
214+
### Directly Inject Secrets into Config Map Based on Value Prefix
215+
216+
We could consider interpreting configuration values that start with a well known prefix, like `env:`, as values that need to be read from the environment. The primary drawback to this solution is that there is not an easy way to configure the function at this point in the code, which means that it is always on.
217+
218+
This solution would look something like adding this code block
219+
220+
```java
221+
// Replace environment variable pointers with their environment variable values
222+
for (Map.Entry<String, Object> entry : config.entrySet()) {
223+
if (entry.getValue() instanceof String && ((String) entry.getValue()).toLowerCase().startsWith("env:")) {
224+
String envVariableName = ((String) entry.getValue()).substring("env:".length());
225+
String envVariableValue = System.getenv(envVariableName);
226+
entry.setValue(envVariableValue);
227+
}
228+
}
229+
```
230+
231+
to this method: https://github.com/apache/pulsar/blob/f7c0b3c49c9ad8c28d0b00aa30d727850eb8bc04/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L884-L929.
232+
233+
# General Notes
234+
235+
# Links
236+
237+
* Initial Issue exploring this feature: https://github.com/apache/pulsar/issues/20862
238+
* PR for new interpolation feature: https://github.com/apache/pulsar/pull/20901
239+
* PR for correcting `sensitive` annotation flag handling: https://github.com/apache/pulsar/pull/20902
240+
* Rejected PR for merging secrets map into config map: https://github.com/apache/pulsar/pull/20863
241+
* Mailing List discussion thread: https://lists.apache.org/thread/xdmhp6zpwto2dyrf1xwk7fhd2cr69xtn
242+
* Mailing List voting thread: https://lists.apache.org/thread/ww88z811bpnzpcdf8popvg4njn6d07jt

0 commit comments

Comments
 (0)