- 1.121.0 (latest)
- 1.120.0
- 1.119.0
- 1.118.0
- 1.117.0
- 1.116.0
- 1.115.0
- 1.114.0
- 1.113.0
- 1.112.0
- 1.111.0
- 1.110.0
- 1.109.0
- 1.108.0
- 1.107.0
- 1.106.0
- 1.105.0
- 1.104.0
- 1.103.0
- 1.102.0
- 1.101.0
- 1.100.0
- 1.99.0
- 1.98.0
- 1.97.0
- 1.96.0
- 1.95.1
- 1.94.0
- 1.93.1
- 1.92.0
- 1.91.0
- 1.90.0
- 1.89.0
- 1.88.0
- 1.87.0
- 1.86.0
- 1.85.0
- 1.84.0
- 1.83.0
- 1.82.0
- 1.81.0
- 1.80.0
- 1.79.0
- 1.78.0
- 1.77.0
- 1.76.0
- 1.75.0
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
PrivateEndpoint(
    endpoint_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)Represents a Vertex AI PrivateEndpoint resource.
Classes
PrivateServiceConnectConfig
PrivateServiceConnectConfig(
    project_allowlist: typing.Optional[typing.Sequence[str]] = None,
)Represents a Vertex AI PrivateServiceConnectConfig resource.
Properties
create_time
Time this resource was created.
dedicated_endpoint_dns
The dedicated endpoint dns for this Endpoint.
This property is only available if dedicated endpoint is enabled. If dedicated endpoint is not enabled, this property returns None.
dedicated_endpoint_enabled
The dedicated endpoint is enabled for this Endpoint.
This property will be true if dedicated endpoint is enabled.
display_name
Display name of this resource.
encryption_spec
Customer-managed encryption key options for this Vertex AI resource.
If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.
explain_http_uri
HTTP path to send explain requests to, used when calling PrivateEndpoint.explain()
gca_resource
The underlying resource proto representation.
health_http_uri
HTTP path to send health check requests to, used when calling PrivateEndpoint.health_check()
labels
User-defined labels containing metadata about this resource.
Read more about labels at https://goo.gl/xmQnxf
name
Name of this resource.
network
The full name of the Google Compute Engine network to which this Endpoint should be peered.
Takes the format projects/{project}/global/networks/{network}. Where
{project} is a project number, as in 12345, and {network} is a network name.
Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.
predict_http_uri
HTTP path to send prediction requests to, used when calling PrivateEndpoint.predict()
preview
Return an Endpoint instance with preview features enabled.
private_service_connect_config
The Private Service Connect configuration for this Endpoint.
resource_name
Full qualified resource name.
traffic_split
A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.
If a DeployedModel's ID is not listed in this map, then it receives no traffic.
The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
update_time
Time this resource was last updated.
Methods
PrivateEndpoint
PrivateEndpoint(
    endpoint_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)Retrieves a PrivateEndpoint resource.
Example usage: my_private_endpoint = aiplatform.PrivateEndpoint( endpoint_name="projects/123/locations/us-central1/endpoints/1234567891234567890" )
or (when project and location are initialized)
my_private_endpoint = aiplatform.PrivateEndpoint(
    endpoint_name="1234567891234567890"
)
| Parameters | |
|---|---|
| Name | Description | 
| endpoint_name | strRequired. A fully-qualified endpoint resource name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/my_endpoint_id" or "my_endpoint_id" when project and location are initialized or passed. | 
| project | strOptional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used. | 
| location | strOptional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used. | 
| credentials | auth_credentials.CredentialsOptional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If the Endpoint being retrieved is not a PrivateEndpoint. | 
| ImportError | If there is an issue importing the urllib3package. | 
create
create(
    display_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    network: typing.Optional[str] = None,
    description: typing.Optional[str] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
    sync=True,
    private_service_connect_config: typing.Union[
        google.cloud.aiplatform.models.PrivateEndpoint.PrivateServiceConnectConfig,
        None,
        google.cloud.aiplatform_v1.types.service_networking.PrivateServiceConnectConfig,
    ] = None,
    enable_request_response_logging=False,
    request_response_logging_sampling_rate: typing.Optional[float] = None,
    request_response_logging_bq_destination_table: typing.Optional[str] = None,
    inference_timeout: typing.Optional[int] = None,
) -> google.cloud.aiplatform.models.PrivateEndpointCreates a new PrivateEndpoint.
Example usage: For PSA based private endpoint: my_private_endpoint = aiplatform.PrivateEndpoint.create( display_name="my_endpoint_name", project="my_project_id", location="us-central1", network="projects/123456789123/global/networks/my_vpc" )
or (when project and location are initialized)
my_private_endpoint = aiplatform.PrivateEndpoint.create(
    display_name="my_endpoint_name",
    network="projects/123456789123/global/networks/my_vpc"
)
For PSC based private endpoint: my_private_endpoint = aiplatform.PrivateEndpoint.create( display_name="my_endpoint_name", project="my_project_id", location="us-central1", private_service_connect=aiplatform.compat.types.service_networking.PrivateServiceConnectConfig( enable_private_service_connect=True, project_allowlist=["test-project"]), )
or (when project and location are initialized)
my_private_endpoint = aiplatform.PrivateEndpoint.create(
    display_name="my_endpoint_name",
    private_service_connect=aiplatform.compat.types.service_networking.PrivateServiceConnectConfig(
        enable_private_service_connect=True,
        project_allowlist=["test-project"]),
)
| Parameters | |
|---|---|
| Name | Description | 
| private_service_connect_config | typing.Union[google.cloud.aiplatform.models.PrivateEndpoint.PrivateServiceConnectConfig, NoneType, google.cloud.aiplatform_v1.types.service_networking.PrivateServiceConnectConfig](aiplatform.compat.types.service_networking.PrivateServiceConnectConfig): Private Service Connect Configuration | 
| request_response_logging_sampling_rate | floatOptional. The request response logging sampling rate. If not set, default is 0.0. | 
| request_response_logging_bq_destination_table | strOptional. The request response logging bigquery destination. If not set, will create a table with name:  | 
| inference_timeout | intOptional. It defines the prediction timeout, in seconds, for online predictions using cloud-based endpoints. This applies to either PSC endpoints, when private_service_connect_config is set, or dedicated endpoints, when dedicated_endpoint_enabled is true. | 
| display_name | strRequired. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters. | 
| project | strOptional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used. | 
| location | strOptional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used. | 
| network | strOptional. The full name of the Compute Engine network to which this Endpoint will be peered. E.g. "projects/123456789123/global/networks/my_vpc". Private services access must already be configured for the network. If left unspecified, the network set with aiplatform.init will be used. Cannot be set together with private_service_connect_config. | 
| description | strOptional. The description of the Endpoint. | 
| labels | Dict[str, str]Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. | 
| credentials | auth_credentials.CredentialsOptional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init. | 
| encryption_spec_key_name | strOptional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form:  | 
| sync | boolWhether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. | 
| enable_request_response_logging | boolOptional. Whether to enable request & response logging for this endpoint. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | A network must be instantiated when creating a | 
| PrivateEndpoint. | |
| Returns | |
|---|---|
| Type | Description | 
| endpoint (aiplatform.PrivateEndpoint) | Created endpoint. | 
delete
delete(force: bool = False, sync: bool = True) -> NoneDeletes this Vertex AI PrivateEndpoint resource. If force is set to True, all models on this PrivateEndpoint will be undeployed prior to deletion.
| Parameters | |
|---|---|
| Name | Description | 
| force | boolRequired. If force is set to True, all deployed models on this Endpoint will be undeployed first. Default is False. | 
| sync | boolWhether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. | 
| Exceptions | |
|---|---|
| Type | Description | 
| FailedPrecondition | If models are deployed on this Endpoint and force = False. | 
deploy
deploy(
    model: google.cloud.aiplatform.models.Model,
    deployed_model_display_name: typing.Optional[str] = None,
    machine_type: typing.Optional[str] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: typing.Optional[str] = None,
    accelerator_count: typing.Optional[int] = None,
    tpu_topology: typing.Optional[str] = None,
    service_account: typing.Optional[str] = None,
    explanation_metadata: typing.Optional[
        google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata
    ] = None,
    explanation_parameters: typing.Optional[
        google.cloud.aiplatform_v1.types.explanation.ExplanationParameters
    ] = None,
    metadata: typing.Optional[typing.Sequence[typing.Tuple[str, str]]] = (),
    sync=True,
    disable_container_logging: bool = False,
    traffic_percentage: typing.Optional[int] = 0,
    traffic_split: typing.Optional[typing.Dict[str, int]] = None,
    reservation_affinity_type: typing.Optional[str] = None,
    reservation_affinity_key: typing.Optional[str] = None,
    reservation_affinity_values: typing.Optional[typing.List[str]] = None,
    spot: bool = False,
    system_labels: typing.Optional[typing.Dict[str, str]] = None,
    required_replica_count: typing.Optional[int] = 0,
    autoscaling_target_cpu_utilization: typing.Optional[int] = None,
    autoscaling_target_accelerator_duty_cycle: typing.Optional[int] = None,
    autoscaling_target_request_count_per_minute: typing.Optional[int] = None,
    autoscaling_target_pubsub_num_undelivered_messages: typing.Optional[int] = None,
    autoscaling_pubsub_subscription_labels: typing.Optional[
        typing.Dict[str, str]
    ] = None,
) -> NoneDeploys a Model to the PrivateEndpoint.
Example Usage: PSA based private endpoint my_private_endpoint.deploy( model=my_model )
PSC based private endpoint
psc_endpoint.deploy(
    model=first_model,
)
psc_endpoint.deploy(
    model=second_model,
    traffic_percentage=50,
)
psc_endpoint.deploy(
    model=third_model,
    traffic_percentage={
        'first_model_id': 40,
        'second_model_id': 30,
        'third_model_id': 30
    },
)
| Parameters | |
|---|---|
| Name | Description | 
| deployed_model_display_name | strOptional. The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used. | 
| machine_type | strOptional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources. | 
| min_replica_count | intOptional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed. | 
| max_replica_count | intOptional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count. | 
| accelerator_type | strOptional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4 | 
| accelerator_count | intOptional. The number of accelerators to attach to a worker replica. | 
| tpu_topology | strOptional. The TPU topology to use for the DeployedModel. Required for CloudTPU multihost deployments. | 
| service_account | strThe service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the  | 
| explanation_metadata | aiplatform.explain.ExplanationMetadataOptional. Metadata describing the Model's input and output for explanation.  | 
| explanation_parameters | aiplatform.explain.ExplanationParametersOptional. Parameters to configure explaining for Model's predictions. For more details, see  | 
| metadata | Sequence[Tuple[str, str]]Optional. Strings which should be sent along with the request as metadata. | 
| traffic_percentage | intOptional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Defaults to 100 for PSA based private endpoint. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model's traffic. Should not be provided if traffic_split is provided. | 
| traffic_split | Dict[str, int]Optional. Only supported by PSC base private endpoint. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is "0". Should not be provided if traffic_percentage is provided. | 
| reservation_affinity_type | strOptional. The type of reservation affinity. One of NO_RESERVATION, ANY_RESERVATION, SPECIFIC_RESERVATION, SPECIFIC_THEN_ANY_RESERVATION, SPECIFIC_THEN_NO_RESERVATION | 
| reservation_affinity_key | strOptional. Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use  | 
| reservation_affinity_values | List[str]Optional. Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation. Format: 'projects/{project_id_or_number}/zones/{zone}/reservations/{reservation_name}' | 
| spot | boolOptional. Whether to schedule the deployment workload on spot VMs. | 
| system_labels | Dict[str, str]Optional. System labels to apply to Model Garden deployments. System labels are managed by Google for internal use only. | 
| required_replica_count | intOptional. Number of required available replicas for the deployment to succeed. This field is only needed when partial model deployment/mutation is desired, with a value greater than or equal to 1 and fewer than or equal to min_replica_count. If set, the model deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. | 
| model | aiplatform.ModelRequired. Model to be deployed. | 
| sync | boolWhether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. | 
direct_predict
direct_predict(
    inputs: typing.List,
    parameters: typing.Optional[typing.Dict] = None,
    timeout: typing.Optional[float] = None,
) -> google.cloud.aiplatform.models.PredictionMakes a direct (gRPC) prediction against this Endpoint for a pre-built image.
| Parameters | |
|---|---|
| Name | Description | 
| inputs | ListRequired. The inputs that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictOptional. The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| timeout | Optional[float]Optional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | The resulting prediction. | 
direct_predict_async
direct_predict_async(
    inputs: typing.List,
    *,
    parameters: typing.Optional[typing.Dict] = None,
    timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.PredictionMakes an asynchronous direct (gRPC) prediction against this Endpoint for a pre-built image.
Example usage:
response = await my_endpoint.direct_predict_async(inputs=[...])
my_predictions = response.predictions
```
| Parameters | |
|---|---|
| Name | Description | 
| inputs | ListRequired. The inputs that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictOptional. The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| timeout | Optional[float]Optional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | The resulting prediction. | 
direct_raw_predict
direct_raw_predict(
    method_name: str, request: bytes, timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.PredictionMakes a direct (gRPC) prediction request using arbitrary headers for a custom container.
Example usage:
my_endpoint = aiplatform.Endpoint(ENDPOINT_ID)
response = my_endpoint.direct_raw_predict(request=b'...')
```
| Parameters | |
|---|---|
| Name | Description | 
| method_name | strFully qualified name of the API method being invoked to perform prediction. | 
| request | bytesThe body of the prediction request in bytes. | 
| timeout | Optional[float]Optional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | The resulting prediction. | 
direct_raw_predict_async
direct_raw_predict_async(
    method_name: str, request: bytes, timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.PredictionMakes a direct (gRPC) prediction request for a custom container.
Example usage:
my_endpoint = aiplatform.Endpoint(ENDPOINT_ID)
response = await my_endpoint.direct_raw_predict(request=b'...')
```
| Parameters | |
|---|---|
| Name | Description | 
| method_name | strFully qualified name of the API method being invoked to perform prediction. | 
| request | bytesThe body of the prediction request in bytes. | 
| timeout | Optional[float]Optional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | The resulting prediction. | 
explain
explain()Make a prediction with explanations against this Endpoint.
Example usage: response = my_endpoint.explain(instances=[...]) my_explanations = response.explanations
| Parameters | |
|---|---|
| Name | Description | 
| instances | ListRequired. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictThe parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| deployed_model_id | strOptional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split. | 
| timeout | floatOptional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | Prediction with returned predictions, explanations, and Model ID. | 
explain_async
explain_async(
    instances: typing.List[typing.Dict],
    *,
    parameters: typing.Optional[typing.Dict] = None,
    deployed_model_id: typing.Optional[str] = None,
    timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.PredictionMake a prediction with explanations against this Endpoint.
Example usage:
response = await my_endpoint.explain_async(instances=[...])
my_explanations = response.explanations
```
| Parameters | |
|---|---|
| Name | Description | 
| instances | ListRequired. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictThe parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| deployed_model_id | strOptional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split. | 
| timeout | floatOptional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | Prediction with returned predictions, explanations, and Model ID. | 
health_check
health_check() -> boolMakes a request to this PrivateEndpoint's health check URI. Must be within network that this PrivateEndpoint is in. This is only supported by PSA based private endpoint.
Example Usage: if my_private_endpoint.health_check(): print("PrivateEndpoint is healthy!")
| Exceptions | |
|---|---|
| Type | Description | 
| RuntimeError | If a model has not been deployed a request cannot be made. | 
| RuntimeError | If the endpoint is PSC based private endpoint. | 
| Returns | |
|---|---|
| Type | Description | 
| bool | Checks if calls can be made to this PrivateEndpoint. | 
invoke
invoke(
    request_path: str,
    body: bytes,
    headers: typing.Dict[str, str],
    deployed_model_id: typing.Optional[str] = None,
    stream: bool = False,
    timeout: typing.Optional[float] = None,
    endpoint_override: typing.Optional[str] = None,
) -> typing.Iterator[bytes]Makes a prediction request for arbitrary paths.
Example usage: my_endpoint = aiplatform.PrivateEndpoint(ENDPOINT_ID) response = my_endpoint.invoke( request_path="/v1/chat/completions", body = json.dumps(DATA).encode("utf-8"), headers = {'Content-Type':'application/json'}, endpoint_override="10.128.0.3", ) status_code = response.status_code results = json.dumps(response.text)
for stream_response in my_endpoint.invoke(
    request_path="/v1/chat/completions",
    body = json.dumps(DATA).encode("utf-8"),
    headers = {'Content-Type':'application/json'},
    stream=True,
    endpoint_override="10.128.0.3",
):
    stream_response_text = stream_response.decode('utf-8')
| Parameters | |
|---|---|
| Name | Description | 
| request_path | strThe request url to the model server. The request path must be a string that starts with a forward slash. Root can't be accessed. | 
| body | bytesThe body of the prediction request in bytes. This must not exceed 1.5 mb per request. | 
| headers | Dict[str, str]The header of the request as a dictionary. There are no restrictions on the header. | 
| deployed_model_id | strOptional. If specified, this InvokeRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split. | 
| stream | boolIf set to True, streaming will be enabled. | 
| timeout | floatOptional. The timeout for this request in seconds. | 
| endpoint_override | Optional[str]The Private Service Connect endpoint's IP address or DNS that points to the endpoint's service attachment. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If a endpoint override is not provided for PSC based endpoint. | 
| ValueError | If a endpoint override is invalid for PSC based endpoint. | 
list
list(
    filter: typing.Optional[str] = None,
    order_by: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> typing.List[google.cloud.aiplatform.models.PrivateEndpoint]List all PrivateEndpoint resource instances.
Example Usage: my_private_endpoints = aiplatform.PrivateEndpoint.list()
or
my_private_endpoints = aiplatform.PrivateEndpoint.list(
    filter='labels.my_label="my_label_value" OR display_name=!"old_endpoint"',
)
| Parameters | |
|---|---|
| Name | Description | 
| filter | strOptional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported. | 
| order_by | strOptional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields:  | 
| project | strOptional. Project to retrieve list from. If not set, project set in aiplatform.init will be used. | 
| location | strOptional. Location to retrieve list from. If not set, location set in aiplatform.init will be used. | 
| credentials | auth_credentials.CredentialsOptional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init. | 
| Returns | |
|---|---|
| Type | Description | 
| List[models.PrivateEndpoint] | A list of PrivateEndpoint resource objects. | 
list_models
list_models() -> (
    typing.List[google.cloud.aiplatform_v1.types.endpoint.DeployedModel]
)Returns a list of the models deployed to this Endpoint.
| Returns | |
|---|---|
| Type | Description | 
| deployed_models (List[aiplatform.gapic.DeployedModel]) | A list of the models deployed in this Endpoint. | 
predict
predict(
    instances: typing.List,
    parameters: typing.Optional[typing.Dict] = None,
    endpoint_override: typing.Optional[str] = None,
) -> google.cloud.aiplatform.models.PredictionMake a prediction against this PrivateEndpoint using a HTTP request.
For PSA based private endpoint, this method must be called within the
network the PrivateEndpoint is peered to. Otherwise, the predict() call
will fail with error code 404. To check, use PrivateEndpoint.network.
For PSC based priviate endpoint, the project where caller credential are from must be allowlisted.
Example usage: PSA based private endpoint:
response = my_private_endpoint.predict(instances=[...], parameters={...})
my_predictions = response.predictions
PSC based private endpoint:
After creating PSC Endpoint pointing to the endpoint's
ServiceAttachment, use the PSC Endpoint IP Address or DNS as
endpoint_override.
psc_endpoint_address = "10.0.1.23"
or
psc_endpoint_address = "test.my.prediction"
response = my_private_endpoint.predict(instances=[...],
    endpoint_override=psc_endpoint_address)
my_predictions = response.predictions
| Parameters | |
|---|---|
| Name | Description | 
| instances | ListRequired. The instances that are the input to the prediction call. Instance types mut be JSON serializable. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictThe parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| endpoint_override | Optional[str]The Private Service Connect endpoint's IP address or DNS that points to the endpoint's service attachment. | 
| Exceptions | |
|---|---|
| Type | Description | 
| RuntimeError | If a model has not been deployed a request cannot be made for PSA based endpoint. | 
| ValueError | If a endpoint override is not provided for PSC based endpoint. | 
| ValueError | If a endpoint override is invalid for PSC based endpoint. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | Prediction object with returned predictions and Model ID. | 
predict_async
predict_async(
    instances: typing.List,
    *,
    parameters: typing.Optional[typing.Dict] = None,
    timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.PredictionMake an asynchronous prediction against this Endpoint. Example usage:
response = await my_endpoint.predict_async(instances=[...])
my_predictions = response.predictions
```
| Parameters | |
|---|---|
| Name | Description | 
| instances | ListRequired. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictOptional. The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| timeout | floatOptional. The timeout for this request in seconds. | 
| Returns | |
|---|---|
| Type | Description | 
| prediction (aiplatform.Prediction) | Prediction with returned predictions and Model ID. | 
raw_predict
raw_predict(
    body: bytes,
    headers: typing.Dict[str, str],
    endpoint_override: typing.Optional[str] = None,
) -> requests.models.ResponseMake a prediction request using arbitrary headers.
This method must be called within the network the PrivateEndpoint is peered to.
Otherwise, the predict() call will fail with error code 404. To check, use PrivateEndpoint.network.
Example usage: my_endpoint = aiplatform.PrivateEndpoint(ENDPOINT_ID)
# PSA based private endpint
response = my_endpoint.raw_predict(
    body = b'{"instances":[{"feat_1":val_1, "feat_2":val_2}]}',
    headers = {'Content-Type':'application/json'}
)
# PSC based private endpoint
response = my_endpoint.raw_predict(
    body = b'{"instances":[{"feat_1":val_1, "feat_2":val_2}]}',
    headers = {'Content-Type':'application/json'},
    endpoint_override = "10.1.0.23"
)
status_code = response.status_code
results = json.dumps(response.text)
| Parameters | |
|---|---|
| Name | Description | 
| body | bytesThe body of the prediction request in bytes. This must not exceed 1.5 mb per request. | 
| headers | Dict[str, str]The header of the request as a dictionary. There are no restrictions on the header. | 
| endpoint_override | Optional[str]The Private Service Connect endpoint's IP address or DNS that points to the endpoint's service attachment. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If a endpoint override is not provided for PSC based endpoint. | 
| ValueError | If a endpoint override is invalid for PSC based endpoint. | 
stream_direct_predict
stream_direct_predict(
    inputs_iterator: typing.Iterator[typing.List],
    parameters: typing.Optional[typing.Dict] = None,
    timeout: typing.Optional[float] = None,
) -> typing.Iterator[google.cloud.aiplatform.models.Prediction]Makes a streaming direct (gRPC) prediction against this Endpoint for a pre-built image.
| Parameters | |
|---|---|
| Name | Description | 
| inputs_iterator | Iterator[List]Required. An iterator of the inputs that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| parameters | DictOptional. The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata]  | 
| timeout | Optional[float] :Yields: *predictions (Iterator[aiplatform.Prediction])* -- The resulting streamed predictions.Optional. The timeout for this request in seconds. | 
stream_direct_raw_predict
stream_direct_raw_predict(
    method_name: str,
    requests: typing.Iterator[bytes],
    timeout: typing.Optional[float] = None,
) -> typing.Iterator[google.cloud.aiplatform.models.Prediction]Makes a direct (gRPC) streaming prediction request for a custom container.
Example usage:
my_endpoint = aiplatform.Endpoint(ENDPOINT_ID)
for stream_response in my_endpoint.stream_direct_raw_predict(
    request=b'...'
):
    yield stream_response
```
| Parameters | |
|---|---|
| Name | Description | 
| method_name | strFully qualified name of the API method being invoked to perform prediction. | 
| requests | Iterator[bytes]The body of the prediction requests in bytes. | 
| timeout | Optional[float] :Yields: *predictions (Iterator[aiplatform.Prediction])* -- The resulting streamed predictions.Optional. The timeout for this request in seconds. | 
stream_raw_predict
stream_raw_predict(
    body: bytes,
    headers: typing.Dict[str, str],
    endpoint_override: typing.Optional[str] = None,
) -> typing.Iterator[bytes]Make a streaming prediction request using arbitrary headers.
Example usage: my_endpoint = aiplatform.PrivateEndpoint(ENDPOINT_ID)
# Prepare the request body
request_body = json.dumps({...}).encode('utf-8')
# Define the headers
headers = {
    'Content-Type': 'application/json',
}
# Use stream_raw_predict to send the request and process the response
for stream_response in psc_endpoint.stream_raw_predict(
    body=request_body,
    headers=headers,
    endpoint_override="10.128.0.26"  # Replace with your actual endpoint
):
    stream_response_text = stream_response.decode('utf-8')
| Parameters | |
|---|---|
| Name | Description | 
| body | bytesThe body of the prediction request in bytes. This must not exceed 10 mb per request. | 
| headers | Dict[str, str]The header of the request as a dictionary. There are no restrictions on the header. | 
| endpoint_override | Optional[str] :Yields: *predictions (Iterator[bytes])* -- The streaming prediction results as lines of bytes.The Private Service Connect endpoint's IP address or DNS that points to the endpoint's service attachment. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If a endpoint override is not provided for PSC based endpoint. | 
| ValueError | If a endpoint override is invalid for PSC based endpoint. | 
to_dict
to_dict() -> typing.Dict[str, typing.Any]Returns the resource proto as a dictionary.
undeploy
undeploy(
    deployed_model_id: str,
    sync=True,
    traffic_split: typing.Optional[typing.Dict[str, int]] = None,
) -> NoneUndeploys a deployed model from the PrivateEndpoint.
Example Usage: PSA based private endpoint: my_private_endpoint.undeploy( deployed_model_id="1234567891232567891" )
or
my_deployed_model_id = my_private_endpoint.list_models()[0].id
my_private_endpoint.undeploy(
    deployed_model_id=my_deployed_model_id
)
| Parameters | |
|---|---|
| Name | Description | 
| traffic_split | Dict[str, int]Optional. Only supported by PSC based private endpoint. A map of DeployedModel IDs to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. Required if undeploying a model with non-zero traffic from an Endpoint with multiple deployed models. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. If a DeployedModel's ID is not listed in this map, then it receives no traffic. | 
| deployed_model_id | strRequired. The ID of the DeployedModel to be undeployed from the PrivateEndpoint. Use PrivateEndpoint.list_models() to get the deployed model ID. | 
| sync | boolWhether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. | 
undeploy_all
undeploy_all(sync: bool = True) -> google.cloud.aiplatform.models.PrivateEndpointUndeploys every model deployed to this PrivateEndpoint.
| Parameter | |
|---|---|
| Name | Description | 
| sync | boolWhether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. | 
update
update(
    display_name: typing.Optional[str] = None,
    description: typing.Optional[str] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    traffic_split: typing.Optional[typing.Dict[str, int]] = None,
    request_metadata: typing.Optional[typing.Sequence[typing.Tuple[str, str]]] = (),
    update_request_timeout: typing.Optional[float] = None,
) -> google.cloud.aiplatform.models.PrivateEndpointUpdates a PrivateEndpoint.
Example usage: PSC based private endpoint
my_endpoint = my_endpoint.update(
    display_name='my-updated-endpoint',
    description='my updated description',
    labels={'key': 'value'},
    traffic_split={
        '123456': 20,
        '234567': 80,
    },
)
| Parameters | |
|---|---|
| Name | Description | 
| display_name | strOptional. The display name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters. | 
| description | strOptional. The description of the Endpoint. | 
| labels | Dict[str, str]Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. | 
| traffic_split | Dict[str, int]Optional. Only supported by PSC based private endpoint A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment. | 
| request_metadata | Sequence[Tuple[str, str]]Optional. Strings which should be sent along with the request as metadata. | 
| update_request_timeout | floatOptional. The timeout for the update request in seconds. | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If traffic_splitis set for PSA based private endpoint. | 
| Returns | |
|---|---|
| Type | Description | 
| Endpoint (aiplatform.Prediction) | Updated endpoint resource. | 
wait
wait()Helper method that blocks until all futures are complete.