Skip to content

The Databricks Integration collects telemetry from the Databricks Data Intelligence Platform that can be used to troubleshoot and optimize Databricks workloads.

License

Notifications You must be signed in to change notification settings

newrelic/newrelic-databricks-integration

Community Project header

GitHub forks GitHub stars GitHub watchers

GitHub all releases GitHub release (latest by date) GitHub last commit GitHub Release Date

GitHub issues GitHub issues closed GitHub pull requests GitHub pull requests closed

Databricks Integration

The Databricks Integration is a standalone application that collects telemetry from the Databricks Data Intelligence Platform, to be used in troubleshooting and optimizing Databricks workloads.

The integration collects the following types of telemetry:

  • Apache Spark application metrics, such as Spark executor memory and cpu metrics, durations of Spark jobs, durations and I/O metrics of Spark stages and tasks, and Spark RDD memory and disk metrics
  • Databricks Lakeflow job run metrics, such as durations, start and end times, and termination codes and types for job and task runs.
  • Databricks Lakeflow Declarative Pipeline update metrics, such as durations, start and end times, and completion status for updates and flows.
  • Databricks Lakeflow Declarative Pipeline event logs
  • Databricks query metrics, including execution times and query I/O metrics.
  • Databricks cluster health metrics and logs, such as driver and worker memory and cpu metrics and driver and executor logs.
  • Databricks consumption and cost data that can be used to show DBU consumption and estimated Databricks costs.

Usage Guide

To get up and running quickly, refer to the Getting Started section; for comprehensive usage details, review the additional sections linked below.

Getting Started

Follow the steps below to get started with the Databricks Integration quickly.

1. Install the integration

Follow the steps to deploy the integration to a Databricks cluster.

2. Verify the installation

Once the Databricks Integration has run for a few minutes, use the query builder in New Relic to run the following query, replacing [YOUR_CLUSTER_NAME] with the name of the Databricks cluster where the integration was installed (note that if your cluster name includes a ', you must escape it with a \):

SELECT uniqueCount(executorId) AS Executors FROM SparkExecutorSample WHERE databricksclustername = '[YOUR_CLUSTER_NAME]'

The result of the query should be a number greater than zero.

3. Import the example dashboards (optional)

To help you get started using the collected telemetry, example dashboards have been provided that can be imported into New Relic.

To use these dashboards, follow the instructions found in Import the Example Dashboards.

Support

New Relic has open-sourced this project. This project is provided AS-IS WITHOUT WARRANTY OR DEDICATED SUPPORT. Issues and contributions should be reported to the project here on GitHub.

We encourage you to bring your experiences and questions to the Explorers Hub where our community members collaborate on solutions and new ideas.

Privacy

At New Relic we take your privacy and the security of your information seriously, and are committed to protecting your information. We must emphasize the importance of not sharing personal data in public forums, and ask all users to scrub logs and diagnostic information for sensitive information, whether personal, proprietary, or otherwise.

We define “Personal Data” as any information relating to an identified or identifiable individual, including, for example, your name, phone number, post code or zip code, Device ID, IP address, and email address.

For more information, review New Relic’s General Data Privacy Notice.

Contribute

We encourage your contributions to improve this project! Keep in mind that when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project.

If you have any questions, or to execute our corporate CLA (which is required if your contribution is on behalf of a company), drop us an email at [email protected].

If you would like to contribute to this project, please review the standards outlined in Contribute to the Integration, as well as these guidelines.

A note about vulnerabilities

As noted in our security policy, New Relic is committed to the privacy and security of our customers and their data. We believe that providing coordinated disclosure by security researchers and engaging with the security community are important means to achieve our security goals.

If you believe you have found a security vulnerability in this project or any of New Relic's products or websites, we welcome and greatly appreciate you reporting it to New Relic through our bug bounty program.

License

The Databricks Integration project is licensed under the Apache 2.0 License.

About

The Databricks Integration collects telemetry from the Databricks Data Intelligence Platform that can be used to troubleshoot and optimize Databricks workloads.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages