AI Driven Automation in Open-Source Metadata Platforms: Embedding an MCP Server

Hello PyCon Ireland! Thank you for joining our training session! You can find the contents below, please let me know if there's anything you need!

Note - This training was prepared using a MacBook

Prerequisites

Before getting started, please make sure you have the following three services on your laptop:

node - on a MacBook, you might have to run xcode-select --install before installing Node
Docker Desktop 4.49.0 - there are open-source alternatives to Docker, like Podman, that are great, please do not use them for this workshop!
goose Desktop 1.12.0 - Desktop, not goose CLI

This workshop is bring-you-own-agent, and you will need API key for it, almost any AI Agent will do!

OpenMetadata

Installing OpenMetadata

With the prerequisites installed, we will move on to installing OpenMetadata. OpenMetadata is an open-source metadata platform for data discovery, observability and governance! If you have any questions about OpenMetadata, please ask! We will be installing OpenMetadata along with its supporting components:

Airflow - Which orchestrates ingestion jobs that bring new metadata into OpenMetadata and keeps it up-to-date as data systems change
Elasticsearch - Search indexing to retrieve OpenMetadata assets
PostgreSQL - Stores and maintain state for OpenMetadata assets

We'll bring all these services online with the following commands:

curl -sL -o docker-compose-postgres.yml https://github.com/open-metadata/OpenMetadata/releases/download/1.10.4-release/docker-compose-postgres.yml
docker compose -f docker-compose-postgres.yml up --detach

Once OpenMetadata is ready, run

curl -fsSL https://raw.githubusercontent.com/open-metadata/openmetadata-demo/main/postgres/docker/postgres-script.sql | docker exec -i openmetadata_postgresql psql -U postgres -d postgres


Welcome to OpenMetadata!

Adding postgreSQL to OpenMetadata

Adding a connector in OpenMetadata is easy, we've already loaded some sample data into the postgreSQL database OpenMetadata is using to manage asset states, so we will use that, but you can just as easily connect to cloud data services like Snowflake, RedShift, BigQuery, and Databricks.

Go to OpenMetadata
Login
- Email: [email protected]
- Password: admin
Go to Settings -> Services -> Databases -> and select Add New Service
Select Postgres, then Next
Enter the Service Name as postgres with the following Connection Details:
- Username: openmetadata_user
- Auth Configuration Type: Basic Auth
- Password: openmetadata_password
- Host and Port: postgresql:5432
- Database: openmetadata_db
- Enable Ingest All Databases
- Select Next
- No edits are needed in the filters page, scroll down and select Save


Adding a postgres connector to OpenMetadata

Adding the OpenMetadata MCP Server to goose

An OpenMetadata Personal Access Token (PAT) will be needed to add OpenMetadata to goose. From here, select Generate New Token


An OpenMetadata PAT is needed to use it in goose

Copy this token to paste into goose later.

With OpenMetadata up and running, we can add it's MCP server as a goose extension! Open goose, select Extensions, then +Add custom extension

Please create your OpenMetadata Extension with the following options:

Extension Name: openmetadata
Type: STDIO
Description:
Command: npx -y mcp-remote http://localhost:8585/mcp --auth-server-url=http://localhost:8585/mcp --client-id=openmetadata --verbose --clean --header Authorization:${AUTH_HEADER}
Timeout: 300
Environment Variables
Variable name: AUTH_HEADER

Value:

Bearer <PASTE_YOUR_OpenMetadata_TOKEN_HERE>

Select +Add
Select Save Changes


OpenMetadata MCP Server in goose

goose 🎉

Now we'll recreate one of the usecases we just saw from the community!

In our sample data schema, you will see 7 tables. We will add some classifications to this schema and have an AI agent push those changes to every table.

In OpenMetadata
- Go to the public databaseSchema
- Select the Edit Certification button
- Select Gold
- Select ✅ to apply this certification to the schema
In goose
- Go to the Use OpenMetadata goose Recipe
- Scroll down to Launch in Goose Desktop, and paste your fqn postgres.postgres.public into the new goose session!
Back in OpenMetadata
- Tables should now have the same Certification!

Feel free to experiment with OpenMetadata, OpenMetadata MCP, and goose!

Integrating Python via Jupyter MCP

For this lab, we are going to create a virtual environment so that everyone can work from the same Python.

python3 -m venv pycon
source pycon/bin/activate

From the pycon virtual environment, run:

pip install jupyterlab==4.4.1 jupyter-collaboration==4.0.2 jupyter-mcp-tools==0.1.3 ipykernel uv
pip uninstall -y pycrdt datalayer_pycrdt
pip install datalayer_pycrdt==0.12.17
jupyter lab --port 8888 --IdentityProvider.token pycon --ip 0.0.0.0

This will start a JupyterLab instance at http://localhost:8888/, if you are prompt for a password, enter pycon.

Adding JupyterLab to goose

Just like OpenMetadata, we will add JupyterLab as an extension to goose with the following options:

Extension Name: jupyter
Type: STDIO
Description:
Command: uvx jupyter-mcp-server@latest
Timeout: 300
Environment Variables
- Variable name: JUPYTER_URL
- Value: "http://localhost:8888"
- Variable name: JUPYTER_TOKEN
  - Value: pycon
- Variable name: ALLOW_IMG_OUTPUT
  - Value: true
- Make sure to Select +Add for each Environment Variable
Select Save Changes


Extension details for Jupyter MCP Server

We can now use the JupyterLab and OpenMetadata MCP Servers together in goose!

In goose, prompt

How many tables are in postgres.postgres.public?

then,

How many tables are in postgres.postgres.public, postgres.airflow_db.public, and postgres.openmetadata_db.public

to combine these results with the Jupyter MCP server:

Create a new notebook pycon.ipynb and build a visualization with the table counts for each postgres database


Combining MCP Servers from OpenMetadata and Jupyter!

Scaling out with Collate

The OpenMetadata Sandbox is an OpenMetadata instance hosted and curated by Collate. We can use it for a better look at combining OpenMetadata and Jupyter MCP servers. Log into the sandbox, and generate a Personal Access Token for yourself, just like before, and add one more extension to goose.

Extension Name: collate
Type: STDIO
Description:
Command: npx -y mcp-remote https://sandbox.open-metadata.org/mcp --auth-server-url=https://sandbox.open-metadata.org/mcp --client-id=collate --verbose --clean --header Authorization:${COLLATE_AUTH_HEADER}
Timeout: 300
Environment Variables
Variable name: COLLATE_AUTH_HEADER
Value:
```
Bearer <PASTE_YOUR_collate_TOKEN_HERE>
```
Select +Add, then Save Changes


Adding the OpenMetadata Sandbox to goose

For a model to be able to easily differentiate between this OpenMetadata and the one on your laptop, we have named it Collate. Now you can try the following prompts:

what is the count of all assets in collate?

or:

How many assets have gold certifications, silver certifications, and bronze certifications?

and to combine it with the Jupyter MCP server:

Create a new notebook collate.ipynb and build a visualization with the asset counts by type in one cell and the assets counts by certification in another.

Wrapping up and feedback

To shutdown your OpenMetadata services, run the following command:

docker compose down

Or, you can add additional metadata connectors to your OpenMetadata instance! Popular connectors include Snowflake, BigQuery, Databricks, and Tableau!

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Driven Automation in Open-Source Metadata Platforms: Embedding an MCP Server

Contents

Prerequisites

OpenMetadata

Installing OpenMetadata

Adding postgreSQL to OpenMetadata

Adding the OpenMetadata MCP Server to goose

goose 🎉

Integrating Python via Jupyter MCP

Adding JupyterLab to goose

Scaling out with Collate

Wrapping up and feedback

About

Uh oh!

Releases

Packages

PubChimps/openmetadata-pycon-ireland

Folders and files

Latest commit

History

Repository files navigation

AI Driven Automation in Open-Source Metadata Platforms: Embedding an MCP Server

Contents

Prerequisites

OpenMetadata

Installing OpenMetadata

Adding postgreSQL to OpenMetadata

Adding the OpenMetadata MCP Server to goose

goose 🎉

Integrating Python via Jupyter MCP

Adding JupyterLab to goose

Scaling out with Collate

Wrapping up and feedback

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages