Skip to content

shammur/MultimodalXplain

Repository files navigation

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models

Paper

Accepted at Interspeech 2025 Main Track

This repository contains the official implementation and analysis code for our paper.

🛠️ Environment Setup

We provide three specialized conda environments for different analysis components:

Core Speech Analysis Environment

conda env create -f environment.yml
conda activate speechX

NeuroX Interpretability Environment

conda env create -f environment_neurox.yml
conda activate neurox_pip

Concept Clustering Environment

conda env create -f environment_conceptx.yml
conda activate conceptx

📊 Running Experiments

Representation Extraction and Clustering

Speech Modality Analysis (LibriSpeech)

Extract and analyze speech representations using HuBERT:

sh scripts/librispeech/speech/hubert/extract_and_cluster_speech.sh

Text Modality Analysis (LibriSpeech)

Extract and analyze text representations using BERT:

sh scripts/librispeech/text/bert/extract_and_cluster_text.sh

Fine-tuning Experiments

SST-2 Sentiment Analysis

Train models on the Stanford Sentiment Treebank:

# Fine-tune SpeechT5 model
sbatch sst2_ft/scripts/train_speecht5.sh

# Evaluate fine-tuned model (update model path in script first)
sbatch sst2_ft/scripts/infer_speecht5.sh

📖 Citation

If you find this work useful for your research, please cite:

About

The repo contains code and workflow to study multimodal models for interpretability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •