Skip to content

Audio-XAI

Audio-XAI studies the perceptual fragility of explanation methods (Grad-CAM, LRP) for deep-learning audio classifiers. The core question: can an adversary flip what the model explains while keeping predictions identical, and can that perturbation remain inaudible?

Models supported: AST, VGGish, Sonics (SpecTTTra) XAI methods: Grad-CAM (CNN + Transformer variants), LRP Perceptual metrics: PESQ, STOI, PEAQ, ViSQOL, CDPAM Infrastructure: PyTorch Lightning, HuggingFace, SLURM/PLGrid A100 cluster

Docs

Page Contents
Setup Install, cluster environment, conda env
Data Dataset layout, CSV format, SonicsDataset
Training Train classifiers locally or on SLURM
Inference Batch prediction, metrics, checkpoints
Attack Perceptual XAI attack — full workflow
Config reference Every YAML key explained
SLURM guide Submitting and monitoring jobs on PLGrid
API Reference Auto-generated module docs

Quick start

# 1. Clone and install
git clone https://github.com/cncPomper/Audio-XAI && cd Audio-XAI
conda activate $SCRATCH/conda_envs/athena   # or: uv sync

# 2. Train
python scripts/train_classifier.py --config config/train_ast.yaml --data-root /path/to/data

# 3. Predict
python scripts/predict.py --config config/predict_ast.yaml \
    --data-root /path/to/data --checkpoint runs/ast/version_0/checkpoints/epoch=4.ckpt

# 4. Attack
python scripts/attack.py --config config/predict_ast.yaml \
    --data-root /path/to/data --checkpoint runs/ast/version_0/checkpoints/epoch=4.ckpt \
    --full-audio --window-hop-seconds 5.0

Repository layout

Audio-XAI/
├── config/             # YAML configs for train / predict / attack
├── scripts/            # Entry-point scripts (train, predict, attack, explain)
├── audio_xai/          # Main Python package
│   ├── models/         # ASTBinary, VGGishBinary, Wav2Vec2Binary + LightningModule
│   ├── attacks/        # perceptual_xai_attack, AttackConfig, AttackResult
│   ├── data/           # SonicsDataset, SonicsConfig
│   ├── xai/            # GradCAM (CNN + Transformer), LRP
│   └── metrics/        # psychoacoustic masking, PESQ/STOI wrappers
├── docs/               # This documentation
└── sbatch/             # SLURM job scripts (train / predict / attack)
    ├── train/
    ├── predict/
    └── attack/