Config reference¶
All scripts accept a --config YAML file. CLI flags always override YAML values.
How configs are loaded¶
# Pre-parsed before argparse so YAML values become defaults:
python scripts/train_classifier.py --config config/train_ast.yaml --lr 5e-5
# → lr=5e-5 (CLI wins), everything else from YAML
config/train_ast.yaml / train_vggish.yaml¶
seed: 42
clip_seconds: 5.0 # audio clip length in seconds
model:
name: ast # ast | vggish
vggish_ckpt: null # path to vggish_model.ckpt (required for vggish)
data:
root: null # set via --data-root
train_csv: null # CSV with filepath/target columns
val_csv: null
real_subdir: real_songs # directory-mode fallback
fake_subdir: fake_songs
val_frac: 0.1
max_per_class: null
training:
batch_size: 64
epochs: 10
lr: 1.0e-4
num_workers: 16
out_dir: runs
cluster:
num_nodes: 1
devices: 1
strategy: auto # ddp_find_unused_parameters_false for multi-GPU
no_auto_requeue: false
config/predict_ast.yaml / predict_vggish.yaml / predict_sonics.yaml¶
seed: 42
clip_seconds: 5.0
model:
name: ast
checkpoint: null # set via --checkpoint or AST_CKPT env var
model_id: null # HuggingFace repo ID (Sonics only)
predict:
split: test # train | valid | test
n_samples: null # null = full split; integer = balanced cap
batch_size: 16
sample_rate: 16000
log_dir: runs/predict
device: cuda
attack:
n_attack_samples: 100
micro_batch: 4 # --attack-micro-batch
n_steps: 100 # Adam optimisation steps
lr: 1.0e-3
lambda_aud: 0.5 # audibility loss weight
lambda_pred: 100.0 # prediction preservation weight
pred_margin: 0.1 # hinge margin
log_dir: runs/attack
device: cuda
AttackConfig fields¶
These map to the AttackConfig dataclass used internally by the attack engine:
| Field | Type | Default | Description |
|---|---|---|---|
n_steps |
int | 200 | Adam steps per sample |
lr |
float | 1e-3 | Adam learning rate |
lambda_audibility |
float | 1.0 | Weight on the inaudibility loss term |
lambda_pred |
float | 100.0 | Weight on the prediction-preservation loss |
pred_margin |
float | 1.0 | Hinge margin — lower activates penalty earlier |
linf_bound |
float | 0.01 | Hard L∞ clip applied to δ after each step |
sample_rate |
int | 16000 | Audio sample rate |
use_gradient_checkpointing |
bool | True | Recompute activations during backprop (saves VRAM) |
log_every |
int|None | 20 | Log loss every N steps; None disables |
Tuning guidance¶
| Goal | Recommendation |
|---|---|
| Stronger explanation flip | Decrease lambda_pred, increase n_steps |
| Keep predictions intact | Increase lambda_pred, increase pred_margin |
| More inaudible perturbation | Increase lambda_aud |
| Faster runs (less OOM risk) | Decrease attack-micro-batch, decrease n_steps |
| Reduce memory on AST | clip_seconds: 2.0 — drops tokens from 1214 → 230 |