Config reference¶

All scripts accept a --config YAML file. CLI flags always override YAML values.

How configs are loaded¶

# Pre-parsed before argparse so YAML values become defaults:
python scripts/train_classifier.py --config config/train_ast.yaml --lr 5e-5
# → lr=5e-5 (CLI wins), everything else from YAML

`config/train_ast.yaml` / `train_vggish.yaml`¶

seed: 42
clip_seconds: 5.0          # audio clip length in seconds

model:
  name: ast                # ast | vggish
  vggish_ckpt: null        # path to vggish_model.ckpt (required for vggish)

data:
  root: null               # set via --data-root
  train_csv: null          # CSV with filepath/target columns
  val_csv: null
  real_subdir: real_songs  # directory-mode fallback
  fake_subdir: fake_songs
  val_frac: 0.1
  max_per_class: null

training:
  batch_size: 64
  epochs: 10
  lr: 1.0e-4
  num_workers: 16
  out_dir: runs

cluster:
  num_nodes: 1
  devices: 1
  strategy: auto           # ddp_find_unused_parameters_false for multi-GPU
  no_auto_requeue: false

`config/predict_ast.yaml` / `predict_vggish.yaml` / `predict_sonics.yaml`¶

seed: 42
clip_seconds: 5.0

model:
  name: ast
  checkpoint: null         # set via --checkpoint or AST_CKPT env var
  model_id: null           # HuggingFace repo ID (Sonics only)

predict:
  split: test              # train | valid | test
  n_samples: null          # null = full split; integer = balanced cap
  batch_size: 16
  sample_rate: 16000
  log_dir: runs/predict
  device: cuda

attack:
  n_attack_samples: 100
  micro_batch: 4           # --attack-micro-batch
  n_steps: 100             # Adam optimisation steps
  lr: 1.0e-3
  lambda_aud: 0.5          # audibility loss weight
  lambda_pred: 100.0       # prediction preservation weight
  pred_margin: 0.1         # hinge margin
  log_dir: runs/attack
  device: cuda

AttackConfig fields¶

These map to the AttackConfig dataclass used internally by the attack engine:

Field	Type	Default	Description
`n_steps`	int	200	Adam steps per sample
`lr`	float	1e-3	Adam learning rate
`lambda_audibility`	float	1.0	Weight on the inaudibility loss term
`lambda_pred`	float	100.0	Weight on the prediction-preservation loss
`pred_margin`	float	1.0	Hinge margin — lower activates penalty earlier
`linf_bound`	float	0.01	Hard L∞ clip applied to δ after each step
`sample_rate`	int	16000	Audio sample rate
`use_gradient_checkpointing`	bool	True	Recompute activations during backprop (saves VRAM)
`log_every`	int\|None	20	Log loss every N steps; None disables

Tuning guidance¶

Goal	Recommendation
Stronger explanation flip	Decrease `lambda_pred`, increase `n_steps`
Keep predictions intact	Increase `lambda_pred`, increase `pred_margin`
More inaudible perturbation	Increase `lambda_aud`
Faster runs (less OOM risk)	Decrease `attack-micro-batch`, decrease `n_steps`
Reduce memory on AST	`clip_seconds: 2.0` — drops tokens from 1214 → 230

Config reference¶

How configs are loaded¶

config/train_ast.yaml / train_vggish.yaml¶

config/predict_ast.yaml / predict_vggish.yaml / predict_sonics.yaml¶

AttackConfig fields¶

Tuning guidance¶

`config/train_ast.yaml` / `train_vggish.yaml`¶

`config/predict_ast.yaml` / `predict_vggish.yaml` / `predict_sonics.yaml`¶