Model Run Guide

This page describes the public models.yaml contract used by st-cnvbench --steps run.

Model running assumes that prep has already produced a standardized dataset bundle. See data_preparation.md for the dataset input contract.

Included Methods

The current public release includes 8 CNV inference methods:

CalicoST
CopyKAT
InferCNV
Clonalscope (Clonalscope_NoWGS, Clonalscope_WGS)
Numbat
Xclone
SCEVAN
STARCH

In the public config surface, Clonalscope is exposed as two wrappers so users can run the no-WGS and WGS-assisted modes separately.

Run Command

Run all enabled models on all datasets:

st-cnvbench --steps run \
  --data-config configs/examples/cscc_demo/data.yaml \
  --model-config configs/examples/cscc_demo/models.yaml

Run selected models or datasets:

st-cnvbench --steps run \
  --data-config configs/examples/cscc_demo/data.yaml \
  --model-config configs/templates/models.template.yaml \
  --models CopyKAT InferCNV STARCH \
  --prep-ids P6_vis_rep1

Override the execution backend from the command line:

st-cnvbench --steps run --exec-mode conda --data-config data.yaml --model-config models.yaml

Global Settings

project_settings defines shared paths and the default execution backend.

Field	Meaning
`root_dir`	Project root used to resolve relative paths.
`results_dir`	Output root for raw model results.
`refs`	Reference data directory.
`exts`	External tool source directory.
`default_exec_mode`	Default backend: `conda`, `docker`, or `apptainer`.

Example:

project_settings:
  root_dir: "."
  results_dir: "${root_dir}/outputs/raw_results"
  refs: "${root_dir}/refs"
  exts: "${root_dir}/external_tools"
  default_exec_mode: "conda"

Shared Model Fields

Each model section is keyed by the public model name in the wrapper registry.

Field	Meaning
`enabled`	Whether the wrapper is run by default.
`model_name`	Wrapper name expected by ST-CNVBench. Keep this aligned with the section name.
`env_name`	Conda environment used when `exec_mode` is `conda`.
`docker_image`	Docker image used when `exec_mode` is `docker`.
`apptainer_sif`	SIF image path used when `exec_mode` is `apptainer`.
`per_dataset`	Optional dataset-specific overrides.

Disable a model globally:

CopyKAT:
  enabled: false

Override selected parameters for one dataset:

STARCH:
  enabled: true
  n_clusters: 3
  per_dataset:
    P6_vis_rep1:
      n_clusters: 2

Disable one model for one dataset:

InferCNV:
  enabled: true
  per_dataset:
    P6_vis_rep1:
      enabled: false

Reference-Normal Mode

Reference-normal behavior is controlled by ref_norm in data.yaml, not by a separate model-level flag.

Use reference spots:

ref_norm: true
tumor_normal_mode: subset
raw:
  tumor_normal: /path/to/metadata_tumor_normal.tsv

Run without provided reference spots:

ref_norm: false
tumor_normal_mode: de_novo
raw:
  tumor_normal: null

Current wrapper behavior when raw.tumor_normal is not provided:

Status	Models
Supported	`CalicoST`, `CopyKAT`, `InferCNV`, `SCEVAN`, `Numbat`, `STARCH`, `Clonalscope_NoWGS`, `Clonalscope_WGS`
Not supported	`Xclone`

Key Parameters By Model

Model	Key fields
`CalicoST`	`calicost_dir`, `eagle_dir`, `region_vcf`, `phasing_panel`, `use_tumor_purity`, `n_clones`, `n_threads`, `UMItag`, `cellTAG`
`CopyKAT`	`genome`, `win_size`, `ks_cut`, `distance`, `n_cores`, `n_clones`
`InferCNV`	`gene_order_file`, `cutoff`, `n_threads`, `k_obs_groups`
`Clonalscope_NoWGS`	`gene_coords_file`, `aux_data_dir`, `mincell`
`Clonalscope_WGS`	`gene_coords_file`, `aux_data_dir`, `hmm_states`, `mincell`
`Numbat`	`pileup_script`, `eagle_path`, `genetic_map`, `snp_vcf_path`, `panel_dir`, `genome_version`, `n_threads`, `UMItag`, `cellTAG`, `n_clones`
`Xclone`	`snp_vcf`, `gene_region`, `eagle_path`, `genetic_map`, `panel_dir`, `n_threads`, `UMItag`, `cellTAG`, `minCOUNT`, `minMAF`, `n_clusters`
`SCEVAN`	`n_threads`
`STARCH`	`gene_mapping_file`, `n_clusters`, `beta_spot`, `platform`, `returnnormal`

Output Layout

Each enabled wrapper writes to:

<results_dir>/<dataset_id>/<model_name>/

The wrapper also writes <model_name>_run.log in the same directory. If execution fails, ST-CNVBench raises an error and reports the log path plus the last log lines.