Skip to content

Model Run Guide

This page describes the public models.yaml contract used by st-cnvbench --steps run.

Model running assumes that prep has already produced a standardized dataset bundle. See data_preparation.md for the dataset input contract.

Included Methods

The current public release includes 8 CNV inference methods:

In the public config surface, Clonalscope is exposed as two wrappers so users can run the no-WGS and WGS-assisted modes separately.

Run Command

Run all enabled models on all datasets:

st-cnvbench --steps run \
  --data-config configs/examples/cscc_demo/data.yaml \
  --model-config configs/examples/cscc_demo/models.yaml

Run selected models or datasets:

st-cnvbench --steps run \
  --data-config configs/examples/cscc_demo/data.yaml \
  --model-config configs/templates/models.template.yaml \
  --models CopyKAT InferCNV STARCH \
  --prep-ids P6_vis_rep1

Override the execution backend from the command line:

st-cnvbench --steps run --exec-mode conda --data-config data.yaml --model-config models.yaml

Global Settings

project_settings defines shared paths and the default execution backend.

Field Meaning
root_dir Project root used to resolve relative paths.
results_dir Output root for raw model results.
refs Reference data directory.
exts External tool source directory.
default_exec_mode Default backend: conda, docker, or apptainer.

Example:

project_settings:
  root_dir: "."
  results_dir: "${root_dir}/outputs/raw_results"
  refs: "${root_dir}/refs"
  exts: "${root_dir}/external_tools"
  default_exec_mode: "conda"

Shared Model Fields

Each model section is keyed by the public model name in the wrapper registry.

Field Meaning
enabled Whether the wrapper is run by default.
model_name Wrapper name expected by ST-CNVBench. Keep this aligned with the section name.
env_name Conda environment used when exec_mode is conda.
docker_image Docker image used when exec_mode is docker.
apptainer_sif SIF image path used when exec_mode is apptainer.
per_dataset Optional dataset-specific overrides.

Disable a model globally:

CopyKAT:
  enabled: false

Override selected parameters for one dataset:

STARCH:
  enabled: true
  n_clusters: 3
  per_dataset:
    P6_vis_rep1:
      n_clusters: 2

Disable one model for one dataset:

InferCNV:
  enabled: true
  per_dataset:
    P6_vis_rep1:
      enabled: false

Reference-Normal Mode

Reference-normal behavior is controlled by ref_norm in data.yaml, not by a separate model-level flag.

Use reference spots:

ref_norm: true
tumor_normal_mode: subset
raw:
  tumor_normal: /path/to/metadata_tumor_normal.tsv

Run without provided reference spots:

ref_norm: false
tumor_normal_mode: de_novo
raw:
  tumor_normal: null

Current wrapper behavior when raw.tumor_normal is not provided:

Status Models
Supported CalicoST, CopyKAT, InferCNV, SCEVAN, Numbat, STARCH, Clonalscope_NoWGS, Clonalscope_WGS
Not supported Xclone

Key Parameters By Model

Model Key fields
CalicoST calicost_dir, eagle_dir, region_vcf, phasing_panel, use_tumor_purity, n_clones, n_threads, UMItag, cellTAG
CopyKAT genome, win_size, ks_cut, distance, n_cores, n_clones
InferCNV gene_order_file, cutoff, n_threads, k_obs_groups
Clonalscope_NoWGS gene_coords_file, aux_data_dir, mincell
Clonalscope_WGS gene_coords_file, aux_data_dir, hmm_states, mincell
Numbat pileup_script, eagle_path, genetic_map, snp_vcf_path, panel_dir, genome_version, n_threads, UMItag, cellTAG, n_clones
Xclone snp_vcf, gene_region, eagle_path, genetic_map, panel_dir, n_threads, UMItag, cellTAG, minCOUNT, minMAF, n_clusters
SCEVAN n_threads
STARCH gene_mapping_file, n_clusters, beta_spot, platform, returnnormal

Output Layout

Each enabled wrapper writes to:

<results_dir>/<dataset_id>/<model_name>/

The wrapper also writes <model_name>_run.log in the same directory. If execution fails, ST-CNVBench raises an error and reports the log path plus the last log lines.