Skip to content

Evaluation Guide

This page describes the public eval.yaml contract, available evaluation tasks, and the ground-truth files required by each task.

Evaluation assumes that model outputs already exist under project_settings.results_dir. See model_run.md for model-running configuration.

Run Command

Run one evaluation task for selected models:

st-cnvbench --steps eval \
  --data-config configs/examples/cscc_demo/data.yaml \
  --eval-config configs/examples/cscc_demo/eval.yaml \
  --models CopyKAT \
  --eval-tasks cnv_profile

Run evaluation for selected datasets, models and tasks:

st-cnvbench --steps eval \
  --data-config data.yaml \
  --eval-config eval.yaml \
  --prep-ids sample_1 sample_2 \
  --models InferCNV CopyKAT STARCH \
  --eval-tasks cnv_profile tumor_normal

If --eval-tasks is omitted, the controller attempts all registered tasks. For public examples, specify only tasks whose required GT files are present.

Global Settings

project_settings defines where raw model outputs are read and where evaluation outputs are written.

Field Meaning
root_dir Project root used to resolve relative paths.
results_dir Raw model result root produced by --steps run.
eval_dir Evaluation output root.

global_params provides shared genomic settings.

Field Meaning
bin_size Genomic bin size used by CNV profile and clone-level mapping tasks.
genome_version Reference genome label, currently used by clonal mapping utilities.
gene_annot_path Gene annotation table used by expression-to-genomic-bin conversion.

Example:

project_settings:
  root_dir: "."
  results_dir: "${root_dir}/outputs/raw_results"
  eval_dir: "${root_dir}/outputs/evaluation"

global_params:
  bin_size: 100000
  genome_version: "hg38"
  gene_annot_path: "${root_dir}/refs/hg38_genome_info/hg38_genes_annot.txt"

Model Loader Settings

eval_list maps each model result directory to one or more evaluation loaders.

eval_list:
  InferCNV:
    eval_name: ["InferCNV_expr", "InferCNV_cnv"]
  CopyKAT:
    eval_name: ["CopyKAT"]

The key, for example InferCNV, must match the model result directory under:

<results_dir>/<dataset_id>/<model_name>/

The eval_name values select loader adapters for the output format produced by that model.

Model key Loader names
InferCNV InferCNV_expr, InferCNV_cnv
CopyKAT CopyKAT
SCEVAN SCEVAN_expr, SCEVAN_cnv
Clonalscope_WGS Clonalscope_WGS
Clonalscope_NoWGS Clonalscope_NoWGS
Numbat Numbat_expr, Numbat_cnv
CalicoST CalicoST
STARCH STARCH
Xclone Xclone_expr, Xclone_cnv

Use the loader that matches the model output type you want to evaluate. Expression-derived loaders usually need gene_annot_path to map genes to genomic bins.

Evaluation Tasks

Available task names for --eval-tasks:

Task Purpose Required GT or inputs
efficiency Runtime and memory summary. No biological GT. Requires conda-mode .perf files generated during model execution.
resolution CNV resolution comparison across model outputs. No GT. Requires model CNV outputs and global_params.gene_annot_path.
tumor_normal Tumor/normal prediction evaluation. raw.tumor_normal_gt; subset mode also requires raw.tumor_normal.
cnv_profile CNV profile concordance against sample-level CNV GT. raw.cnv_gt as a FACETS/VCF-like segment file.
subclone_detection_in_slice Spot-level subclone assignment and clone-profile matching within a slice. raw.subclone_gt; clone-profile metrics also need clone-level CNV profiles from raw.cnv_gt.
subclone_detection_organ Organ-level subclone assignment across slices or merged samples. raw.subclone_gt.
clonal_evolution Clone-level CNV tree and spatial phylogeography plots. Model clone labels/profiles and spatial coordinates; no separate GT tree is currently required.

GT Files In data.yaml

Evaluation GT paths are defined in the dataset config, not in eval.yaml.

raw.tumor_normal_gt

Used only by tumor_normal evaluation.

Expected format: tab-delimited spot labels with two or four columns. The first two columns are interpreted as barcode and label.

Barcode  tumor_normal
AAAC...  tumor
AAAG...  normal

Accepted labels are tumor and normal.

raw.tumor_normal

This is not GT. It is the reference-normal annotation used during model running and subset-mode evaluation.

In tumor_normal_mode: subset, evaluation removes the reference normal spots from the comparison set, so both raw.tumor_normal and raw.tumor_normal_gt are required.

raw.cnv_gt For cnv_profile

For cnv_profile, raw.cnv_gt should point to a FACETS/VCF-like segment file.

Required content:

Field Meaning
#CHROM Chromosome.
POS Segment start.
INFO Must include END; TCN_EM and LCN_EM are used when available.

The evaluator derives:

Derived field Source
GT_Score log2(TCN_EM / 2) with a small pseudocount for zero-copy events.
GT_Event loss, neutral, or gain from total copy number.
LOH_Status CN-LOH from TCN_EM == 2 and LCN_EM == 0.

raw.subclone_gt

Used by subclone detection tasks.

Expected format: tab-delimited spot-to-subclone labels with two columns.

Barcode  subclone
AAAC...  clone_1
AAAG...  clone_2

The evaluator renames these columns internally to Barcodes and Label_preds.

Clone-Level CNV GT For Subclone Tasks

subclone_detection_in_slice can also compare predicted clone CNV profiles against GT clone CNV profiles.

Current loader expectation: a directory containing files named:

<clone_id>_GT_Profile.txt

Each profile file should contain genomic bins and a CNV score column such as CN_Score or CN_Score_Continuous.

This is separate from the FACETS/VCF-like file used by cnv_profile. If a dataset needs both sample-level cnv_profile GT and clone-level subclone CNV GT, keep the paths clear in the dataset config used for that run.

raw.beads_mapping

Only needed for Slide-DNA-seq style subclone evaluation when model predictions are made on pseudo-barcodes and must be mapped back to original bead barcodes.

Expected columns:

pseudo_barcode,original_barcode

Spatial Inputs

Evaluation uses standardized spatial files from prep:

<output.root>/spatial/tissue_positions.csv
<output.root>/spatial/scalefactors_json.json
<output.root>/spatial/tissue_hires_image.png  # optional

Spatial coherence mode is selected from platform:

platform Spatial mode
ST KNN-based spatial coherence.
other values Visium-style distance-based spatial coherence.

Output Layout

Evaluation outputs are written under:

<eval_dir>/<dataset_id>/

Common task directories include:

computational_efficiency/
cnv_resolution/
tumor_normal/
cnv_profile/
subclone_detection/
GT/

Task outputs include formatted intermediate tables, metrics summaries, and plots. Missing required GT or missing model result directories are reported explicitly rather than silently substituted.