Evaluation Guide
This page describes the public eval.yaml contract, available evaluation tasks, and the ground-truth files required by each task.
Evaluation assumes that model outputs already exist under project_settings.results_dir. See model_run.md for model-running configuration.
Run Command
Run one evaluation task for selected models:
st-cnvbench --steps eval \
--data-config configs/examples/cscc_demo/data.yaml \
--eval-config configs/examples/cscc_demo/eval.yaml \
--models CopyKAT \
--eval-tasks cnv_profile
Run evaluation for selected datasets, models and tasks:
st-cnvbench --steps eval \
--data-config data.yaml \
--eval-config eval.yaml \
--prep-ids sample_1 sample_2 \
--models InferCNV CopyKAT STARCH \
--eval-tasks cnv_profile tumor_normal
If --eval-tasks is omitted, the controller attempts all registered tasks. For public examples, specify only tasks whose required GT files are present.
Global Settings
project_settings defines where raw model outputs are read and where evaluation outputs are written.
| Field | Meaning |
|---|---|
root_dir |
Project root used to resolve relative paths. |
results_dir |
Raw model result root produced by --steps run. |
eval_dir |
Evaluation output root. |
global_params provides shared genomic settings.
| Field | Meaning |
|---|---|
bin_size |
Genomic bin size used by CNV profile and clone-level mapping tasks. |
genome_version |
Reference genome label, currently used by clonal mapping utilities. |
gene_annot_path |
Gene annotation table used by expression-to-genomic-bin conversion. |
Example:
project_settings:
root_dir: "."
results_dir: "${root_dir}/outputs/raw_results"
eval_dir: "${root_dir}/outputs/evaluation"
global_params:
bin_size: 100000
genome_version: "hg38"
gene_annot_path: "${root_dir}/refs/hg38_genome_info/hg38_genes_annot.txt"
Model Loader Settings
eval_list maps each model result directory to one or more evaluation loaders.
eval_list:
InferCNV:
eval_name: ["InferCNV_expr", "InferCNV_cnv"]
CopyKAT:
eval_name: ["CopyKAT"]
The key, for example InferCNV, must match the model result directory under:
<results_dir>/<dataset_id>/<model_name>/
The eval_name values select loader adapters for the output format produced by that model.
| Model key | Loader names |
|---|---|
InferCNV |
InferCNV_expr, InferCNV_cnv |
CopyKAT |
CopyKAT |
SCEVAN |
SCEVAN_expr, SCEVAN_cnv |
Clonalscope_WGS |
Clonalscope_WGS |
Clonalscope_NoWGS |
Clonalscope_NoWGS |
Numbat |
Numbat_expr, Numbat_cnv |
CalicoST |
CalicoST |
STARCH |
STARCH |
Xclone |
Xclone_expr, Xclone_cnv |
Use the loader that matches the model output type you want to evaluate. Expression-derived loaders usually need gene_annot_path to map genes to genomic bins.
Evaluation Tasks
Available task names for --eval-tasks:
| Task | Purpose | Required GT or inputs |
|---|---|---|
efficiency |
Runtime and memory summary. | No biological GT. Requires conda-mode .perf files generated during model execution. |
resolution |
CNV resolution comparison across model outputs. | No GT. Requires model CNV outputs and global_params.gene_annot_path. |
tumor_normal |
Tumor/normal prediction evaluation. | raw.tumor_normal_gt; subset mode also requires raw.tumor_normal. |
cnv_profile |
CNV profile concordance against sample-level CNV GT. | raw.cnv_gt as a FACETS/VCF-like segment file. |
subclone_detection_in_slice |
Spot-level subclone assignment and clone-profile matching within a slice. | raw.subclone_gt; clone-profile metrics also need clone-level CNV profiles from raw.cnv_gt. |
subclone_detection_organ |
Organ-level subclone assignment across slices or merged samples. | raw.subclone_gt. |
clonal_evolution |
Clone-level CNV tree and spatial phylogeography plots. | Model clone labels/profiles and spatial coordinates; no separate GT tree is currently required. |
GT Files In data.yaml
Evaluation GT paths are defined in the dataset config, not in eval.yaml.
raw.tumor_normal_gt
Used only by tumor_normal evaluation.
Expected format: tab-delimited spot labels with two or four columns. The first two columns are interpreted as barcode and label.
Barcode tumor_normal
AAAC... tumor
AAAG... normal
Accepted labels are tumor and normal.
raw.tumor_normal
This is not GT. It is the reference-normal annotation used during model running and subset-mode evaluation.
In tumor_normal_mode: subset, evaluation removes the reference normal spots from the comparison set, so both raw.tumor_normal and raw.tumor_normal_gt are required.
raw.cnv_gt For cnv_profile
For cnv_profile, raw.cnv_gt should point to a FACETS/VCF-like segment file.
Required content:
| Field | Meaning |
|---|---|
#CHROM |
Chromosome. |
POS |
Segment start. |
INFO |
Must include END; TCN_EM and LCN_EM are used when available. |
The evaluator derives:
| Derived field | Source |
|---|---|
GT_Score |
log2(TCN_EM / 2) with a small pseudocount for zero-copy events. |
GT_Event |
loss, neutral, or gain from total copy number. |
LOH_Status |
CN-LOH from TCN_EM == 2 and LCN_EM == 0. |
raw.subclone_gt
Used by subclone detection tasks.
Expected format: tab-delimited spot-to-subclone labels with two columns.
Barcode subclone
AAAC... clone_1
AAAG... clone_2
The evaluator renames these columns internally to Barcodes and Label_preds.
Clone-Level CNV GT For Subclone Tasks
subclone_detection_in_slice can also compare predicted clone CNV profiles against GT clone CNV profiles.
Current loader expectation: a directory containing files named:
<clone_id>_GT_Profile.txt
Each profile file should contain genomic bins and a CNV score column such as CN_Score or CN_Score_Continuous.
This is separate from the FACETS/VCF-like file used by cnv_profile. If a dataset needs both sample-level cnv_profile GT and clone-level subclone CNV GT, keep the paths clear in the dataset config used for that run.
raw.beads_mapping
Only needed for Slide-DNA-seq style subclone evaluation when model predictions are made on pseudo-barcodes and must be mapped back to original bead barcodes.
Expected columns:
pseudo_barcode,original_barcode
Spatial Inputs
Evaluation uses standardized spatial files from prep:
<output.root>/spatial/tissue_positions.csv
<output.root>/spatial/scalefactors_json.json
<output.root>/spatial/tissue_hires_image.png # optional
Spatial coherence mode is selected from platform:
platform |
Spatial mode |
|---|---|
ST |
KNN-based spatial coherence. |
| other values | Visium-style distance-based spatial coherence. |
Output Layout
Evaluation outputs are written under:
<eval_dir>/<dataset_id>/
Common task directories include:
computational_efficiency/
cnv_resolution/
tumor_normal/
cnv_profile/
subclone_detection/
GT/
Task outputs include formatted intermediate tables, metrics summaries, and plots. Missing required GT or missing model result directories are reported explicitly rather than silently substituted.