Use Your Own Dataset

This tutorial outlines the minimum path for running ST-CNVBench on your own dataset.

1. Decide The Raw Input Layout

The public prep step currently supports:

SpaceRanger
STpipeline

See Dataset Preparation for the exact input contract.

2. Create `data.yaml`

Start from:

configs/templates/data.template.yaml

Fill in:

dataset identity fields such as dataset_id, platform, genome, and species
raw.* input paths
output.root
GT paths only for the evaluation tasks you actually plan to run

3. Prepare Runtime Requirements

Before running models, make sure the required runtime and references are available.

base installation: Installation
external tools: External Tools And Runtime Notes
reference data: Reference Data

4. Run Step By Step

Start with prep only:

st-cnvbench --steps prep --data-config path/to/data.yaml

Then run selected methods:

st-cnvbench --steps run \
  --data-config path/to/data.yaml \
  --model-config path/to/models.yaml \
  --exec-mode conda \
  --models CopyKAT InferCNV

Then run only the evaluation tasks supported by your GT files:

st-cnvbench --steps eval \
  --data-config path/to/data.yaml \
  --eval-config path/to/eval.yaml \
  --models CopyKAT InferCNV \
  --eval-tasks cnv_profile

5. Keep The First Run Small

For a first pass, it is usually better to:

start with one dataset
enable one or two methods
run one evaluation task
verify outputs before scaling up

Try Next

For the packaged cSCC demo, go to Quickstart Demo And Expected Outputs
For the CNV profile task example, go to CNV Profile Task Example
For the tumor-normal task example, go to Tumor-Normal Classification Task Example
For the subclone task example, go to Subclone Identification Task Example
To configure or add a method, go to Use Your Own Model