Reference Data
ST-CNVBench uses two kinds of reference data.
Bundled Small References
These files are already included in git under refs/hg38_genome_info/:
hg38_genes_simple.txthg38_genes_annot.txtcytoBand.txt.gzhg38.list
Population Phasing Bundle
Large population phasing resources are required only for allele-aware wrappers:
CalicoSTNumbatXclone
Download the bundle from:
After download, extract the bundle under:
refs/
└── population_phasing/
Required Files
The extracted population_phasing/ directory must contain:
1000G_hg38/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.ensemble_style.sorted.vcf.gzgenome1K.phase3.SNP_AF5e2.chr1toX.hg38.ensemble_style.sorted.vcf.gz.tbi
Expected layout:
refs/
└── population_phasing/
├── 1000G_hg38/
├── genome1K.phase3.SNP_AF5e2.chr1toX.hg38.ensemble_style.sorted.vcf.gz
└── genome1K.phase3.SNP_AF5e2.chr1toX.hg38.ensemble_style.sorted.vcf.gz.tbi
Do not rename the files or the directory.
Practical Rule
If you are only running the shipped demo with CopyKAT, you do not need the population phasing bundle.