ChatGPT ARTICLE 30 June 2026

Inside Genebench-Pro

Title: Inside Genebench-Pro

Case studies

These 10 case studies showcase representative questions from GeneBench-Pro. Each case study includes the original prompt, datasets, and supporting materials. For an overview of the benchmark and key findings, see the announcement blog.

Note: File previews show excerpts from the full datasets.

Case study 1

Somatic oncology: Structural variant-guided tumor therapy benefit-risk decision

Estimate whether a synthetic TXR1-directed inhibitor has positive clinical utility in tumors whose target activation is driven by a structural variant. TXR1, TXR1i, DLR1, and star-allele labels are synthetic benchmark labels.

_The target subgroup has to be recovered from long-read, expression, tumor-quality, and pharmacogenomic evidence before benefit and toxicity can be interpreted as a treatment decision._

Files provided to the model

patient_idanalysis_setagesexsitecalendar_periodecogtumor_burdenprior_linesprior_resistancelineage_classtherapy_classassessed16benefit16tox_stop_8wktime_zero_day

MTB0001 1 73.8 M S1 P2 2 0.787 3 1 A TXR1i 0 1 0

MTB0002 1 55.2 M S3 P1 1 2.637 0 1 A TXR1i 1 0 0 0

MTB0003 1 68.8 F S4 P2 0 0.891 2 1 A TXR1i 1 1 1 0

MTB0004 1 82.8 F S2 P2 2 4.101 0 0 B TXR1i 1 0 0 0

MTB0005 1 65.5 F S1 P3 1 7.0 1 1 A TXR1i 1 0 0 0

Registry covariates, therapy, week-16 assessment, benefit, and early toxicity.

Case study 2

Functional genomics: CRISPR target validation: lncRNA transcript or genomic locus?

Decide whether an apparent lncRNA dependency is transcript-specific or driven by nearby-locus and neighbor-gene effects.

_Transcript-directed evidence has to survive controls for local DNA-locus perturbation, neighbor-gene repression, guide swaps, GC toxicity, and plate effects._

guide_idnominal_targetchrcoordstranddist_lnc_tss_bpdist_neighbor_tss_bpguide_gc_frac

g001 LINC473 chr7 100014+14 30 0.624

g002 LINC473 chr7 100035-43 67 0.584

g003 LINC473 chr7 100051+116 56 0.622

g004 LINC473 chr7 100066-59 66 0.617

g005 LINC473 chr7 100088+74 77 0.715

Guide coordinates, targets, distances, and GC features.

Case study 3

Statistical genetics: Prioritizing protein drug targets in a linked genetic locus

Estimate direct disease effects for two nearby proteins using cis multivariable Mendelian randomization (cis-MVMR) while handling assay scale, allele orientation, winner's curse, LD, and residual local pleiotropy.

_The two proteins share a correlated locus. The analysis has to move from marginal associations to conditional, LD-aware disease effects on a common protein scale._

snppos_bpeffect_alleleother_allelemafbetasepval

rs200000 50000000 A C 0.42215 0.006438668310706808 0.003267330091203412 0.04876727714241972

rs200001 50010126 A C 0.05709 0.011008993337581301 0.006955239208750407 0.11345916603941006

rs200002 50020253 G T 0.09021 0.009922014757116319 0.005633023027015518 0.07817048492026045

rs200003 50030379 G T 0.48399 0.010569215614164573 0.0032291419740237445 0.0010638520681901973

rs200004 50040506 A G 0.37703 0.007036551378238654 0.0033297592321269802 0.034580976884336506

Screening-stage protein association summaries for PROTA.

Case study 4

Clinical genomics / carrier screening: DRX1 carrier-screening residual risk under CNV and pseudogene calibration

Estimate ancestry-specific carrier frequencies, residual risk after a negative screen, partner carrier frequency, and affected-conceptus risk from carrier-screening assay data.

_The residual-risk estimate depends on pseudogene-aware carrier calls, founder-haplotype collapse, ancestry-specific assay calibration, and standardization from tested partners back to the full partner roster._

sample_idcollectionancestryfamily_history_tier

S_EUR_0001 screening EUR 0

S_EUR_0002 screening EUR 0

S_EUR_0003 screening EUR 0

S_EUR_0004 screening EUR 0

S_EUR_0005 screening EUR 1

Screening-roster adults with ancestry and screening context.

Case study 5

Single-cell genomics: Activated-monocyte eQTL after ambient RNA correction

Estimate a genotype effect on activated-monocyte expression after removing ambient RNA and technical contamination from single-cell RNA-seq data.

_Ambient RNA affects both target expression and the marker panel used to call activation state, so correction has to occur before the eQTL model._

cell_iddonortotal_umiHBBIFI6ISG15LST1CXCL10

D01_C001 D01 1113 7 3 4 83 5

D01_C002 D01 1103 6 3 3 112 10

D01_C003 D01 1141 9 8 12 63 9

D01_C004 D01 1250 7 60 43 2 17

D01_C005 D01 1045 9 1 2 51 15

Per-cell UMI counts for marker genes, contamination markers, and the target gene.

Case study 6

Structural genetics: Nested structural variant: expression support and clinical association

Estimate whether a nested structural subhaplotype inside an anonymous inversion-like locus has a calibrated clinical association and credible expression support.

_A nested copy-dosage signal can be confounded by the broader inversion orientation, so dosage calibration, expression support, and clinical modeling have to remain distinct._

sample_idcaseageage_bandsexpc1pc2pc3ancestry_groupclinic_stratumrecruitment_stream

Q00012 1 50.45 50_64 0-1.01514-0.21032-0.08849 EUR tertiary clinic

Q00028 0 57.39 50_64 0-1.25987-0.12498 0.2344 EUR regional registry

Q00029 1 68.4 65_plus 0 0.91598 0.62177 0.01891 AFR tertiary clinic

Q00030 1 74.07 65_plus 1 0.21125-0.59634-0.08197 EAS community registry

Q00032 1 82.82 65_plus 0-1.12034-0.24372 0.14665 EUR community clinic

Clinical and covariate data for the full cohort.

Case study 7

Regulatory genomics: Measuring chromatin loop strength after structural-variant and mapping artifact masking

Quantify a focal case-control Hi-C loop-strength difference after removing low-mappability and structural-variant artifacts from the expected-contact background.

_The target loop is defined at 20 kb resolution, but the expected-contact model is distorted unless low-mappability contacts and a case-only SV stripe are masked first._

bin_idchromstartendgc_contentmappabilityre_sites

0 chr8 400000 420000 0.46199033821572594 0.9787574214704273 5

1 chr8 420000 440000 0.5044124208534677 0.8901084943498397 5

2 chr8 440000 460000 0.43218451584938194 0.9056879289326712 3

3 chr8 460000 480000 0.4733197282681218 0.9376529840664789 3

4 chr8 480000 500000 0.4444956062150748 0.8682565517981877 4

Target-resolution bin annotations.

Case study 8

Statistical genetics: Multi-parent QTL mapping with founder reconstruction

Map a chromosome-1 quantitative-trait locus in an eight-founder recombinant population by reconstructing founder ancestry before testing the phenotype association.

_The visible marker data are biallelic, but the biological signal is founder ancestry. A defensible analysis therefore has to reconstruct founder state, check marker orientation, and separate the QTL from a batch-aligned nuisance peak._

marker_idchrpos_cM

m2_065 2 59.762431265596575

m2_103 2 94.52656615104739

m2_107 2 98.18761427503033

m2_079 2 72.20130244108847

m1_054 1 49.907510212292195

Marker identifiers, chromosomes, and genetic-map positions.

Case study 9

Population genetics: Parent-specific ancestry and recent admixture timing

Infer parent-specific ancestry proportions and recent admixture timing from phased local-ancestry tracts after repairing reciprocal artifacts and a chromosome-specific label inversion.

_Ancestry fractions and pulse times both change if reciprocal tract artifacts, chromosome-local label inversion, or map denominators are handled incorrectly._

chromhapstart_morganend_morganancposteriorlow_complexity_frac

chr1 h1 0.03 0.505 A 0.985 0.08

chr1 h1 0.505 0.535 B 0.62 0.92

chr1 h1 0.535 1.478849 A 0.985 0.08

chr1 h1 1.503727 1.852681 B 0.985 0.08

chr1 h1 1.852681 2.422373 A 0.985 0.08

Phased local-ancestry tracts with coordinates, ancestry labels, posterior values, and QC annotations.

Case study 10

Population genetics: Estimating selection from noisy ancient-DNA time series

Infer which of two haploid loci is under stronger positive selection from ancient allele-frequency time series while accounting for allele orientation, directional error, drift, and changing population size.

_Noisy ancient trajectories are not directly comparable until both loci are placed on the same derived-allele scale and the provided sample-level sequencing-error values are modeled directly._

generationalt_readstotal_readsseq_errorsample_year

6 36 40 0.16-4500

12 34 45 0.16-4278

18 41 55 0.16-4056

24 38 70 0.16-3833

30 36 90 0.16-3611

Read-count time series for locus A.

_Noisy ancient trajectories are not directly comparable until both loci are placed on the same derived-allele scale and the provided sample-level sequencing-error values are modeled directly._

Files provided to the model

generationalt_readstotal_readsseq_errorsample_year

6 36 40 0.16-4500

12 34 45 0.16-4278

18 41 55 0.16-4056

24 38 70 0.16-3833

30 36 90 0.16-3611

Read-count time series for locus A.

Back to ChatGPT updates
Save

More from ChatGPT

All updates

Comments

Sign in or join free to leave a comment.

No comments yet. Be the first.

Gemini komt eraan