RNA-seq Cell-Type Classification (PWDR) L1-520

Computational BiologySingle-cell / bulk RNA-seq transcript quantification with marker-gene-panel cell-type categorical readoutδ=5 · advancedL_DAG = 8.3📋 Stub — not mineable

📋

Unclaimed Principle — open for contribution

This Principle is declared in the catalog but has no reference solver, no pinned dataset, and is not registered on-chain. There is no reward pool. Submitting a cert against this Principle today will record the cert for reproducibility but pay zero PWM.

To claim it as a Bounty #7 contribution: open a PR adding (1) a reference solver, (2) ≥1 dataset pinned to IPFS, (3) updates to the L3 manifest with dataset CIDs. After verifier-agent triple-review, the founders' 3-of-5 multisig signs PWMRegistry.register() and the Principle becomes mineable.

Read the contribution guide →Open a claim issue →

Forward model E

S t a g e 1 (s ib l in g t o L 1 - 413) : r e a d s \to t r an scr i pt o m e a l i g nm e n t \to co u n t s \to lo g_{2} C P M or sc T r an s f or m r es i d u a l s a cr oss 20 k g e n es p er ce l l . S t a g e 2 : ce l l_{t} y p e = a r g ma x_{c} l a ssscor e (e x p r ess i o n v ec t or, ma r k er p an e l_{c} l a ss) p er C e l l M a r k er / P an g l a oD B .

RNA-seq Cell-Type Classification (PWDR): wraps RNA-seq alignment + transcript quantification core with canonical marker-gene-panel rules. Stage 1 (analytical, sibling to L1-413): align reads to transcriptome (HISAT2, STAR, kallisto, Salmon); estimate per-transcript abundance via EM (RSEM, Kallisto pseudoalignment) or Bayesian inference (Salmon variational); normalize to counts-per-million or apply scTransform/sctransform Pearson residuals. Stage 2 (deterministic threshold): per-cell argmax over marker-panel scores per CellMarker / PanglaoDB / Tabula Sapiens taxonomies. Difficulty tier delta = 5. Mismatch parameters: dropout_rate, batch_effect, doublet_contamination, ambient_rna_contamination, marker_panel_coverage_uncertainty, taxonomy_disagreement.

L-DAG

L.poly_a_capture -> L.reverse_transcription -> L.pcr_amplification -> L.sequencing -> L.transcriptome_alignment -> L.transcript_quantification -> L.normalization -> L.marker_panel_classifier -> int.cell

L.poly_a_captureL.reverse_transcriptionL.pcr_amplificationL.sequencingL.transcriptome_alignmentL.transcript_quantificationL.normalizationL.marker_panel_classifierint.cell

Well-posedness W

Existence:: true
Uniqueness:: conditional
Stability:: conditional
κ:: 200

Existence guaranteed within Omega bounds. Uniqueness conditional on adequate sequencing depth (typically >30k reads per cell for 3' chemistry) and adequate marker-panel coverage. Stability conditional with dropout_rate dominant for low-expressing markers; batch_effect dominant cross-sample; doublet_contamination dominant for high cell densities. Joint Hadamard well-posedness for the coupled RNA-seq + marker-panel-classifier forward established by Trapnell 2014 (foundational scRNA-seq), Macosko 2015 (Drop-seq), Stuart-Butler 2019 (Seurat v3 integration), Tabula Sapiens Consortium 2022, Zhang 2019 (CellMarker), Franzen 2019 (PanglaoDB).

Solvability C

Solver class:: linear-operator + statistical [Salmon EM transcript quantification + marker-panel argmax] | nonlinear [Seurat / Scanpy clustering then label-transfer] | linear-operator + deep neural [scVI variational, scANVI, scPhere]
Convergence rate q:: 1
Complexity:: O(N_reads * log(N_genes)) for alignment; O(N_cells * N_genes) for normalization + marker scoring; total alignment-dominated

Specs (0)

No L2 specs registered yet for this principle.