Scientific Review And Roadmap

Director Review: HTS_IL17_Psoriasis

Executive Assessment

This is a credible staff-scientist portfolio project because it connects screening data, disease biology, proteomics, protein model features, structure context, FAIR workflow design, and an app-facing interpretation layer. The strongest aspect is not the compact ranking itself; it is the architecture for turning heterogeneous discovery evidence into a transparent decision record.

The project should keep emphasizing one point clearly: this is a reusable workflow pattern. Psoriasis / IL-17 is the demo case.

What Is Strong

The project includes an HTS-like evidence layer rather than only omics.
The scoring model is transparent and inspectable.
The evidence-card module is appropriately grounded in workflow outputs.
The documentation explains complex topics for mixed computational and biology audiences.
The app makes the output reviewable by non-pipeline users.
The Nextflow workflow has local and AWS-oriented profiles.

Scientific Limitations To State Clearly

PubChem ROR gamma qHTS is pathway-proximal. It supports Th17 / IL-17 biology but is not a direct IL-17 peptide assay.
The current demo data are compact, candidate-level summaries shaped like public datasets.
IL17A ranking highly is biologically intuitive, but the demo assay evidence for IL17A is contextual rather than direct.
RORC ranking highly is assay-supported, but protein-level validation is weaker in the demo and should be flagged.
TNF is a broad inflammatory node; a high disease/proteomics signal does not make it a specific IL-17-pathway candidate.
Protein language model and structure scores are hypothesis-generating features, not validated predictors.

Actionable Biological Insights

IL17A is the strongest biology-positive control: it should rank high because disease expression, proteomics, and T-cell context align. The next validation should test whether candidate perturbations reduce IL-17-driven keratinocyte inflammatory outputs.
RORC is the strongest screening-positive control: it should rank high from qHTS/counterscreen evidence but remain caveated because it is upstream and protein evidence is weaker. The next validation should confirm ROR gamma pathway modulation and check specificity.
IL23R is a translationally useful intermediate candidate: it links upstream immune signaling to Th17 maintenance and should be evaluated for cell-type specificity.
STAT3 is mechanistically relevant but broad. Treat it as a pathway node requiring selectivity checks, not a clean target nomination.
TNF should demonstrate the value of counterscreens and specificity penalties: disease evidence can be strong while screen selectivity is poor.

Sequential Improvement Plan

v0.2 - Replace Example Tables With Public Retrieval

Implement PubChem PUG-REST retrieval for AID 2604 and AID 2546.
Add ChEMBL API retrieval for RORC/IL-17 pathway assay context.
Store raw downloads under a user-specified cache path, not in git.
Record checksums and retrieval dates in provenance.

v0.3 - Real Disease Omics

Import processed GSE54456 expression/count data.
Add differential expression with a documented contrast.
Add pathway enrichment for IL-17 signaling, cytokine signaling, keratinocyte activation, and Th17 differentiation.
Add an independent validation dataset before expanding biological claims.

v0.4 - Proteomics And Single-Cell Context

Import processed PXD021673 protein quantification tables.
Add RNA/protein concordance flags.
Import processed GSE162183 cell-type markers or pseudobulk summaries.
Add a warning when a candidate is broadly expressed rather than cell-type enriched.

v0.5 - Protein Models And Structure

Add real UniProt sequence retrieval.
Add optional ESM-2 or ProtBERT embeddings with cached model metadata.
Add AlphaFold DB lookup for candidate proteins.
Keep ESMFold or heavier structure prediction optional and top-candidate-only.

v0.6 - Grounded Evidence Summary Mode

Replace deterministic evidence-card templates with an optional reviewed model-backed summarizer.
Require all summary sentences to cite workflow table IDs or accessions.
Add a validation test that rejects evidence cards with missing citations.
Keep deterministic template mode as the default for reproducibility.

Staff Scientist Bar For The Next Version

The next version should answer this review question:

If I only had enough budget to validate two candidates, which two should I choose, what experiment should I run first, and what evidence would make me stop?

The current demo begins to answer that by nominating IL17A and RORC for different reasons: IL17A as disease-biology aligned, RORC as screen-supported and pathway-proximal. The next version should make that tradeoff quantitative and explicit.