Protein language models such as ESM-2, ProtBERT, and ProtTrans learn representations of protein sequences from large collections of proteins. Their embeddings can capture sequence similarity, family structure, and some biochemical context.
flowchart LR
A["Protein sequence"] --> B["Protein language model"]
B --> C["Embedding vector"]
C --> D["Clusters and features"]
D --> E["Candidate context"]
Protein embeddings provide a reusable feature layer for candidate proteins or peptides. They can help cluster candidates, compare sequence families, or feed downstream ranking models.
This example uses protein model scores as contextual features only. They do not prove mechanism or efficacy.