More about the Standard Model - Standard Model Bio

SMB-v1-Structure is our flagship biological world model. Unlike traditional LLMs that predict tokens, it predicts patient states in latent space, modeling how patients evolve over time and respond to interventions.

Read more about our modeling approach and benchmarks at our blog.

Key Differentiators

State Prediction

Predicts future patient states in latent space, not text tokens

Causal Learning

Learns cause-and-effect: (Pre-State + Intervention) → Post-State

Multimodal Fusion

Unifies genomics, imaging, EHR, and proteomics

Environment Activation

source standard_model/bin/activate

Usage

from transformers import AutoModel, AutoTokenizer
import torch

# Load model
model = AutoModel.from_pretrained("standardmodelbio/SMB-v1-1.7B-Structure")
tokenizer = AutoTokenizer.from_pretrained("standardmodelbio/SMB-v1-1.7B-Structure")

# Move to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

Architecture

The Standard Model uses Joint-Embedding Predictive Architecture (JEPA) — treating the patient as a dynamic “world” and treatments as interventions that change that world.

How It Works

Modality Ingestion

Raw signals — genomics, proteomics, imaging, EHR data — pass through modality-specific encoders. Each encoder is trained to extract meaningful representations from its data type.

Fusion Layer

A specialized projector maps these encodings into a universal latent space. This creates a “fused” patient state embedding that retains both high-level semantic context and low-level biological granularity.

State Prediction

Given the current patient state S(t) and an intervention A(t), the model predicts the future state S(t+1) in latent space — not as text, but as a dense embedding.

Hybrid Optimization

The model combines supervised fine-tuning (anchoring to clinical outcomes) with JEPA objectives (learning dynamics), preventing training collapse.

Extracting Embeddings

Get patient state embeddings for downstream tasks:

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("standardmodelbio/SMB-v1-1.7B-Structure")
tokenizer = AutoTokenizer.from_pretrained("standardmodelbio/SMB-v1-1.7B-Structure")

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Prepare input
inputs = tokenizer(
    "patient clinical data here",
    return_tensors="pt",
    padding=True,
    truncation=True
).to(device)

# Extract embeddings
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state
    
    # Pool to get single patient embedding
    patient_embedding = embeddings.mean(dim=1)  # [batch, hidden_dim]

print(f"Embedding shape: {patient_embedding.shape}")

Use Cases

Treatment Simulation

Simulate how a tumor would evolve under Treatment A versus Treatment B by conditioning on different interventions.

Trajectory Prediction

Predict disease progression over 3, 6, or 12 month windows.

Digital Twins

Create evolving patient representations that update as new data arrives.

Response Prediction

Model probability of response to specific therapies.

Memory Optimization

SMB-v1-Structure requires 16GB GPU memory at full precision. Memory usage can be reduced in several ways:

Float16
8-bit Quantization
4-bit Quantization

model = AutoModel.from_pretrained(
    "standardmodelbio/SMB-v1-1.7B-Structure",
    torch_dtype=torch.float16,
    device_map="auto"
)

Memory: ~8GB

# pip install bitsandbytes
model = AutoModel.from_pretrained(
    "standardmodelbio/SMB-v1-1.7B-Structure",
    load_in_8bit=True,
    device_map="auto"
)

Memory: ~4GB

# pip install bitsandbytes
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

model = AutoModel.from_pretrained(
    "standardmodelbio/SMB-v1-1.7B-Structure",
    quantization_config=quantization_config,
    device_map="auto"
)

Memory: ~2GB

Hardware Requirements

Precision	GPU Memory	Recommended GPU
float32	16 GB	A100, A6000
float16	8 GB	RTX 4090, A10
8-bit	4 GB	RTX 3080, T4
4-bit	2 GB	RTX 3060

​Key Differentiators

State Prediction

Causal Learning

Multimodal Fusion

​Environment Activation

​Usage

​Architecture

​How It Works

​Extracting Embeddings

​Use Cases

Treatment Simulation

Trajectory Prediction

Digital Twins

Response Prediction

​Memory Optimization

​Hardware Requirements

Key Differentiators

Environment Activation

Usage

Architecture

How It Works

Extracting Embeddings

Use Cases

Memory Optimization

Hardware Requirements