Model:
Utility:
Data Format: MEDS (Medical Event Data Standard)
standardmodelbio/SMB-v1-1.7B-StructureUtility:
smb-biopan-utilsData Format: MEDS (Medical Event Data Standard)
Environment Activation
Create Dummy MEDS Data
MEDS (Medical Event Data Standard) represents patient data as timestamped clinical events. Each row contains a subject ID, timestamp, clinical code, table type, and optional value.Load Model and Tokenizer
Process Patient Data
Useprocess_ehr_info from smb-biopan-utils to convert MEDS data into the structured text format expected by SMB-v1.
<conditions>, <procedures>, <medications>, and <labs>.
Extract Embeddings
Pooling Strategies
The raw output has shape[batch, sequence_length, hidden_dim]. Pool to get a single vector per patient:
- Last Token
- Mean Pooling
- Max Pooling
Use the last token’s representation (common for causal LMs):
Batch Processing Multiple Patients
Process multiple patients efficiently:Saving Embeddings
Save embeddings for downstream tasks:Memory Optimization
For large cohorts or limited GPU memory:- Float16
- 8-bit Quantization
