- an events table in MEDS (Medical Event Data Standard) format (one row per clinical event) used to create embeddings, and
- a labels table (one row per patient with task targets) used to record the clinical outcomes for the classifiers.
This example uses the MIMIC-IV Clinical Database Demo (a reduced, publicly available subset for teaching and development—not the full MIMIC-IV database) in MEDS format.
1. Events data (MEDS event stream)
The events file is a long table of timestamped clinical events (e.g. diagnoses, medications, labs). Although the MIMIC-IV Clinical Database Demo parquet file contains other columns,prep_mimic_demo_data.py reshapes it to focus only those listed below for this demo. The final demo parquet file contains ~916k rows for 100 subjects.
Events Data Preview (example rows):
| subject_id | time | code | table | value |
|---|---|---|---|---|
| 10000032 | 2022-01-15 08:00:00 | ICD10:I10 | condition | — |
| 10000032 | 2022-01-15 09:30:00 | LOINC:2093-3 | lab | 145.2 |
| 10001217 | 2022-02-01 14:00:00 | RxNorm:861004 | medication | — |
| Column | Description |
|---|---|
subject_id | Patient identifier; links events table to the labels table by patient. |
time | When the event occurred (used for ordering events and censoring them from the model). |
code | Clinical code (e.g. ICD10:, RxNorm:, LOINC:); the primary categorical descriptor of the measurement (e.g., the performed laboratory test or recorded diagnosis). |
table | Modality of the row’s event data: condition, medication, lab, or procedure (used by the model’s text serialization). |
value | Optional numeric value (e.g. lab result); may be null. |
Note that the table column is derived from the code column in this demo, using
More information about these and other columns in the MEDS format schemas can be found here.A collection of ETLs from common data formats, including OMOP, MIMIC-IV, and MEDS Unsorted can be found here.
prep_mimic_demo_data.py. More information about these and other columns in the MEDS format schemas can be found here.A collection of ETLs from common data formats, including OMOP, MIMIC-IV, and MEDS Unsorted can be found here.
2. Labels data (one row per subject)
The labels file contains columns recording the clinical outcomes for the four demo tasks we will predict, per subject. This will be used for training the classifiers and evaluating test prediction metrics.Outcome labels and
prediction_time in this demo are artificially generated (see data/README.md for methodology); they are not directly derived from events data and are for demonstration purposes only.In a real use case, outcome labels should be directly derived from an events dataset, and prediction_time should be set as the timestamp strictly before the event that defines the label.| subject_id | prediction_time | readmission_risk | phenotype_class | overall_survival_months | event_observed |
|---|---|---|---|---|---|
| 10000032 | 2022-06-15 12:00:00 | 0 | 0 | 68.1 | 1 |
| 10001217 | 2022-06-15 12:00:00 | 0 | 2 | 45.8 | 1 |
| 10002428 | 2022-06-15 12:00:00 | 1 | 1 | 35.5 | 1 |
| Column | Description |
|---|---|
subject_id | Must match the events table order; one row per patient. |
prediction_time | As-of time for the prediction (inclusive endpoint of data used). The model uses this time as the cutoff when considering events to build embeddings. |
readmission_risk | Binary (0/1) for Task A, a classifier to predict subject readmission to the hospital. |
phenotype_class | Integer (e.g. 0–3) for Task B, a phenotype classifier to predict cancer staging progression. |
overall_survival_months | Numeric; for Task C, a regression model to predict months survival for subjects who died, and Task D, a Cox survival prediction. |
event_observed | 1 if the outcome of death was observed; used in Task D. |
Running the demo via AI coding agent
Copy and paste the prompt below into your coding assistant of choice (i.e., Claude Code, Codex, Gemini). Runtime from prompt through model download, inference, and classifier train/test to results should take ~5m.Install the model, download the dummy data, and run the demo.
Running the demo manually
Runningdemo.py loads both pre-configured data files from the quickstart repo (main branch) into memory, then:
- Events → Embeddings: For each patient, it serializes that patient’s events (using
subject_id,time,code,table,valuefrom the events data) into a token stream withsmb_utilsup toprediction_timefrom the labels file. The script then runs the Standard Model, and takes the last-token hidden state as the patient embedding. - Embeddings + Labels → Predictive Task Performance Metrics: After extracting embeddings,
demo.pytrains four separate task heads designed to predict A) binary hospital readmission, B) multiclass classification of cancer staging, C) continuous survival regression in months, and D) Cox survival. Each predictive output layer is trained on an 80/20 train/test split of the embeddings and labels. ROC-AUC, accuracy, mean absolute error, and C-index are provided for each of these tasks, respectively.
Next Steps
Use the model on your own data
Learn more about the Standard Model
Contact Us
Having trouble, or just want to talk about your project? Send an email to erik@standardmodel.bioAttribution: Events are from the MIMIC-IV Clinical Database Demo converted to MEDSvan de Water et al. (2025). MIMIC-IV demo data in the Medical Event Data Standard (MEDS) (version 0.0.1). PhysioNet. RRID:SCR_007345.Licensed under the Open Database License (ODbL). Redistribution must be under ODbL (share-alike); do not apply technical measures that restrict reuse.Predictive task labels are artificially generated.

