Quickstart (Recommended)
bash -c "$( curl -fsSL https://docs.standardmodel.bio/quickstart.sh)"
This script will:
Create a Python 3.10 virtual environment named standard_model.
Install PyTorch with CUDA support (if available).
Install HuggingFace libraries (transformers, datasets, accelerate).
Install smb-biopan-utils.
Download the SMB-v1-Structure model to your machine.
Manual Installation - Prerequisites
GPU support is strongly recommended. SMB-v1-Structure has 1.7B parameters and requires approximately 16GB GPU memory for inference.
The following are required to run the Standard Model:
Python 3.10+ — Required for running the models
pip — Python package manager
PyTorch — Deep learning framework for model inference
HuggingFace Ecosystem (transformers, datasets, accelerate, huggingface_hub) — Model loading, data handling, and inference utilities
pandas — Data manipulation and analysis
smb-biopan-utils — Standard Model Bio utilities package
CUDA (recommended) — For GPU acceleration with NVIDIA GPUs
Git — For cloning repositories
Manual Installation - Environment Setup
If you prefer to set up your environment manually:
Create Virtual Environment
python3 -m venv standard_model
source standard_model/bin/activate
Install PyTorch
CUDA 12.x
CUDA 11.x
CPU Only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Install Dependencies
pip install transformers datasets accelerate huggingface_hub pandas
pip install git+https://github.com/standardmodelbio/smb-biopan-utils.git
Download SMB-v1-Structure
from huggingface_hub import snapshot_download
snapshot_download( "standardmodelbio/SMB-v1-1.7B-Structure" )
Verify Your Installation
After setup, activate your environment for usage:
source standard_model/bin/activate
Verify that everything is working correctly:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Check PyTorch and CUDA
print ( f "PyTorch version: { torch. __version__ } " )
print ( f "CUDA available: { torch.cuda.is_available() } " )
# Load SMB-v1-Structure
model_id = "standardmodelbio/SMB-v1-1.7B-Structure"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code = True ,
device_map = "auto"
)
print ( "SMB-v1-Structure loaded successfully!" )
print ( f "Model parameters: { sum (p.numel() for p in model.parameters()) :,} " )
Troubleshooting
CUDA Not Detected Ensure NVIDIA drivers are up to date. Run nvidia-smi to verify GPU is accessible.
Out of Memory Reduce memory use via torch.float16 or quantization (see below).
Model Access Denied Some models may require authentication. Run huggingface-cli login with your token.
Slow Download Model downloads can be large (several GB). Ensure stable connection and sufficient disk space.
Memory optimization
For large cohorts and/or limited GPU memory, use half-precision or quantization:
Float16
8-bit Quantization
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code = True ,
torch_dtype = torch.float16,
device_map = "auto"
)
Memory: ~8GBNext Steps model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code = True ,
load_in_8bit = True ,
device_map = "auto"
)
Memory: ~4GB