Skip to main content
bash -c "$(curl -fsSL https://docs.standardmodel.bio/quickstart.sh)"
This script will:
  1. Create a Python 3.10 virtual environment named standard_model.
  2. Install PyTorch with CUDA support (if available).
  3. Install HuggingFace libraries (transformers, datasets, accelerate).
  4. Install smb-biopan-utils.
  5. Download the SMB-v1-Structure model to your machine.
After running the quickstart script, skip ahead to Verify Your Installation to confirm everything is working.

Manual Installation - Prerequisites

GPU support is strongly recommended. SMB-v1-Structure has 1.7B parameters and requires approximately 16GB GPU memory for inference.
The following are required to run the Standard Model:
  • Python 3.10+ — Required for running the models
  • pip — Python package manager
  • PyTorch — Deep learning framework for model inference
  • HuggingFace Ecosystem (transformers, datasets, accelerate, huggingface_hub) — Model loading, data handling, and inference utilities
  • pandas — Data manipulation and analysis
  • smb-biopan-utils — Standard Model Bio utilities package
  • CUDA (recommended) — For GPU acceleration with NVIDIA GPUs
  • Git — For cloning repositories

Manual Installation - Environment Setup

If you prefer to set up your environment manually:
1

Create Virtual Environment

python3 -m venv standard_model
source standard_model/bin/activate
2

Install PyTorch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
3

Install Dependencies

pip install transformers datasets accelerate huggingface_hub pandas
pip install git+https://github.com/standardmodelbio/smb-biopan-utils.git
4

Download SMB-v1-Structure

from huggingface_hub import snapshot_download

snapshot_download("standardmodelbio/SMB-v1-1.7B-Structure")

Verify Your Installation

After setup, activate your environment for usage:
source standard_model/bin/activate
Verify that everything is working correctly:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Check PyTorch and CUDA
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Load SMB-v1-Structure
model_id = "standardmodelbio/SMB-v1-1.7B-Structure"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    device_map="auto"
)

print("SMB-v1-Structure loaded successfully!")
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

Troubleshooting

CUDA Not Detected

Ensure NVIDIA drivers are up to date. Run nvidia-smi to verify GPU is accessible.

Out of Memory

Reduce memory use via torch.float16 or quantization (see below).

Model Access Denied

Some models may require authentication. Run huggingface-cli login with your token.

Slow Download

Model downloads can be large (several GB). Ensure stable connection and sufficient disk space.

Memory optimization

For large cohorts and/or limited GPU memory, use half-precision or quantization:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)
Memory: ~8GB

Next Steps