Spaces:

jeanbaptdzd
/

dragonllm-finance-models

Runtime error

App Files Files Community

dragonllm-finance-models / docs /PERSISTENT_STORAGE_SETUP.md

jeanbaptdzd

feat: Clean deployment to HuggingFace Space with model config test endpoint

8c0b652 2 months ago

preview code

raw

history blame contribute delete

3.81 kB

🗄️ Persistent Storage Setup for HuggingFace Spaces

🎯 Problem Solved: Model Storage

This setup prevents reloading models from the LinguaCustodia repository each time by using HuggingFace Spaces persistent storage.

📋 Step-by-Step Setup

1. Enable Persistent Storage in Your Space

Go to your Space: https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api
Click "Settings" tab
Scroll to "Storage" section
Select a storage tier (recommended: 1GB or 5GB)
Click "Save"

2. Update Your Space Files

Replace your current app.py with the persistent storage version:

# Copy the persistent storage app
cp persistent_storage_app.py app.py

3. Key Changes Made

Environment Variable Setup:

# CRITICAL: Set HF_HOME to persistent storage directory
os.environ["HF_HOME"] = "/data/.huggingface"

Pipeline with Cache Directory:

pipe = pipeline(
    "text-generation",
    model=model_id,
    token=hf_token_lc,
    dtype=torch_dtype,
    device_map="auto",
    trust_remote_code=True,
    # CRITICAL: Use persistent storage cache
    cache_dir=os.environ["HF_HOME"]
)

Storage Monitoring:

def get_storage_info() -> Dict[str, Any]:
    """Get information about persistent storage usage."""
    # Returns storage status, cache size, writable status

🔧 How It Works

First Load (Cold Start):

Model downloads from LinguaCustodia repository
Model files cached to /data/.huggingface/
Takes ~2-3 minutes (same as before)

Subsequent Loads (Warm Start):

Model loads from local cache (/data/.huggingface/)
Much faster - typically 30-60 seconds
No network download needed

📊 Storage Information

The app now provides storage information via /health endpoint:

{
  "status": "healthy",
  "model_loaded": true,
  "storage_info": {
    "hf_home": "/data/.huggingface",
    "data_dir_exists": true,
    "data_dir_writable": true,
    "hf_cache_dir_exists": true,
    "hf_cache_dir_writable": true,
    "cache_size_mb": 1234.5
  }
}

🚀 Deployment Steps

1. Update Space Files

# Upload these files to your Space:
- app.py (use persistent_storage_app.py as base)
- requirements.txt (same as before)
- Dockerfile (same as before)
- README.md (same as before)

2. Enable Storage

Go to Space Settings
Enable persistent storage (1GB minimum)
Save settings

3. Deploy

Space will rebuild automatically
First load will be slow (downloading model)
Subsequent loads will be fast (using cache)

🧪 Testing

Test Storage Setup:

# Check health endpoint for storage info
curl https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/health

Test Model Loading Speed:

First request: Will be slow (downloading model)
Second request: Should be much faster (using cache)

💡 Benefits

✅ Faster startup after first load
✅ Reduced bandwidth usage
✅ Better reliability (no network dependency for model loading)
✅ Cost savings (faster inference = less compute time)
✅ Storage monitoring (see cache size and status)

🚨 Important Notes

Storage costs: ~$0.10/GB/month
Cache size: ~1-2GB for 8B models
First load: Still takes 2-3 minutes (downloading)
Subsequent loads: 30-60 seconds (from cache)

🔗 Files to Update

app.py - Use persistent_storage_app.py as base
Space Settings - Enable persistent storage
Test scripts - Update URLs if needed

🎯 Result: Models will be cached locally, dramatically reducing load times after the first deployment!