dragonllm-finance-models / docs /PERSISTENT_STORAGE_SETUP.md
jeanbaptdzd's picture
feat: Clean deployment to HuggingFace Space with model config test endpoint
8c0b652

πŸ—„οΈ Persistent Storage Setup for HuggingFace Spaces

🎯 Problem Solved: Model Storage

This setup prevents reloading models from the LinguaCustodia repository each time by using HuggingFace Spaces persistent storage.

πŸ“‹ Step-by-Step Setup

1. Enable Persistent Storage in Your Space

  1. Go to your Space: https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api
  2. Click "Settings" tab
  3. Scroll to "Storage" section
  4. Select a storage tier (recommended: 1GB or 5GB)
  5. Click "Save"

2. Update Your Space Files

Replace your current app.py with the persistent storage version:

# Copy the persistent storage app
cp persistent_storage_app.py app.py

3. Key Changes Made

Environment Variable Setup:

# CRITICAL: Set HF_HOME to persistent storage directory
os.environ["HF_HOME"] = "/data/.huggingface"

Pipeline with Cache Directory:

pipe = pipeline(
    "text-generation",
    model=model_id,
    token=hf_token_lc,
    dtype=torch_dtype,
    device_map="auto",
    trust_remote_code=True,
    # CRITICAL: Use persistent storage cache
    cache_dir=os.environ["HF_HOME"]
)

Storage Monitoring:

def get_storage_info() -> Dict[str, Any]:
    """Get information about persistent storage usage."""
    # Returns storage status, cache size, writable status

πŸ”§ How It Works

First Load (Cold Start):

  1. Model downloads from LinguaCustodia repository
  2. Model files cached to /data/.huggingface/
  3. Takes ~2-3 minutes (same as before)

Subsequent Loads (Warm Start):

  1. Model loads from local cache (/data/.huggingface/)
  2. Much faster - typically 30-60 seconds
  3. No network download needed

πŸ“Š Storage Information

The app now provides storage information via /health endpoint:

{
  "status": "healthy",
  "model_loaded": true,
  "storage_info": {
    "hf_home": "/data/.huggingface",
    "data_dir_exists": true,
    "data_dir_writable": true,
    "hf_cache_dir_exists": true,
    "hf_cache_dir_writable": true,
    "cache_size_mb": 1234.5
  }
}

πŸš€ Deployment Steps

1. Update Space Files

# Upload these files to your Space:
- app.py (use persistent_storage_app.py as base)
- requirements.txt (same as before)
- Dockerfile (same as before)
- README.md (same as before)

2. Enable Storage

  • Go to Space Settings
  • Enable persistent storage (1GB minimum)
  • Save settings

3. Deploy

  • Space will rebuild automatically
  • First load will be slow (downloading model)
  • Subsequent loads will be fast (using cache)

πŸ§ͺ Testing

Test Storage Setup:

# Check health endpoint for storage info
curl https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/health

Test Model Loading Speed:

  1. First request: Will be slow (downloading model)
  2. Second request: Should be much faster (using cache)

πŸ’‘ Benefits

  • βœ… Faster startup after first load
  • βœ… Reduced bandwidth usage
  • βœ… Better reliability (no network dependency for model loading)
  • βœ… Cost savings (faster inference = less compute time)
  • βœ… Storage monitoring (see cache size and status)

🚨 Important Notes

  • Storage costs: ~$0.10/GB/month
  • Cache size: ~1-2GB for 8B models
  • First load: Still takes 2-3 minutes (downloading)
  • Subsequent loads: 30-60 seconds (from cache)

πŸ”— Files to Update

  1. app.py - Use persistent_storage_app.py as base
  2. Space Settings - Enable persistent storage
  3. Test scripts - Update URLs if needed

🎯 Result: Models will be cached locally, dramatically reducing load times after the first deployment!