Spaces:
Runtime error
Runtime error
ποΈ Persistent Storage Setup for HuggingFace Spaces
π― Problem Solved: Model Storage
This setup prevents reloading models from the LinguaCustodia repository each time by using HuggingFace Spaces persistent storage.
π Step-by-Step Setup
1. Enable Persistent Storage in Your Space
- Go to your Space: https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api
- Click "Settings" tab
- Scroll to "Storage" section
- Select a storage tier (recommended: 1GB or 5GB)
- Click "Save"
2. Update Your Space Files
Replace your current app.py with the persistent storage version:
# Copy the persistent storage app
cp persistent_storage_app.py app.py
3. Key Changes Made
Environment Variable Setup:
# CRITICAL: Set HF_HOME to persistent storage directory
os.environ["HF_HOME"] = "/data/.huggingface"
Pipeline with Cache Directory:
pipe = pipeline(
"text-generation",
model=model_id,
token=hf_token_lc,
dtype=torch_dtype,
device_map="auto",
trust_remote_code=True,
# CRITICAL: Use persistent storage cache
cache_dir=os.environ["HF_HOME"]
)
Storage Monitoring:
def get_storage_info() -> Dict[str, Any]:
"""Get information about persistent storage usage."""
# Returns storage status, cache size, writable status
π§ How It Works
First Load (Cold Start):
- Model downloads from LinguaCustodia repository
- Model files cached to
/data/.huggingface/ - Takes ~2-3 minutes (same as before)
Subsequent Loads (Warm Start):
- Model loads from local cache (
/data/.huggingface/) - Much faster - typically 30-60 seconds
- No network download needed
π Storage Information
The app now provides storage information via /health endpoint:
{
"status": "healthy",
"model_loaded": true,
"storage_info": {
"hf_home": "/data/.huggingface",
"data_dir_exists": true,
"data_dir_writable": true,
"hf_cache_dir_exists": true,
"hf_cache_dir_writable": true,
"cache_size_mb": 1234.5
}
}
π Deployment Steps
1. Update Space Files
# Upload these files to your Space:
- app.py (use persistent_storage_app.py as base)
- requirements.txt (same as before)
- Dockerfile (same as before)
- README.md (same as before)
2. Enable Storage
- Go to Space Settings
- Enable persistent storage (1GB minimum)
- Save settings
3. Deploy
- Space will rebuild automatically
- First load will be slow (downloading model)
- Subsequent loads will be fast (using cache)
π§ͺ Testing
Test Storage Setup:
# Check health endpoint for storage info
curl https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/health
Test Model Loading Speed:
- First request: Will be slow (downloading model)
- Second request: Should be much faster (using cache)
π‘ Benefits
- β Faster startup after first load
- β Reduced bandwidth usage
- β Better reliability (no network dependency for model loading)
- β Cost savings (faster inference = less compute time)
- β Storage monitoring (see cache size and status)
π¨ Important Notes
- Storage costs: ~$0.10/GB/month
- Cache size: ~1-2GB for 8B models
- First load: Still takes 2-3 minutes (downloading)
- Subsequent loads: 30-60 seconds (from cache)
π Files to Update
app.py- Usepersistent_storage_app.pyas base- Space Settings - Enable persistent storage
- Test scripts - Update URLs if needed
π― Result: Models will be cached locally, dramatically reducing load times after the first deployment!