Slash / README.md
ND06-25's picture
Switch HF Space to Streamlit SDK
3669696
---
title: Slash
emoji: πŸŒ–
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.25.0
pinned: false
license: mit
short_description: 'An AI powered book summarizer '
---
# πŸ“š Book Summarizer AI
An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models.
## ✨ Features
- πŸ“š **PDF Text Extraction**: Advanced PDF processing with multiple extraction methods
- πŸ€– **AI-Powered Summarization**: Uses transformer models (BART, T5) for high-quality summaries
- 🌐 **Beautiful Web Interface**: Modern UI built with Streamlit
- ⚑ **FastAPI Backend**: Scalable and fast API for processing
- πŸ“ **Configurable Settings**: Adjust summary length, chunk size, and AI models
- πŸ“Š **Text Analysis**: Detailed statistics about book content
- πŸ’Ύ **Download Summaries**: Save summaries as text files
## πŸš€ Quick Start
### Option 1: Automated Setup (Recommended)
**Windows:**
```bash
# Double-click start.bat or run:
start.bat
```
**Unix/Linux/Mac:**
```bash
# Make script executable and run:
chmod +x start.sh
./start.sh
```
### Option 2: Manual Setup
1. **Install dependencies:**
```bash
pip install -r requirements.txt
```
2. **Download NLTK data:**
```python
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
```
3. **Start the FastAPI backend:**
```bash
uvicorn api.main:app --reload --port 8000
```
4. **Start the Streamlit frontend:**
```bash
streamlit run app.py
```
5. **Open your browser:**
- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
## πŸ“– Usage
1. **Upload PDF**: Select a PDF book file (max 50MB)
2. **Configure Settings**: Choose AI model and summary parameters
3. **Generate Summary**: Click "Generate Summary" and wait for processing
4. **Download Result**: Save your AI-generated summary
## πŸ› οΈ Technology Stack
### Frontend
- **Streamlit**: Modern web interface
- **Custom CSS**: Beautiful styling and responsive design
### Backend
- **FastAPI**: High-performance API framework
- **Uvicorn**: ASGI server for FastAPI
### AI & ML
- **Hugging Face Transformers**: State-of-the-art NLP models
- **PyTorch**: Deep learning framework
- **BART/T5 Models**: Pre-trained summarization models
### PDF Processing
- **PyPDF2**: PDF text extraction
- **pdfplumber**: Advanced PDF processing
- **NLTK**: Natural language processing
## πŸ“ Project Structure
```
book-summarizer/
β”œβ”€β”€ app.py # Streamlit frontend
β”œβ”€β”€ start.py # Automated startup script
β”œβ”€β”€ start.bat # Windows startup script
β”œβ”€β”€ start.sh # Unix/Linux/Mac startup script
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ __init__.py # API package
β”‚ β”œβ”€β”€ main.py # FastAPI backend
β”‚ β”œβ”€β”€ pdf_processor.py # PDF text extraction
β”‚ β”œβ”€β”€ summarizer.py # AI summarization logic
β”‚ └── utils.py # Utility functions
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # Project documentation
```
## βš™οΈ Configuration
### AI Models
- **facebook/bart-large-cnn**: Best quality, slower processing
- **t5-small**: Faster processing, good quality
- **facebook/bart-base**: Balanced performance
### Summary Settings
- **Max Length**: 50-500 words (default: 150)
- **Min Length**: 10-200 words (default: 50)
- **Chunk Size**: 500-2000 characters (default: 1000)
- **Overlap**: 50-200 characters (default: 100)
## πŸ”§ API Endpoints
- `GET /` - API information
- `GET /health` - Health check
- `POST /upload-pdf` - Validate PDF file
- `POST /extract-text` - Extract text from PDF
- `POST /summarize` - Generate book summary
- `GET /models` - List available AI models
- `POST /change-model` - Switch AI model
## πŸ“‹ Requirements
- **Python**: 3.8 or higher
- **Memory**: At least 4GB RAM (8GB recommended)
- **Storage**: 2GB free space for models
- **Internet**: Required for first-time model download
## πŸ› Troubleshooting
### Common Issues
1. **"Module not found" errors:**
```bash
pip install -r requirements.txt
```
2. **NLTK data missing:**
```python
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
```
3. **API connection failed:**
- Ensure FastAPI is running on port 8000
- Check firewall settings
- Verify no other service is using the port
4. **Large PDF processing slow:**
- Reduce chunk size in advanced settings
- Use a faster model (t5-small)
- Ensure sufficient RAM
5. **Model download issues:**
- Check internet connection
- Clear Hugging Face cache: `rm -rf ~/.cache/huggingface`
### Performance Tips
- **GPU Acceleration**: Install CUDA for faster processing
- **Model Selection**: Use smaller models for faster results
- **Chunk Size**: Smaller chunks = faster processing but may lose context
- **Memory**: Close other applications to free up RAM
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## πŸ“„ License
This project is open source and available under the MIT License.
## πŸ™ Acknowledgments
- Hugging Face for transformer models
- Streamlit for the web framework
- FastAPI for the backend framework
- The open-source community for various libraries
## πŸ“ž Support
For issues, questions, or feature requests:
1. Check the troubleshooting section
2. Open an issue on GitHub
---
**Happy summarizing! πŸ“šβœ¨**
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference