|
|
--- |
|
|
title: Slash |
|
|
emoji: π |
|
|
colorFrom: purple |
|
|
colorTo: gray |
|
|
sdk: streamlit |
|
|
sdk_version: 1.25.0 |
|
|
pinned: false |
|
|
license: mit |
|
|
short_description: 'An AI powered book summarizer ' |
|
|
--- |
|
|
|
|
|
# π Book Summarizer AI |
|
|
|
|
|
An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models. |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
- π **PDF Text Extraction**: Advanced PDF processing with multiple extraction methods |
|
|
- π€ **AI-Powered Summarization**: Uses transformer models (BART, T5) for high-quality summaries |
|
|
- π **Beautiful Web Interface**: Modern UI built with Streamlit |
|
|
- β‘ **FastAPI Backend**: Scalable and fast API for processing |
|
|
- π **Configurable Settings**: Adjust summary length, chunk size, and AI models |
|
|
- π **Text Analysis**: Detailed statistics about book content |
|
|
- πΎ **Download Summaries**: Save summaries as text files |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Option 1: Automated Setup (Recommended) |
|
|
|
|
|
**Windows:** |
|
|
```bash |
|
|
# Double-click start.bat or run: |
|
|
start.bat |
|
|
``` |
|
|
|
|
|
**Unix/Linux/Mac:** |
|
|
```bash |
|
|
# Make script executable and run: |
|
|
chmod +x start.sh |
|
|
./start.sh |
|
|
``` |
|
|
|
|
|
### Option 2: Manual Setup |
|
|
|
|
|
1. **Install dependencies:** |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. **Download NLTK data:** |
|
|
```python |
|
|
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')" |
|
|
``` |
|
|
|
|
|
3. **Start the FastAPI backend:** |
|
|
```bash |
|
|
uvicorn api.main:app --reload --port 8000 |
|
|
``` |
|
|
|
|
|
4. **Start the Streamlit frontend:** |
|
|
```bash |
|
|
streamlit run app.py |
|
|
``` |
|
|
|
|
|
5. **Open your browser:** |
|
|
- Frontend: http://localhost:8501 |
|
|
- API Docs: http://localhost:8000/docs |
|
|
|
|
|
## π Usage |
|
|
|
|
|
1. **Upload PDF**: Select a PDF book file (max 50MB) |
|
|
2. **Configure Settings**: Choose AI model and summary parameters |
|
|
3. **Generate Summary**: Click "Generate Summary" and wait for processing |
|
|
4. **Download Result**: Save your AI-generated summary |
|
|
|
|
|
## π οΈ Technology Stack |
|
|
|
|
|
### Frontend |
|
|
- **Streamlit**: Modern web interface |
|
|
- **Custom CSS**: Beautiful styling and responsive design |
|
|
|
|
|
### Backend |
|
|
- **FastAPI**: High-performance API framework |
|
|
- **Uvicorn**: ASGI server for FastAPI |
|
|
|
|
|
### AI & ML |
|
|
- **Hugging Face Transformers**: State-of-the-art NLP models |
|
|
- **PyTorch**: Deep learning framework |
|
|
- **BART/T5 Models**: Pre-trained summarization models |
|
|
|
|
|
### PDF Processing |
|
|
- **PyPDF2**: PDF text extraction |
|
|
- **pdfplumber**: Advanced PDF processing |
|
|
- **NLTK**: Natural language processing |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
book-summarizer/ |
|
|
βββ app.py # Streamlit frontend |
|
|
βββ start.py # Automated startup script |
|
|
βββ start.bat # Windows startup script |
|
|
βββ start.sh # Unix/Linux/Mac startup script |
|
|
βββ api/ |
|
|
β βββ __init__.py # API package |
|
|
β βββ main.py # FastAPI backend |
|
|
β βββ pdf_processor.py # PDF text extraction |
|
|
β βββ summarizer.py # AI summarization logic |
|
|
β βββ utils.py # Utility functions |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ README.md # Project documentation |
|
|
``` |
|
|
|
|
|
## βοΈ Configuration |
|
|
|
|
|
### AI Models |
|
|
- **facebook/bart-large-cnn**: Best quality, slower processing |
|
|
- **t5-small**: Faster processing, good quality |
|
|
- **facebook/bart-base**: Balanced performance |
|
|
|
|
|
### Summary Settings |
|
|
- **Max Length**: 50-500 words (default: 150) |
|
|
- **Min Length**: 10-200 words (default: 50) |
|
|
- **Chunk Size**: 500-2000 characters (default: 1000) |
|
|
- **Overlap**: 50-200 characters (default: 100) |
|
|
|
|
|
## π§ API Endpoints |
|
|
|
|
|
- `GET /` - API information |
|
|
- `GET /health` - Health check |
|
|
- `POST /upload-pdf` - Validate PDF file |
|
|
- `POST /extract-text` - Extract text from PDF |
|
|
- `POST /summarize` - Generate book summary |
|
|
- `GET /models` - List available AI models |
|
|
- `POST /change-model` - Switch AI model |
|
|
|
|
|
## π Requirements |
|
|
|
|
|
- **Python**: 3.8 or higher |
|
|
- **Memory**: At least 4GB RAM (8GB recommended) |
|
|
- **Storage**: 2GB free space for models |
|
|
- **Internet**: Required for first-time model download |
|
|
|
|
|
## π Troubleshooting |
|
|
|
|
|
### Common Issues |
|
|
|
|
|
1. **"Module not found" errors:** |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. **NLTK data missing:** |
|
|
```python |
|
|
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')" |
|
|
``` |
|
|
|
|
|
3. **API connection failed:** |
|
|
- Ensure FastAPI is running on port 8000 |
|
|
- Check firewall settings |
|
|
- Verify no other service is using the port |
|
|
|
|
|
4. **Large PDF processing slow:** |
|
|
- Reduce chunk size in advanced settings |
|
|
- Use a faster model (t5-small) |
|
|
- Ensure sufficient RAM |
|
|
|
|
|
5. **Model download issues:** |
|
|
- Check internet connection |
|
|
- Clear Hugging Face cache: `rm -rf ~/.cache/huggingface` |
|
|
|
|
|
### Performance Tips |
|
|
|
|
|
- **GPU Acceleration**: Install CUDA for faster processing |
|
|
- **Model Selection**: Use smaller models for faster results |
|
|
- **Chunk Size**: Smaller chunks = faster processing but may lose context |
|
|
- **Memory**: Close other applications to free up RAM |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
1. Fork the repository |
|
|
2. Create a feature branch |
|
|
3. Make your changes |
|
|
4. Add tests if applicable |
|
|
5. Submit a pull request |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is open source and available under the MIT License. |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- Hugging Face for transformer models |
|
|
- Streamlit for the web framework |
|
|
- FastAPI for the backend framework |
|
|
- The open-source community for various libraries |
|
|
|
|
|
## π Support |
|
|
|
|
|
For issues, questions, or feature requests: |
|
|
1. Check the troubleshooting section |
|
|
2. Open an issue on GitHub |
|
|
|
|
|
--- |
|
|
|
|
|
**Happy summarizing! πβ¨** |
|
|
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|