Spaces:

ND06-25
/

Slash

Sleeping

App Files Files Community

Slash / README.md

ND06-25

Switch HF Space to Streamlit SDK

3669696 10 days ago

preview code

raw

history blame contribute delete

5.62 kB

	---
	title: Slash
	emoji: 🌖
	colorFrom: purple
	colorTo: gray
	sdk: streamlit
	sdk_version: 1.25.0
	pinned: false
	license: mit
	short_description: 'An AI powered book summarizer '
	---

	# 📚 Book Summarizer AI

	An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models.

	## ✨ Features

	- 📚 PDF Text Extraction: Advanced PDF processing with multiple extraction methods
	- 🤖 AI-Powered Summarization: Uses transformer models (BART, T5) for high-quality summaries
	- 🌐 Beautiful Web Interface: Modern UI built with Streamlit
	- ⚡ FastAPI Backend: Scalable and fast API for processing
	- 📝 Configurable Settings: Adjust summary length, chunk size, and AI models
	- 📊 Text Analysis: Detailed statistics about book content
	- 💾 Download Summaries: Save summaries as text files

	## 🚀 Quick Start

	### Option 1: Automated Setup (Recommended)

	Windows:
	```bash
	# Double-click start.bat or run:
	start.bat
	```

	Unix/Linux/Mac:
	```bash
	# Make script executable and run:
	chmod +x start.sh
	./start.sh
	```

	### Option 2: Manual Setup

	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Download NLTK data:
	```python
	python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
	```

	3. Start the FastAPI backend:
	```bash
	uvicorn api.main:app --reload --port 8000
	```

	4. Start the Streamlit frontend:
	```bash
	streamlit run app.py
	```

	5. Open your browser:
	- Frontend: http://localhost:8501
	- API Docs: http://localhost:8000/docs

	## 📖 Usage

	1. Upload PDF: Select a PDF book file (max 50MB)
	2. Configure Settings: Choose AI model and summary parameters
	3. Generate Summary: Click "Generate Summary" and wait for processing
	4. Download Result: Save your AI-generated summary

	## 🛠️ Technology Stack

	### Frontend
	- Streamlit: Modern web interface
	- Custom CSS: Beautiful styling and responsive design

	### Backend
	- FastAPI: High-performance API framework
	- Uvicorn: ASGI server for FastAPI

	### AI & ML
	- Hugging Face Transformers: State-of-the-art NLP models
	- PyTorch: Deep learning framework
	- BART/T5 Models: Pre-trained summarization models

	### PDF Processing
	- PyPDF2: PDF text extraction
	- pdfplumber: Advanced PDF processing
	- NLTK: Natural language processing

	## 📁 Project Structure

	```
	book-summarizer/
	├── app.py # Streamlit frontend
	├── start.py # Automated startup script
	├── start.bat # Windows startup script
	├── start.sh # Unix/Linux/Mac startup script
	├── api/
	│ ├── __init__.py # API package
	│ ├── main.py # FastAPI backend
	│ ├── pdf_processor.py # PDF text extraction
	│ ├── summarizer.py # AI summarization logic
	│ └── utils.py # Utility functions
	├── requirements.txt # Python dependencies
	└── README.md # Project documentation
	```

	## ⚙️ Configuration

	### AI Models
	- facebook/bart-large-cnn: Best quality, slower processing
	- t5-small: Faster processing, good quality
	- facebook/bart-base: Balanced performance

	### Summary Settings
	- Max Length: 50-500 words (default: 150)
	- Min Length: 10-200 words (default: 50)
	- Chunk Size: 500-2000 characters (default: 1000)
	- Overlap: 50-200 characters (default: 100)

	## 🔧 API Endpoints

	- `GET /` - API information
	- `GET /health` - Health check
	- `POST /upload-pdf` - Validate PDF file
	- `POST /extract-text` - Extract text from PDF
	- `POST /summarize` - Generate book summary
	- `GET /models` - List available AI models
	- `POST /change-model` - Switch AI model

	## 📋 Requirements

	- Python: 3.8 or higher
	- Memory: At least 4GB RAM (8GB recommended)
	- Storage: 2GB free space for models
	- Internet: Required for first-time model download

	## 🐛 Troubleshooting

	### Common Issues

	1. "Module not found" errors:
	```bash
	pip install -r requirements.txt
	```

	2. NLTK data missing:
	```python
	python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
	```

	3. API connection failed:
	- Ensure FastAPI is running on port 8000
	- Check firewall settings
	- Verify no other service is using the port

	4. Large PDF processing slow:
	- Reduce chunk size in advanced settings
	- Use a faster model (t5-small)
	- Ensure sufficient RAM

	5. Model download issues:
	- Check internet connection
	- Clear Hugging Face cache: `rm -rf ~/.cache/huggingface`

	### Performance Tips

	- GPU Acceleration: Install CUDA for faster processing
	- Model Selection: Use smaller models for faster results
	- Chunk Size: Smaller chunks = faster processing but may lose context
	- Memory: Close other applications to free up RAM

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Add tests if applicable
	5. Submit a pull request

	## 📄 License

	This project is open source and available under the MIT License.

	## 🙏 Acknowledgments

	- Hugging Face for transformer models
	- Streamlit for the web framework
	- FastAPI for the backend framework
	- The open-source community for various libraries

	## 📞 Support

	For issues, questions, or feature requests:
	1. Check the troubleshooting section
	2. Open an issue on GitHub

	---

	Happy summarizing! 📚✨

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference