YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
LSTM Seq2Seq Model for Translation
This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality.
Model Architecture
The model is a Seq2Seq architecture that uses:
- Embedding Layer: To convert input tokens into dense vectors.
- LSTM Encoder: To encode the source language sequences into a hidden representation.
- LSTM Decoder: To generate the translated target language sequences from the hidden representation.
- Linear Layer: To map the decoder output to the target vocabulary space.
Training Details
- Training Loss: Cross-entropy loss with padding tokens ignored.
- Optimizer: Adam optimizer with a learning rate of 0.001.
- Number of Epochs: 10 epochs.
- Batch Size: 32.
Evaluation Metrics
The model's performance was evaluated using:
- BLEU Score: A metric to measure the similarity between the generated and reference translations.
- ChrF Score: A character-based metric for evaluating translation quality.
Results
The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance:
- Training Loss: Decreased steadily over the epochs, indicating effective learning.
- Validation Loss: Showed minimal improvement, suggesting potential overfitting.
- BLEU Score: Improved gradually but remained relatively low, indicating that further tuning may be needed.
- ChrF Score: Showed a consistent increase, reflecting better character-level accuracy in translations.
Files Included
- LSTM_model.ipynb: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation.
- bleu_scores.csv: CSV file containing BLEU scores for each epoch.
- chrf_scores.csv: CSV file containing ChrF scores for each epoch.
- loss_plot.png: Plot of training and validation loss.
- bleu_score_plot.png: Plot of BLEU scores over epochs.
- chrf_score_plot.png: Plot of ChrF scores over epochs.
Future Work
- Hyperparameter Tuning: Experiment with different hyperparameters to improve model performance.
- Data Augmentation: Use data augmentation techniques to improve the model's ability to generalize.
- Advanced Architectures: Consider using attention mechanisms or transformer models for better performance.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support