LSTM Seq2Seq Model for Translation

This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality.

Model Architecture

The model is a Seq2Seq architecture that uses:

Embedding Layer: To convert input tokens into dense vectors.
LSTM Encoder: To encode the source language sequences into a hidden representation.
LSTM Decoder: To generate the translated target language sequences from the hidden representation.
Linear Layer: To map the decoder output to the target vocabulary space.

Training Details

Training Loss: Cross-entropy loss with padding tokens ignored.
Optimizer: Adam optimizer with a learning rate of 0.001.
Number of Epochs: 10 epochs.
Batch Size: 32.

Evaluation Metrics

The model's performance was evaluated using:

BLEU Score: A metric to measure the similarity between the generated and reference translations.
ChrF Score: A character-based metric for evaluating translation quality.

Results

The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance:

Training Loss: Decreased steadily over the epochs, indicating effective learning.
Validation Loss: Showed minimal improvement, suggesting potential overfitting.
BLEU Score: Improved gradually but remained relatively low, indicating that further tuning may be needed.
ChrF Score: Showed a consistent increase, reflecting better character-level accuracy in translations.

Files Included

LSTM_model.ipynb: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation.
bleu_scores.csv: CSV file containing BLEU scores for each epoch.
chrf_scores.csv: CSV file containing ChrF scores for each epoch.
loss_plot.png: Plot of training and validation loss.
bleu_score_plot.png: Plot of BLEU scores over epochs.
chrf_score_plot.png: Plot of ChrF scores over epochs.

Future Work

Hyperparameter Tuning: Experiment with different hyperparameters to improve model performance.
Data Augmentation: Use data augmentation techniques to improve the model's ability to generalize.
Advanced Architectures: Consider using attention mechanisms or transformer models for better performance.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support