Gujarati Question Answering Model (Finetuned)

This model is a fine-tuned version of Naman0807/gujarati-lm on a custom Gujarati Question Answering dataset. It is designed to generate answers given a context and a question.

Model Description

  • Model Architecture: GPT-2 (Decoder-only)
  • Task: Generative Question Answering
  • Language: Gujarati
  • Base Model: Naman0807/gujarati-lm
  • Context Window: 256 tokens

Intended Use

This model is intended to answer questions based on a provided context in Gujarati. It uses a generative approach where the input is formatted as: Context: <context> Question: <question> Answer: and the model generates the answer.

How to Use

You can use this model with the Hugging Face pipeline.

from transformers import pipeline

# Load the pipeline
generator = pipeline("text-generation", model="Naman0807/fine_tuned_gujarati_qa") # Replace with your actual repo name if different

# Define context and question
context = "ગુજરાત ભારતનું એક રાજ્ય છે. તેનું પાટનગર ગાંધીનગર છે."
question = "ગુજરાતનું પાટનગર કયું છે?"

# Format the prompt
prompt = f"Context: {context}\nQuestion: {question}\nAnswer:"

# Generate answer
output = generator(prompt, max_length=256, num_return_sequences=1, do_sample=True, temperature=0.7)
generated_text = output[0]['generated_text']

# Extract the answer part
print(generated_text)

Training Data

The model was fine-tuned on the Gujarati subset of the L3Cube-Pune IndicSQuAD dataset. Dataset Link

The dataset was formatted into a generative text format:

  • Format: Context: ... Question: ... Answer: ...

Training Procedure

Hyperparameters

  • Epochs: 3
  • Batch Size: 4
  • Learning Rate: 5e-5
  • Max Sequence Length: 256
  • Optimizer: AdamW
  • Precision: Mixed Precision (FP16)

Training Code Snippet

training_args = TrainingArguments(
    output_dir="./fine_tuned_gujarati_qa",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    learning_rate=5e-5,
    fp16=True,
    ...
)

Limitations

  • Context Length: Limited to 256 tokens. Long contexts may be truncated.
  • Hallucination: As a generative model, it may hallucinate answers if the answer is not present in the context or if the model is unsure.
Downloads last month
2
Safetensors
Model size
51.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Naman0807/gujarati-lm-finetuned

Finetuned
(1)
this model

Dataset used to train Naman0807/gujarati-lm-finetuned