Gemma 4-E2B LLM Fine-Tuning: Custom Dataset & Unsloth Local Tutorial

Clip title: Fine-Tune Gemma-4 on Your Own Dataset Locally: Step-by-Step Tutorial Author / channel: Fahd Mirza URL: https://www.youtube.com/watch?v=cHpB0PTRx5A

Summary

This video provides a practical, step-by-step tutorial on how to fine-tune Google’s Gemma 4-E2B large language model locally on a custom dataset, leveraging the unsloth library for enhanced efficiency. The main topic revolves around transforming a general-purpose base model with surface-level knowledge into a specialized expert for niche domains. The presenter, Fahd Mirza, highlights that while base models like Gemma 4-E2B offer broad knowledge, they often provide generic or shallow answers when confronted with highly specific or deep topics, thus necessitating fine-tuning.

To address this, the video details the creation of a custom JSONL dataset containing approximately 100 detailed question-and-answer pairs about the ancient Gandhara civilization. This dataset covers various facets, including the Kushan Empire, Silk Road trade, Buddhist philosophy and art, ancient scripts, key rulers, and geographical significance. The JSONL format is structured in a ChatGPT-like conversational style, with a human query followed by a rich, detailed GPT-generated response. The core idea is to infuse the base model with deep, specialized knowledge that it initially lacks.

The technical implementation involves setting up a Conda virtual environment on an Ubuntu server equipped with an NVIDIA H100 GPU, though the presenter emphasizes that significantly less VRAM (or even a CPU) can suffice for this small model due to the efficiency of the unsloth library. The fine-tuning process utilizes LoRA (Low-Rank Adaptation) and 4-bit quantization, which drastically reduces memory footprint and training time. The SFTTrainer is configured with parameters like per_device_train_batch_size, gradient_accumulation_steps, warmup_steps, and num_train_epochs. Remarkably, the fine-tuning of the Gemma 4-E2B model, which has a total of 5.1 billion parameters (but an “effective core” of 2.3 billion for inference compute costs), was completed in under three minutes, consuming just over 8GB of VRAM.

The effectiveness of the fine-tuning is demonstrated through a comparative test. When asked a specific question about Kanishka I and his significance to Gandhara and Buddhism, the base Gemma 4-E2B model provides a brief, generic response. In stark contrast, the fine-tuned model delivers an extensive, well-structured, and highly detailed answer, showcasing a profound understanding of the historical and cultural nuances of the Gandhara civilization. This tangible improvement underscores the video’s main takeaway: fine-tuning with efficient tools like unsloth can transform general LLMs into domain-specific experts quickly and affordably, making advanced AI customization accessible to a broader audience.