This project holds a special place in my heart. It represents the very beginning of my coding journey - my first ever Python project, created during my AI & Data Science studies at SNS College of Engineering.
Back then, I didn't even own a laptop. I was just a student trying to understand the world of technology, with more curiosity than knowledge. Despite the challenges, I was determined to build something meaningful. This simple image-to-text generator became my first step into the vast world of programming.
This desktop application converts images into descriptive text using Google's Gemini AI. What started as a learning experiment has become a functional tool that demonstrates the power of AI in understanding visual content.
When I first wrote this code, I barely understood what APIs were or how GUI applications worked. Every line was a learning experience:
- Tkinter GUI - My first attempt at creating a user interface
- Google Gemini Integration - Learning to work with AI APIs
- File Handling - Understanding how to manage image uploads
- Error Handling - Discovering the importance of robust code
This project taught me that you don't need to be an expert to start building. Sometimes, the best way to learn is by doing.
This humble image-to-text generator was my first commit to GitHub. It marked the beginning of my transformation from a curious student without a laptop to someone who could actually build software. Every developer has a first project, and this is mine.
- Python 3.x
- Google Gemini API key
- Required packages:
tkinter
,Pillow
,google-generativeai
,requests
-
Clone this repository:
git clone https://github.com/Hariharanpugazh/Image-to-Text-Generator.git cd Image-to-Text-Generator
-
Install dependencies:
pip install Pillow google-generativeai requests
-
Get your Google Gemini API key from Google AI Studio
-
Set your API key as an environment variable:
# Windows set GOOGLE_GEMINI_API_KEY=your_api_key_here # Linux/Mac export GOOGLE_GEMINI_API_KEY=your_api_key_here
python visions.py
- Click "Upload Image" to select your image
- Click "Generate Text" to let AI describe your image
- Read the generated description in the text area
If you're just starting out like I was, remember that every expert was once a beginner. This project might not be the most sophisticated application you'll ever see, but it represents something more important - the courage to start.
Your first project doesn't have to be perfect. It just has to be yours.
Built with:
- GUI: Tkinter (Python's built-in GUI library)
- AI Model: Google Gemini 1.5 Flash
- Image Processing: PIL (Python Imaging Library)
- API Integration: Google GenerativeAI
MIT License - Feel free to learn from this code and build something even better!