A modern, AI-powered platform for uploading, searching, classifying, and analyzing documents in the cloud.
🌐 Try it online now: Cloud Document Analytics Platform
- Features Overview
- Feature Comparison Table
- Screenshots
- How It Works
- Installation & Setup
- Usage Guide
- Tech Stack
- Security
- Contributing
- License
- Credits
- Project Requirements & Approach
- Algorithms & Platform Choices
- Documentation & Reporting
- Drag-and-Drop Upload: Upload PDF, DOC, DOCX, TXT, and more with a modern interface
- Web Scraping: Extract and analyze content directly from web pages via URL
- AI-Powered Classification: Automatic, explainable document categorization with multiple algorithms (AI, rule-based, hybrid)
- Advanced Search: Full-text, fuzzy, and filtered search with instant results
- Analytics Dashboard: Visualize document types, upload trends, and category distributions
- Secure Cloud Storage: All files and metadata stored securely with Supabase
- User Authentication: Secure sign-up, login, and access control
- Responsive UI: Works beautifully on desktop and mobile
- Download & Delete: Manage your documents with ease
- Explainable AI: See why documents are classified a certain way
- Category Tree: Hierarchical classification for academic, technical, business, and legal documents
- Real-Time Feedback: Toasts and progress indicators for all actions
- Persistent Stats: Track your document stats over time
- Role-Based Access: Admin and user roles supported
- Live Demo: Always-available online version for instant access
Feature | Local Version | Online Demo | Description |
---|---|---|---|
Drag-and-Drop Upload | ✅ | ✅ | Upload documents from your device |
Web Scraping (URL Import) | ✅ | ✅ | Import and analyze web pages |
AI Classification | ✅ | ✅ | Automatic, explainable document categorization |
Advanced Search | ✅ | ✅ | Full-text, fuzzy, and filtered search |
Analytics Dashboard | ✅ | ✅ | Visualize document types, trends, and categories |
Secure Cloud Storage | ✅ | ✅ | Files and metadata stored in Supabase |
User Authentication | ✅ | ✅ | Sign up, login, and access control |
Download & Delete | ✅ | ✅ | Manage your documents |
Explainable AI | ✅ | ✅ | See classification confidence and rationale |
Category Tree | ✅ | ✅ | Hierarchical document classification |
Real-Time Feedback | ✅ | ✅ | Toasts, progress bars, and instant updates |
Persistent Stats | ✅ | ✅ | Track document stats over time |
Role-Based Access | ✅ | ✅ | Admin and user roles |
Mobile Responsive | ✅ | ✅ | Works on all devices |
Live Demo | ❌ | ✅ | No setup required, use instantly online |
- Sign Up & Login:
- Secure authentication with Supabase Auth
- Role-based access for users and admins
- Upload Documents:
- Drag and drop files or select from your device
- Supported formats: PDF, DOC, DOCX, TXT, XLSX, PPTX, images, and more
- Optionally, enter a URL to scrape and analyze web content
- AI Classification:
- Choose your preferred classification method (AI, rule-based, hybrid)
- Documents are categorized into Academic, Technical, Business, Legal, and subcategories
- Confidence scores and algorithm details are shown
- Search & Filter:
- Use the search bar to find documents by content, title, or metadata
- Apply filters by type, category, or upload date
- Sort results and view document details
- Analytics Dashboard:
- Visualize your document collection with bar, pie, and line charts
- See trends, type distributions, and category breakdowns
- Manage Documents:
- Download, delete, or view details for each document
- Real-time feedback for all actions
git clone git@github.com:Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service.git
cd Cloud-Based-Document-Analytics-Service
npm install
- Copy
.env.example
to.env
and fill in your Supabase credentials
npm run dev
- Open http://localhost:5173 in your browser
- Create an account or log in securely
- Drag and drop files or select from your device
- Optionally, enter a URL to scrape and analyze web content
- Choose your preferred classification method (AI, rule-based, hybrid)
- Use the search bar to find documents by content, title, or metadata
- Apply filters by type, category, or upload date
- Browse your documents in a sortable, filterable list
- Download, delete, or view details for each document
- See automatic document categorization with confidence scores
- Explore analytics: document type distribution, upload trends, and more
Layer | Technology/Service |
---|---|
Frontend | React, TypeScript, Vite |
UI | Shadcn/UI, Lucide Icons, Tailwind |
Backend | Supabase (Postgres, Auth, Storage) |
AI/ML | Custom & hybrid classification |
Deployment | Vercel (CI/CD, CDN, Analytics) |
- All data is protected with Supabase Auth and Row Level Security
- Files are stored in user-specific buckets for privacy
- Environment variables are required for all sensitive credentials
- HTTPS enforced on the online demo
- User actions are logged for auditability
- Fork the repo and create your branch
- Make your changes and add tests if needed
- Open a pull request with a clear description
- Follow the code style and best practices
MIT License. See LICENSE for details.
- Developed by Yousef M. Y. Al Sabbah
- Islamic University of Gaza - Faculty of Information Technology
This project was developed as a cloud-based program for basic data analytics, document search, sorting, and classification. Below is a summary of the requirements and how they are addressed in this platform:
Requirement | How It Is Addressed |
---|---|
Collect a large number of PDF/Word documents | Upload via drag-and-drop, file picker, or web scraping from URLs. |
Store documents in the cloud | Uses Supabase for secure, scalable cloud storage and database. |
Update collection anytime | Upload new documents or scrape new sources at any time via the interface. |
Sort documents by title (extracted from document, not filename) | Title extraction from document content; sorting and filtering in the UI. |
Search documents for text/keywords | Full-text and fuzzy search with instant results; highlights found keywords in document previews. |
Highlight search text in output documents | Search results show highlighted keywords in context. |
Classify documents by a predefined tree using any algorithm | Hierarchical classification tree (Academic, Technical, Business, Legal, etc.) with AI, rule-based, or hybrid methods. |
Provide statistics (size, number, search/sort/classify time, etc.) | Analytics dashboard shows document count, size, upload trends, and operation timings. |
Use any programming language and cloud platform | Built with React/TypeScript (frontend), Supabase (cloud backend), Vercel (deployment). |
Well-documented, readable, and maintainable source code | Modular, commented codebase; clear folder structure; usage and contribution guides in README. |
GitHub repository and cloud program link | GitHub Source Code and Live Demo |
Write a report describing algorithms, platform, and usage | See below for a summary of algorithms and platform choices. |
- Title Extraction:
- Extracts the actual document title from PDF/Word content using custom parsing utilities.
- Sorting:
- Sorts documents by extracted title, not just filename, for more meaningful organization.
- Search:
- Supports keyword, phrase, and fuzzy search. Highlights found terms in document previews.
- Classification:
- Uses a hybrid approach: combines AI/ML (e.g., text embeddings, TF-IDF) with rule-based logic for robust, explainable classification.
- Classification tree includes Academic, Technical, Business, Legal, and their subcategories.
- Analytics:
- Tracks and displays statistics: document count, total size, upload/search/classification times, and trends over time.
- Cloud Platform:
- Supabase for authentication, storage, and database; Vercel for deployment and CDN; React/TypeScript for frontend.
- The source code is fully documented and organized for easy understanding and extension.
- This README serves as both a user and developer guide.
- For a detailed report on algorithms, platform decisions, and usage, see the attached project report template (if provided by your instructor).
- GitHub Repository: https://github.com/Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service
- Live Cloud Program: https://cloud-based-document-analytics-serv.vercel.app
Last updated: June 8, 2025