Skip to content

πŸ“° Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types β€” text, URLs, and PDFs β€” and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

License

Notifications You must be signed in to change notification settings

Md-Emon-Hasan/InformaTruth

Repository files navigation

πŸ“˜ InformaTruth: AI-Driven News Authenticity Analyzer

🧠 Fine-tuned RoBERTa-based Multi-Modal Fake News Detector with Explanation Generation using FLAN-T5, URL/PDF/Text support, and Agentic LangGraph orchestration. Orchestrated through a LangGraph-powered agentic pipeline with Planner, Retriever, Tool Router, Fallback Agent, and LLM Answerer agents, plus memory and dynamic tool augmentation.

demo.mp4

πŸš€ Live Demo

πŸ–₯️ Try it now: InformaTruth β€” Fake News Detection AI App


πŸ” Overview

In the digital age, misinformation spreads rapidly across news outlets, social media, and online platforms. With the increasing difficulty of distinguishing between credible journalism and deceptive content, This agentic AI system detects fake news from text, PDF, or website URLs using a fine-tuned RoBERTa model. It leverages a multi-agent architecture with LangGraph, including Planner, Retriever, Tool Router, and Explanation Agent. When a claim is classified, the system uses FLAN-T5 to generate human-readable reasoning. If local evidence fails, it falls back on Wikipedia or DuckDuckGo search. This production-grade solution supports real-world fact-checking, multi-source ingestion, tool-augmented reasoning, and modular orchestration.


βš™οΈ Tech Stack

Category Technology/Resource
Core Framework PyTorch, Transformers, HuggingFace
Classification Model Fine-tuned RoBERTa-base on LIAR Dataset
Explanation Model FLAN-T5-base (Zero-shot Prompting)
Training Data LIAR Dataset (Political Fact-Checking)
Evaluation Metrics Accuracy, Precision, Recall, F1-score
Training Framework HuggingFace Trainer
LangGraph Orchestration LangGraph (Multi-Agent Directed Acyclic Execution Graph)
Agents Used PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent
Input Modalities Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF)
Tool Augmentation DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic
Web Scraping Newspaper3k (HTML β†’ Clean Article)
PDF Parsing PyMuPDF
Explainability Natural language justification generated using FLAN-T5
State Management Shared State Object (LangGraph-compatible)
Deployment Interface Flask (HTML,CSS,JS)
Hosting Platform Render (Docker)
Version Control Git, GitHub
Logging & Debugging Logs, Print Debugs, Custom Logger
Input Support Text, URLs, PDF documents

βœ… Key Features

  • πŸ”„ Multi-Format Input Support Accepts raw text, web URLs, and PDF documents with automated preprocessing for each type.

  • 🧠 Full NLP Pipeline Integrates summarization (optional), fake news classification (RoBERTa), and natural language explanation (FLAN-T5).

  • 🧱 Modular Agent-Based Architecture Built using LangGraph with modular agents: Planner, Tool Router, Executor, Explanation Agent, and Fallback Agent.

  • πŸ“œ Explanation Generation Uses FLAN-T5 to generate human-readable, zero-shot rationales for model predictions.

  • πŸ§ͺ Tool-Augmented & Fallback Logic Dynamically queries DuckDuckGo when local context is insufficient, enabling robust fallback handling.

  • 🧼 Clean, Modular Codebase with Logging Structured using clean architecture principles, agent separation, and informative logging.

  • 🌐 Flask with Web UI User-friendly, interactive, and responsive frontend for input, output, and visual explanations.

  • 🐳 Dockerized for Deployment Fully containerized setup with Dockerfile and requirements.txt for seamless deployment.

  • βš™οΈ CI/CD with GitHub Actions Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.


πŸ“¦ Project File Structure

InformaTruth/
β”‚
β”œβ”€β”€ .github/              # GitHub Actions
β”‚   └── workflows/
β”‚       └── main.yml 
β”‚
β”œβ”€β”€ agents/                            # Modular agents (planner, executor, etc.)
β”‚   β”œβ”€β”€ executor.py
β”‚   β”œβ”€β”€ fallback_search.py
β”‚   β”œβ”€β”€ input_handler.py
β”‚   β”œβ”€β”€ planner.py
β”‚   β”œβ”€β”€ router.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ fine_tuned_liar_detector/         # Fine-tuned RoBERTa model directory
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ vocab.json
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ special_tokens_map.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   └── merges.txt
β”‚
β”œβ”€β”€ graph/                            # LangGraph state and builder logic
β”‚   β”œβ”€β”€ builder.py
β”‚   β”œβ”€β”€ state.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ models/                           # Classification + LLM model loader
β”‚   β”œβ”€β”€ classifier.py
β”‚   β”œβ”€β”€ loader.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ news/                             # Sample news or test input
β”‚   └── news.pdf
β”‚
β”œβ”€β”€ notebook/                         # Jupyter notebooks for experimentation
β”‚   β”œβ”€β”€ 1 Fine-Tuning.ipynb
β”‚   └── 2 Fine-Tuning with Multi Agent.ipynb
β”‚
β”œβ”€β”€ static/                           # Static files (CSS, JS)
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   └── style.css
β”‚   └── js/
β”‚       └── script.js
β”‚
β”œβ”€β”€ templates/                        # HTML templates for Flask UI
β”‚   β”œβ”€β”€ dj_base.html
β”‚   └── dj_index.html
β”‚
β”œβ”€β”€ tests/                            # Unit tests
β”‚   └── test_app.py
β”‚
β”œβ”€β”€ train/                            # Training logic
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ data_loader.py
β”‚   β”œβ”€β”€ predictor.py
β”‚   β”œβ”€β”€ run.py
β”‚   β”œβ”€β”€ trainer.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ utils/                            # Utilities like logging, evaluation
β”‚   β”œβ”€β”€ logger.py
β”‚   β”œβ”€β”€ results.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ __init__.py                        
β”œβ”€β”€ app.png                           # Demo
β”œβ”€β”€ demo.webm                         # Demo video
β”œβ”€β”€ app.py                            # Flask app entry point
β”œβ”€β”€ main.py                           # Main script / orchestrator
β”œβ”€β”€ config.py                         # Configuratin file
β”œβ”€β”€ setup.py                          # Project setup for pip install
β”œβ”€β”€ render.yaml                       # Project setup render
β”œβ”€β”€ Dockerfile                        # Docker container spec
β”œβ”€β”€ requirements.txt                  # Python dependencies
β”œβ”€β”€ LICENSE                           # License file
β”œβ”€β”€ .gitignore                        # Git ignore rules
β”œβ”€β”€ .gitattributes                    # Git lfs rules
└── README.md                         # Readme

🧱 System Architecture

graph TD
    A[User Input] --> B{Input Type}
    B -->|Text| C[Direct Text Processing]
    B -->|URL| D[Newspaper3k Parser]
    B -->|PDF| E[PyMuPDF Parser]

    C --> F[Text Cleaner]
    D --> F
    E --> F

    F --> G[Context Validator]
    G -->|Sufficient Context| H[RoBERTa Classifier]
    G -->|Insufficient Context| I[Web Search Agent]
    
    I --> J[Context Aggregator]
    J --> H

    H --> K[FLAN-T5 Explanation Generator]
    K --> L[Output Formatter]
    
    L --> M[Web UI using Flask,HTML,CSS,JS]

    style M fill:#e3f2fd,stroke:#90caf9
    style G fill:#fff9c4,stroke:#fbc02d
    style I fill:#fbe9e7,stroke:#ff8a65
    style H fill:#f1f8e9,stroke:#aed581
Loading

πŸ“Š Model Performance

Epoch Train Loss Val Loss Accuracy F1 Precision Recall
1 0.3889 0.6674 0.7204 0.8285 0.7461 0.9313
2 0.4523 0.6771 0.7196 0.8259 0.7511 0.9173

Emphasis on Recall ensures the model catches most fake news cases.


🐳 Docker Instructions

Step 1: Build Docker image

docker build -t informa-truth-app .

Step 2: Run Docker container

docker run -p 8501:8501 informa-truth-app

βš™οΈ CI/CD Pipeline (GitHub Actions)

The CI/CD pipeline automates code checks, Docker image building, and Streamlit app validation.

Sample Workflow

name: CI Pipeline
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install flake8 pytest

      - name: Run tests
        run: pytest tests/

      - name: Docker build
        run: docker build -t informa-truth-app .

🌐 Real-World Use Case

  • Journalists and media watchdogs
  • Educators and students
  • Concerned citizens and digital media consumers
  • Social media platforms for content moderation

πŸ‘€ Author

Md Emon Hasan
πŸ“§ iconicemon01@gmail.com
πŸ”— GitHub πŸ”— LinkedIn πŸ”— Facebook πŸ”— WhatsApp


About

πŸ“° Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types β€” text, URLs, and PDFs β€” and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published