Skip to content

r0han01/genai-sec10k-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Financial Filing AI Assistant (Chatbot) πŸš€

Demo GIF

genai-sec10k-insights is a full-stack Retrieval-Augmented Generation (RAG) web application that enables users to interactively query and analyze SEC 10-K filings from major companies. The system extracts key insights from long-form financial reports using a semantic search pipeline powered by OpenAI embeddings and a FAISS vector store, all deployed behind a FastAPI backend and custom frontend.


πŸ“Œ Features

  • πŸ” Query Interface: Users can ask questions in natural language and get context-aware answers from 10-K filings.
  • πŸ“„ Document Source: Uses real SEC 10-K filings from companies like AAPL, TSLA, JPM, etc.
  • βš™οΈ Backend: FastAPI-based API server with secure authentication and RAG pipeline.
  • 🧠 Vector Search: OpenAI embeddings + FAISS index for fast and accurate semantic retrieval.
  • πŸ’» Frontend: Custom HTML, CSS, and JavaScript interface with clean UI and dynamic response rendering.
  • πŸ” Authentication: Random 6-digit terminal-generated access code required before accessing the app.

πŸ—‚οΈ Project Structure


genai-sec10k-insights/
β”œβ”€β”€ backend/
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py               # FastAPI entry point
β”‚       β”œβ”€β”€ api\_routes.py         # API route handlers
β”‚       β”œβ”€β”€ config.example.py     # Config template
β”‚       └── rag\_pipeline.py       # RAG logic: load FAISS, embed, search
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                      # Raw 10-K filings (e.g., AAPL\_10k.txt)
β”‚   β”œβ”€β”€ parsed/                   # Chunked JSON from raw filings
β”‚   └── faiss\_index/             # Prebuilt FAISS index and mapping
β”‚
β”œβ”€β”€ sec\_downloader/              # Utility scripts
β”‚   β”œβ”€β”€ download\_sec\_filings.py  # Scrapes and saves raw 10-Ks
β”‚   β”œβ”€β”€ chunk\_text.py            # Splits long 10-K text into chunks
β”‚   └── embed\_and\_store.py       # Embeds chunks and builds FAISS index
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html               # Main chat UI
β”‚   β”œβ”€β”€ auth.html                # Auth page with blur effect
β”‚   β”œβ”€β”€ styles.css               # Styling for layout and animation
β”‚   β”œβ”€β”€ app.js                   # Query + response logic (AJAX)
β”‚   └── assets/                  # Fonts, icons, etc.
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── mock\_10k\_40\_questions.json  # Mock questions for testing
β”‚
β”œβ”€β”€ Dockerfile                   # Production-ready Docker image
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ LICENSE
└── README.md                    # You're here!

🚦 Step-by-Step Walkthrough

Follow this guided walkthrough to experience the full pipeline from server startup to querying a 10-K report:


πŸ” 1. Start the FastAPI Server

From the project root, run the backend using:

cd backend
uvicorn app.main:app --reload

Once started, you'll see a well-formatted terminal output styled using rich, including:

  • Project banner
  • Usage instructions
  • Export variable tips
  • A random 6-digit authentication code

Example:

πŸš€ genai-sec10k-insights is running on http://localhost:8000
πŸ”’ Auth Code: 348215
πŸ”— Visit /chat to access the chatbot interface

Screenshot from 2025-05-25 09-46-35 (1)


🌐 2. Visit the Web App in Your Browser

Open:

http://localhost:8000

You’ll be taken to the authentication screen (auth.html) which has:

  • A minimal design with blurred background
  • An input field for the 6-digit code

Enter the code shown in your terminal and proceed.

Screenshot from 2025-05-25 10-55-23


πŸ’¬ 3. Access the Chatbot Interface

Once authenticated, you'll land on the main chat interface (index.html) where you can:

  • Ask any question related to the SEC 10-K reports of the 10 companies
  • View dynamic responses with smooth UI animations

ScreenShot Tool -20250525111020


πŸ“₯ 4. Use an Example Prompt (or Ask Your Own)

To help you get started, we’ve provided a few sample prompts in:


notebooks/mock\_10k\_40\_questions.json

Here’s one you can try immediately:

{
  "question": "What does Microsoft report about cybersecurity risks?"
}

main

πŸ“Œ Important Note: These are just example prompts β€” the chatbot is not limited to answering only those.

Screenshot from 2025-05-25 09-49-41 (1)

Since this app is built on top of the actual SEC 10-K filings from 10 companies (AAPL, MSFT, TSLA, AMZN, etc.), you can ask any meaningful financial or business-related question that could be answered from those filings.

βœ… For example:

  • "What does Apple say about supply chain risks?"
  • "How does Amazon describe its revenue streams?"
  • "What litigation risks does Meta report?"

Just make sure your question is relevant to the content typically found in 10-K reports (i.e., risk factors, business overview, financials, strategy, etc.).

Once submitted, your question will be semantically searched across the embedded document chunks, and an accurate answer will be generated based on the most relevant 10-K sources.

Screencastfrom2025-05-2509-54-46-ezgif com-video-to-gif-converter


βš™οΈ How It Works

1. Data Acquisition (sec_downloader)

  • 10-K reports are downloaded using download_sec_filings.py.
  • Files are stored in data/raw/.

2. Preprocessing

  • chunk_text.py splits raw text into overlapping chunks with metadata.
  • Output chunks saved as JSON in data/parsed/.

3. Embedding & Indexing

  • embed_and_store.py uses OpenAI’s text-embedding-3-small model to embed parsed chunks.
  • Embeddings stored in a FAISS index (index.faiss and index.pkl) under data/faiss_index/.

4. Backend (backend/app)

  • main.py initializes FastAPI with secure cookie auth.
  • rag_pipeline.py handles query embedding, similarity search via FAISS, and prompt construction.
  • api_routes.py serves /chat route with POST support for queries.
  • A random 6-digit access code is printed at app launch and must be entered to unlock the app.

5. Frontend (frontend)

  • auth.html: Initial page prompts for access code with a blurred background.
  • index.html: Interactive UI with user input box, loading animation, and styled answer display.
  • app.js: Handles sending queries, receiving responses, and displaying them in real time.
  • styles.css: Custom CSS for layout, transitions, and visual polish.

πŸš€ Running the App Locally

1. Clone the Repo

git clone https://github.com/r0han01/genai-sec10k-insights.git
cd genai-sec10k-insights

2. Set Up Environment

cd backend
cp app/config.example.py app/config.py
# Add your OpenAI API key and config in config.py
pip install -r requirements.txt

3. Run the App

cd backend
uvicorn app.main:app --reload

Access the app at: http://localhost:8000/chat

⚠️ Enter the access code displayed in the terminal on the auth.html page to unlock access.

4. (Optional) Docker Build

docker build -t genai-sec10k-app .
docker run -p 8000:8000 genai-sec10k-app

πŸ”’ Authentication Mechanism

  • When the FastAPI server starts, a secure 6-digit numeric access code is printed in the terminal.
  • This must be entered on the frontend’s auth page before gaining access.
  • Auth state is stored in cookies to protect /chat and /docs.

πŸ“Š Companies Covered

The app includes processed 10-K filings from the following companies:

  • Apple (AAPL)
  • Amazon (AMZN)
  • JPMorgan Chase (JPM)
  • Coca-Cola (KO)
  • Meta (META)
  • Microsoft (MSFT)
  • NVIDIA (NVDA)
  • Pfizer (PFE)
  • Tesla (TSLA)
  • Walmart (WMT)

πŸ› οΈ Troubleshooting & Swagger Testing

If the frontend chatbot interface (/chat) does not seem to be working correctly (e.g., blank responses, stuck loading, or CORS errors), you can directly test the backend API using the built-in Swagger UI at:

Screencastfrom2025-05-2512-06-09-ezgif com-video-to-gif-converter


http://localhost:8000/docs

βœ… Why Use This?

  • Confirms whether the backend and RAG pipeline are functioning correctly.
  • Helps isolate whether issues originate from the frontend or backend.
  • Allows developers to quickly debug request/response flow without modifying UI.

πŸ§ͺ How to Test via Swagger

  1. Make sure your FastAPI backend is running:

    uvicorn app.main:app --reload
    
  2. Open your browser and go to:

    http://localhost:8000/docs
    
  3. Scroll down to the POST /ask endpoint.

  4. Click on Try it out.

  5. In the request body, paste the following prompt:

    {
      "question": "What does Microsoft report about cybersecurity risks?"
    }
  6. Click Execute.

  7. You should receive a structured JSON response:

    {
      "answer": "Microsoft discusses cybersecurity risks related to...",
      "sources": [
        "MSFT_10k.txt"
      ]
    }

🧩 If It Works Here but Not in Frontend

If the Swagger /ask test returns valid responses but the frontend (/chat) does not:

  • βœ… It means your backend RAG pipeline is working fine.

  • ❌ The issue is likely in:

    • frontend/app.js (JavaScript fetch logic)
    • CORS configuration (make sure backend allows frontend origin)
    • Cookie or auth state mismatch

πŸ”§ Recommended Fixes

  • Check the console in browser dev tools (F12) for network or JS errors.
  • Verify you're logged in with the correct 6-digit code.
  • Ensure fetch requests from the frontend hit the same /ask endpoint with correct Content-Type headers.

By using the Swagger UI, you can verify your backend independently and streamline debugging efforts.

🧯 Stopping Docker Containers Safely

If you're running this app via Docker and need to stop it manually (especially if docker stop doesn't work or Docker is unresponsive), follow these steps:


πŸ” 1. List All Docker-Related Processes

Use ps and grep to identify Docker processes:

ps aux | grep docker

Sample output might include:

root     1234  ... /usr/bin/dockerd
root     2345  ... docker-containerd
user     3456  ... docker-proxy ...

Avoid killing core Docker daemons like dockerd or containerd.


πŸ”ͺ 2. Kill the Container Process by PID (Preferred)

Find the PID of the container process:

docker inspect --format '{{.State.Pid}}' genai-sec10k-container

Then stop it forcefully:

sudo kill -9 <PID>

Replace <PID> with the number returned above.


πŸ’₯ 3. Kill All Docker Container Processes (Advanced)

If you're confident and want to kill all Docker-related processes manually:

ps aux | grep docker | grep -v grep

Note the PIDs in column 2, then:

sudo kill -9 <pid1> <pid2> ...

⚠️ Warning: Only do this if you know what you're killing. Terminating Docker system processes may disrupt all containers or require restarting the Docker service.


βœ… One-Liner (Recommended)

To quickly kill the container process cleanly:

sudo kill -9 $(docker inspect --format '{{.State.Pid}}' genai-sec10k-container)

This is the safest way to force-stop just the container, without impacting Docker itself.

πŸš€ Deployment on AWS EC2

You can deploy this full-stack RAG-based chatbot on an AWS EC2 instance using Docker. This section includes a full guide and real-world fixes for issues you might encounter.


🧱 Requirements

  • AWS Account
  • EC2 access with security group allowing:
    • Port 22 (SSH)
    • Port 8000 (FastAPI)
  • SSH key pair (.pem file)
  • Docker and Git installed on the EC2 instance
  • OpenAI API Key

πŸ”§ 1. Launch EC2 Instance

  • AMI: Amazon Linux 2023
  • Type: t2.micro (Free Tier eligible)
  • Enable auto-assign public IP
  • Security group must include:
    • SSH (port 22) β†’ 0.0.0.0/0
    • HTTP (port 80) β†’ 0.0.0.0/0 (optional)
    • Custom TCP (port 8000) β†’ 0.0.0.0/0

πŸ” 2. Connect via SSH

ssh -i /path/to/genai-key.pem ec2-user@<your-ec2-public-ip>

If you get β€œToo many authentication failures”, use:

ssh -i /path/to/genai-key.pem -o IdentitiesOnly=yes ec2-user@<your-ec2-public-ip>

🐳 3. Install Docker and Git on EC2

sudo yum update -y
sudo yum install docker git -y
sudo service docker start
sudo usermod -aG docker ec2-user
exit

Re-login after the last step to activate Docker group changes.


πŸ“¦ 4. Clone the Repository

git clone https://github.com/r0han01/genai-sec10k-insights.git
cd genai-sec10k-insights

πŸ“ 5. Upload Your data/ Folder (FAISS index + 10-K chunks)

On your local machine, run:

scp -i /path/to/genai-key.pem -o IdentitiesOnly=yes -r \
~/CoursePractice/aiProject/genai-sec10k-insights/data \
ec2-user@<your-ec2-public-ip>:~/genai-sec10k-insights/

This avoids needing to rebuild the vector store on EC2.


πŸ§ͺ 6. Ensure Your Dockerfile Includes the data/ Directory

Make sure this line in Dockerfile is uncommented:

COPY data ./data

🧱 7. Build the Docker Image

docker build -t genai-sec10k-app .

πŸ”‘ 8. Run the Container (Foreground with API Key)

docker run -p 8000:8000 \
  --name genai-sec10k-container \
  -e OPENAI_API_KEY=your-key-here \
  genai-sec10k-app

This runs the app in the foreground so you can see logs and the auth code.


🌐 9. Access Your Web App

Open your browser:

http://<your-ec2-public-ip>:8000
  • Enter the 6-digit code shown in your EC2 terminal
  • You’ll be redirected to the chatbot interface
  • Ask any question like:
What does Microsoft report about cybersecurity risks?

🧩 Common Issues & Fixes

Problem Fix
Too many authentication failures Use -o IdentitiesOnly=yes with ssh or scp
COPY data ./data fails Make sure data/ exists locally and isn’t ignored by .gitignore
App runs but doesn't answer You're missing the FAISS vector index β€” upload your data/ folder
Input fields zoom on mobile Set font-size: 16px in styles.css for all input fields