Skip to content

Comprehensive EDA and machine learning analysis of Walmart sales data with feature engineering, multiple feature selection techniques, and predictive modeling. Handles 400k+ records with 130+ engineered features to forecast weekly sales and identify key business drivers.

Notifications You must be signed in to change notification settings

Vd1299/Walmart-EDA-and-time-series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Walmart Sales Forecasting

Machine learning project for predicting weekly sales using advanced time series analysis, feature engineering, and XGBoost regression on Walmart historical data.

🎯 Overview

End-to-end sales forecasting pipeline with comprehensive EDA, anomaly detection, multiple feature selection techniques, and XGBoost time series modeling for Walmart's 400k+ sales records across 45 stores.

⚑ Key Features

  • Exploratory Data Analysis (EDA): Statistical insights, correlation analysis, and data visualization
  • Anomaly/Outlier Detection: Statistical and IQR-based outlier identification and removal
  • Advanced Feature Engineering: Time-based, lag, rolling window, Trend and seasonality, and interaction features (130+ features)
  • Multiple Feature Selection: Univariate, Mutual Information, XGboost Feature importance
  • Data Quality Pipeline: Comprehensive cleaning, validation, and problematic value handling

πŸ“Š Dataset

Walmart historical sales data (2010-2012):

  • 400,000+ sales records across 45 stores and multiple departments
  • Store metadata, economic indicators (CPI, unemployment, fuel prices)
  • Holiday and seasonal information

πŸ”§ Implementation

Pipeline

  1. EDA & Visualization β†’ 2. Anomaly Detection β†’ 3. Feature Engineering β†’ 4. Feature Selection β†’ 5. XGBoost Modeling

Feature Selection Methods

  • Univariate statistical testing
  • Mutual Information scoring
  • XGboost importance ranking

Model

  • XGBoost Regressor: Optimized for time series forecasting
  • Time-aware feature engineering for temporal patterns
  • Cross-validation and hyperparameter tuning

πŸ“ˆ Results

  • 130+ engineered features from raw data
  • Robust outlier detection improving data quality
  • Multi-algorithm feature selection optimizing model performance
  • XGBoost time series model for accurate sales forecasting

About

Comprehensive EDA and machine learning analysis of Walmart sales data with feature engineering, multiple feature selection techniques, and predictive modeling. Handles 400k+ records with 130+ engineered features to forecast weekly sales and identify key business drivers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published