Skip to content

nafisa-samia/Statistical-Analysis-of-Iris-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 

Repository files navigation

Statistical-Analysis-of-Iris-Dataset

Problem Statement: Perform statistical analysis on the Iris flower dataset.

Description: The iris flower data consists of 50 samples from 3 different species of iris flower namely setosa, versicolor and virginica. The dataset consists of 4 numerical/input features and 1 categorical feature/target variable. Input features are sepal length, sepal width, petal length and petal width whereas target variable is species.

alt text

Libraries Used: Numpy, Pandas, Scipy, Matplotlib, Scikit Learn, Statsmodels, Seaborn

What we have learned so far from this project:

  • We have four numerical columns and just one categorical column which is our target column
  • This dataset is a balanced dataset as every category has same number of instances
  • Very high correlation is there between petal length and petal width
  • The setosa species is the most easily distinguishable because it is less distributed
  • The versicolor and virginica species are difficult to distinguish due to the overlapping of attributes
  • All input features (sepal length, sepal width, petal length and petal width) are statistically significant in distinguishing the species of iris flower
  • The three species (setosa,versicolor, and virginica) have different petal lengths, with only partially overlapping values at the last two of them
  • We have verified that the species’ means are significantly different for all the four input features

About

Here we will try some statistical analysis of Iris Dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •