Data Science

Introduction

Data science is the study of large quantities of data, which can reveal insights that help organizations make strategic choices.
The data scientists need to be curious, judgemental and argumentative.
Many algorithms are used to bring out insights from data.

Methodologies

A Methodology is a system of methods and a guideline to decision making during the scientific process.

Data science methodology guides the datascientis in solving complex problems with data.

Data methodology stages

Foundational methodology, a cyclical, iterative data science methodology developed by John Rollins, consists of 10 stages.

Business understanding
- What is the problem you are trying to solve?
- Understand the business problem and determine the data needed to answer the core business question.
Analytic Approach
- How can you use data to answer the question?
  - If the question is to determine the probabilities of an action, then use a predictive model
  - If the question is to show the relationships, then use a descriptive model
  - If the question requires a yes or no answer, then use a classification model
Data Requirements
- Identify the correct and necessary data content, formats, and sources needed for the specific analytical approach.
Data Collection
- Idenify and gather available data sources (These can be in the form of structured, unstructured, and even semi-structured data relevant to the problem domain.)
Data Understanding
- Focused on exploring and analyzing the collected data to ensure that the data is representative of the problem to be solved.
Data Preparation
- Where data is cleaned, transformed, and formatted for further analysis, including feature engineering and text analysis.
Modeling
Evaluation
Deployment
Feedback

CRISP - DM Methodology

CRISP - DM stands for cross industry standard process for data mining
CRISP-DM, an open source data methodology, combines several data-related methodology stages into one stage and omits the Feedback stage resulting in a six-stage data methodology.
1. Business Understanding
2. Data Understanding [Combination of Data Requirements, Collection and Understanding]
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment

Analytic Approach

Based On Queuestions

Descriptive Questions: What is current status?
Diagonistic Questions: Why did it happen?
Predictive Questions: What likely to happen?
Prescriptive Questions: What should we do?
Classification Questions: What category does this belong to?

Descriptive Statistics

Descriptive statistics are appropriately named, as they provide insights into the main features of our data.

Central Tendency

Mean - Average
Median - Middle Value
Mode - Most Frequently Occuring Value

Measures of Dispersion (Variability)

The most common way to gauge the variability is via Standard Deviation

Standard Deviation

Measures how much the values in the dataset vary around the mean.

Super low standard deviation indicates a dataset with values clustered around the mean, while a higher standard deviation represents a wider spread around the mean

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
01_jupyter_notebook		01_jupyter_notebook
02_python_fundamentals		02_python_fundamentals
03_numpy		03_numpy
04_pandas		04_pandas
05_matplotlib		05_matplotlib
06_seaborn		06_seaborn
07_database		07_database
08_scikit_learn		08_scikit_learn
example_data_sets		example_data_sets
misc_libraries		misc_libraries
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science

Introduction

Methodologies

Data methodology stages

CRISP - DM Methodology

Analytic Approach

Descriptive Statistics

Central Tendency

Measures of Dispersion (Variability)

Standard Deviation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

vadivelmurugesan/data-science

Folders and files

Latest commit

History

Repository files navigation

Data Science

Introduction

Methodologies

Data methodology stages

CRISP - DM Methodology

Analytic Approach

Descriptive Statistics

Central Tendency

Measures of Dispersion (Variability)

Standard Deviation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages