Class 12 Data Science

4 min readJan 2, 2023

Exploratory Data Analysis:

Exploratory Data Analysis is the process of carrying out an initial analysis of the available data to find out more about the data. We usually try to find patterns, try to spot anomalies, and test any hypotheses or assumptions that we may have about the data. The process of Exploratory Data Analysis is done with the help of summary statistics and graphical representations.

The main reason for this is that EDA is a way to explore data quickly and find patterns and this can be done best by using graphs.

A Few Types of Exploratory Data Analysis are:

Univariate Analysis
Bivariate Analysis.
Multivariate Analysis.

Univariate Analysis:

Univariate Analysis in statistics refers to the study of data belonging to one category only. “Uni” means one and “variate” means variable hence univariate analysis stands for the statistical study of single-variable data.

There are many techniques available to study single-variable data few of them are as follows

Find the central value from the given dataset by using concepts like mean, median, and mode. Univariate data can be expressed at a single data point.
Measure variance and interquartile range about the spread of data.
Visualize data in form of bar charts, pie charts, and holograms.

Example of univariate data:

Consider the following data of marks obtained by the students in the subject of computer science. Here single variable under study is marks obtained.

Let’s discuss three common ways of performing univariate analysis with python.

Find out the center value using mean, median, and mode. Find the spread of data.
Find out the frequency or occurrence of a particular element.
Visualization of data using bar charts, Pie charts, and holograms.

Find out the center value using mean, median, and mode. Find the spread of data.


import pandas as pd

#create DataFrame
df = pd.DataFrame({'points': [1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6, 8, 8, 9, 3, 2, 6],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15]})

#view first five rows of DataFrame
print("head",df.head())

#calculate mean of 'points'
print("mean =",df['points'].mean())

#calculate median of 'points' 
print("median =",df['points'].median()) 

#calculate standard deviation of 'points'
print("Standard Deviation =",df['points'].std())

output:

Find out the frequency or occurrence of a particular element.


import pandas as pd

#create DataFrame
df = pd.DataFrame({'points': [1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6, 8, 8, 9, 3, 2, 6],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15]})

print(df["points"].value_counts())

Visualization of data using bar charts, Pie charts, and holograms.

import pandas as pd
import matplotlib.pyplot as plt
#create DataFrame
df = pd.DataFrame({'points': [1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2],
 'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6, 8, 8, 9, 3, 2, 6],
 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15]})
print(df["points"].value_counts())
df.hist(column='points', grid=False, edgecolor='black')
plt.show()

Bivariate Analysis:

Bivariate analysis is analyzing the relationship between two variables.

The different methods used to analyze bivariate data are as follows

Scatterplots
Correlation Coefficients
Simple Linear Regression

Example 1: ( Ice Cream Business )

The ice cream business often uses bivariate data about temperature and total sales of ice cream. As temperature increases, ice cream sales also tend to increase. Let us understand this with the below table.

This is an example of bivariate analysis since it uses only two variables temperature and ice cream sales.

Scatterplots display the relationship between two variables temperature and ice cream sales.

import matplotlib.pyplot as plt

x = [20,25,30,35,40]
y = [1000,1200,1400,1500,1800]
plt.xlabel("ice cream")
plt.ylabel("Sales")
plt.scatter(x, y)
plt.show()

Output:

2. Correlation Coefficients

import numpy as np

x = [20,25,30,35,40]
y = [1000,1200,1400,1500,1800]
r=np.corrcoef(x,y)
print(r)

Output:

corrcoef() returns the correlation matrix, which is a two-dimensional array with the correlation coefficients. Here’s a simplified version of the correlation matrix you just created:

      x        y 
x    1.00    0.99
y    0.99    1.00

The diagonal of the matrix is always 1.

The left upper is the correlation coefficient of x and x.

The right lower is the correlation coefficient of y and y.

These values are equal and both represent the Pearson correlation coefficient for x and y. In this case, it’s approximately 0.99

Class 12 Data Science

Exploratory Data Analysis:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Vijaya Kumar Chinthala

No responses yet

More from Vijaya Kumar Chinthala

CLASS 11 : Chapter 8: Strings Computer Science (083)

2023–24

Class 12 Python(083) Practical File

Program 1: Program to enter two numbers and print the arithmetic operations like +,-,*, /, // and %.

IB DP Computer Science Option D: Object-oriented programming

Sample Paper 1

Class 11 Chapter 9: List

Write a program to increment elements of a list by 5.

Recommended from Medium

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Lists

Predictive Modeling w/ Python

Coding & Development

Practical Guides to Machine Learning

ChatGPT prompts

Mock Interview 1: Data Structures, Algorithms, Computer Networks, Operating Systems, DBMS

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Creating The Dashboard That Got Me A Data Analyst Job Offer

A walkthrough of the Udemy dashboard that got me a job offer from one of the biggest names in academic publishing.

5 AI Projects You Can Build This Weekend (with Python)

From beginner-friendly to advanced