Today, I will be taking you through two most referred terms in the machine learning. In Machine Learning we will deal with a lot of data and it is important to know the naming conventions related to data sets, which are Variables and Observations.

Defining Data set

Before answering what are Variables and Observations, first of all, let’s get to know data set. What is a data set? A data set is a collection of data values in a matrix format. Now, don’t get confused with the definition. In simple words, the data set is a collection of values in rows and columns. For Example:

 Column 1Column 2
Row 1item 1×1item 1×2
Row 2item 2×1item 2×2

This is a matrix of 2×2 dimension with 2 columns and 2 rows. Where item 1×1, item 1×2, item 2×1, and item 2×2 are the data items.

What are Variables and Observations in a data set? - Machine Learning Future
What are Variables and Observations in a data set? – Machine Learning Future

Real Life Example of variable and observations

We can represent any kind of data set with the above framework. Let’s look at the data set of Sundar and Nicki:

GenderAgeCountry
Male45India
Female47US

In above data set, what are variables and observations? the Name, Age, and Country are Variables (columns). The Data items(Male, 45, India, Female, 47, and US) are Observations (rows).

Wait a minute! why do I call those as variables and observations?

Basically, Sundar and Nicki are two objects who have some features. Here features are Gender, Age, and Country. These Features are variable in nature. For Eg: Age can be of any real value and it varies from person to person. So hence the name Variable. When in an experiment, objects reveal their characteristics and values which constitutes them. Hence, variables can be defines as characteristics of an object and observations are the values. Here Sundar and Nicki are two objects.

Thus, depending on the number of variables in the data set, we will categorize the data sets as single variable or multi-variable. A single variable data set will have only one column(variable) and a multi-variable data set will have more than one columns of data.

Therefore, with the help of above explanation and examples, we got to know what are variables and observations.