4 min readSep 16, 2020

Entropy is the measure of disorder — I’m sure most if not all of us have heard this definition back in the high school physics classes. And guess what, it has also made its way here in data science and the visual is even more vivid here. Okay, recently I’ve been studying this topic and while I’ve found quite a number of resources that help you visualize what entropy is, I haven't found many that go all the way and assist you to calculate it. So I decided why not have a walk around on how to actually determine the entropy and information gain with a real-life example, but a little theory before that.

Entropy and Information Gain

Since they both stem from the decision tree problem, I’m gonna take a decision tree for explanation. Here’s a traditional example. You’ve got previous data about on what weather conditions does the Mayor (of your city) play golf, and say on one fine day, given the current weather data, you decide to predict whether the white-bearded man will play or not. Here’s the previous data,

Now, to form the decision tree, you need to split the table until leaves are reached and the question is what is the sequence of features that you split in so that leaves are reached in the minimum number of splits? The number of steps towards leaves is inversely proportional to the homogeneity of the nod while entropy is the measure of how heterogenous the node is. Note the terms, they’re on opposite sides of the coin, ie. more the entropy, longer would it take to reach the leaves, hence we try to minimize that. On the contrary, information gain is the change in entropy (entropy is expected to fall) caused by splitting the nodes and is the difference of entropies of parent and the children (nodes formed after the partition). For any nodes, the more the information gain, the more it is likely to be expanded. Since this is a writeup on a work-out example, I’ll stop the theory here. More brush up needed? Click here, gonna be an awesome read!

Conventions

Here are some conventions and shorthand I’m gonna use in this writing,

IG(node) = information gain from the node

H(node) = entropy of the node

E(a, b) = entropy when a of total is of type 1 and b of type 2

Please, mind that we’re not going to predict anything in this episode, just see how decision trees are constructed and what entropy and information gain has to do with it.

Want to read this story later? Save it in Journal.

A walk-through

Let’s try to solve the problem of in which order should the above table is expanded to form the decision tree of minimal height. Note that we’ve got 4 features and a binary terminal conditions(play or not) here. The approach is to find IG(outlook), IG(temperature), IG(humidity)and IG(windy), and sort the features on descending rate of IG. Let’s focus on IG of outlook.

Here, helps you focus. Just a cutout of the original table. Now lets chalk out the formulae.

We go bottom-up, execute the last formula first. First, list the values of outlike, like this —

Total values: 3; sunny, overcast, rainy

sunny(4 occurrences) -> 3 yes, 1 no

overcast(2 occurrences) -> 2 yes, 0 no

rainy(4 occurrences) -> 1 yes, 3 no

Okay, now, entropy of sunny = H(sunny) = E(3, 1) = 0.24

Similarly entropy of overcast = E(2, 0) = 0 and for rainy = 0.24. Note that E(a, b) = E(b, a). And the P(c)s are 4/10, 2/10, 4/10.

Therefore the entropy of the children =(4/10)* 0.24+ (2/10)*0 + (4/10)*0.24 = 0.192. There are a total of 6 yes and 4 no, which makes entropy of parent = H(p) = E(6, 4) = 0.29, Hence the information gain, IG(outlook) = 0.29–0.192 = 0.098.

So, there you have. Now all you got to do is repeat the process all over for the other three features and rank them against the IGs and follow the ranked order while partitioning. Well I know it’s painstaking doing all these on hands. Then go ahead, write a script, automate it! And as always, enjoy the beauty!

📝 Save this story in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.

Entropy and Information Gain

Conventions

A walk-through

Written by Hussain Safwan