Introduction to ROC Curve

The abbreviation ROC stands for Receiver Operating Characteristic. Its main purpose is to illustrate the diagnostic ability of classifier as the discrimination threshold is varied. It was developed during World War II when Radar operators had to decide if the blip on the screen is an enemy target, a friendly ship or just a noise.  […]

A Gentle Introduction to Precision and Recall.

The idea of this blog is to give an intuitive understanding of Precision and Recall for a binary classification problem. I will shy away from explaining it in a textbook way but rather will try to give an intuition. Nevertheless, let me write the textbook formula first: The problem with this nomenclature is that despite […]

How is automation changing data science and machine learning?

We have come a long way since the introduction of data science and machine learning. The recent study has found that the volume of business data doubles in less than 14 months. Today, the collection of data is no longer a problem, but the filtration, analysis, and maintenance of relevant information is a bigger issue. […]

A common trap when it comes to sampling from a population that intrinsically includes outliers

I will discuss a common fallacy concerning the conclusions drawn from calculating a sample mean and a sample standard deviation and more importantly how to avoid it. Suppose you draw a random sample , , … of size and compute the ordinary (arithmetic) sample mean  and a sample standard deviation from it.  Now if (and […]

Business Intelligence Organizations

I am often asked how the Business Intelligence department should be set up and how it should interact and collaborate with other departments. First and foremost: There is no magic recipe here, but every company must find the right organization for itself. Before we can talk about organization of BI, we need to have a […]

Predictive maintenance in Semiconductor Industry: Part 1

The process in the semiconductor industry is highly complicated and is normally under consistent observation via the monitoring of the signals coming from several sensors. Thus, it is important for the organization to detect the fault in the sensor as quickly as possible. There are existing traditional statistical based techniques however modern semiconductor industries have […]

Fuzzy Matching mit dem Jaro-Winkler-Score zur Auswertung von Markenbekanntheit und Werbeerinnerung

Für Unternehmen sind Markenbekanntheit und Werbeerinnerung wichtige Zielgrößen, denn anhand dieser lässt sich ableiten, ob Konsumenten ein Produkt einer Marke kaufen werden oder nicht. Zielgrößen wie diese werden von Marktforschungsinstituten über Befragungen ermittelt. Dafür wird in regelmäßigen Zeitabständen eine gleichbleibende Anzahl an Personen befragt, ob diese sich an Marken einer bestimmten Branche erinnern oder sich […]

Big Data has reduced the boundary between demand-centric dynamic pricing and user-behavior centric pricing!

Real-time pricing is also known as Dynamic pricing, and it is a method to plan and set highly flexible prices of the services or the products. Dynamic pricing is aimed to help the online organizations modify the costs on the fly in relation to the ever changing market conditions. All sorts of modifications are managed […]

Sentiment Analysis of IMDB reviews

  Download the Data Sources The data sources used in this article can be downloaded here:

The Inside Out of ML Based Prescriptive Analytics

With the constantly growing number of data, more and more companies are shifting towards analytic solutions. Analytic solutions help in extracting the meaning from the huge amount of data available. Thus, improving decision making. Decision making is an important aspect of businesses, and technologies like Machine Learning are enhancing it further. The growing use of […]