Data Science Basics You Should Know

What exactly is Data Science?

It is a buzz word in today’s IT world. It happens with many technologies that people start using it as a jargon without even understanding what it means, what comes in its purview and so on. We will discuss some such things in detail. The moment you talk about and especially when you talk about data science in today’s context. Data Science has its multiple components. When you talk about components, you essentially talk of big data you talk of various roles that are in Data Science – what exactly is the role of a Data Scientist, what exactly is the role of the Data Curator, what exactly is the role of the Data Librarian and so on. In today’s world when you talk about Data Science as a stream itself, it inherently has to deal with huge amounts of data.

Role of Hadoop in Data Science

And when you talk about it, it means big data and huge amounts of frameworks that are going to deal with this massive data. There are so many frameworks that are available, and they have their own advantages and disadvantages. The most popular framework is Hadoop. You talk about data science, you talk about various analytics you have to do on this huge amount of data – you cannot really escape Hadoop. When you are doing statistical analysis, you do not care about Hadoop or any other big data framework. Hadoop is written in Java, so it will help if you know Java as well.

What is R?

R is a statistical programming language. You cannot really avoid R because when you talk of various algorithms you have to apply on this huge amount of data in order to understand the insights of it or in order to enable some machine learning algorithms on top of it, you have to work with R.

What is Apache Mahout?

Apache Mahout is a machine learning library provided by Apache. Now, why has it gained so much popularity? What exactly are the reasons behind it? The thing is that it is directly integrated into mathematics. Data Science is not really about the volume of data. It is about getting insights from data. Now what are those kinds of insights? If you do not really take care of the huge amount of data and in today’s world when you speak of social media marketing and all those linkedins, Facebooks, etc. Mahout has a direct integration with Hadoop, which allows it to leverage Hadoop’s processing power to implement its algorithm on a huge scale of data. If you look at companies like Linked and Facebook, you will find Mahout implementations.

Data Science is all about the huge amount of data that has to be sliced and diced in multiple ways to get the answers sought within a problem domain. The problem statement nowadays is, “You have told me enough about what I already know, tell me something I do not know”

I am a Commerce, Computer and Law graduate. I am running my own IT Company since 1993. I like to Read, explore Hindu Sanskruti, Travelling and Riding/Driving.

No Comments
Post a comment