Data science is a science of organizing, packaging and delivering data (the OPD of data). It’s a science which is often confused with data analytics or business analytics. While analytics is a part of data science, the reverse is not true. Data Science refers to every action that can be performed on data or facts, right from storing it to wrangling or formatting it into a more usable structure to later analyzing and deriving useful interpretations out of it.
Large chunks of data are stored in, what are called as, data warehouses. From data warehouse, the data is then extracted through special programming languages, like SQL, and NoSQL, which are used to communicate with the database to make the data available to the end user for analysis. After the data has been extracted, the next step is to analyze it from different angles and perspectives so it can then be summarized into useful information that can be used to increase revenue, cut costs or both.
There are many different tools which are used for the different steps involved in data science, a few of which are listed below.
- MySQL: It can comfortably handle datasets that are a few Gegabytes.
- CSV Files: One can actually get very far with using .csv files as primary storage medium.
- Hive/Shark/Redshift: When the data is actually big; big data.
- The R Project for Statistical Computing is the most popular for analysis. R studio offers a brilliant user interface and a great computing environment.
- Pandas is a set of Python libraries, which is another great tool for data analysis.
- Tableau, Matplotlib for ad-hoc Python plotting and ggplot2 for R.
It’ is a subfield in computer science that explores the study and construction of algorithms that can make predictions on data. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms are infeasible. R, again, is a great tool for predictive analysis.
There are many other tools, and the usage of those largely depends on one’s expertise and comfort. Again, it must be noted that data science is not the same data analytics or business analytics. While the latter primarily involve visualizing the raw data for its meaning, data science is about processes to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics.