CodeSteps

Python, C, C++, C#, PowerShell, Android, Visual C++, Java ...

Data Science – Overview

In simple terms, Data Science means, we apply Science to Data. What science do we Apply, to the Data? Lets’ look into details: first, we start with what is Data.

Data Science - In Simple Terms
Data Science – In Simple Terms

What is Data?

Data is the collection of details; it may be financial details, human genome details, weather details, personal details, internet traffic details, the data generated by the devices connected through IoT, geographical details, etc. We use all these details for a specific purpose. The data may contain, high-level or detailed-level details. We call this as raw-data.

What Data do we collect & what do we do with it? – Data Analysis

Data Analysis plays a major role in Data Science. Data Analysis is the process of discovering useful information from the data. Once the data is analyzed, we must process the data in order to produce the results we are expecting from the data.

There are different steps in Data Analysis:

How do we get the Data? – Data Gathering

We collect the Data using some data gathering techniques. We may collect the data from the Internet, connected devices, or from various data sources like news channels, people, books, etc,. or from already collected data sources.

Data Filtering

There is a lot of data available or generated and which contains useful or non-useful details. Do we need to keep non-useful data? Not necessarily. We collect the data which is useful to us and which is related; where Data Filtering comes into the picture. The data filtering technique is useful to collect the related data which suits our requirements. For example, for face recognition; we need facial expressions of the person; not the images of the tree, car, etc.

And also the quality of the data is also important. So in data filtering, we collect ONLY the quality data. If there is any invalid data, we simply ignore the data.

Once the quality and required data is gathered; we must do the Data Modeling.

Data Modeling

Modeling of the data means a representation of the data in a logical structure and also defining the physical schema. Data Modeling defines the data objects and its’ relationships. Data Modeling is the key concept in Data Science. Without Data Modeling, it is difficult to retrieve, update & expand the data.

Data Storage

Once the data is available, we must store the data to use it for different purposes. And this can be stored in a place to enable to access the data. How much storage we need? That depends on the requirement. We need MBs – Megabytes, GBs – Gigabytes, TBs – Terabytes, PBs – Petabytes, EBs  – Exabytes, or even ZBs – Zettabytes of data storage. For example, to store huge collection of movies, we need ZBs or even more storage volumes. Here is some useful information;

MB – 1024 Kilo bytes (KB)

GB – 1024 MB

TB – 1024 GB

PB – 1024 TB

EB – 1024 PB

ZB – 1024 EB

Data Formatting

Once the data is processed, it produces the results and the results should be formatted to report. Usual formats are TEXT, CSV, XML or JSON.

Data Security

Once the data is stored and enabled to access is NOT sufficient. The data must be secured properly, and restrict access to the data to authorized persons ONLY. This way, we can restrict to misuse of the data. For example, personal details should be stored and accessed in a secured manner.

So finally, what is Data Science?

It is comprised of all these things that we discussed above; comprise of modeling, tools & techniques, processes to extract useful information from the data.

Happy reading!

\Wesley/

Data Science – Overview

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top