Huge size of data is known as big data. The size of data starts from giga to zetta bytes and beyond. Big data flows too fast, requires too many new techniques, technologies, approaches and handles these recent requirements. Big data is generated from online and offline processes, transactions, emails, videos, audios, images, click streams, logs, posts, books, photos, search queries, health records, social networking interactions, science data, sensors, and mobile phones including their applications and traffics. They are stored in databases or clouds and the size of them grows massively. As a result of this, it becomes difficult to capture, form, store, manage, share, analyze and visualize via typical database software tools. Big data concepts have a combination of techniques and technologies that help experts, managers, directors, investors, companies and institutions to gain deeper insights in their information assets and also to abstract new ideas, ways, approaches, values, perceptions from the analyzed data. Some statistics in the last two years are given below to understand the features of data.
- The size of data was in petabytes in 90s, exabytes in 2000s and nowadays it is calculated in zettabytes and more.
- Internet users have reached to 2,92 billions. They are nearly 40% of all people on earth.
- Today, the internet traffic is approximately 3,000 petabytes per month and it is predicted to reach 83,299 petabytes in 2018.
- Mobile phone penetration has reached to 6,57 billions worldwide.
- The number of the active social network users are 1.85 billions which is nearly 60% of all internet users.
- Facebook has 1.18 billions active users while, WhatsApp has 400 millions, Google+ has 300 millions, Twitter has 232 millions and Tumblr has 230 millions active users.
- More than 570 websites are created every minute.
- CERN Data Centre processes about one petabyte of data in each day.
- According to Fortune1000 Companies, 10% of increase in data provides $65.7 millions extra income.
- The value of big data market is 10,2 billion dollars now and it is expected to reach 53.4 billion dollars in 2017.
Big data is a way of understanding not only the nature of data but also the relations among data. Identifying characteristics of the data is helpful for defining its patterns. Key characteristics for big data are classified into ten classes. These classified categories are given in Fig. 1.
Fig 1. Big data classification
To clarify and express the big data features, the five Vs of volume, velocity, variety, veracity and value are frequently used to explain or understand the nature of big data as seen in Fig. 2. These features have been briefly explained below:
Volume is the size of data produced or generated. It is huge and its size might be in terabytes, petabytes, exabytes or more. The volume is important to distinguish the big data from others.
Velocity is important not only for big data but also for all processes. The speed of generating or processing big data is crucial for further steps to meet the demands and requirements.
Variety has different forms of data, covers the complexity of big data and imposes new requirements in terms of analysts, technologies and tools. Big data comes from a great variety of sources and generally has in three types: structured, semi structured and unstructured.
Veracity deals with consistency and trustworthy of big data. Recent statistics have shown that 1 of 3 decision makers do not trust the information gathered from big data because of their inaccuracy. Accordingly, collected or analyzed big data should be in trusted origin, protected from unauthorized access and normal format even if it is hard to achieve.
Value is the most important feature of big data and provides outputs to demands of business or requirement. Accessing and analyzing big data is very precious but it is almost useless if any value does not come out from those. Values should be in different forms such as having statistical reports, realizing a trend that was invisible, finding cost saving resolutions, detecting improvements or considering new thoughts for better solutions or achievements.
Fig 2. 5 V’s of big data
Organizations in any industry have big data can benefit from its careful analysis to gain insights and depths to solve real problems. The potential of big data can be specified in six main topics:
- Healthcare: clinical decision support systems, individual analytics applied for patient profile, personalized medicine, performance based pricing for personnel, analyze disease patterns, DNA analysis, improve public health
- Public sector: creating transparency by accessible related data, discover needs, improve performance, customize actions for suitable products and services, decision making with automated systems to decrease risks, innovating new products and services
- Retail: in store behavior analysis, variety and price optimization, product placement design, improve performance, labor inputs optimization, distribution and logistics optimization, web based markets
- Manufacturing: improved demand forecasting, supply chain planning, sales support, developed production operations, web search based applications
- Personal location data: smart routing, geo targeted advertising or emergency response, urban planning, new business models
- Technology: reducing the process time, real-time analysis, producing rapid response in times of crisis, decision-making with automated systems to reduce risks