AIT-614-DL1 Homework 1
Kirti Kaur Sidana
Big data is a term defined in many ways but the key concept it includes volume, variety and velocity remain same across all the definitions. Therefore, from my understanding ‘Big Data can be interpreted as Data that is huge in volume, contains different format of data(structured, semi structured, unstructured and/or quasi structured etc),complex and increasing with a great speed require new tools/technologies to store, manage, process for future business benefits’.
As mentioned in the course slides there are 4 common traditional business problems that every organization face such as reduce customer churn, increase sales, cross-sell customers, fraud prevention etc for many years, Data Science comes into play here by unifying different statistical, data analysis, machine learning and related methods to solve, provide impactful analyses and derive additional advantages to all the above and many other problems. In other words it helps in solving Big data problem.
Big Data Analytics
The tools/technologies (which are developed with the help of data science) used to examine, identifying hidden or uncovered pattern, analyzing big data and drawing conclusions to take better decisions is defined as Big Data Analytics in simple words.
Big Data in a today’s time
From clinical data associated with lab tests and physician visits, to the administrative data surrounding payments and payers, this well of information is already expanding. When that data is coupled with greater use of precision medicine, there will be a big data explosion in health care, especially as genomic and environmental data become more ubiquitous.Medical and Pharma giants have been using collecting data on patients with the purpose of studying them and the rapidly increasing speed at which new data is being created by technological advances, and the corresponding need for that data to be digested and analyzed in near real-time to evaluate the trend of drugs and symptoms in order to provide low cost premiums and healthcare. One such recent example is of Kaiser Permanente.
Kaiser gathered data from Medicaid programs and integrated healthcare programs (explains variety)to determine that a drug largely consumed by women as oral contraceptive containing drospirenone increases the chancing of blood clot in women by 77% as compared to women who consume Oral Contraceptives which do not contain drospirenone. By understanding this pattern helped Kaiser to lower the number of cases of women going to emergency room with blood clot cases, hence decreasing the number of number of payouts to the hospitals.
In this day and age terms Facebook and Social Media are interchangeable. With 1.9 billion profiles which is more than 20% of the world’s population, Facebook is a perfect candidate when comes to study of big data. From Politics to Healthcare, every landscape can be altered just by studying and targeting Facebook’s data with the right tool and technique. The most common technique to collect Facebook’s data is by Facebook Graph API V2.0 and up. The Graph API can be termed as a social graph which uses nodes (user, user’s photo, user’s comment etc), edges and fields (info about things like user’s info: DOB, location, workplace etc). Services that can use Facebook Data: Off late Facebook has been a perfect ground for product or service research purposes. One of the most effective way is to encourage users to invite their friends into the study. Due to the recent policy update incentivizing users to bring in more applicants is now prohibited. For example a Telecom company targeting an area can target a user base by giving them free credits for their current telecom service by completing a survey and allowing them access their profile information and user allowing them to post on their profile the prize/credit they have received from the survey/research, this can snowball into a good number of participants ultimately resulting in a big data set for their research purposes.
Data storage entities used with Big Data
Data repositories are of three kinds according to analyst perspective, Data Islands, Data Warehouse and Analytic Sandbox. Data Islands aka spreadsheets represent data structured in rows and columns and have become a stronger tool over time to analyze data but can’t be compared to tools which can perform much more capabilities in less time and deal with any type of data.
Data warehouse can store any amount of data from multiple sources which can be controlled and accessed in a restricted environment. Analytic sandbox provides high level computing speed while cutting down the cost and can effectively deal with complex data types.
Data is growing every second and can be used wisely to predict future course of action for any organization if analyzed and handled with right tools/technologies(Big Data Analytics drivers).
There are many challenges in current analytical architecture which are obscured and can be corrected by data sciencetists.
Center for Medical, Agricultural and Veterinary Entomology: Gainesville, Fl
Van Kui , Collecting Facebook data for big data research:
Jeevan Mathew Sajan, Data Science vs. Big Data vs. Data Analytics:
Bernard Marr, 4 ways Big Data will change every business:
Karthik Kamballa, Giorgis Kollias, Vipin Kumar, Annath Grama, Trends in big data analytics: