It is huge, large, or voluminous data, information, or the relevant statistics acquired by large organizations and ventures. Many software and data storages is created and prepared as it is difficult to compute the big data manually. It is used to discover patterns and trends and make decisions related to human behavior and interaction technology.
The role of a data scientist is normally associated with tasks such as predictive modeling, developing segmentation algorithms, recommender systems, A/B testing frameworks and often working with raw unstructured data.
The nature of their work demands a deep understanding of mathematics, applied statistics and programming. There are a few skills common between a data analyst and a data scientist, for example, the ability to query databases. Both analyze data, but the decision of a data scientist can have a greater impact in an organization.
In big data analytics, people normally confuse the role of a data scientist with that of a data architect. In reality, the difference is quite simple. A data architect defines the tools and the architecture the data would be stored at, whereas a data scientist uses this architecture. Of course, a data scientist should be able to set up new tools if needed for ad-hoc projects, but the infrastructure definition and design should not be a part of his task.
A collaborative research forum for multiple units within Oakland University, the Center for Data Science and Big Data Analytics facilitates multidisciplinary data science research that uses big data analytics techniques. The center combines the expertise of scientists from biological and biomedical sciences, and researchers in mathematics/statistics, engineering, business and finance. These experts use cutting-edge analytics, informatics and computing methodologies to conduct research and develop innovative solutions to address high-impact problems across disciplines. In addition to serving as a research center emphasizing quantitative data based research in various disciplines, the experts in this center work with researchers from other disciplines by providing them with analytic support. Experts from this center are also available to consult with external industries and businesses.
Traditional healthcare analytics involves using patient as well as operational data to conduct statistical and quantitative analysis, build explanatory and predictive models, and fact-based management to drive healthcare decisions and actions. It is broadly concerned with the use, study, creation or synthesis of information artifacts such as databases, knowledge bases, mathematical/statistical models, data integration and transformation tools and entire decision support systems.
The primary aim of healthcare analytics is to improve managerial decision making through access to better information. However, the amount of medical data generated and the heterogeneity of that data makes traditional analytics inefficient particularly given the fact that much of the data is non-numerical. For example, notes written by physicians and nurses, images, and videos contain valuable information that need to be factored into the analysis. However, current tools do not have adequate mechanisms to integrate different types of data.
With auto industry and its primary and secondary supplier industries around, there are streams of big datasets that await the analysis. While larger companies have some sort of technical research centers, albeit inadequate for the purpose, smaller industries completely lack resources or manpower to handle their big data. The Center for Data Science and Big Data Analytics at Oakland University would act as a bridge between different disciplines and industries and provide analytics services.
This research stream focuses on using the multivariate and Bayesian methods in big data problems with special reference to finance. These datasets are huge, over thousands of stocks, mutual funds, Exchange traded funds and other financial instruments and are collected over years at the intervals of day, hour or minute and even at further higher frequencies. The sheer volume of data on various financial instruments and indexes collected over years on per minute frequency or on per stock price change (commonly called tick) basis and the interconnectedness of various stock price changes thereof pose a great challenge. The complexity is further compounded by events such as stock splits, mergers, stocks leaving the space as some companies die and new stocks entering the space as new companies are formed. The challenges of studying the market behavior or predictions can only be handled by looking at the data together rather than doing so on per stock basis. Such correlated data can be analyzed only through appropriate techniques and in view of the complexity of the data, these techniques are bound to be computer intensive. With Bayesian and Markov Chain Monte Carlo methods of modelling these data, new ways of analyzing these data would be developed. Further, such problems inevitably require special expertise, intensive computational power and special analytics. A specific objective of this research is the efficient and effective analysis of financial data.
This research stream focuses on studying/identifying gene mutations that lead to cardiovascular diseases. Specifically, this research will use mouse sensitized whole genome ENU mutagenesis screens to identify genes involved in the pathogenesis of several cardiovascular diseases including venous thromboembolism, heart attacks and other vascular occlusive diseases such as sickle cell anemia. The whole genome sequencing is used to identify the mutations, thus this research generates terabytes worth of genomic sequencing data per experiment. The high volume of genomic sequence data produced necessitates computationally intensive analyses and data storage. The Center for Data Science and Big Data Analytics aims to conduct cutting edge research in genomics and provide critical research and training opportunities for OU faculty and students.
This stream focuses broadly on fully computational evolutionary research using large datasets. It consists of two applied research areas supported by strong theoretical investigations. One of the major goals of this research is to explore evolution of life through phylogenetic trees, with special attention to evolution of early microbial life. It requires working with thousands of sequenced genomes that are obtained from available databases and reconstructing large evolutionary histories. The second goal of this stream of research is to explore the correlation between genotype and phenotype with particular attention to pathogenic species. This requires the use of fully sequenced genomes and of techniques to reconstruct past evolutionary steps (ancestral state reconstruction), which can be calculated by intense computational applications. Both these goals are supported by large-scale simulations that allow testing of the accuracy of obtained estimates within a controlled environment and the optimization of methodologies and software implementations. This research would greatly benefit from a venue in which expertise from other Big Data scientists could be tapped to design new and innovative ways to analyze/visualize data, and statistically evaluate the significance of the results.
Healthcare: For improving the quality of life, wireless and wired sensor network technologies are considered as one of the key research areas in computer science and healthcare application industries. The amount of data collected from patients in a region with many hospitals is huge and it grows every minute. Analysis of the sensor data (ECG, blood test results, ailments, treatment, allergy etc.) of a patient from the past few years and current value, identifying similar patients and comparison with their treatment and response is important for quality health care. Our research includes, design and development of small size, low power consuming and accurate digital sensors, collection of various data at appropriate interval, storing in the proper and secure format in large repositories, data mining algorithms, testing and implementation.
Internet of Things: Internet of Things (IoT) is the network of physical objects, devices, vehicles, buildings and other objects which are embedded with electronics, software, sensors, and network connectivity. These objects collect and exchange data. At present, industries are developing IOT enabled objects like connecting front door locks to internet, garage door opener, refrigerator etc. to internet and every object possible. This evolution is pushed into us and we need to worry not only about security of this technology, but also the huge amount of data this generates every micro second. Some of the data generated by IOT devices need to be stored for historical reasons and for analysis of trend or behavior of the customer. Automotive manufacturers are moving towards connected cars. These cars will communicate with each other and also communicate with internet and store various data from the cars in a huge data base. The data base may be maintained separately by individual car company like GM, FORD etc. to analyze their car's performance under varying road, weather, and traffic conditions. This requires big data analytics.
Cloud based Manufacturing: In manufacturing plants, various machines are connected to internet and the sensor values from these machines are stored in the cloud with time stamp. This allows the study of behavior of machines, the status of these machines etc. The analysis is used for Condition based maintenance of the machines, product quality analysis, and downtime analysis to improve productivity. We do research on the Cloud based manufacturing and data analysis this big data.
SAS is essentially a gold standard for scientific data analysis and data management. It is widely used worldwide by industries, businesses, research institutions, universities, hospitals, actuarial scientists, epidemiologists and social and biomedical scientists. 59ce067264