Improving Governance through Data Science
 
                                The massive proliferation of digital technology has resulted in generation of huge amount of digital data and growing immensely – from web data to e-commerce to fintech to stock markets to scientific research to governance to mobile devices to social media to IoT devices; the list is almost endless. All these have suddenly led to data explosion. International Data Corporation (IDC) predicts that the amount of digital data generated will be 175 Zeta Bytes (equivalent to 1021) by 2025. Due to the tremendous volume of digital data being generated and at such a rapid pace, such information cannot be easily interpreted by an individual but instead has to be relied on machines to interpret and process it. Therefore, it is pertinent to be able to extract relevant insights from the data which can be leveraged as a competitive advantage. The term data science was originally coined by a Danish astronomer and computer scientist Peter Naur. He was an academic and called it the “science of data” to refer to computer science. In recent times, due to the vast amount of digital data being generated, there has been renewed thrust within the academia and formal courses are being offered with defined curriculum by top universities globally so that students can specialize in the field.
The dissemination of data science, however, has been mostly driven by the technology industry triggered by the business insights and predictions which are emerging in various domains. When tools and techniques with a systematic methodology are used to study “data” to extract meaning from data, it is called Data Science. It is a careful amalgamation of various streams such as scientific method, knowledge of mathematics and statistics, knowledge of specialized programming, advanced analytics, artificial intelligence & machine learning, and the art of story-telling. While data analytics examines and interprets the existing conditions, data science provides meaningful insights of the future.
Data Science Lifecycle
Data science is an interdisciplinary field and combines a spectrum of techniques, processes, and algorithms. A team working on data science would generally comprise mathematicians, statisticians, scientists, developers, systems engineers, and domain experts. Typically, data scientists adopt the following process:
- Capture : Gather raw structured and unstructured data from relevant sources
- Prepare and Maintain : Cleaning, deduplicating, organizing, transforming, integrating the data
- Process : Examine patterns & determine the data’s suitability for analytics
- Analyse : Perform statistical analysis, predictive analytics, regression, learning algorithms
- Communicate : Prepare insights in form of reports, charts, and other data visualizations etc.
Let us take a case study to illustrate the Data Science lifecycle – Targeted Public Distribution System (TPDS) which aims for identification of the poor for delivery of food grains and for its distribution in a transparent and accountable manner at the Fair Price Shop (FPS) level. Several states are using Aadhaar-enabled systems for authentication of beneficiaries (over 79 Cr.) and distribution of ration. Consequently, huge amount of data is generated and has immense potential for applying data science techniques. It can help in solving problems such as identifying genuine beneficiaries, detecting fraudulent transactions, removing ghost beneficiaries, demand and supply forecasting, procurement planning of food-grains and stocking at warehouses, logistics, portability of beneficiaries across the country etc. The various stages describe have been illustrated as follows:
- Capture: Data on the eligible beneficiaries are captured based on SECC or any state-specific criteria, as applicable.
- Prepare and Maintain: The transactional databases are not suitable for analysis and data is extracted and transformed through data cleansing process and deduplication for analysis. This process should be able to handle a variety of data formats as there may be diverse datasets.
- Process: In this phase, data is examined for any bias, patterns etc. using several statistical tests and visualization techniques to determine the suitability of using predictive analytics, machine learning algorithms
- Analyse: In this phase, depending on the business objective, such as identification of ghost beneficiaries, detection of fraud or manipulation etc., various statistical analysis, predictive analytics, regression, machine learning and deep learning algorithms etc. are performed to extract insights from the prepared data.
- Communicate: In this phase, using data visualization techniques, insights can be presented depicting the hotspot areas for fraudulent transactions, identifying ghost beneficiaries, forecasting the demand of the food grains in various regions, predicting the additional stock requirements availability at various locations etc.
Applications of Data Science
The first step for data science and business leaders is to bring in cohesiveness to identify concrete, practical use cases where data science can be applied to deliver value. Data science finds application in almost every sphere and across domains. Some of the common applications are:
- Finance – Customer segmentation, risk analytics, algorithmic trading
- Banking – Instant loan approval based on credit history and risk profiling, customer lifetime value, fraud detection
- Insurance – Detect fraudulent claims, assess risk profile of the applicants, and determine the premium, assess weather conditions, and create localized heat maps to predict claims
- Healthcare – For elderly care, combining sensors, data science, cloud processing, system monitors unusual behaviour and alerts relatives and caregivers, medical imaging, drug discovery, bioinformatics, vaccine development, epidemiology, correcting genetic issues via genomic data evaluation
- Agriculture – Determine the crop pattern for high-yield, time of sowing, fertilizer, irrigation needs
- Safety and Surveillance – Identify crime and accident hotspots to prevent untoward incidents
- TV Audience analytics platform – Employs deep analytics and machine learning to gather real-time insights into viewer behaviour
- E-Commerce – Assessing consumer behaviour, buying pattern, product recommendations, inventory management, logistics, analysing reviews
- Manufacturing – Automating manufacturing units, scheduling maintenance, anomaly detection, predicting potential problems in assembly line, warehouse management, detect product defects
- Transport –Self-driving cars using real-time object detection through 3D-printed sensors to guide the vehicle, enhanced driving experience, car monitoring system, traffic analysis and best route recommendation
- Energy and utilities – Reduce demand and production gap, detect power loss and thefts, predict outages
- Education – Smart education by upskilling the student depending on the strengths and determining the right skillset
- Sports –Training and fitness of the sportspersons, assessing the adversary strengths and weakness and design the team and strategy formation
- Social Media – Determine user needs, suggest products, their promotions, and advertisements, detect contentious/engaging posts and alert
Data Science in Government
Data Science can find ample application in government. It can help in extracting useful information and knowledge from large volumes of data in order to improve government decision-making or providing the insights to make data-driven decisions, making use of predictive causal analytics, prescriptive analytics, and machine learning. The government can harness the data to address implementation gaps, detect overlaps and target the right beneficiaries and feed into smart policy making, for instance, through improved predictive analytics.