Predictive analysis can be define as “the practice of extracting information from existing data sets in order to determine models and predict future results and trends."
The term was first used by Matt Cutler in 2003, from the desire to transform raw data into useful information that could be used not only to understand past models and trends, but also to accurately predict future results.
Predictive analysis is closely linked to data mining and machine learning, as it uses data models to make forecasts, where machines acquire historical and current information and apply it to a predictive model.
This model does not tell you what will happen in the future. No one knows exactly what will happen tomorrow, or in a week.
Rather he says that a certain event has a certain probability of happening. And this depends on the variables that influence the analyzed problem.
The greater the accuracy of the model used by predictive analysis, the greater the probability that a given event will happen in the immediate future.
For this reason, predictive analysis involves the search for significant relationships between the variables and the representation of these relationships in the models.
The variables that are analyzed can be distinguished in:
· Response variables, which indicate things we are trying to predict.
· Explanatory variables or predictors, which indicate things that we observe, manipulate or control and that could relate to the answer.
Having seen the importance of variables and the predictive model for predictive analysis, let us now see that it can be considered:
· Based on the type of analysis approach being used;
· Based on the response that is provided to the analyst.
According to Thomas W. Miller, author of the book Modeling Techniques in Predictive Analytics, there are essentially three general approaches to research and modeling used in predictive analysis:
1) Traditional approach: It involves defining a specific theory or model, which is based on statistical methods, such as linear regression and logistic regression. The construction of the model involves adaptation to the data and their control with diagnostics. These models are then validated before use.
2) Adaptable data approach: In this case we start with the data looking for useful predictors. Theories or models are not considered at the moment before performing the analysis.
Data adaptation methods adapt to the available data, representing non-linear relationships and interactions between variables. The data then determines the model. As with traditional models, adaptive data models are validated before using them to make predictions.
3) Model-dependent approach: Model-dependent research is the third approach. Start by specifying a model and use this model to generate data, forecasts or recommendations. When using such an approach, models are improved by comparing the data generated with real data. One wonders if simulated consumers, companies and markets behave like real consumers, companies and markets. Comparison with real data serves as a form of validation.
Simulations, methods of mathematical programming and the primary means of looking for accommodation in operation are examples of Employee Search by model.
Which approach is the best?
There is no right and unambiguous answer. Analyzes show that what works best is a combination of models and methods.
Predictive analysis allows you to give an answer to the forecast you decide to make. In particular, there are answers that:
A) Answer the question "How much" with a numeric variable. In these cases it refers to regression methods, which help us to predict a response with a significant magnitude, such as the quantity sold (example: How much is the share price x in a month? How much is the return of the investment of the investment y in a year? ).
B) Answer the question "Which" with a categorical variable. In this case we refer to classification methods (example: which brand will be purchased? Which bank transaction is fraudulent?)
The quality of the answers obtained from the analysis certainly depends on the data available, but above all on the type of predictive model used.
For this reason, We show you the basic steps for creating a predictive model.
The creation of a predictive model requires a high level of competence in statistical methods.
As a result, it is typically the domain of data scientists, statisticians and other qualified data analysts. The latter are supported by computer engineers, who help to gather relevant data and prepare them for analysis, and by software developers and business analysts, who help in the visualization of data, dashboards and reports.
Those involved in building a predictive model follow the following steps:
1. Definition of the project: First necessary activity is the understanding of the needs, priorities, desires and resources of the organization, in order to define the objective that the predictive analysis must reach by estimating the costs necessary for implementation and related timing.
2. Data preparation : All useful data must be obtained so that they are ready to be used later in the analysis. In predictive analysis, data is collected, cleaned and often sliced and cut in such a way as to be ready for use in a subsequent analytical phase.
3. Construction of the model: In this step the key variables that allow predicting the events of interest are identified. In addition to this, the algorithm best suited to analysis must be defined (for example, regression or classification algorithm).
This is the heart of predictive analysis. Creating the right model with the right predictive variables will require most of your time and energy. It needs as much experience as it does creativity. And there is never an exact or better solution. It is an iterative task and it is necessary to optimize the forecasting model over and over again.
Although it may be tempting to think that with the advent of big data, predictive models will be more accurate, the statistic shows that after a certain point, inserting more data into a predictive analysis model will not provide more accurate results.
The analysis of representative parts of the available information (sampling) can help speed up model development times and allow them to be implemented more quickly.
4. Validation of the model: Tests must be carried out to validate the model just created and constructed. Depending on the model created, different validation tools are used (e.g. residue analysis, confidence interval, for a regression model or accuracy tests such as the f1 score for the classification one).
5. Use of the model: The model is used for the purpose for which it was created, and updated with the insertion of data also in real time. Now look at the main areas of application of predictive analysis and the benefits they can bring.
The marketing, the financial services and insurance companies were among the main promoters of predictive analytics, as well as the big search engines and online service providers. Predictive analysis is also commonly used in areas such as healthcare, retail and manufacturing.
Business applications for predictive analysis include:
· Targeting online advertisements
· The identification of customers who are abandoning a service or product.
· Sending marketing campaigns to customers who are undecided about buying.
· The improvement of customer service.
· The analysis of the behavior of customers to determine buying patterns.
· Reporting of financial transactions potentially fraudulent.
· Identification of patients at risk of developing a medical condition.
· Detection of imminent failure of components in industrial equipment before they occur.
These applications make it possible to create a tangible reduction in costs and / or an increase in revenues, thanks in particular to a better allocation of resources or to a faster identification of problems (such as fraudulent behavior or machinery breakdowns).
Predictive analysis can provide managers and managers with decision-making tools to influence revenue forecasts, production optimization and even new product development. However, although useful, it is not suitable for everyone.
Dec 06, 2019