Auer and Griffiths a assessed self-reported problem gambling using the PGSI with a sample of European online gamblers. They applied AI methods to predict self-reported problem gambling based on a number of behavioral features derived from transactional data.
The study found that frequent session depositing and frequently depleting the gambling account were most predictive of self-reported problem gambling.
Several other studies which investigated problem gambling have relied on the PlayScan problem gambling classification, a commercial player tracking tool e. These studies did not explain in detail how PlayScan classifies high-risk gambling other than that is based on gambling behavior such as depositing, wagering, and playing duration.
The present authors are not aware of a generally agreed approach to identify problem gambling based on player tracking data. Several European countries e.
Online gambling is a competitive market and several studies have found that gamblers continue to gamble with other operators when they have reached a mandatory limit or have self-excluded Auer and Griffiths, b ; Håkansson and Widinghoff, For that reason, it is important that monitoring algorithms identify potentially problematic gambling as early as possible after gamblers have registered with a particular online gambling operator.
Therefore, the present study investigated whether it is possible to identify risky behavioral patterns among online gamblers in the first week after registration that are predictive of future high-risk gambling.
This could assist early prevention efforts and tailored responsible gaming measures by online gambling operators. The authors examined a sample of European online gamblers to study the association between gambling behavior during the first week after registration and high-risk gambling during the first 90 days after registration.
There was no specific hypothesis as the study was exploratory other than the investigation of the correlation between the first week of gambling and high-risk gambling during the first three months after the registration. It was anticipated that the findings will be helpful for policymakers and regulators, as well as for online gambling operators.
The authors were given access to an anonymized secondary dataset from a European online casino operator. Every transaction could be assigned to a single account. The dataset comprised player data from January 1 to April 30, inclusive.
The dataset comprised all gamblers who registered during the aforementioned study period. For each gambler, the gambling behavior during the first seven days after the registration was carried out see Appendix 1 for a list of all the variables.
Apart from two demographic variables i. Only gamblers who had at least one playing session during the first seven days after registration were selected for further analysis. The authors wanted to evaluate whether the first week of gambling was predictive of becoming a high-risk gambler sometime during the first 90 days after registration.
Based on gambling behavior, the system classifies gamblers daily into one of three categories: low-risk, medium-risk, high-risk. It uses a number of metrics such as monetary deposit volume, frequency of deposits, gambling session length, amount of money lost, frequency of gambling, and gambling during the night.
The score takes into account up to six months of historical data. However, gamblers can sometimes be classified as a risky gambler the day after they register, given that they also gambled on the day of registration. Such gamblers usually deposit a lot of money, gamble most of the day, place large bets, do not withdraw any winnings, and chase their losses.
For each of the gamblers who registered during the study period and gambled during the first week after registration, a binary target variable was computed. The variable indicated if a gambler became high-risk on any day during the eight days after registration date up until 90 days into the future.
Gamblers could have become high-risk at any day during the 90 days after the registration. Gamblers can remain high-risk for any number of days.
A hierarchical logistic regression analysis was used to compute the correlation between demographics as well as gambling behavior and a future high-risk classification. The dependent variable was binary and indicated whether a gambler was classified high-risk at any time between the day after registration and 90 days after the registration.
Variables were classified into three groups. Age and being female were the control variables, a second set of variables reflected behavioral features, and a final set of variables reflected monetary intensity features.
First, a logistic regression which only included the control variables was carried out. Next, a logistic regression model which included the control variables and behavioral variables was carried out.
In order to determine whether the explanatory power improved after including the behavioral variables, a likelihood ratio test Feder, was carried out.
The monetary intensity variables were added in a third logistic regression model and a likelihood ratio test was carried out between the third and the second model. To reduce and prevent multicollinearity among the variables James et al.
This threshold was also used by Hopfgartner et al. The amount of money bet per session and the amount of money won per session were excluded from the analysis based on a VIF greater than The Nagelkerke R 2 compares the log-likelihood of a model with explanatory variables to the null-model without any explanatory variables.
Similar to an R 2 of a linear regression it is between 0 and 1. However, it does not report the percentage of explained variance, it reports the degree of the correlation between the independent variables and the binary dependent variable.
Additionally, two machine learning models, Random Forest Rigatti, and Gradient Boost Machine Doan and Kalita, , were carried out. In contrast to classical statistical methods like logistic regression, machine learning methods use more parameters which can lead to overfitting. This means that models might explain data on which they were trained very well, but not be applicable to new datasets.
Model accuracy is reported based on the test data. A total number of 37, gamblers registered between January 1 and April 30, with the online operator that provided the secondary dataset.
Out of the 37, gamblers, became high-risk for at least one day in the 90 days after registration 7. Table 1 reports the mean average values for gamblers who became high-risk and gamblers who did not become high-risk during the 90 days after registration.
Gamblers who did not become high-risk were on average 30 years old and gamblers who became high-risk were on average 38 years old. Future high-risk gamblers also displayed higher values with respect to every metric carried out during the first seven days after registration. In order to investigate whether there was a linear or non-linear relationship between age and being a high-risk gambler, the authors classified players into different age bands.
There appeared to be a positive correlation between age and the percentage of high-risk gamblers with the largest value appearing among those aged 39—55 years. Those gamblers aged up to 21 years and those aged 22—28 years comprised the lowest percentage of high-risk gamblers.
Gamblers older than 56 years had a lower percentage of high-risk gamblers compared to those aged between 39 and 55 years Table 2. Appendix 2 shows the correlations between each variable including the high-risk status. There is a correlation of 0.
A combination of variance inflation factor analysis and examination of the bivariate correlations led to the exclusion of the average amount of money won per session and the average amount of money bet per session.
This can be also explained by the fact that the difference between the amount of money won and amount of money bet is actually the amount of money lost. Variables which are derived from other variables do not add additional explanatory power, but increase collinearity and therefore add instability to regression models.
The number of monetary deposits had the largest correlation with becoming high-risk 0. A logistic regression model which included age and being female as independent variables and high-risk gambling as a binary dependent variable was carried out.
The control model reported a Nagelkerke R 2 of 0. The AIC of the control model was 18, In the next step, the behavioral variables were added to the logistic regression.
The Nagelkerke R 2 was 0. The AIC was 14, The lower the AIC value, the better the model quality. Table 3 reports the coefficients for each independent variable. Only being female was negatively correlated with becoming a high-risk gambler.
In the third step, the monetary intensity variables were added to the logistic regression. The AIC was 13, and therefore lower than for the model without the monetary intensity values.
This means that the AIC also confirmed an improved model quality after adding the monetary intensity variables. Table 4 reports the coefficients for each independent variable in the third logistic regression model. In the multivariate logistic regression model, being female, average amount of money deposited per session, and average amount of money were negatively associated with becoming a high-risk gambler.
Table 4 also reports the odds ratios exp β for each independent variable in the logistic regression model. An odds ratio of 1 indicates that the chance of becoming high-risk is not related to an independent variable.
An odds ratio of greater than 1 means that the chance of becoming high-risk increases with an increasing value of the independent variable. If a player gambles on one more day during the first seven days after registration, the chance of becoming high-risk increases by An odds ratio of smaller than 1 means that the chance of becoming high-risk decreases with an increasing value of the independent variable.
Being female decreased the chance of becoming high-risk by One of limitation of logistic regression is that the method can only identify linear relationships.
It is also non-iterative and it has many assumptions regarding the distribution of the data. Therefore, the authors also carried out a Random Forest as well as a Gradient Boost Machine model.
The independent variables and the dependent variable were the same as for the aforementioned third logistic regression model. Figure 1 displays the receiver operating curve ROC and the area under the curve AUC for the two models based on the test dataset. The most important variables with respect to explanatory power were the total amount of money deposited, the number of deposits, the amount of money lost, and the average number of deposits per session.
The present study was carried out in an attempt to identify early patterns of gambling which are predictive of becoming high-risk during the first 90 days after registration. Player tracking data from a sample of 37, European online gamblers were used. The average age was 38 years which is in line with samples from other online gambling studies e.
A univariate analysis found a lower percentage of high-risk gamblers among those aged up to 28 years compared to older gamblers. Various relationships between age and problem gambling have been identified previously.
For example, Raisamo et al. In a study of to year-old Australians, Abbott et al. The present study was based on player tracking data and did not assess gambling-related harm using a self-report gambling screen.
Young adults were less likely to become high-risk gamblers i. However, this does not necessarily contradict the aforementioned findings because young adults might perceive lower losses as harmful due to a lower available income.
Older gamblers are more likely to have more available income which can lead to higher losses and more frequent high-risk classifications.
However, older adults might not perceive the losses as harmful because they do not negatively impact their financial situation. In the multivariate logistic regression model, this correlation was reversed which means that being female meant a lower likelihood of being high-risk.
The odds ratio for being female indicated that the chance of becoming high risk decreased by It is not uncommon that the direction of a correlation is different in a multivariate analysis compared to a univariate analysis because independent variables are often correlated with each other.
Apart from the average session length in minutes per session and the average number of monetary deposits per session, each of the behavioral metrics in the logistic regression was significant. This was also indicated by a Nagelkerke R 2 value of 0. The addition of monetary intensity variables only slightly increased the Nagelkerke R 2 value from 0.
In the final model, only the average amount of money withdrawn per session and the total amount of money lost were not significant. The non-significance of the average number of deposits per session in the present study could be related to high correlations between the independent variables.
Although highly correlated variables were removed based on the variance inflation factor VIF , it is still possible that another independent variable which is highly correlated with the average number of deposits per session was responsible for the non-significant correlation of the latter variable.
The correlation matrix in Appendix 1 shows a correlation of 0. The latter remained in the model and the odds ratio indicated that the chance of becoming high-risk increased by 5. Other factors that were not assessed that may have contributed to high-risk gambling include factors specific to the gamblers themselves e.
This was further backed up by machine learning models which report an area under the curve AUC of 0. This is higher than any other AUC value reported by similar studies that the authors are aware of e.
Among all the variables, the number of days on which a player gambled during the first week increases the chance of becoming high-risk the most. The present study has a number of limitations.
First, although the number of participants was large and representative of those who gambled on the website, the findings were based on a single anonymized secondary dataset from one European online casino operator.
Data from different operators might lead to other results which limits the generalizability of the findings. For example, responsible gambling interactions could lower the number of high-risk players or prevent players from becoming high-risk at an early stage.
Responsible gaming procedures can also lead to player suspensions which would also lead to a lower number of high-risk players over time.
Third, there was no information available on the nationalities of the gamblers. Given there are often cultural differences between gamblers, it is not known if the participants predominantly came from one or two countries or whether the sample was more geographically diverse.
Finally, there is a possibility that more than one person might have been gambling using the same account e. Future replication studies should be conducted with data from different operators with different types of gamblers.
However, no causal conclusions are made because the study was based on secondary data. The findings of the study will be of interest to many different stakeholder groups of the gambling industry, gambling policymakers, and gambling regulators, as well as other researchers in the gambling studies field.
This means that online gambling operators could identify future high-risk players very early through monitoring metrics such as the amount of money deposited, number of monetary deposits, amount of money lost, and number of monetary deposits per session. Abbott, M. The prevalence, incidence, and gender and age-specific incidence of problem gambling: Results of the Swedish longitudinal gambling study Swelogs.
Addiction, 4 , — Article PubMed Google Scholar. Auer, M. The use of personalized behavioral feedback for online gamblers: An empirical study. Frontiers in Psychology, 6 , Article PubMed PubMed Central Google Scholar. Self-reported losses versus actual losses in online gambling: An empirical study.
Journal of Gambling Studies, 33 3 , — The use of personalized messages on wagering behavior of Swedish online gamblers: An empirical study. Computers in Human Behavior, , Article Google Scholar.
Global limit setting as a responsible gambling tool: What do players think? International Journal of Mental Health and Addiction, 18 1 , 14— An empirical attempt to operationalize chasing losses in gambling utilizing account-based player tracking data.
Journal of Gambling Studies. Advance online publication. Bozdogan, H. Psychometrika, 52 3 , — Braverman, J. Accuracy of self-reported versus actual online gambling wins and losses. Psychological Assessment, 26 3 , — Challet-Bouju, G. Modeling early gambling behavior using indicators from online lottery gambling tracking data: Longitudinal analysis.
Journal of Medical Internet Research, 22 8 , e Chóliz, M. The challenge of online gambling: The effect of legalization on the increase in online gambling addiction. Journal of Gambling Studies, 32 2 , — Doan, T. Selecting machine learning algorithms using regression models.
In IEEE International Conference on Data Mining Workshop ICDMW pp. Effertz, T. The effect of online gambling on gambling problems and resulting economic health costs in Germany. European Journal of Health Economics, 19 7 , — Over 3.
Over 2. Deportivo Armenio. Unidos do Alvorada. Sparta Rotterdam. Comerciantes Unidos. Deportes Puerto Montt. Atletico Tucuman. Deportes Concepcion. Halcones de Zapopan. Tritones Vallarta. Inter de Limeira. Deportivo Cuenca. Mineros de Fresnillo.
Olimpia Asuncion. It is free for everyone. frog vs Betfair Sportsbook. You must be 18 or over to use this website Please know your limits and gamble responsibly If you are looking for help, advice or support about your gambling, please go to: gamcare.
An anonymous professional sports bettor shares why legalizing sports betting in the U.S. hasn't created more opportunities to profit for Insider Secrets of football betting - how to forecast match results; How a London syndicate won over 1/2 million pounds with carefully targeted predictions Join This Week's #SteamRoom!! LIVE SUN @ pm est - Review Short/Long-Term Results Profit Blueprint + Proper Bet Size Q&A + CHAT+ LIVE BETS & More