Selection of features and prediction of wine quality using artificial neural networks
Abstract
This study intends to introduce an alternative method for the prediction of wine quality with the usage of machine learning techniques such as linear regression and neural networks.
Our data analysis is based on a real wine dataset provided by an established winery in Greece. First of all, we determine the dependence of the quality from selected physicochemical features of wine. We use some well-known algorithms to achieve better results in statistical calculations and specific methods of selecting the best possible number of variables using principal component analysis (PCA) and linear regression.
After using artificial neural networks and checking various combinations of layers we conclude how the proposed statistical techniques improve the accuracy of the prediction of the wine quality using the previously selected features.
References
Abbal, P., Sablayrolles, J. M., Matzner-Lober, E., & Carbonneau, A. (2018). A Model for Predicting Wine Quality in a Rhône Valley Vineyard. Agronomy Journal. https://doi.org/10.2134/agronj2018.04.0269
Abbal, P., Sablayrolles, J. M., Matzner-Lober, É., Boursiquot, J. M., Baudrit, C., & Carbonneau, A. (2016). A decision support system for vine growers based on a Bayesian network. Journal of agricultural, biological, and environmental statistics, 21(1), 131-151. https://doi.org/10.1007/s13253-015-0233-2
Arvanitoyannis, I. S., Katsota, M. N., Psarra, E. P., Soufleros, E. H., & Kallithraka, S. (1999). Application of quality control methods for assessing wine authenticity: Use of multivariate analysis (chemometrics). Trends in Food Science & Technology, 10(10), 321-336.
https://doi.org/10.1016/S0924-2244(99)00053-9
Ashenfelter, O. (2008). Predicting the quality and prices of Bordeaux wine. The Economic Journal, 118(529), F174-F184. https://doi.org/10.1111/j.1468-0297.2008.02148.x
Astray, G., Mejuto, J. C., Martínez-Martínez, V., Nevares, I., Alamo-Sanza, M., & Simal-Gandara, J. (2019). Prediction Models to Control Aging Time in Red Wine. Molecules, 24(5), 826. https://doi.org/10.3390/molecules24050826
Athanasiadis I., Ioannides D., (2015). A Statistical Analysis of Big Web Market Data Structure Using a Big Dataset of Wines. Procedia Economics and Finance, 33, 256-268. https://doi.org/10.1016/S2212-5671(15)01710-4
Beltrán, N. H., Duarte-Mermoud, M. A., Vicencio, V. A. S., Salah, S. A., & Bustos, M. A. (2008). Chilean wine classification using volatile organic compounds data obtained with a fast GC analyzer. IEEE Transactions on Instrumentation and Measurement, 57(11), 2421-2436.
https://doi.org/10.1109/TIM.2008.925015
Cortez Paulo, António Cerdeirab, Fernando Almeidab, Telmo Matosb, José Reis, (2009), Modeling wine preferences by data mining from physicochemical properties, Journal Decision Support Systems, Volume 47, Issue 4, November 2009, Pages 547–553. https://doi.org/10.1016/j.dss.2009.05.016
Frank, I. E., & Kowalski, B. R. (1984). Prediction of wine quality and geographic origin from chemical measurements by partial least-squares regression modeling. Analytica Chimica Acta, 162, 241-251. https://doi.org/10.1016/S0003-2670(00)84245-2
Grömping, U. (2006). Relative importance for linear regression in R: the package relaimpo. Journal of statistical software, 17(1), 1-27.
Gustafson, C. R., Lybbert, T. J., & Sumner, D. A. (2016). Consumer sorting and hedonic valuation of wine attributes: exploiting data from a field experiment. Agricultural economics, 47(1), 91-103. https://doi.org/10.1111/agec.12212
Guyon I. and Elisseeff A., (2003), An introduction to variable and feature selection. Journal of Machine Learning Research, 3(7–8):1157–1182.
Hair F.J. et al., (2014), Multivariate Data Analysis, (7th ed.), Pearson Ed.
Kallithraka, S., Arvanitoyannis, I. S., Kefalas, P., El-Zajouli, A., Soufleros, E., & Psarra, E. (2001). Instrumental and sensory analysis of Greek wines; implementation of principal component analysis (PCA) for classification according to geographical origin. Food Chemistry, 73(4), 501-514. https://doi.org/10.1016/S0308-8146(00)00327-7
Lantz Brett, 2013, Machine Learning with R, (2nd ed.), Packt Publishing
Legin, A., Rudnitskaya, A., Lvova, L., Vlasov, Y., Di Natale, C., & D’amico, A. (2003). Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimica Acta, 484(1), 33-44. https://doi.org/10.1016/S0003-2670(03)00301-5
Lindeman RH, Merenda PF, Gold RZ (1980). Introduction to Bivariate and Multivariate
Analysis. Scott, Foresman, Glenview, IL.
Lindsey C., Sheather S., (2010). Variable selection in linear regression. The Stata Journal, 10, nr.4, pp.650-669. https://doi.org/10.1177%2F1536867X1101000407
Mendenhall W., Sincich T., (2012). A Second Course in Statistics_ Regression Analysis, (7thed.), Prentice Hall ed.
Smith D. and Margolskee R., (2006). Making sense of taste. Scientific American, Special issue, 16(3):84–92. https://doi.org/10.1038/scientificamerican0906-84sp
Thiene, Mara, Riccardo Scarpa, Luigi Galletto, and Vasco Boatto. (2013). "Sparkling wine choice from supermarket shelves: the impact of certification of origin and production practices."Agricultural Economics 44, no. 4-5, 523-536. https://doi.org/10.1111/agec.12036
Yu, H., Lin, H., Xu, H., Ying, Y., Li, B., & Pan, X. (2008). Prediction of enological parameters and discrimination of rice wine age using least-squares support vector machines and near infrared spectroscopy. Journal of agricultural and food chemistry, 56(2), 307-313.
https://doi.org/10.1021/jf0725575
Full Text: pdf