Selection of features and prediction of wine quality using artificial neural networks


Abstract


The assessment of wine taste quality is a key factor for successful sales in the wine industry, where the aim is to fulfill the consumer's needs. Usually, this is determined by human experts who make the evaluation process very expensive and time-consuming.
This study intends to introduce an alternative method for the prediction of wine quality with the usage of machine learning techniques such as linear regression and neural networks.
Our data analysis is based on a real wine dataset provided by an established winery in Greece. First of all, we determine the dependence of the quality from selected physicochemical features of wine. We use some well-known algorithms to achieve better results in statistical calculations and specific methods of selecting the best possible number of variables using principal component analysis (PCA) and linear regression.
After using artificial neural networks and checking various combinations of layers we conclude how the proposed statistical techniques improve the accuracy of the prediction of the wine quality using the previously selected features.

DOI Code: 10.1285/i20705948v14n2p389

Keywords: Linear regression; neural networks; physicochemical properties; prediction; statistical methods; wines

References


Abbal, P., Sablayrolles, J. M., Matzner-Lober, E., & Carbonneau, A. (2018). A Model for Predicting Wine Quality in a Rhône Valley Vineyard. Agronomy Journal. https://doi.org/10.2134/agronj2018.04.0269

Abbal, P., Sablayrolles, J. M., Matzner-Lober, É., Boursiquot, J. M., Baudrit, C., & Carbonneau, A. (2016). A decision support system for vine growers based on a Bayesian network. Journal of agricultural, biological, and environmental statistics, 21(1), 131-151. https://doi.org/10.1007/s13253-015-0233-2

Arvanitoyannis, I. S., Katsota, M. N., Psarra, E. P., Soufleros, E. H., & Kallithraka, S. (1999). Application of quality control methods for assessing wine authenticity: Use of multivariate analysis (chemometrics). Trends in Food Science & Technology, 10(10), 321-336.

https://doi.org/10.1016/S0924-2244(99)00053-9

Ashenfelter, O. (2008). Predicting the quality and prices of Bordeaux wine. The Economic Journal, 118(529), F174-F184. https://doi.org/10.1111/j.1468-0297.2008.02148.x

Astray, G., Mejuto, J. C., Martínez-Martínez, V., Nevares, I., Alamo-Sanza, M., & Simal-Gandara, J. (2019). Prediction Models to Control Aging Time in Red Wine. Molecules, 24(5), 826. https://doi.org/10.3390/molecules24050826

Athanasiadis I., Ioannides D., (2015). A Statistical Analysis of Big Web Market Data Structure Using a Big Dataset of Wines. Procedia Economics and Finance, 33, 256-268. https://doi.org/10.1016/S2212-5671(15)01710-4

Beltrán, N. H., Duarte-Mermoud, M. A., Vicencio, V. A. S., Salah, S. A., & Bustos, M. A. (2008). Chilean wine classification using volatile organic compounds data obtained with a fast GC analyzer. IEEE Transactions on Instrumentation and Measurement, 57(11), 2421-2436.

https://doi.org/10.1109/TIM.2008.925015

Cortez Paulo, António Cerdeirab, Fernando Almeidab, Telmo Matosb, José Reis, (2009), Modeling wine preferences by data mining from physicochemical properties, Journal Decision Support Systems, Volume 47, Issue 4, November 2009, Pages 547–553. https://doi.org/10.1016/j.dss.2009.05.016

Frank, I. E., & Kowalski, B. R. (1984). Prediction of wine quality and geographic origin from chemical measurements by partial least-squares regression modeling. Analytica Chimica Acta, 162, 241-251. https://doi.org/10.1016/S0003-2670(00)84245-2

Grömping, U. (2006). Relative importance for linear regression in R: the package relaimpo. Journal of statistical software, 17(1), 1-27.

Gustafson, C. R., Lybbert, T. J., & Sumner, D. A. (2016). Consumer sorting and hedonic valuation of wine attributes: exploiting data from a field experiment. Agricultural economics, 47(1), 91-103. https://doi.org/10.1111/agec.12212

Guyon I. and Elisseeff A., (2003), An introduction to variable and feature selection. Journal of Machine Learning Research, 3(7–8):1157–1182.

Hair F.J. et al., (2014), Multivariate Data Analysis, (7th ed.), Pearson Ed.

Kallithraka, S., Arvanitoyannis, I. S., Kefalas, P., El-Zajouli, A., Soufleros, E., & Psarra, E. (2001). Instrumental and sensory analysis of Greek wines; implementation of principal component analysis (PCA) for classification according to geographical origin. Food Chemistry, 73(4), 501-514. https://doi.org/10.1016/S0308-8146(00)00327-7

Lantz Brett, 2013, Machine Learning with R, (2nd ed.), Packt Publishing

Legin, A., Rudnitskaya, A., Lvova, L., Vlasov, Y., Di Natale, C., & D’amico, A. (2003). Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimica Acta, 484(1), 33-44. https://doi.org/10.1016/S0003-2670(03)00301-5

Lindeman RH, Merenda PF, Gold RZ (1980). Introduction to Bivariate and Multivariate

Analysis. Scott, Foresman, Glenview, IL.

Lindsey C., Sheather S., (2010). Variable selection in linear regression. The Stata Journal, 10, nr.4, pp.650-669. https://doi.org/10.1177%2F1536867X1101000407

Mendenhall W., Sincich T., (2012). A Second Course in Statistics_ Regression Analysis, (7thed.), Prentice Hall ed.

Smith D. and Margolskee R., (2006). Making sense of taste. Scientific American, Special issue, 16(3):84–92. https://doi.org/10.1038/scientificamerican0906-84sp

Thiene, Mara, Riccardo Scarpa, Luigi Galletto, and Vasco Boatto. (2013). "Sparkling wine choice from supermarket shelves: the impact of certification of origin and production practices."Agricultural Economics 44, no. 4-5, 523-536. https://doi.org/10.1111/agec.12036

Yu, H., Lin, H., Xu, H., Ying, Y., Li, B., & Pan, X. (2008). Prediction of enological parameters and discrimination of rice wine age using least-squares support vector machines and near infrared spectroscopy. Journal of agricultural and food chemistry, 56(2), 307-313.

https://doi.org/10.1021/jf0725575


Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.