A hybrid method for the extraction and classification of product features from user-generated contents


Abstract – The research we present in this paper focuses on the automatic management of the knowledge about experience goods and services and their features, starting from real texts generated online by internet users. The details about an experiment conducted on a dataset of product reviews, on which we tested a set of rule-based and statistical solutions, will be described in the paper. The main goals are the review classification, the extraction of relevant product features and their systematization into product-driven ontologies. Feature extraction is performed through a rule-based strategy grounded on SentIta, an Italian collection of subjective lexical resources. Features and Reviews are classified thanks to a Distributional Semantic algorithm. In the end, we face the problem of the extracted knowledge organization by integrating the subjective information produced by the internet users within a product-driven ontology. The Natural Language Processing (NLP) tool exploited in the work is LG-Starship, a hybrid framework for Italian texts processing based on the Lexicon-Grammar theory.

DOI Code: 10.1285/i22390359v22p137

Keywords: feature extraction; review classification; opinion mining; distributional semantics; feature ontology


Attardi G., Fuschetto A., Tamberi F., Simi M. and Vecchi, E. M. 2009, Experiments in tagger combination: arbitrating, guessing, correcting, suggesting, in “Poster and Workshop Proceedings of the 11th Conference of the IAAI”, page 10, Reggio Emilia, Italy.

Bacelar da Silva A.J. 2003, The effect of instruction on pragmatic development: teaching polite refusals in English, in “Second Language Studies” 22 [1], pp. 55-106.

Bastian M., Heymann S. and Jacomy M. 2009, Gephi: an open source software for exploring and manipulating networks, in “ICWSM 8”, pp. 361-362.

Biber D., Johansson S., Leech G., Conrad S. and Finegan E. 1999, Longman Grammar of Spoken and Written English, Longman, London.

Bloomfield, L. 1933, Language, University of Chicago Press, Chicago.

Bounie D., Bourreau M., Gensollen M. and Waelbroeck P. 2005, The effect of online customer reviews on purchasing decisions: The case of video games, in “Retrieved July”, volume 8, page 2009. Citeseer.

Buvet P.-A., Girardin C., Gross G. et Groud C. 2005, Les prédicats d’affect, in “LIDIL”[32, pp. 123-143.

Cameron D. 2005, Language, Gender and Sexuality: Current Issues and New Directions, in “Applied Linguistics” 26 [4], pp. 482-502.

Carenini G., Ng R. T. and Zwart E. 2005, Extracting knowledge from evaluative text, in “Proceedings of the 3rd international conference on Knowledge capture”, pp. 11–18. ACM.

Carbonell J. G. 1979, Subjective understanding: Computer models of belief systems. Technical report, DTIC Document.

Carreras X. and Màrquez L. 2005, Introduction to the conll-2005 shared task: Semantic role labeling, in “Proceedings of the Ninth Conference on Computational Natural Language Learning”, pp. 152–164. ACL.

Chen R. 2010, Compliment and Compliment Response Research: a Cross-Cultural Survey, in Trosborg A. (ed.), Pragmatics Across Languages and Cultures, Mouton de Gruyter, Berlin, pp. 79-102.

Chen Y., Zhou Y., Zhu S. and Xu H. 2012, Detecting offensive language in social media to protect adolescent online safety, in “Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference” and on “2012 International Confernece on Social Computing (SocialCom)”, pp. 71–80. IEEE.

Chevalier J. A. and Mayzlin D. 2006, The effect of word of mouth on sales: Online book reviews, in “Journal of marketing research” 43, pp. 345–354. American Marketing Association.

Chomsky N. 1965, Aspects of the Theory of Syntax. [11]. MIT press.

Cogo A., Archibald A., Jenkins J. (eds.) 2011, Latest trends in ELF research, Cambridge Scholars Publishing, Cambridge.

Comrie B. 1976, Aspect, Cambridge University Press, Cambridge.

Daoud M., Tamine-Lechani L., Boughanem M. and Chebaro B. 2009, A session based personalized search using an ontological user profile, in “Proceedings of the 2009 ACM symposium on Applied Computing”, ACM, pp. 1732-1736.

Dell’Orletta F. 2009, Ensemble system for part-of-speech tagging, in “Proceedings of EVALITA” 9, pp. 1-8.

D’Agostino E. 1992, Analisi del discorso: metodi descrittivi dell’italiano d’uso, Loffredo, Napoli.

De Longis R. 2001, La Storia delle donne, in Di Cori P., Barazzetti D. (a cura di), Gli studi delle donne in Italia, Carocci, Roma, pp. 299-320.

De Mauro T. e Thornton A. M. 1985, La predicazione: teoria e applicazione all’italiano, in “Sintassi e morfologia della lingua italiana d’uso: teorie ed applicazioni descrittive”, pp. 487–519.

Di Prospero B. (a cura di) 2004, Il futuro prolungato, Carocci, Roma.

Duan W., Gu B. and Whinston A. B. 2008, The dynamics of online wordof-mouth and product sales—an empirical investigation of themovie industry, in “Journal of retailing” 84, pp. 233–242.

D’Urso A. 2011, Histoire des critiques du surréalisme et critique des Histoires du surréalisme. Pour une démystification de l’historiographie surréaliste, in “Lingue e Linguaggi” 5, pp. 99-110.

Elia A., Martinelli M. e D’Agostino E. 1981, Lessico e Strutture sintattiche. Introduzione alla sintassi del verbo italiano, Liguori, Napoli.

Elia A. 1995, Dizionari elettronici e applicazioni informatiche, in “In III Giornate internazionali di Analisi Statistica dei dati Testuali, JADT”, pp. 55-6.

Elia A., Marano F., Monteleone M., Sabatino S. e Vellutino D. 2010, Strutture lessicali delle informazioni comunitarie all'interno di domini specialistici, in “Statistical Analysis of Textual Data, Proceedings of 10th International Conferences”, “Journées D'Analyse Statistique des Données Textuelles”, pp 9-11, Sapienza University, Rome, Italy.

Elia A. 2014a, Lessico e sintassi tra tempo e massa parlante, in Marchese M.P., Nocentini A., Il lessico nella teoria e nella storia linguistica, Edizioni il Calamo, Roma, pp. 15–47.

Favretti R. R., Tamburini F. and De Santis C. 2002, Coris/codis: A corpus of written italian based on a defined and a dynamic model, in “A Rainbow of Corpora: Corpus Linguistics and the Languages of the World”, Lincom-Europa, Munich.

Ferreira L., Jakob N. and Gurevych I. 2008, A comparative study of feature extraction algorithms in customer reviews, in “Semantic Computing”, 2008 IEEE International Conference on, pp. 144–151.

Gardent C., Guillaume B., Perrier G. and Falk I. 2005, Maurice gross' grammar lexicon and natural language processing, in “Proceedings of the 2nd Language and Technology Conference”, Pologne 2005.

Gauch S., Chaffee J. and Pretschner A. 2003, Ontology-based personalized search and browsing, in “Web Intelligence and Agent Systems: An international Journal” 1(3, 4), pp. 219-234.

Giordano R. e Voghera M. 2008, Frasi senza verbo: il contributo della prosodia, in “Sintassi storica e sincronica dell’italiano”, Atti del Conv. Intern. SILFI, Basilea.

Gross M. 1971, Transformational Analysis of French Verbal Constructions, University of Pennsylvania.

Gross M. 1975, Méthodes en syntaxe. Régime des constructions complétives. Hermann, Paris.

Gross M. 1992b, The argument structure of elementary sentences, in “Language Research” 28, pp. 699–716.

Gruber T. R. 1993, A translation approach to portable ontology specifications, in “Knowledge acquisition” 5(2), pp. 199-220.

Gutiérrez Y., Vázquez S. and Montoyo A. 2011, Sentiment classification using semantic features extracted from WordNet-based resources, in “Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis”, pp. 139–145.

Guillet A. et Leclère C. 1981, Restructuration du groupe nominal, in “Langages”, pp. 99–125.

Halliday M.A.K. and Hasan R. 1976, Cohesion in English, Longman, London.

Harris Z. S. 1970, Discourse analysis, in “Papers in structural and transformational linguistics”, pp. 313–347.

Hatzivassiloglou V. and McKeown K. R. 1997, Predicting the semantic orientation of adjectives, in “Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics”, pp. 174–181.

Hollande F. 2012, Changer de destin, Robert Laffont, Paris.

Hu M. and Liu B. 2004, Mining opinion features in customer reviews, in “AAAI” 4, pp. 755–760.

Hu M. and Liu B. 2006, Opinion feature extraction using class sequential rules, in “AAAI Spring Symposium: Computational Approaches to AnalyzingWeblogs”, pp. 61–66.

Huang A. 2008, Similarity measures for text document clustering, in “Proceedings of the sixth new zealand computer science research student conference”, pp. 49-56, NZCSRSC2008 Christchurch, New Zealand.

Jin X., Li Y., Mah T. and Tong J. 2007, Sensitive webpage classification for content advertising, in “Procredings of the 1st international workshop on Data mining and audience intelligence for advertising”, pp. 28–33.

Khan K., Baharudin B. B. and City T. 2012, Identifying product features from customer reviews using lexical concordance, in “Research Journal of Applied Sciences Engineering and Technology” 4, pp. 833–839.

Laporte E. 1997, L’analyse de phrases adjectivales par rétablissement de noms appropriés, in “Langages” 31, pp. 79–104. Armand Colin.

Laporte E. 2012, Appropriate nouns with obligatory modifiers, in “arXiv”, preprint arXiv:1207.4625.

Laver M., Benoit K. and Garry J. 2003. Extracting policy positions from political texts using words as data, in “American Political Science Review” 97, pp. 311–331. Cambridge Univ Press.

Liu B. 2010, Sentiment analysis and subjectivity, in “Handbook of natural language processing” 2, pp. 627–666. Chapman & Hall Goshen, CT.

Lyding V., Stemle E., Borghetti C., Brunello M., Castagnoli S., Dell'Orletta F., Dittmann H., Lenci A. and Pirrelli V. 2014, The paisa corpus of italian web texts. In “Proceedings of the 9th Web as Corpus Workshop” (WaC-9), pp. 36-43.

Maisto A. and Pelosi S. 2014, A lexicon-based approach to sentiment analysis. The italian module for nooj, in “Proceedings of the International Nooj 2014 Conference”, University of Sassari, Italy. Cambridge Scholar Publishing.

Maisto A. 2017, A Hybrid Framework for Text Analysis. Ph.D Thesis to be published. Department of Political, Social and Communication Sciences. University of Salerno, Italy.

Mathieu Y.Y. 1999b, Un classement sémantique des verbes psychologiques, in “Cahiers du CIEL”. Publications Paris 7.

Mejova Y. and Srinivasan P. 2011, Exploring feature definition and selection for sentiment classifiers, in “ICWSM”.

Meunier A. 1984, La sémantique locative de certaines structures: N0 être adj, in “Revue québécoise de linguistique” 13, pp. 95–121. Université du Québec à Montréal.

Meillet A. 1906, La Phrase nominale en indoeuropéen. “Mémoires de la Société de linguistique de Paris”. Société de linguistique de Paris.

Meydan M. 1996, Constructions adjectivales, substantifs appropriés et verbes supports, in “Linx” 34, pp. 197–210. Centre de recherches linguistiques de Paris 10.

Meydan M. 1999, La restructuration du gn sujet dans les phrases adjectivales à substantif approprié, in “Langages”, pp 59–80.

Morton T., Kottmann J., Baldridge J. and Bierner G. 2005, Opennlp: A java-based nlp toolkit.

Moody L.A. 1999, Religio-Political Insights of 19th Century Women Hymnists and Lyric Poets. http://www.janushead.org/JHSumm99/moody.cfm (7.12.2010).

Mullen T. and Malouf R. 2006, A preliminary investigation into sentiment analysis of informal political discourse, in “AAAI Spring Symposium: Computational Approaches to AnalyzingWeblogs”, pp. 159–162.

Nakayama M., Sutcliffe N. and Wan Y. 2010, Has the web transformed experience goods into search goods?, in “Electronic Markets” 20, pp. 251–262. Springer.

Newman M.E. 2006b, Modularity and community structure in networks, in “Proceedings of the national academy of sciences” 103(23), pp. 8577-8582.

Pelosi S. 2015, Sentita and doxa: Italian databases and tools for sentiment analysis purposes, in “Proceedings of the Second Italian Conference on Computational Linguistics” CLiC-it 2015, pp. 226–231. Accademia University Press.

Perelman C. et Olbrechts-Tyteca L. 1958, Traité de l’argumentation. La nouvelle rhétorique, P.U.F., Paris; trad. it. di Schick C., Mayer M. et Barassi E. 2001, Trattato dell’argomentazione. La nuova retorica, Einaudi, Torino.

Pianta E. and Zanoli R. 2007, Tagpro: A system for italian pos tagging based on svm, in “Intelligenza Artificiale” 4(2), pp. 8-9.

Piao S., Ananiadou S., Tsuruoka Y., Sasaki Y. and McNaught J. 2007, Mining opinion polarity relations of citations, in “InternationalWorkshop on Computational Semantics” (IWCS), pp. 366–371.

Popescu A.-M. and Etzioni O. 2007, Extracting product features and opinions from reviews, in “Natural language processing and text mining”, pp. 9–28. Springer.

Predelli S. 2010, From the Expressive to the Derogatory: On the Semantic Role for Non-Truth-Conditional Meaning, in Sawyer S. (ed.), New Waves in Philosophy of Language, Palgrave Macmillan, Houndmills/New York, pp. 164-185.

Qiu G., Liu B., Bu J. and Chen C. 2009, Expanding domain sentiment lexicon through double propagation, in “IJCAI” 9, pp. 1199–1204.

Reinkowski M. 2002, Kulturerbe oder Erblast? Zum Status der Turzismen in den Staaten Südosteuropas, insbesondere des Bosnischen, in “Mediterranean language review” 14 (2002), pp. 98-112.

Reinstein D. A. and Snyder C. M. 2005, The influence of expert reviews on consumer demand for experience goods: A case study of movie critics, in “The journal of industrial economics” 53, pp. 27–51.Wiley Online Library.

Reynolds K., Kontostathis A. and Edwards L. 2011, Using machine learning to detect cyberbullying, in “Machine Learning and Applications and Workshops, 2011 10th International Conference on” (ICMLA) 2, IEEE, pp. 241–244.

Riloff E., Patwardhan S. and Wiebe J. 2006, Feature subsumption for opinion analysis, in “Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing”, pp. 440–448. ACL.

Rosa J.G. 2001 (ed.), No Urubuquaquá, no Pinhém, Nova Fronteira, Rio de Janeiro.

Sagot B. 2010, The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French, in “7th international conference on Language Resources and Evaluation” (LREC 2010).

Schmid H. 1995, Treetagger| a language independent part-of-speech tagger. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart.

Schmitz B. 1975, Sexism in French language textbook, in Lafayette R. C. (ed.), The Cultural Revolution in Foreign Language Teaching, National Textbook Co., Skokie (IL), pp. 119-130.

Sebeok T. 1976, Contributions to the Doctrine of Signs, Indiana University Press, Bloomington; trad. it. di Pesaresi M. 1979, Contributi alla dottrina dei segni, Feltrinelli, Milano.

Seki Y., Eguchi K., Kando N. and Aono M. 2005, Multi-document summarization with subjectivity analysis at duc 2005, in “Proceedings of the Document Understanding Conference” (DUC).

Shen L., Satta G. and Joshi A. 2007, Guided learning for bidirectional sequence classification, in “ACL” 7, pp. 760-767.

Smedt T. D. and Daelemans W. 2012, Pattern for python, in “Journal of Machine Learning Research”, 13(Jun), pp. 2063-2067.

Somprasertsri G. and Lalitrojwong P. 2010, Mining feature-opinion in online customer reviews for opinion summarization, in “J. UCS” 16, pp. 938–955.

Sureka A., Goyal V., Correa D. and Mondal A. 2010, Generating domain-specific ontology from common-sense semantic network for target specific sentiment analysis, in “Proceedings of the fifth international conference of the Global WordNet Association”, pp. 1-8. Mumbai, India.

Taboada M., Anthony C. and Voll K. 2006, Methods for creating semantic orientation dictionaries, in “Proceedings of the 5th International Conference on Language Resources and Evaluation” LREC, Genova, Italy, pp. 427–432.

Terveen L., Hill W., Amento B., McDonald D. and Creter J. 1997, Phoaks: A system for sharing recommendations, in “Communications of the ACM” 40, pages 59–62. ACM.

Tesnière L. 1959, Eléments de syntaxe structurale. Klincksieck, Paris.

Thüne E.-M.e Leonardi S. 2009, I colori sotto la mia lingua. Scritture transculturali in tedesco, Aracne, Roma.

Tolone E. 2009, Les tables du Lexique-Grammaire au format TAL, in “MajecSTIC 2009” (pp. electronic-version).

Toutanova K., Klein D., Manning C. D. and Singer Y. 2003, Feature-rich part-of speech tagging with a cyclic dependency network, in “Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology” 1, pp. 173-180. ACL.

Vietri S. 2004, Lessico-grammatica dell’italiano. Metodi, descrizioni e applicazioni. UTET Università.

Wei C.-P., Chen Y.-M., Yang C.-S. and Yang C. C. 2010, Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews, in “Information Systems and E-Business Management” 8, pp. 149–167. Springer.

Wei W.and Gulla J. A. 2010, Sentiment learning on product reviews via sentiment ontology tree, in “Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics”, pp. 404-413. ACL.

Xia R. and Zong C. 2010, Exploring the use of word relation features for sentiment classification, in “Proceedings of the 23rd International Conference on Computational Linguistics: Posters”, pp. 1336–1344. ACL.

Xiang G., Fan B., Wang L., Hong J. and Rose C. 2012, Detecting offensive tweets via topical feature discovery over a large scale twitter corpus, in“Proceedings of the 21st ACM international conference on Information and knowledgemanagement”, pp. 1980–1984. ACM.

Ye Q., Law R., Gu B. and Chen W. 2011, The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings, in “Computers in Human Behavior” 27, pp. 634–639. Elsevier.

Yi J., Nasukawa, T., Bunescu R. and Niblack W. 2003, Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, in “DataMining”, 2003. ICDM

Third IEEE International Conference on, pp. 427–434.

Zhang L., Liu B., Lim S. H. and O’Brien-Strain E. 2010, Extracting and ranking product features in opinion documents, in “Proceedings of the 23rd international conference on computational linguistics: Posters”, pp. 1462–1470. Association for Computational Linguistics.

Zhang L. and Liu B. 2011, Identifying noun product features that imply opinions, in “Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers” 2, pp. 575–580. ACL.

Zhu F., Zhang X. 2006, The influence of online consumer reviews on the demand for experience goods: The case of video games, in “ICIS 2006 Proceedings”, p. 25.

Full Text: pdf


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.