A hybrid method for the extraction and classification of product features from user-generated contents


Abstract – The research we present in this paper focuses on the automatic management of the knowledge about experience goods and services and their features, starting from real texts generated online by internet users. The details about an experiment conducted on a dataset of product reviews, on which we tested a set of rule-based and statistical solutions, will be described in the paper. The main goals are the review classification, the extraction of relevant product features and their systematization into product-driven ontologies. Feature extraction is performed through a rule-based strategy grounded on SentIta, an Italian collection of subjective lexical resources. Features and Reviews are classified thanks to a Distributional Semantic algorithm. In the end, we face the problem of the extracted knowledge organization by integrating the subjective information produced by the internet users within a product-driven ontology. The Natural Language Processing (NLP) tool exploited in the work is LG-Starship, a hybrid framework for Italian texts processing based on the Lexicon-Grammar theory.

DOI Code: 10.1285/i22390359v22p137

Keywords: feature extraction; review classification; opinion mining; distributional semantics; feature ontology


