Digital voice-of-customer processing

15 March, 2023

Web 2.0 has revolutionized the way we interact online, enabling increased user participation and collaboration. This has led to the rise of digital Voice of Customer (VoC) as a valuable source of information for businesses seeking to understand customer needs, opinions, and expectations. Machine learning techniques, such as topic modelling algorithms, have been developed to analyze large collections of digital VoC and extract relevant information. However, an issue that remains critical is the validation of the results obtained from these algorithms.

A recent paper proposes a structured approach for validating the results of topic modelling algorithms used in quality management applications. Digital VoC, particularly online reviews, offer a low-cost, unbiased, and reliable source of information for understanding customer opinions and requirements. Topic modelling algorithms can identify latent topics running through a collection of unstructured textual documents, but standardized procedures to evaluate their outputs are still lacking.

The paper discusses the validation of topic modelling algorithms, which involves identifying latent topics in a set of documents. The most commonly used metric for evaluating the performance of topic modelling algorithms is the held-out likelihood, which measures the likelihood of new, unseen documents being generated by the model. Other metrics include semantic coherence and exclusivity, but they do not fully consider the semantic meaning of the topics.

Therefore, the paper proposes a supervised approach to validate the results of topic modelling algorithms, which involves human evaluators classifying a sample of documents based on identified quality determinants. The proposed approach can provide a more comprehensive assessment of the quality of the identified topics. A case study on car-sharing provider reviews is presented to illustrate the method.

The proposed method suggests the use of dynamic thresholds to identify the most relevant topics in a document, using the Tukey fence non-parametric outlier detection method to detect outliers in the distribution. The results obtained from human topic assignment are compared with those generated by the topic modelling algorithm to calculate validation metrics such as accuracy, precision, recall, and negative predictive value. The proposed method provides a more accurate identification of relevant topics and allows for a preliminary assessment of the goodness of results.

The paper provides a practical method for compiling a confusion matrix, the adoption of a dynamic threshold, and a comprehensive set of metrics to allow comparison between the outputs of topic modelling algorithms and human-supervised classification. The proposed metrics have a well-defined co-domain and specific target values, allowing an immediate evaluation of the quality of the results obtained. Overall, the paper highlights the need for supervised validation and proposes a structured procedure that can become a reference for all practitioners who must face the problem of empirical validating the results of an analysis of the digital VoC.

Authors

Barravecchia, F., Mastrogiacomo, L., & Franceschini, F. (2022). Digital voice-of-customer processing by topic modelling algorithms: insights to validate empirical results. International Journal of Quality & Reliability Management, 39(6), 1453-1470. https://doi.org/10.1108/IJQRM-07-2021-0217
 

Tags