Bibliographic data: a different analysis perspective


A bibliografic record, related to a product, is composed by different information: authors, year, source, publisher, keywords, abstract, citations and so on. Citations usually have a central role in bibliometric analysis. The study of textual information could be a different analysis perspective. The idea is that documents are mixture of latent topics, where a topic is a probability distribution over words. In this paper we try to show how the scientificic productivity of a research group can be described using topic models. Moreover, for the same sample, we test if the other bibliometric measures follow the known distribution laws.

DOI Code: 10.1285/i20705948v5n3p353

Keywords: Text mining; topic models; bibliometrics; distribution laws


. Blei, D. M. (2011). Introduction to Probabilistic Topic Models. Princeton University.

. Blei, D.M., Lafferty, J.D. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 113-120.

. Blei, D.M., Lafferty, J.D. (2007). A correlated topic model of science. The Annals of Applied Statistics.

. Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. The Journal of Machine Learning Research.

. Bornmann, L., Mutz, R., Neuhaus, C., Daniel, H. (2008). Citation counts for research evaluation: standards of good practice for analyzing bibliometric data and presenting and interpreting results. Ethics in Science and Environmental Politics.

. Bradford, S.C. (1934). Sources of information on specific subjects. Engineering, 137, 85-6.

. De Battisti, F., Salini, S. (2012). Robust analysis of bibliometric data. Statistical Methods & Applications. In press. DOI 10.1007/s10260-012-0217-0.

. Ferrara, A., Salini, S. (2012). Ten challenges in bibliographic data for bibliometrics analysis. Scientometrics, 93-3, 765-785.

. Griffiths, T., Steyvers, M. (2004). Finding scientific topics. Proceeding of the National Academy of Sciences.

. Grün, B., Hornik, K. (2011). topicsmodels: An R Package for fitting topic models. Journal of Statistical Software.

. Hubert, J.J. (1977). Bibliometric Models for Journal Productivity. Social Indicators Research.

. Lotka, A.J. (1926). The frequency of distribution of scientific productivity. Journal of the Washington Academy of Science.

. McRoberts, M.H., McRoberts, B.R. (1982). A Re-Evaluation of Lotka’s Law of Scientific Productivity. Social Studies of Science.

. Newman, M.E.J. (2006). Power laws, Pareto distribution and Zipf’s law. arXiv:cond-mat/0412004v3.

. O’Connor, D.O., Voos, H. (1981). Empirical Laws, Theory Construction and Bibliometrics. Library Trends.

. Potter, W.G. (1981). Lotka’s Law Revisited. Library Trends.

. Price, D.J. De S. (1965). Networks of scientific papers. Science, 149, 510-515.

. Render, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B.

. Steyvers, M., Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis.

. Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.