A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts

Korzeniewski, Jerzy

dc.contributor.author	Korzeniewski, Jerzy
dc.date.accessioned	2021-03-05T12:48:56Z
dc.date.available	2021-03-05T12:48:56Z
dc.date.issued	2020-12-15
dc.identifier.issn	0208-6018
dc.identifier.uri	http://hdl.handle.net/11089/34127
dc.description.abstract	The measures of the semantic relatedness of concepts can be categorised into two types: knowledge‑based methods and corpus‑based methods. Knowledge‑based techniques make use of man‑created dictionaries, thesauruses and other artefacts as a source of knowledge. Corpus‑based techniques assess the semantic similarity of two concepts making use of large corpora of text documents. Some researchers claim that knowledge‑based measures outperform corpus‑based ones, but it is much more important to observe that the latter ones are heavily corpus dependent. In this article, we propose to modify the best WordNet‑based method of assessing semantic relatedness, i.e. the Leacock‑Chodorow measure. This measure has proven to be the best in several studies and has a very simple formula. We asses our proposal on the basis of two popular benchmark sets of pairs of concepts, i.e. the Ruben‑Goodenough set of 65 pairs of concepts and the Fickelstein set of 353 pairs of terms. The results prove that our proposal outperforms the traditional Leacock‑Chodorow measure.	en
dc.description.abstract	Miary semantycznego podobieństwa pojęć można podzielić na dwa rodzaje: metody oparte na wiedzy i metody oparte na bazie tekstów. Techniki oparte na wiedzy stosują stworzone przez człowieka słowniki oraz inne opracowania. Techniki oparte na bazie tekstów oceniają podobieństwo semantyczne dwóch pojęć, odwołując się do obszernych baz dokumentów tekstowych. Niektórzy badacze twierdzą, że miary oparte na wiedzy są lepsze jakościowo od tych opartych na bazie tekstów, ale o wiele istotniejsze jest to, że te drugie zależą bardzo mocno od użytej bazy tekstów. W niniejszym artykule przedstawiono propozycję modyfikacji najlepszej metody pomiaru semantycznego podobieństwa pojęć, opartej na sieci WordNet, a mianowicie miary Leacock‑Chodorowa. Ta miara była najlepsza w kilku eksperymentach badawczych oraz można zapisać ją za pomocą prostej formuły. Nową propozycję oceniono na podstawie dwóch popularnych benchmarkowych zbiorów par pojęć, tj. zbioru 65 par pojęć Rubensteina‑Goodenougha oraz zbioru 353 par pojęć Fickelsteina. Wyniki pokazują, że przedstawiona propozycja spisała się lepiej od tradycyjnej miary Leacock‑Chodorowa.	pl
dc.language.iso	en
dc.publisher	Wydawnictwo Uniwersytetu Łódzkiego	pl
dc.relation.ispartofseries	Acta Universitatis Lodziensis. Folia Oeconomica;351	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0
dc.subject	text mining	en
dc.subject	WordNet network	en
dc.subject	semantic relatedness	en
dc.subject	Lecock-Chodorov measure	en
dc.subject	badanie tekstu	pl
dc.subject	Sieć WordNet	pl
dc.subject	podobieństwo semantyczne słów	pl
dc.subject	miara Leacock‑Chodorowa	pl
dc.title	A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts	en
dc.title.alternative	Modyfikacja miary semantycznego podobieństwa pojęć Leacock‑Chodorowa	pl
dc.type	Article
dc.page.number	97-106
dc.contributor.authorAffiliation	University of Łódź, Faculty of Economics and Sociology, Department of Demography, Łódź, Poland	en
dc.identifier.eissn	2353-7663
dc.references	Bird S., Loper E., Klein E. (2009), Natural Language Processing with Python, O’Reilly Media Inc., Sebastopol.	en
dc.references	Budanitsky A., Hirst G. (2006), Evaluating WordNet‑based Measures of Lexical Semantic Relatedness, “Computational Linguistics”, vol. 32, issue 1, pp. 13–47.	en
dc.references	Fellbaum Ch. (ed.) (1998), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge.	en
dc.references	Hirst G., St‑Onge D. (1998), Lexical chains as representations of context for the detection and correction of malapropisms, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 305–332.	en
dc.references	Jiang J., Conrath D. (1997), Semantic similarity based on corpus statistics and lexical taxonomy, Proceedings of International Conference on Research in Computational Linguistics, Taiwan, pp. 19–33.	en
dc.references	Leacock C., Chodorow M. (1998), Combining local context and WordNet similarity for word sense identification, [in:] Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, Cambridge, pp. 265–283.	en
dc.references	Lin D. (1998), Automatic retrieval and clustering of similar words, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING–ACL ’98), Montreal, pp. 296–304.	en
dc.references	McInnes B., Pedersen T., Liu Y., Melton G., Pakhomov S. (2014), U‑path: An undirected path‑based measure of semantic similarity, Proceedings of the Annual Symposium of the American Medical Informatics Association, Washington, pp. 882–891.	en
dc.references	Resnick P. (1995), Using information content to evaluate semantic similarity, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, pp. 448–453.	en
dc.references	Wu Z., Palmer M. (1994), Verbs semantics and lexical selection, Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ACL ’94, Association for Computational Linguistics, Stroudsburg, pp. 133–138.	en
dc.references	Zugang C., Jia S., Yaping Y. (2018), An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources, “International Journal of Geo‑Information”, vol. 7(3), pp. 98–12.	en
dc.contributor.authorEmail	jerzy.korzeniewski@uni.lodz.pl
dc.identifier.doi	10.18778/0208-6018.351.06
dc.relation.volume	6

Pliki tej pozycji

Nazwa:: 7817-Article_Text-24558-1-10-2 ...
Rozmiar:: 680.0KB
Format:: PDF

Oglądaj/Otwórz

Pozycja umieszczona jest w następujących kolekcjach

Acta Universitatis Lodziensis. Folia Oeconomica nr 351(6)/2020 [6]

Pokaż uproszczony rekord

Poza zaznaczonymi wyjątkami, licencja tej pozycji opisana jest jako https://creativecommons.org/licenses/by/4.0