Measuring associational thinking through word embeddings

Periñán-Pascual, Carlos

Repository landing page

oai:riunet.upv.es:10251/188802

Measuring associational thinking through word embeddings

Authors: Carlos Periñán-Pascual
Publication date: 1 March 2022
Publisher: Springer-Verlag
Doi

Abstract

[EN] The development of a model to quantify semantic similarity and relatedness between words has been the major focus of many studies in various fields, e.g. psychology, linguistics, and natural language processing. Unlike the measures proposed by most previous research, this article is aimed at estimating automatically the strength of associative words that can be semantically related or not. We demonstrate that the performance of the model depends not only on the combination of independently constructed word embeddings (namely, corpus- and network-based embeddings) but also on the way these word vectors interact. The research concludes that the weighted average of the cosine-similarity coefficients derived from independent word embeddings in a double vector space tends to yield high correlations with human judgements. Moreover, we demonstrate that evaluating word associations through a measure that relies on not only the rank ordering of word pairs but also the strength of associations can reveal some findings that go unnoticed by traditional measures such as Spearman's and Pearson's correlation coefficients.s Financial support for this research has been provided by the Spanish Ministry of Science, Innovation and Universities [grant number RTC 2017-6389-5], the Spanish ¿Agencia Estatal de Investigación¿ [grant number PID2020-112827GB-I00 / AEI / 10.13039/501100011033], and the European Union¿s Horizon 2020 research and innovation program [grant number 101017861: project SMARTLAGOON]. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Periñán-Pascual, C. (2022). Measuring associational thinking through word embeddings. Artificial Intelligence Review. 55(3):2065-2102. https://doi.org/10.1007/s10462-021-10056-620652102553Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of the 2009 annual conference of the North American chapter of the ACL, pp. 19–27Agirre E, Soroa A (2009) Personalizing page rank for word sense disambiguation. In: Proceedings of the 12th conference of the European chapter of the ACL, pp. 33–41Akhtar N, Sufyan Beg MM, Javed H (2019) Topic modelling with fuzzy document representation. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Kashyap R (eds) Advances in computing and data sciences. ICACDS, (2019) Communications in computer and information science, vol 1046. Springer, Singapore, pp 577–587Artetxe M, Labaka G, Agirre E (2016) Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 2289-2294Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, pp. 805-810Banjade R, Maharjan N, Niraula NB, Rus V, Gautam D (2015) Lemon and tea are not similar: measuring word-to-word similarity by combining different methods. In: Proceedings of the 16th international conference on intelligent text processing and computational linguistics, pp. 335–346Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the ACL, pp. 238-247Bengio Y, Senécal JS (2003) Quick training of probabilistic neural nets by importance sampling. Proceedings of artificial intelligence statistics 2003:1–9Bengio Y, Ducharme J, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155Bhatia S (2017) Associative judgment and vector space semantics. Psychol Rev 124(1):1–20Bhutada S, Balaram VVSSS, Bulusu VV (2016) Semantic latent dirichlet allocation for automatic topic extraction. J Inf Optim Sci 37(3):449–469Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia - a crystallization point for the Web of Data. J Web Semant 7(3):154–165Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250Bollegala D, Alsuhaibani M, Maehara T, Kawarabayashi K (2016) Joint word representation learning using a corpus and a semantic lexicon. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp. 2690–2696Bruni E, Boleda G, Baroni M, Tran NK (2012) Distributional semantics in technicolor. In: Proceedings of the 50th annual meeting of the ACL, vol. 1, pp. 136–145Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Proceedings of the 2nd meeting of the North American chapter of the ACL. Workshop on WordNet and other lexical resources, pp. 29–34Budhkar A, Rudzicz F (2019) Augmenting Word2Vec with latent dirichlet allocation within a clinical application. In: Proceedings of the 2019 conference of the North American chapter of the ACL: Human language technologies, vol. 1, pp. 4095–4099Camacho-Collados J, Pilehvar MT (2018) From word to sense embeddings: a survey on vector representations of meaning. J Artif Intell Res 63:743–788Cambria E, Li Y, Xing FZ, Poria S, Kwok K (2020) SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM international conference on information and knowledge management, pp. 105–114Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of the 28th AAAI conference on artificial intelligence, pp. 1515–1521Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka ER, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: Proceedings of the 24th AAAI conference on artificial intelligence, pp. 1306–1313Cattle A, Ma X (2017) Predicting word association strengths. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 1283–1288Chandar S, Lauly S, Larochelle H, Khapra M, Ravindran B, Raykar V, Saha A (2014) An autoencoder approach to learning bilingual word representations. In: Proceedings of the 27th annual conference on advances in neural information processing systems, pp. 1853–1861Coates JN, Bollegala D (2018) Frustratingly easy meta-embedding – Computing meta-embeddings by averaging source word embeddings. In: Proceedings of the 2018 conference of the North American chapter of the ACL: Human language technologies, pp. 194–198Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp. 160–167Dacey M (2019) Association and the mechanisms of priming. J Cognit Sci 20(3):281–321Dai Z, Yang Z, Yang Y, Carbonell JG, Le QV, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th annual meeting of the ACL, pp. 2978–2988De Deyne S, Navarro DJ, Perfors A, Brysbaert M, Storms G (2019) The ‘Small World of Words’ English word association norms for over 12,000 cue words. Behav Res Methods 51:987–1006De Deyne S, Perfors A, Navarro DJ (2016) Predicting human similarity judgments with distributional models: the value of word associations. In: Proceedings of the 26th international conference on computational linguistics, pp. 1861–1870De Deyne S, Verheyen S, Storms G (2015) The role of corpus size and syntax in deriving lexico-semantic representations for a wide range of concepts. Q J Exp Psychol 68(8):1643–1664De Souza JVA, Oliveira LES, Gumiel YB, Carvalho DR, Moro CMB (2019) Incorporating multiple feature groups to a siamese neural network for semantic textual similarity task in Portuguese texts. In: Proceedings of the ASSIN 2 shared task: Evaluating semantic textual similarity and textual entailment in Portuguese, XII symposium in information and human language technology, pp. 59–68Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407Demotte P, Senevirathne L, Karunanayake B, Munasinghe U, Ranathunga S (2020) Sentiment analysis of Sinhala news comments using sentence-state LSTM networks. In: Proceedings of the 2020 Moratuwa engineering research conference, pp. 283–288Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the ACL: Human language technologies, vol. 1, pp. 4171–4186Du Y, Wu Y, Lan M (2019) Exploring human gender stereotypes with word association test. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp. 6133–6143El Mahdaouy A, El Alaoui SO, Gaussier E (2018) Improving Arabic information retrieval using word embedding similarities. Int J Speech Technol 21:121–136Erk K (2012) Vector space models of word meaning and phrase meaning: a survey. Lang Linguist Compass 6(10):635–653Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of the 14th conference of the European chapter of the ACL, pp. 462–471Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2015) Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 conference of the North American chapter of the ACL: Human language technologies, pp. 1606–1615Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, CambridgeFinkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: The concept revisited. In: Proceedings of the 10th international conference on world wide web, pp. 406–414Firth JR (1957) Papers in linguistics 1934–1951. Oxford University Press, OxfordGanitkevitch J, Van Durme B, Callison-Burch C (2013) PPDB: The paraphrase database. In: Proceedings of the 2013 conference of the North American chapter of the ACL: Human language technologies, pp. 758–764Garimella A, Banea C, Mihalcea R (2017) Demographic-aware word associations. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 2285–2295Gilligan TM, Rafal RD (2019) An opponent process cerebellar asymmetry for regulating word association priming. Cerebellum 18:47–55Gladkova A, Drozd A (2016) Intrinsic evaluations of word embeddings: What can we do better? In: Proceedings of the 1st workshop on evaluating vector space representations for NLP, pp. 36–42Goikoetxea J, Soroa A, Agirre E (2015) Random walks and neural network language models on knowledge bases. Proceedings of the 2015 annual conference of the North American chapter of the ACL: Human language technologies, pp. 1434–1439Goikoetxea J, Agirre E, Soroa A (2016) Single or multiple? Combining word representations independently learned from text and WordNet. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp. 2608–2614Goldani MH, Momtazi S, Safabakhsh R (2021) Detecting fake news with capsule neural networks. Appl Soft Comput 101(1):1–8Gomez-Perez JM, Denaux R, Garcia-Silva A (2020) A practical guide to hybrid natural language processing. Springer, ChamGong P, Liu J, Yang Y, He H (2020) Towards knowledge enhanced language model for machine reading comprehension. IEEE Access 8:224837–224851Goodwin TR, Demner-Fushman D (2020) Enhancing question answering by injecting ontological knowledge through regularization. In: Proceedings of Deep Learning Inside Out (DeeLIO): The first workshop on knowledge extraction and integration for deep learning architectures, pp. 56–63Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the 11th international conference on language resources and evaluation, pp. 3483–3487Gross O, Doucet A, Toivonen H (2016) Language-independent multi-document text summarization with document-specific word associations. In: Proceedings of the ACM symposium on applied computing, pp. 853–860Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 855–864Grujić ND, Milovanović VM, (2019) Natural language processing for associative word predictions. In: Proceedings of the 18th international conference on smart technologies, pp. 1–6Guan J, Huang F, Zhao Z, Zhu X, Huang M (2020) A knowledge-enhanced pretraining model for commonsense story generation. Trans Assoc Comput Linguist 8:93–108Gunel B, Zhu C, Zeng M, Huang X (2020) Mind the facts: Knowledge-boosted coherent abstractive text summarization. In: Proceedings of the 33rd conference on neural information processing systems, pp. 1–7Günther F, Dudschig C, Kaup B (2016) Predicting lexical priming effects from distributional semantic similarities: a replication with extension. Front Psychol 7(1646):1–13Halawi G, Dror G, Gabrilovich E, Koren Y (2012) Large-scale learning of word relatedness with constraints. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1406–1414Harley TA (2014) The psychology of language: from data to theory. Psychology Press, HoveHarris ZS (1954) Distributional structure. Word 10(2–3):146–162Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on world wide web, pp. 517–526Hermann KM, Blunsom P (2013) Multilingual distributed representations without word alignment. In: Proceedings of the 2014 international conference on learning representations, pp. 1–9Higginbotham G, Munby I, Racine JP (2015) A Japanese word association database of English. Vocab Learn Instr 4(2):1–20Iacobacci I, Pilehvar MT, Navigli R (2015) Sensembed: Learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing, pp. 95–105Iacobacci I, Pilehvar MT, Navigli R (2016) Embeddings for word sense disambiguation: An evaluation study. In: Proceedings of the 54th annual meeting of the ACL, pp. 897–907Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp. 41–48Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446Jiang Y, Bai W, Zhang X, Hu J (2017) Wikipedia-based information content and semantic similarity computation. Inf Process Manag 53(1):248–265Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the international conference on research in computational linguistics, pp. 19–33Jingrui Z, Qinglin W, Yu L, Yuan L (2017) A method of optimizing LDA result purity based on semantic similarity. In: Proceedings of the 32nd youth academic annual conference of Chinese association of automation, pp. 361–365Jo Y, Alice O (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the 4th ACM international conference on web search and web data mining, pp. 815–824Johansson R, Pina LN (2015) Embedding a semantic network in a word space. In: Proceedings of the 2015 conference of the North American chapter of the ACL: Human language technologies, pp. 1428–1433Kang B (2018) Collocation and word association: comparing collocation measuring methods. Int J Corpus Linguist 23(1):85–113Katerenchuk D, Rosenberg A (2016) RankDCG: Rank-ordering evaluation measure. In: Proceedings of the 10th international conference on language resources and evaluation. European Language Resources Association, pp. 3675–3680Kiela D, Hill F, Clark S (2015) Specializing word embeddings for similarity or relatedness. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 2044–2048Kober T, Weeds J, Wilkie J, Reffin J, Weir D (2017) One representation per word - Does it make sense for composition? In: Proceedings of the 1st workshop on sense, concept and entity representations and their applications, pp. 79–90Kulkarni A, Mandhane M, Likhitkar M, Kshirsagar G, Jagdale J, Joshi R (2021) Experimental evaluation of deep learning models for Marathi text classification. https://arxiv.org/pdf/2101.04899.pdf. Accessed 26 February 2021Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, Cambridge (MA), pp 265–283Lebret R, Collobert R (2014) Word embeddings through Hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the ACL, pp. 482–490Lee YY, Ke H, Huang HH, Chen HH (2016) Combining word embedding and lexical database for semantic relatedness measurement. In: Proceedings of the 25th international conference companion on world wide web, pp. 73–74Lenci A (2018) Distributional models of word meaning. Ann Rev Linguist 4:151–171Lengerich BJ, Maas AL, Potts C (2017) Retrofitting distributional embeddings to knowledge graphs with functional relations. In: Proceedings of the 27th international conference on computational linguistics, pp. 2423–2436Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation, pp. 24–26Levy O, Goldberg Y (2014) Linguistic regularities in sparse and explicit word representations. In: Proceedings of the 18th conference on computational language learning, pp. 171–180Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning, pp. 296–304Liu T, Hu Y, Gao J, Sun Y, Yin B (2020a) Zero-shot text classification with semantically extended graph convolutional network. In: Proceedings of the 25th international conference on pattern recognition, pp. 8352–8359Liu Q, Kusner MJ, Blunsom P (2020b) A survey on contextual embeddings. arXiv:2003.07278. Accessed 15 June 2020Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instr Comput 28(2):203–208Luong MT, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the 17th conference on computational natural language learning, pp. 104–113Ma Q, Lee HY (2019) Measuring the vocabulary knowledge of Hong Kong primary school second language learners through word associations: Implications for reading literacy. In: Reynolds B, Teng M (eds) English literacy instruction for Chinese speakers. Palgrave Macmillan, Singapore, pp 35–56Mandera P, Keuleers E, Brysbaert M (2017) Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J Mem Lang 92:57–78Meng Y, Wang G, Liu Q (2019) Multi-layer convolutional neural network model based on prior knowledge of knowledge graph for text classification. In: Proceedings of the IEEE 4th international conference on cloud computing and big data analysis, pp. 618–624Mihaylov T, Frank A (2018) Knowledgeable reader: enhancing cloze-style reading comprehension with external commonsense knowledge. In: Proceedings of the 56th annual meeting of the ACL, pp. 821–832Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations workshop track, pp. 1301–3781Mikolov T, Le QV, Sutskever I (2013b) Exploiting similarities among languages for machine translation. arXiv:1309.4168. Accessed 5 May 2019Mikolov T, Yih WT, Zweig G (2013c) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the ACL: Human language technologies, pp. 746-751Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning based text classification: a comprehensive review. ACM Comput Surv 54(3):1–40Mnih A, Hinton G (2008) A scalable hierarchical distributed language model. In: Proceedings of the 21st international conference on neural information processing systems, pp. 1081–1088Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, pp. 246–252Mrkšić N, Vulić I, Séaghdha DÓ, Leviant I, Reichart R, Gašić M, Korhonen A, Young S (2017) Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. Trans Assoc Comput Linguist 5:309–324Navigli R, Ponzetto SP (2012) BabelNet: the automatic construction, evaluation and application of a wide-covera

Similar works

Full text

RiuNet

oai:riunet.upv.es:10251/188802

Last time updated on 01/11/2022

This paper was published in RiuNet.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.