Alipour, Ghafour and Bagherzadeh Mohasefi, Jamshid and Feizi-Derakhshi, Mohammad-Reza (2022) Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN. Applied Artificial Intelligence, 36 (1). ISSN 0883-9514
Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN.pdf - Published Version
Download (4MB)
Abstract
Cross-lingual word embeddings display words from different languages in the same vector space. They provide reasoning about semantics, compare the meaning of words across languages and word meaning in multilingual contexts, necessary to bilingual lexicon induction, machine translation, and cross-lingual information retrieval. This paper proposes an efficient approach to learn bilingual transform mapping between monolingual word embeddings in language pairs. We choose ten different languages from three different language families and downloaded their last update Wikipedia dumps1 Then, with some pre-processing steps and using word2vec, we produce word embeddings for them. We select seven language pairs from chosen languages. Since the selected languages are relative, they have thousands of identical words with similar meanings. With these identical dictation words and word embedding models of each language, we create training, validation and, test sets for the language pairs. We then use a generative adversarial network (GAN) to learn the transform mapping between word embeddings of source and target languages. The average accuracy of our proposed method in all language pairs is 71.34%. The highest accuracy is achieved for the Turkish-Azerbaijani language pair with the accuracy 78.32%., which is noticeably higher than prior methods.
Item Type: | Article |
---|---|
Subjects: | GO for STM > Computer Science |
Depositing User: | Unnamed user with email support@goforstm.com |
Date Deposited: | 15 Jun 2023 08:09 |
Last Modified: | 03 Nov 2023 03:55 |
URI: | http://archive.article4submit.com/id/eprint/1082 |