Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation

602 4 archive 91485 disk0/00/00/06/02 2026-03-30 15:08:41 2026-04-08 11:27:47 2026-03-30 15:08:41 dataset show Panchenko Alexander Ruppert Eugen Faralli Stefano 0000-0003-3684-8815 Ponzetto Simone Paolo 0000-0001-7484-2049 Biemann Chris Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation 004 30510 This dataset contains the models for interpretable Word Sense Disambiguation (WSD) that were employed in Panchenko et al. (2017; the paper can be accessed at https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/EACL_Interpretability___FINAL__1_.pdf). The files were computed on a 2015 dump from the English Wikipedia. Their contents: Induced Sense Inventories: wp_stanford_sense_inventories.tar.gz This file contains 3 inventories (coarse, medium fine) Language Model (3-gram): wiki_text.3.arpa.gz This file contains all n-grams up to n=3 and can be loaded into an index Weighted Dependency Features: wp_stanford_lemma_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000.gz This file contains weighted word--context-feature combinations and includes their count and an LMI significance score Distributional Thesaurus (DT) of Dependency Features: wp_stanford_lemma_BIM_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000_simsortlimit200_feature expansion.gz This file contains a DT of context features. The context feature similarities can be used for context expansion For further information, consult the paper and the companion page: http://jobimtext.org/wsd/ eng 2017 10.7801/602 https://zenodo.org/records/485151 metadata cc_by_4 Panchenko, Alexander und Ruppert, Eugen und Faralli, Stefano und Ponzetto, Simone Paolo und Biemann, Chris (2017), <a href='https://madoc.bib.uni-mannheim.de/id/eprint/42007' target='new'>Unsupervised does not mean uninterpretable : the case for word sense induction and disambiguation</a> FALSE