This dataset contains the models for interpretable Word Sense Disambiguation (WSD) that were employed in Panchenko et al. (2017; the paper can be accessed at https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/EACL_Interpretability___FINAL__1_.pdf).

The files were computed on a 2015 dump from the English Wikipedia. Their contents:

    Induced Sense Inventories: wp_stanford_sense_inventories.tar.gz
    This file contains 3 inventories (coarse, medium fine)
    Language Model (3-gram): wiki_text.3.arpa.gz
    This file contains all n-grams up to n=3 and can be loaded into an index
    Weighted Dependency Features: wp_stanford_lemma_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000.gz
    This file contains weighted word--context-feature combinations and includes their count and an LMI significance score
    Distributional Thesaurus (DT) of Dependency Features: wp_stanford_lemma_BIM_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000_simsortlimit200_feature expansion.gz
    This file contains a DT of context features. The context feature similarities can be used for context expansion

For further information, consult the paper and the companion page: http://jobimtext.org/wsd/