<?xml version="1.0" encoding="UTF-8" ?>
<abstract xmlns="http://eprints.org/ep2/data/2.0">This dataset contains the models for interpretable Word Sense Disambiguation (WSD) that were employed in Panchenko et al. (2017; the paper can be accessed at https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/EACL_Interpretability___FINAL__1_.pdf).&#13;
&#13;
The files were computed on a 2015 dump from the English Wikipedia. Their contents:&#13;
&#13;
    Induced Sense Inventories: wp_stanford_sense_inventories.tar.gz&#13;
    This file contains 3 inventories (coarse, medium fine)&#13;
    Language Model (3-gram): wiki_text.3.arpa.gz&#13;
    This file contains all n-grams up to n=3 and can be loaded into an index&#13;
    Weighted Dependency Features: wp_stanford_lemma_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000.gz&#13;
    This file contains weighted word--context-feature combinations and includes their count and an LMI significance score&#13;
    Distributional Thesaurus (DT) of Dependency Features: wp_stanford_lemma_BIM_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000_simsortlimit200_feature expansion.gz&#13;
    This file contains a DT of context features. The context feature similarities can be used for context expansion&#13;
&#13;
For further information, consult the paper and the companion page: http://jobimtext.org/wsd/</abstract>
