<?xml version='1.0' encoding='utf-8'?>
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
  <eprint id='https://madata.bib.uni-mannheim.de/id/eprint/602'>
    <eprintid>602</eprintid>
    <rev_number>4</rev_number>
    <eprint_status>archive</eprint_status>
    <userid>91485</userid>
    <dir>disk0/00/00/06/02</dir>
    <datestamp>2026-03-30 15:08:41</datestamp>
    <lastmod>2026-04-08 11:27:47</lastmod>
    <status_changed>2026-03-30 15:08:41</status_changed>
    <type>dataset</type>
    <metadata_visibility>show</metadata_visibility>
    <creators>
      <item>
        <name>
          <family>Panchenko</family>
          <given>Alexander</given>
        </name>
      </item>
      <item>
        <name>
          <family>Ruppert</family>
          <given>Eugen</given>
        </name>
      </item>
      <item>
        <name>
          <family>Faralli</family>
          <given>Stefano</given>
        </name>
        <orcid>0000-0003-3684-8815</orcid>
      </item>
      <item>
        <name>
          <family>Ponzetto</family>
          <given>Simone Paolo</given>
        </name>
        <orcid>0000-0001-7484-2049</orcid>
      </item>
      <item>
        <name>
          <family>Biemann</family>
          <given>Chris</given>
        </name>
      </item>
    </creators>
    <title>Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation</title>
    <subjects>
      <item>004</item>
    </subjects>
    <divisions>
      <item>30510</item>
    </divisions>
    <abstract>This dataset contains the models for interpretable Word Sense Disambiguation (WSD) that were employed in Panchenko et al. (2017; the paper can be accessed at https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/EACL_Interpretability___FINAL__1_.pdf).

The files were computed on a 2015 dump from the English Wikipedia. Their contents:

    Induced Sense Inventories: wp_stanford_sense_inventories.tar.gz
    This file contains 3 inventories (coarse, medium fine)
    Language Model (3-gram): wiki_text.3.arpa.gz
    This file contains all n-grams up to n=3 and can be loaded into an index
    Weighted Dependency Features: wp_stanford_lemma_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000.gz
    This file contains weighted word--context-feature combinations and includes their count and an LMI significance score
    Distributional Thesaurus (DT) of Dependency Features: wp_stanford_lemma_BIM_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000_simsortlimit200_feature expansion.gz
    This file contains a DT of context features. The context feature similarities can be used for context expansion

For further information, consult the paper and the companion page: http://jobimtext.org/wsd/</abstract>
    <ubma_abstract_language>eng</ubma_abstract_language>
    <date>2017</date>
    <id_number>10.7801/602</id_number>
    <ubma_external_identifier>https://zenodo.org/records/485151</ubma_external_identifier>
    <ubma_access>metadata</ubma_access>
    <ubma_eprint_license>cc_by_4</ubma_eprint_license>
    <ubma_publications>
      <item>Panchenko, Alexander und Ruppert, Eugen und Faralli, Stefano und Ponzetto, Simone Paolo und Biemann, Chris (2017), &lt;a href=&apos;https://madoc.bib.uni-mannheim.de/id/eprint/42007&apos; target=&apos;new&apos;&gt;Unsupervised does not mean uninterpretable : the case for word sense induction and disambiguation&lt;/a&gt;</item>
    </ubma_publications>
    <ubma_id_number_checked>FALSE</ubma_id_number_checked>
  </eprint>
</eprints>
