Web Data Commons - Gold Standard for Feature Extraction
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - Gold Standard for Feature Extraction |
Alternative Title: | Gold standard containing manually labeled product entity features for the product categories: phones, televisions, and headphones. |
Date: | 2015 |
Creator: | Petrovski, Petar, Primpeli, Anna, Meusel, Robert and Bizer, Christian |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | product corpus ; gold standard ; product matching ; feature extraction |
Abstract: | We labeled 4 distinct structural units from the HTML pages: (1) Microdata title, (2) Microdata description, (3) HTML tables and (4) HTML lists. The labeled set comprises out of 500 product entities, while the distinct labeled properties are 338 in total. It was created by three different annotators. The product entities were labeled as JSON objects. |
URL: | https://madata.bib.uni-mannheim.de/217/ |
---|---|
DOI: | https://doi.org/10.7801/217 |
Availability (Controlled): | Download |
Availability: | Data is available as a JSON file. Files per product category can be found here: http://webdatacommons.org/productcorpus/index.html#toc3 |
Publication(s) (MADOC): |
Meusel Robert und Primpeli Anna und Meilicke Christian und Paulheim Heiko und Bizer Christian (2015), Exploiting microdata annotations to consistently categorize product offers at web scale |
Project: |
Project Title: Web Data Commons - Product Corpus Project Description: Creation of input data to support and evaluate product matching and product feature extraction methods. |
File | Filename / Infos | Link |
---|---|---|
Text
Filename: WDC_GoldStandard_FeatureExtraction.json |
Download (1MB)
|
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 19 May 2017 11:40 |
Last Modified: | 16 Jun 2017 10:46 |
Actions (login required)
View Item |