Web Data Commons - Gold Standard for Product Matching
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - Gold Standard for Product Matching |
Alternative Title: | Gold standard containing manually labeled product entity correspondences for the product categories: phones, televisions, and headphones. |
Date: | 2015 |
Creator: | Petrovski, Petar ; Primpeli, Anna ; Meusel, Robert ; Bizer, Christian |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | product corpus ; gold standard ; product matching |
Abstract: | We manually generated 1,500 positive correspondences, 500 for each product category: phones, headphones, and televisions. For each product of the product catalog at least one positive correspondence is included. Additionally, to make the matching task more realistic the annotators also annotate closely related products to the once in the product catalog like: phone cases, TV wall mounts or headphone cables, ear-buds, etc. Furthermore we created additional negative correspondences exploiting transitive closure. As all products in the product catalog are distinct, we can generate for all product descriptions contained in web pages, where a positive correspondence exist to a product in the catalog, for all other products in the catalog a negative correspondence to this product on the web page. Using the two approaches we ended up with 73,500 negative correspondences. |
URL: | https://madata.bib.uni-mannheim.de/218/ |
---|---|
DOI: | https://doi.org/10.7801/218 |
Availability (Controlled): | Download |
Availability: | Data is available as JSON files. The product catalog and the gold standard can be found here: http://data.dws.informatik.uni-mannheim.de/productcrawl/product-matching-gold-standard/ |
Publication(s) (MADOC): |
Meusel Robert und Primpeli Anna und Meilicke Christian und Paulheim Heiko und Bizer Christian (2015), Exploiting microdata annotations to consistently categorize product offers at web scale |
Project: |
Project Title: Web Data Commons - Product Corpus Project Description: Creation of input data to support and evaluate product matching and product feature extraction methods. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 19 May 2017 11:54 |
Last Modified: | 16 Jun 2017 10:47 |
Actions (login required)
View Item |