Web Data Commons - Product Data Corpus
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - Product Data Corpus |
Alternative Title: | Product Data Corpus for product matching and product feature extraction |
Date: | 2015 |
Creator: | Bizer, Christian, Petrovski, Petar, Meusel, Robert and Primpeli, Anna |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | product corpus |
Abstract: | A product data corpus containing over 5.6 million product records retrieved from the most visited 32 shopping websites based on the ranking provided by Alexa. The provided corpus evolves around three different product categories: Mobile Phones, Headphones and Televisions. |
URL: | https://madata.bib.uni-mannheim.de/216/ |
---|---|
DOI: | https://doi.org/10.7801/216 |
Availability (Controlled): | Download |
Availability: | Data is available as NQuads and WARC files. Download source: http://data.dws.informatik.uni-mannheim.de/productcrawl/crawl-data-general/ |
Publication(s) (MADOC): |
Meusel Robert und Primpeli Anna und Meilicke Christian und Paulheim Heiko und Bizer Christian (2015), Exploiting microdata annotations to consistently categorize product offers at web scale |
Reference URL (External): |
http://webdatacommons.org/productcorpus/index.html |
Project: |
Project Title: Web Data Commons - Product Corpus Project Description: Creation of input data to support and evaluate product matching and product feature extraction methods. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 19 May 2017 11:28 |
Last Modified: | 16 Jun 2017 10:46 |
Actions (login required)
View Item |