Web Data Commons - Web Table Corpus 2015
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - Web Table Corpus 2015 |
Alternative Title: | Web table corpus extracted from the July 2015 Common Crawl |
Date: | July 2015 |
Creator: | Bizer, Christian, Meusel, Robert, Lehmberg, Oliver, Ritze, Dominique and Zope, Sanikumar |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | web tables ; relational tables ; entity tables |
Abstract: | This dataset contains tables from the WDC Web Table Corpus 2015 that can be described as entity and relational. An entity table usually describes exactly one entity with several attributes while the name of the entity itself is not contained in the table but can be concluded by considering context. Off all 233 milliom extracted tables, 139,687,207 tables are of type entity. In relational tables, a set of similar entities is described with one or more attributes. |
URL: | https://madata.bib.uni-mannheim.de/209/ |
---|---|
DOI: | https://doi.org/10.7801/209 |
Availability (Controlled): | Download |
Availability: | The complete corpus has 99 archives with JSON files. The complete corpus 2015 is available here: http://webdatacommons.org/webtables/2015/downloadInstructions.html |
Publication(s) (MADOC): |
Lehmberg Oliver und Ritze Dominique und Meusel Robert und Bizer Christian (2016), A large public corpus of web tables containing time and context metadata |
Reference URL (External): |
http://webdatacommons.org/webtables/2015/entitySta...
http://webdatacommons.org/webtables/2015/relationa... http://webdatacommons.org/webtables/2015/downloadI... |
Project: |
Project Title: Web Data Commons - Web Tables Project Description: The Web contains vast amounts of HTML tables. Most of these tables are used for layout purposes, but a fraction of the tables is also quasi-relational, meaning that they contain structured data describing a set of entities, and are thus useful in application contexts such as data search, table augmentation, knowledge base construction, and for various NLP tasks. The WDC Web Tables data set consists of millions of relational Web tables that are contained in HTML tables found in the Common Crawl. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 15 May 2017 15:35 |
Last Modified: | 16 Jun 2017 10:45 |
Actions (login required)
View Item |