Web Data Commons - RDFa, Microdata, and Microformat August 2012 Data Set
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - RDFa, Microdata, and Microformat August 2012 Data Set |
Alternative Title: | Microformat, Microdata and RDFa data from the August 2012 Common Crawl web corpus |
Date: | August 2012 |
Creator: | Bizer, Christian ; Meusel, Robert |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | microformats ; microdata ; RDFa ; schema.org ; structured data |
Abstract: | Microformat, Microdata and RDFa data from the August 2012 Common Crawl web corpus. We found structured data within 369 million HTML pages out of the 3 billion pages contained in the crawl (12%). These pages originate from 2.2 million different pay-level-domains out of the 40 million pay-level-domains covered by the crawl (5%). Altogether, the extracted data sets consist of 7.3 billion RDF quads. |
URL: | https://madata.bib.uni-mannheim.de/205/ |
---|---|
DOI: | https://doi.org/10.7801/205 |
Availability (Controlled): | Download |
Availability: | Data is available as N-Quads for downloading. Download source: http://webdatacommons.org/structureddata/2012-08/stats/how_to_get_the_data.html |
Publication(s) (MADOC): |
Meusel Robert und Bizer Christian und Paulheim Heiko (2015), A web-scale study of the adoption and evolution of the schema.org vocabulary over time
Meusel Robert und Paulheim Heiko (2015), Heuristics for fixing common errors in deployed schema.org microdata Meusel Robert und Petrovski Petar und Bizer Christian (2014), The WebDataCommons Microdata, RDFa and Microformat Dataset Series Bizer Christian und Eckert Kai und Meusel Robert und Mühleisen Hannes und Schuhmacher Michael und Völker Johanna (2013), Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis |
Reference URL (External): |
http://webdatacommons.org/structureddata/2012-08/s...
http://webdatacommons.org/structureddata/2012-08/s... |
Project: |
Project Title: Web Data Commons - RDFa, Microdata, and Microformat Data Sets Project Description: More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 15 May 2017 14:45 |
Last Modified: | 16 Jun 2017 10:43 |
Actions (login required)
View Item |