Web Data Commons - RDFa, Microdata, and Microformat October 2016 Data Set
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - RDFa, Microdata, and Microformat October 2016 Data Set |
Alternative Title: | Microformat, Microdata and RDFa data from the October 2016 Common Crawl web corpus |
Date: | October 2016 |
Creator: | Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237, Meusel, Robert and Primpeli, Anna |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | microformats ; microdata ; RDFa ; schema.org ; structured data |
Abstract: | Microformat, Microdata and RDFa data from the October 2016 Common Crawl web corpus. We found structured data within 1.24 billion HTML pages out of the 3.2 billion pages contained in the crawl (38%). These pages originate from 5.63 million different pay-level-domains out of the 34 million pay-level-domains covered by the crawl (16.5%). Altogether, the extracted data sets consist of 44.2 billion RDF quads. |
URL: | https://madata.bib.uni-mannheim.de/197/ |
---|---|
DOI: | https://doi.org/10.7801/197 |
Availability (Controlled): | Download |
Availability: | Data is available as N-Quads for downloading. Download source: http://webdatacommons.org/structureddata/2016-10/stats/how_to_get_the_data.html |
Publication(s) (MADOC): |
Meusel Robert und Bizer Christian und Paulheim Heiko (2015), A web-scale study of the adoption and evolution of the schema.org vocabulary over time
Meusel Robert und Paulheim Heiko (2015), Heuristics for fixing common errors in deployed schema.org microdata Meusel Robert und Petrovski Petar und Bizer Christian (2014), The WebDataCommons Microdata, RDFa and Microformat Dataset Series Bizer Christian und Eckert Kai und Meusel Robert und Mühleisen Hannes und Schuhmacher Michael und Völker Johanna (2013), Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis |
Reference URL (External): |
http://webdatacommons.org/structureddata/2016-10/s...
http://webdatacommons.org/structureddata/2016-10/s... |
Project: |
Project Title: Web Data Commons - RDFa, Microdata, and Microformat Data Sets Project Description: More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 10 May 2017 15:18 |
Last Modified: | 05 Mar 2024 13:55 |
Actions (login required)
View Item |