Web Data Commons - RDFa, Microdata, and Microformat November 2015 Data Set
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - RDFa, Microdata, and Microformat November 2015 Data Set |
Alternative Title: | Microformat, Microdata and RDFa data from the November 2015 Common Crawl web corpus |
Date: | November 2015 |
Creator: | Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237, Meusel, Robert and Primpeli, Anna |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | microformats ; microdata ; RDFa ; schema.org ; structured data |
Abstract: | Microformat, Microdata and RDFa data from the November 2015 Common Crawl web corpus. We found structured data within 541 million HTML pages out of the 1.77 billion pages contained in the crawl (30%).These pages originate from 2.72 million different pay-level-domains out of the 14.41 million pay-level-domains covered by the crawl (19%). Altogether, the extracted data sets consist of 24.38 billion RDF quads. |
URL: | https://madata.bib.uni-mannheim.de/202/ |
---|---|
DOI: | https://doi.org/10.7801/202 |
Availability (Controlled): | Download |
Availability: | Data is available as N-Quads for downloading. Download source: http://webdatacommons.org/structureddata/2015-11/stats/how_to_get_the_data.html |
Publication(s) (MADOC): |
Meusel Robert, Bizer Christian, and Paulheim Heiko (2015), A web-scale study of the adoption and evolution of the schema.org vocabulary over time
Meusel Robert, and Paulheim Heiko (2015), Heuristics for fixing common errors in deployed schema.org microdata Meusel Robert, Petrovski Petar, and Bizer Christian (2014), The WebDataCommons Microdata, RDFa and Microformat Dataset Series Bizer Christian, Eckert Kai, Meusel Robert, Mühlheisen Hannes, Schuhmacher Michael, and Völker Johanna (2013), Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis |
Reference URL (External): |
http://webdatacommons.org/structureddata/2015-11/s...
http://webdatacommons.org/structureddata/2015-11/s... |
Project: |
Project Title: Web Data Commons - RDFa, Microdata, and Microformat Data Sets Project Description: More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 15 May 2017 14:05 |
Last Modified: | 05 Mar 2024 13:55 |
Actions (login required)
View Item |