Web Data Commons - RDFa, Microdata, and Microformat November 2013 Data Set
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - RDFa, Microdata, and Microformat November 2013 Data Set |
Alternative Title: | Microformat, Microdata and RDFa data from the November 2013 Common Crawl web corpus |
Date: | November 2013 |
Creator: | Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237 and Meusel, Robert |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | microformats ; microdata ; RDFa ; schema.org ; structured data |
Abstract: | Microformat, Microdata and RDFa data from the November 2013 Common Crawl web corpus. We found structured data within 585 million HTML pages out of the 2.24 billion pages contained in the crawl (26%). These pages originate from 1.7 million different pay-level-domains out of the 12.8 million pay-level-domains covered by the crawl (13%). |
URL: | https://madata.bib.uni-mannheim.de/204/ |
---|---|
DOI: | https://doi.org/10.7801/204 |
Availability (Controlled): | Download |
Availability: | Data is available as N-Quads for downloading. Download source: http://webdatacommons.org/structureddata/2013-11/stats/how_to_get_the_data.html |
Publication(s) (MADOC): |
Meusel Robert und Bizer Christian und Paulheim Heiko (2015), A web-scale study of the adoption and evolution of the schema.org vocabulary over time
Meusel Robert und Paulheim Heiko (2015), Heuristics for fixing common errors in deployed schema.org microdata Meusel Robert und Petrovski Petar und Bizer Christian (2014), The WebDataCommons Microdata, RDFa and Microformat Dataset Series Bizer Christian und Eckert Kai und Meusel Robert und Mühleisen Hannes und Schuhmacher Michael und Völker Johanna (2013), Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis |
Reference URL (External): |
http://webdatacommons.org/structureddata/2013-11/s...
http://webdatacommons.org/structureddata/2013-11/s... |
Project: |
Project Title: Web Data Commons - RDFa, Microdata, and Microformat Data Sets Project Description: More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 15 May 2017 14:31 |
Last Modified: | 05 Mar 2024 13:56 |
Actions (login required)
View Item |