Web Data Commons - RDFa, Microdata, and Microformat 2009/2010 Data Set
Item Type: | Dataset |
---|---|
Title: | Web Data Commons - RDFa, Microdata, and Microformat 2009/2010 Data Set |
Alternative Title: | Microformat, Microdata and RDFa data from 2009 Common Crawl web corpus |
Date: | 2009 |
Creator: | Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237 and Meusel, Robert |
Divisions: | School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer) |
DDC Classification: |
004 Computer science, internet |
---|---|
Keywords: | microformats ; microdata ; RDFa ; schema.org ; structured data |
Abstract: | Microformat, Microdata and RDFa data from the 2009 Common Crawl web corpus. We found structured data within 147 million HTML pages out of the 2 billion pages contained in the crawl (5%). These pages originate from 19 million different pay-level-domains. Altogether, the extracted data sets consist of 5 billion RDF quads. |
URL: | https://madata.bib.uni-mannheim.de/206/ |
---|---|
DOI: | https://doi.org/10.7801/206 |
Availability (Controlled): | Download |
Availability: | Data is available as N-Quads for downloading. Download source: http://webdatacommons.org/structureddata/2010-09/stats/how_to_get_the_data.html |
Publication(s) (MADOC): |
Meusel Robert und Bizer Christian und Paulheim Heiko (2015), A web-scale study of the adoption and evolution of the schema.org vocabulary over time
Meusel Robert und Paulheim Heiko (2015), Heuristics for fixing common errors in deployed schema.org microdata Meusel Robert und Petrovski Petar und Bizer Christian (2014), The WebDataCommons Microdata, RDFa and Microformat Dataset Series Bizer Christian und Eckert Kai und Meusel Robert und Mühleisen Hannes und Schuhmacher Michael und Völker Johanna (2013), Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis |
Reference URL (External): |
http://webdatacommons.org/structureddata/2010-09/s...
http://webdatacommons.org/structureddata/2010-09/s... |
Project: |
Project Title: Web Data Commons - RDFa, Microdata, and Microformat Data Sets Project Description: More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. |
Full text not available from this repository.
Depositing User: | Anna Primpeli |
---|---|
Date Deposited: | 15 May 2017 14:54 |
Last Modified: | 05 Mar 2024 13:41 |
Actions (login required)
View Item |