Microformat, Microdata and RDFa data from the December 2014 Common Crawl web corpus. We found structured data within 620 million HTML pages out of the 2.01 billion pages contained in the crawl (30%). These pages originate from 2.72 million different pay-level-domains out of the 15.68 million pay-level-domains covered by the crawl (17%). Altogether, the extracted data sets consist of 20.48 billion RDF quads.