Microformat, Microdata and RDFa data from the August 2012 Common Crawl web corpus. We found structured data within 369 million HTML pages out of the 3 billion pages contained in the crawl (12%). These pages originate from 2.2 million different pay-level-domains out of the 40 million pay-level-domains covered by the crawl (5%). Altogether, the extracted data sets consist of 7.3 billion RDF quads.