Microformat, Microdata and RDFa data from the 2009 Common Crawl web corpus. We found structured data within 147 million HTML pages out of the 2 billion pages contained in the crawl (5%). These pages originate from 19 million different pay-level-domains. Altogether, the extracted data sets consist of 5 billion RDF quads.