Back to Browsing catalogs
webdatacommons.org
The Web Data Commons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for download. It features datasets such as RDFa, Microdata, and Microformat extractions, Web Tables (147-233 million relational tables), and a Hyperlink Graph.
Not indexedGeneral
Property | Value |
---|---|
Link | http://webdatacommons.org |
Status | scheduled |
Catalog type | Open data portal |
Owner name | Web Data Commons |
Owner type | Academy |
Owner link | http://webdatacommons.org |
Owner location | Germany |
Software | custom (Custom software) |
Tags | structured data, Common Crawl, web tables, hyperlink graph, RDFa, Microdata, Microformat |
Access modes | open |
Content types | dataset |
API Status | uncertain |
Coverage
code | name |
---|---|
DE | Germany |
Languages
code | name |
---|---|
EN | English |
Download
Feedback
If you notice any errors or missing data catalogs, please contact us at dateno@dateno.io or open an issue on GitHub. We will address it as soon as possible.