Back to Browsing catalogs

Web Data Commons

The Web Data Commons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for download. It features datasets such as RDFa, Microdata, and Microformat extractions, Web Tables (147-233 million relational tables), and a Hyperlink Graph.

Not indexed

General

Property Value
Link http://webdatacommons.org
Status scheduled
Catalog type Open data portal
Owner name Web Data Commons
Owner type Academy
Owner link http://webdatacommons.org
Owner location Germany
Software custom (Custom software)
Tags structured data, Common Crawl, web tables, hyperlink graph, RDFa, Microdata, Microformat
Access modes open
Content types dataset
API Status uncertain

Coverage

code name
DE Germany

Languages

code name
EN English

Download

JSON


Feedback

If you notice any errors or missing data catalogs, please contact us at dateno@dateno.io or open an issue on GitHub. We will address it as soon as possible.

Data catalogs and portals registry by Dateno. The source code is licensed under the MIT License, and the website content is licensed under the CC BY-SA 4.0 license.