Back to Browsing catalogs

Dataset | ROIS-DS Center for Open Data in the Humanities

We provide three types of datasets, namely Kuzushiji-MNIST、Kuzushiji-49、Kuzushiji-Kanji, for different purposes. Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, consists of image and text data and was released as open data. Some books also have summary, transcription, and tagging data. Cooking books in the Edo period, provided from Dataset of Pre-Modern Japanese Text, were curated for creating recipe datasets through the process of transcription, translation to modern Japanese, and structuring into the recipe format. As a by-product of transcription for the Dataset of Pre-Modern Japanese Text (PMJT), shapes and coordinates of old Japanese characters (Kuzushiji) were compiled to create another dataset for training to make machines and humans smarter. Adapted from Kuzushiji Dataset, KMNIST dataset is a drop-in replacement for MNIST dataset. We provide three types of datasets, namely Kuzushiji-MNIST、Kuzushiji-49、Kuzushiji-Kanji, for different purposes. Seal Script Dataset is a machine learning-friendly dataset of 'Tensho' character images cropped from old dictionaries of characters from Japan and China to be used for the interpretation of seals.... Based on the results of digitization of magazines published in the early to mid-Meiji period (modern magazines), we release machine learning datasets for OCR and develop OCR software (Kindai-OCR).

Not indexed

General

Property Value
Link http://codh.rois.ac.jp/dataset/
Status active
Catalog type Datasets list
Owner name Center for Open Data in the Humanities (CODH)
Owner type Academy
Owner link http://codh.rois.ac.jp
Owner location Japan
Software custom (Custom software)
Tags government, has_api, Kuzushiji-MNIST, Kuzushiji-49, Kuzushiji-Kanji, Seal Script, Pre-modern Japanese text, OCR, Machine Learning, Text data, Image data
Access modes open
Content types dataset
API Status active

Coverage

code name
JP Japan

Languages

code name
JA Japanese

Download

JSON


Feedback

If you notice any errors or missing data catalogs, please contact us at dateno@dateno.io or open an issue on GitHub. We will address it as soon as possible.

Data catalogs and portals registry by Dateno. The source code is licensed under the MIT License, and the website content is licensed under the CC BY-SA 4.0 license.