Back to Browsing catalogs
Dataset | ROIS-DS Center for Open Data in the Humanities
We provide three types of datasets, namely Kuzushiji-MNIST、Kuzushiji-49、Kuzushiji-Kanji, for different purposes. Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, consists of image and text data and was released as open data. Some books also have summary, transcription, and tagging data. Cooking books in the Edo period, provided from Dataset of Pre-Modern Japanese Text, were curated for creating recipe datasets through the process of transcription, translation to modern Japanese, and structuring into the recipe format. As a by-product of transcription for the Dataset of Pre-Modern Japanese Text (PMJT), shapes and coordinates of old Japanese characters (Kuzushiji) were compiled to create another dataset for training to make machines and humans smarter. Adapted from Kuzushiji Dataset, KMNIST dataset is a drop-in replacement for MNIST dataset. We provide three types of datasets, namely Kuzushiji-MNIST、Kuzushiji-49、Kuzushiji-Kanji, for different purposes. Seal Script Dataset is a machine learning-friendly dataset of 'Tensho' character images cropped from old dictionaries of characters from Japan and China to be used for the interpretation of seals.... Based on the results of digitization of magazines published in the early to mid-Meiji period (modern magazines), we release machine learning datasets for OCR and develop OCR software (Kindai-OCR).
Not indexedGeneral
Property | Value |
---|---|
Link | http://codh.rois.ac.jp/dataset/ |
Status | active |
Catalog type | Datasets list |
Owner name | Center for Open Data in the Humanities (CODH) |
Owner type | Academy |
Owner link | http://codh.rois.ac.jp |
Owner location | Japan |
Software | custom (Custom software) |
Tags | government, has_api, Kuzushiji-MNIST, Kuzushiji-49, Kuzushiji-Kanji, Seal Script, Pre-modern Japanese text, OCR, Machine Learning, Text data, Image data |
Access modes | open |
Content types | dataset |
API Status | active |
Coverage
code | name |
---|---|
JP | Japan |
Languages
code | name |
---|---|
JA | Japanese |
Download
Feedback
If you notice any errors or missing data catalogs, please contact us at dateno@dateno.io or open an issue on GitHub. We will address it as soon as possible.