Sketch2Data: Recovering Data from
Hand-Drawn Infographics

Computer and Graphics - Journal paper -
Presented at ACM/EG Expressive Symposium

1Centre Inria d’Université Côte d’Azur, 2Université Paris-Saclay, CNRS, Inria, LISN, 3Reichman University
Teaser Image

We introduce a method to recover data values from glyph-based hand-drawn infographics. Given a visualization in a bitmap format and a user-defined parametric template of its glyphs, we leverage deep neural networks to detect and localize the visualization glyphs, and estimate the data values they represent.

Thoughts design

dataset_variations.png

Dogs design

design.jpg

Leaf design

reconstruction_gt.png

Motivation

Data collection and visualization have traditionally been seen as activities reserved for experts. However, by drawing simple geometric figures — known as glyphs — anyone can visually record their own data, as shown below. Still, the resulting hand-drawn infographics do not provide direct access to the underlying data, hindering digital editing of both the glyphs and their values.

Motivation Image

Method

Method Image

Overview of our method: given an input hand-drawn infographic (a), the user defines a parametric glyph template by specifying all elements that compose the glyph and their visual variations (b). Based on this template, we synthesize an annotated dataset of glyph drawings (c) and use it to train a glyph detector (d) and a parameter predictor (e). A simple interface (see videos below) allows users to review the recovered data, make corrections, and even refine the template to achieve higher accuracy.

Glyph detection interface
Parameter estimation interface

Application

Application 1: Reverse engineering existing infographics

We have created a benchmark of 10 hand-drawn infographics collected from diverse sources, including from data designers who advocate drawing for visualization, as well as from coursework of design schools.

Motivation Image

For Triangle design, we show Input image (top left), sampled glyphs from the training dataset (top right), detected glyphs (center left, green boxes indicate True Positive detection and red boxes indicate False Positive detection), reconstruction from the estimated parameter values (center right, red outlines indicate glyphs for which one or several parameter values are erroneous), reconstruction from the ground truth parameter values (bottom left).

Others:

Application 2: Glyphs drawn by different people

We conducted a study to evaluate the second usage scenario where users record data by drawing glyphs according to a prescribed template. We have collected drawings of the same glyphs from multiple participants and evaluate the sensitivity of our approach to variations in individual drawing styles.

Motivation Image

Results for the Leaf design from 12 participants. Input image (top, drawings from 12 participants and PID indicates participant ID), detected glyphs (second row, green boxes indicate True Positive detection and red boxes indicate False Positive detection), reconstruction from the estimated parameter values (third row, red outlines indicate glyphs for which one or several parameter values are erroneous), reconstruction from the ground truth parameter values (bottom).

Application 3: Editing hand-drawn infographics

Editing Leaf Template
Reverse engineering the input hand-drawn illustration (a), drawn by Participants 1, 2, and 3, into a clean vector infographic (b). This enables various operations such as editing the data table itself (c), redesigning (d), recoloring (e) the marks of the Leaf template, and creating a new layout for the glyphs (f).
Swapping Templates for Thoughts
Reverse engineering the original hand-drawn illustration of Thoughts (a) into a clean vector infographic (b) allows the same data to be visualized using different templates (c). In this example: Lollipop template (d), Dog template (e), and Boyfriend template (f).

BibTeX

@article{QI2025104251,
            title = {Sketch2Data: Recovering data from hand-drawn infographics},
            journal = {Computers & Graphics},
            volume = {130},
            pages = {104251},
            year = {2025},
            issn = {0097-8493},
            doi = {https://doi.org/10.1016/j.cag.2025.104251},
            url = {https://www.sciencedirect.com/science/article/pii/S0097849325000925},
            author = {Anran Qi and Theophanis Tsandilas and Ariel Shamir and Adrien Bousseau}}