OCR Proofreading Syntax Guide

This guide outlines a standardized markdown format for proofreading OCR (Optical Character Recognition) text. It is designed for clarity, compatibility with version control, and ease of collaboration

πŸ“˜ Structural Markers

Use standard Markdown headings for book hierarchy:

Element
Markdown Syntax
Example

Book Title

# (H1)

# BOOK ONE

Chapter Title

## (H2)

## CHAPTER ONE

Section/Part

### (H3)

### 1


πŸ“„ Page Continuations

Books and manuscripts are scanned page-by-page with the RoboEdit OCR Scanner. It is very common for books and manuscripts to contain sentences that span two pages.

Page continuations should be placed only in two locations on a page (if applicable):

  1. "First paragraph continuation"

    1. At the top of a page where a sentence continues from the previous page

  2. "Final paragraph continuation"

    1. At the end of a page where a sentence continues to the next page.

You can use the same syntax to indicate a first or final paragraph continuation:

Description
Syntax
Notes

Page continuation marker

[...]

Insert where a page break occurred.

During post-processing, the continued paragraphs will be joined together.


πŸ” Illegible or Uncertain Text

Case
Syntax
Notes

Completely unreadable text

<<<illegible>>>

Use for garbled OCR beyond repair.

Word of low confidence

??word??

Flag for manual review.

Inline guess with brackets

She [ran?] quickly.

Use if you're unsure but making a guess.


πŸ’¬ Dialogue and Formatting

  • Use straight quotes (") unless typographic quotes are confirmed.

    • RoboEdit will generally coerce curly quotation marks and apostrophes to straight quotation marks and apostrophes automatically.

  • Em dashes (β€”) are preferred over double dashes (--) when typographically supported.

  • Italics: _italic_ or *italic*

  • Bold (rare in OCR text): **bold**


✏️ Annotations (Optional for Reviewers)

Use Case
Syntax

Inline comment (invisible)

{<!-- check: is this 'navel'? -->}

Visible reviewer comment

[[check: word choice]]


πŸ”‘ Footnotes

Use Markdown-style footnotes if applicable:

Example

Last updated