OCR Proofreading Syntax Guide
This guide outlines a standardized markdown format for proofreading OCR (Optical Character Recognition) text. It is designed for clarity, compatibility with version control, and ease of collaboration
π Structural Markers
Use standard Markdown headings for book hierarchy:
Book Title
# (H1)
# BOOK ONE
Chapter Title
## (H2)
## CHAPTER ONE
Section/Part
### (H3)
### 1
π Page Continuations
Books and manuscripts are scanned page-by-page with the RoboEdit OCR Scanner. It is very common for books and manuscripts to contain sentences that span two pages.
Page continuations should be placed only in two locations on a page (if applicable):
"First paragraph continuation"
At the top of a page where a sentence continues from the previous page
"Final paragraph continuation"
At the end of a page where a sentence continues to the next page.
You can use the same syntax to indicate a first or final paragraph continuation:
Page continuation marker
[...]
Insert where a page break occurred.
During post-processing, the continued paragraphs will be joined together.
π Illegible or Uncertain Text
Completely unreadable text
<<<illegible>>>
Use for garbled OCR beyond repair.
Word of low confidence
??word??
Flag for manual review.
Inline guess with brackets
She [ran?] quickly.
Use if you're unsure but making a guess.
π¬ Dialogue and Formatting
Use straight quotes (
") unless typographic quotes are confirmed.RoboEdit will generally coerce curly quotation marks and apostrophes to straight quotation marks and apostrophes automatically.
Em dashes (β) are preferred over double dashes (
--) when typographically supported.Italics:
_italic_or*italic*Bold (rare in OCR text):
**bold**
βοΈ Annotations (Optional for Reviewers)
Inline comment (invisible)
{<!-- check: is this 'navel'? -->}
Visible reviewer comment
[[check: word choice]]
π‘ Footnotes
Use Markdown-style footnotes if applicable:
Example
Last updated