Sunday, July 6, 2014

Tesseract open source OCR engine for linux mint

Wanted to convert pdf to text on linux mint and looked around. The option under the software manager in linux mint shows Tesseract .You have to install a 3rd party GUI also to use it . I used YAGF.

Note this program is a memory  hog and can freeze your comp if you are doing multiple things

Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Version: 3.02.02-1
Size: 29MB to download, 78MB of disk space required
Impact on packages: liblept3 (installed), tesseract-ocr-eng (installed), tesseract-ocr-equ (installed), tesseract-ocr-osd (installed), tesseract-ocr (installed), libtesseract3 (installed)

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google and is probably one of the most accurate open source OCR engines available. It can read a wide variety of image formats and convert them to text in over 40 languages.

YAGF is a graphical interface for cuneiform and tesseract text recognition tools on the Linux platform. With YAGF you can scan images via XSane, import pages from PDF documents, perform images preprocessing and recognize texts using cuneiform from a single command centre. YAGF also makes it easy to scan and recognize several images sequentially.

Version: 0.9.2-2

Size: 382KB to download, 828KB of disk space required

Impact on packages: yagf (installed)

YAGFXGPL v3A graphical front-end for cuneiform and tesseract
gImageReaderXXGPL v3A graphical GTK frontend to tesseract-ocr
SunnyPage OCRXProprietaryA GUI frontend for Tesseract OCR engine with automatic adjustment of image brightness, image processing and PDF support.
VietOCRXXXApache 2.0A GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract
OCRFeederXGPL v3OCRFeeder is a document layout analysis and optical character recognition system
PDF OCR XXXProprietaryPDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files
LectorXXGPL v2A graphical ocr solution for GNU/Linux based on Python, Qt4 and tessaract OCR
Tesseract-OCR QT4 guiXApache 2.0Tesseract-OCR QT4 gui is a simple GUI for tesseract
Lime OCRXGPL v3A simple, free OCR software for Windows using tesseract-ocr engine
OcrivistXGPL v3Ocrivist is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format
Tesseract-GUIXGPL v2Tessract-GUI is not a front-end for tesseract-ocr, it is just a graphical way to use it with simple image manipulation thru ImageMagick
QTesseractXLGPL v3QT GUI for the Tesseract OCR
TessOCR(KISI)XApache 2.0A free OCR tool

1 comment:

  1. if you like tesseract ocr, you may like this free online ocr tool using tesseract ocr 3.02