Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others


0 votes
in Technique[技术] by (71.8m points)

python - Improving image pre-processing for tesseract (video game screenshot)

I am trying to read text for prices in a video game and am experiencing difficulty in pre-processing the image.

The rest of my code is "complete", as in after the text is extracted I am formatting it and outputting into CSV for later use.

This is what I have come up with so far for the following images, and would like input on other thresholds or pre-processing tools that will make the OCR more accurate.

Raw Image Screenshot

After gamma, denoise on left - binary threshold on right

The text detected

As you can see, it is very close but not perfect. I would like to make it more accurate as I will be processing many frames eventually.

Here is my current code:

import cv2
import pytesseract
import pandas as pd
import numpy as np

# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCResseract.exe"

img = cv2.imread("./image_frames/frame0.png")

# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
    # build a lookup table mapping the pixel values [0, 255] to
    # their adjusted gamma values
    invGamma = 1.0 / gamma
    table = np.array([((i / 255.0) ** invGamma) * 255
        for i in np.arange(0, 256)]).astype("uint8")
    # apply gamma correction using the lookup table
    return cv2.LUT(crop_img, table)

adjusted = adjust_gamma(crop_img, gamma=0.15)

# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)

# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]

# OCR configurations (3 is default)
config = "--psm 3"

# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)

# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('

# remove unicode character
text = text.replace('', '')

Any help is appreciated as I am very new to this!

Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Step#1: Scale the image

Step#2: Apply adaptive-threshold

Step#3: Set page-segmentation-mode (psm) to 6 (Assume a single uniform block of text.)

1 Scaling the image:

  • The reason is to see the image clearly, since the original image is really small.

  • img = cv2.imread("udQw1.png")
    img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)

2 Apply adaptive-threshold

  • Generally threshold is applied, but in your image, applying threshold has no effect to the result.

  • For different images you may need to set different C and block values.

  • For instance for the 1st image:

  • gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                                cv2.THRESH_BINARY_INV, 15, 22)
  • Result:

    • enter image description here
  • For instance for the 2nd image:

  • gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                                cv2.THRESH_BINARY_INV, 51, 4)
  • Result:

    • enter image description here

3 Set psm to 6 which assumes the image as a single uniform block of text.

  • txt = pytesseract.image_to_string(thr, config="--psm 6")
  • Result for the 1st image:

    • Dragon Claymore
      1,388,888,888 mesos.
      Maple Pyrope Spear
      288,888,888 mesos.
      Element Pierce
      488,888,888 mesos.
      Purple Adventurer Cape
      97,777,777 mesos.
  • Result for the 2nd image:

    • Ring of Alchemist
      749,999,995 mesos.
      Dragon Slash Claw
      499,999,995 mesos.
      "Stormcaster Gloves
      149,999,995 mesos.
      Elemental Wand 6
      749,999,995 mesos.
      Big Money Chalr
      1 tor 249,999,985 mesos.|

Code for the 1st image:

import pytesseract
import cv2

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")

Code for the 2nd image:

import pytesseract
import cv2

img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")

Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share