Skip to content

Conversation

@noahmetzger
Copy link
Contributor

Overhaul of the lstm_choice_mode

  • Made the lstm_choice_mode compatible with hocr_char_boxes.

  • Improved the choice mode algorithm to increase the performance.

  • Cut down the different modes to the two essentiell ones.

    • lstm_choice_mode=1 returns the timesteps segemented by character. This replaces mode 1 and 3 with no loss of information.
    • lstm_choice_mode=2 returns alternative choices per character based on the new ctc retrieval algorithm. This replaces mode 2 and 4. The former mode 2 is completely dropped as it worked only for languages with a smaller amount of characters than output channels of the LSTM.
  • Renamed lstm_choice_amount to lstm_choice_iterations because of the misleading title.

@zdenop zdenop merged commit 98c7aaa into tesseract-ocr:master Sep 6, 2019
@zdenop
Copy link
Contributor

zdenop commented Sep 6, 2019

thanks

@woodjohndavid
Copy link

Unless I misunderstand what these options are supposed to do, it appears like there is a bug or oversight. Please refer to the user area thread:

https://groups.google.com/forum/#!topic/tesseract-ocr/5tC6appoUgE

There seems to be no way to prevent lstm from including duplicates in the generated text and/or HOCR output. The example in the thread above is a clear example of this.

Thanks.

@bertsky
Copy link
Contributor

bertsky commented Dec 2, 2019

Unless I misunderstand what these options are supposed to do, it appears like there is a bug or oversight. Please refer to the user area thread:

https://groups.google.com/forum/#!topic/tesseract-ocr/5tC6appoUgE

@woodjohndavid the above change (and the preceding ones relating to this config variable) is not active unless you set lstm_choice_mode to 1 or 2 – the default is 0.

There seems to be no way to prevent lstm from including duplicates in the generated text and/or HOCR output. The example in the thread above is a clear example of this.

This is an independent problem, let's discuss it further in issue #2738 you created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants