Optical Character Recognition (OCR) FAQ

Have more questions? Submit a request

Q: What is OCR?
A: Optical Character Recognition (OCR) is a form of machine learning that converts physically scanned documents into machine readable text. Once these scanned documents or image files are uploaded into the VDR, they are run through an OCR conversion that makes them readable for user search and analysis.

Q: Is OCR enabled for my Firmex site?
A: If your Firmex site was created after March 11, 2021, then OCR is enabled for your site. If your Firmex site was created before March 11, 2021, then OCR is disabled for your site. You may contact Firmex Support to enable OCR for projects going forward.

Q: Can I re-index my existing project using OCR so that my scanned documents can be searchable?
A: Firmex’s OCR functionality applies to newly uploaded documents. To take advantage of OCR in existing projects you will need to re-upload your documents.

Q: Is there an extra charge for OCR?
A: OCR is included in the price of your Firmex subscription.

Q: What document types does OCR support?
A: PDF documents, MS Office documents (.docx, .xlsx, .pptx, .docm), and image files (.jpeg .png .tiff)

Q: When I download documents that have been OCR’d, will the OCR’d text remain selectable?
A: Not at this time. Once downloaded, documents that have undergone OCR will not contain selectable text.

Q: Can OCR be enabled for one project but not another?
A: Not at this time. OCR is a site-level feature. It can either be enabled or disabled for your entire Firmex site.

Q: How is OCR text secured?
A: OCR text is protected with the same high standards of security and compliance that our documents currently adhere to.

Q: Can I search for OCR’d content in the Firmex Viewer and the Firmex Redact tool?
A: Only documents with OCR applied to them before being uploaded to Firmex will be searchable when using the Firmex Viewer or Firmex Redact tool. Currently the Firmex Viewer and the Redact tool do not support searching for content if it was OCR’d by Firmex during document upload.

Q: How does OCR currently work with languages other than English?

  • Latin Script Languages (ie: German, French, Italian, etc): OCR will work with these languages if the search term does not contain a special character such as an accent (ie: ç, é, â, Ü)
  • Non-Latin Script Languages (ie. Mandarin, Russian, Japanese, etc) are currently not supported with OCR.

Q: How accurate is Firmex's OCR functionality?
Many factors will affect the accuracy of the text that is indexed during the OCR process including:

  • Image Quality
  • Language
  • Font Size
  • Font legibility
  • Contrast/Brightness of image

OCR will do its best to extract the correct text based on the above parameters.

Q: Is there a maximum image size for documents?
A: The maximum image size Firmex supports is 32,000 x 32,000 pixels. If any images in the uploaded document are larger than these dimensions, text will not be searchable for all images in the document. We recommend reducing the size of the images before uploading.

Q: How long does it take for an OCR document to be searchable after upload?
A: Depending on the amount of images and files, it can take between 2-10 minutes before a document appears in the search results.

Articles in this section

Was this article helpful?
0 out of 0 found this helpful