ECM Workflow Extract Text error on Word and Excel Documents

rapat_mark · November 25, 2024, 2:42pm

I am using a workflow with Extract Text to fully OCR documents for old jobs that I am adding to ECM for customer service/warranty purposes. I added iFilter to the server and that works correctly for any pdf files. From what I have read, the iFilter capability for Word and Excel documents is built into the OS. When I try to run the workflow against Word and Excel documents, I get the following error:

Document text cannot be set on documents that do not have pages, this includes unrendered and native documents.

Has anyone successfully used extract text against Word and Excel documents?

vleveris · November 25, 2024, 5:04pm

It may be that you need to update the ECM Client to process Office documents in the Configure Service tab. This will require the Office products to be installed on the same server where that ECM Client instance is installed. I’m not sure if this is the answer, but it would be where I’d first start.

rapat_mark · November 25, 2024, 9:23pm

Thanks for the idea Victor. I installed M365 on the server and I already had the check box to process Office documents checked on the Configure Services tab. Some are processing correctly now but the newer extensions (xlsx, xlsm and docx) give an error that there is no ifilter defined for those extensions.

vleveris · November 25, 2024, 10:18pm

What version of the ECM Client do you have installed? I ask as while xlsx, xlsm and docx aren’t exactly new, they may be too new for the version you’re running. I’m also not sure if it would pick up macro-enabled Excel files in general.

I understand that this isn’t a great workaround, but each of those programs can export as a PDF which may solve the issue for now while troubleshooting, assuming there’s any urgency to getting these files into ECM and processed.

rapat_mark · November 25, 2024, 10:22pm

We are running 23.1.113. I was thinking about converting them to pdfs as well. This would work ok for this old data but moving forward I will need to see how it should all work as new processes are developed. I have a ticket in with support as well. If they don’t have any other answers, I will probably move forward with converting them to pdfs. Thanks.

vleveris · November 25, 2024, 11:18pm

Another option for these is to create a Batch Import using the CSV format and an index file to populate the field date you need to. This would likely be rather simple to gather with your Excel files, but potentially time-consuming for the Word documents. If you did index and brought them in using the CSV method then you wouldn’t need to OCR anything for those files.

swilliasc111 · November 26, 2024, 2:29am

Hi Mark,

Looks like this extract text task is described as a task to use with pdfs.

The Extract Text Task

If you’re importing PDFs containing text elements, the Extract Text task can be a handy tool – but you’ll need a bit of extra setup to get it working properly. Watch this lesson to learn more!

rapat_mark · November 26, 2024, 2:38pm

I can’t find the original post or help screen that said Extract Text works for office files too but it is working for the older/simpler formats. I do think that converting documents to pdfs first is probably the best path. I need to work with other departments to figure out what we save moving forward. Looks like a lot of documents that are information they should be looking up in Kinetic instead of an old printed form.