IDC OCR Capture Accuracy -- Is Your Experience Similar?

Barb · July 6, 2022, 6:37pm

We have submitted thousands of the same document, generated from the same system, same layout, same font, printed as pdf and sometimes IDC doesn’t read the numbers correctly, and/or reads only part of a field’s value, like the company name or user’s name.

The vendor has offered to help us out but we have been waiting and waiting.

IDC performance is also quite slow, up to 20-30 minutes for one document to be captured.

MikeGross · July 6, 2022, 8:26pm

Keep after them - we’re not having anywhere near that level of time per document. There are a few other factors to consider on the server - but they should be able to help you with that.

gpayne · July 6, 2022, 8:29pm

@Barb Is it a clean PDF? If it is not sensitive can you DM it to me and I will send it thru my dev system. My typical time is 2 minutes and the longest and it made me nervous waiting was a 107 page PO that took 7 minutes.

utaylor · July 6, 2022, 8:29pm

Thanks for sharing Mike. Keep at em @Barb !

Who are you working with?

utaylor · July 6, 2022, 8:29pm

That’s nuts Greg! 107 pages!

Barb · July 7, 2022, 2:45pm

These are usually clean pdf’s and it doesn’t matter what document type/DFD it is using. We are told we need to upgrade and waiting on a time from the vendor to do this.

We are working directly with Epicor and I reached out to every contact we’ve ever worked with at Epicor and Ancora and will see what happens next.

We have hundreds of documents per day and many users submitting. We don’t use the classification feature as we classify based on the folder it is dropped into. So ‘capture’ is the culprit. We have followed the vendor’s advice on hardware - waiting on a review with them.

MikeGross · July 7, 2022, 3:12pm

as I understand it, each of the services (classification, pre-processing of the image, OCR, Field data extraction and export are all essentially separate services/queues within the Epicor IDC Server service and each is managed independently by the server service and a queuing app it installs called RabbitMQ. (check your program files directory).

Knowing that, we got into a conversation with Ancora guys and learned that the pre-processing of the image is what will kill a server - depending on what you are doing. For example, if you check all the boxes for de-speckle, orientation, rotation, blank removal, vector extraction, and graphics cleanup - you are seriously adding time to the loop for each and every document. And then there is document separation… and did you see this little suggestion?

On the ECM side -
Are you doing any OCR pre/post work on the ECM side also? I’m told the ECM OCR stuff is even slower because it has to farm out all the work to the active/registered ECM clients installed on the network. Yes, even though the import is processed at the server (folder watching) the processing is actually done in a distributed fashion by registered ECM Clients. We were told that if it got bad, we could install just that client piece on a few of our dat center servers and get better performance.

Barb · July 7, 2022, 4:48pm

I unchecked the 3 checkboxes, just for PO’s so as not to interrupt our Payables team.

What happens is a single PO document (or other batch type) gets behind a half dozen or more Invoice batches all with “OCR performed - ready for Capture” status. Two batches will be “In Data Capture” at any given time.

Each Invoice Batch has a max of 5 documents (some have multiple pages but never hundreds) and each batch takes about 5 minutes in the “In Data Capture” step.

This means a single document waiting behind 6 batches is still waiting 30 minutes to get into Data Capture step and this is not acceptable.

MikeGross · July 7, 2022, 7:32pm

Yeah, I see what you are saying and I think you are experiencing what I was talking about, albeit larger scale. I don’t have stats for multiple document types X number of vendors X number of documents X (anything else particular about your setup like calculable fields in the DFD, validation formulas, etc. that might all be part of the problem)

Do you have any of those in the DFD that are doing look ups back to Epicor? You may want to consider doing some work in SQL to make those look ups local to IDC (through some SQL replication/synching) or something.

Keep after them and please let us know what comes of this. I am very curious as I’m about to roll out to 3 more companies by year end.

Barb · July 7, 2022, 7:48pm

We heard back from Epicor and Ancora today. We need a few hardware changes and a version upgrade (we were implemented on 7.2 earlier this year and were not told of a newer version). This will accommodate our approx 20,000 docs per month. This will be done tomorrow.

We don’t use Epicor. We don’t have IDC connected to our ERP system.

MikeGross · July 7, 2022, 7:53pm

Sorry - the posting here usually indicates ERP as well - but you are welcome none the less!!

Look forward to your results improving! I think a number of us would be interested in the particular specifications of the changes you are making once you’ve tried them out. This group is pretty technical and we enjoy the gory details

JerseyEric · July 7, 2022, 8:17pm

@Barb, like you, I have documents flowing from DocStar (Epicor IDC => Epicor ECM =>) into non-Epicor ERP system. One process has a limited # of AP invoices going into Dynamics GP. Another process has customer POs (for pickup services) going into a 3rd party transportation system as our Sales Orders.

@Barb, you might not want to share which ERP system you’re integrating with DocStar. But if it’s Dynamics GP and the 20,000 docs per month are AP invoices, I would love to ask you a few questions privately. I’m on the verge of pursing AP Automation for almost all of our AP invoices. So my questions would involve how you’re addressing some things with the Coding (GL distributions) and AP Batching.

Regardless of your ERP system, if you’re upgrading from IDC version 7.x to version 9.x, you might want to read the brief summary of my experience that’s in a Dec 2022 response within this discussion topic: Docstar and IDC - Upgrade to 21.2.x and 9.xx - notes from the field - #3 by MikeGross

Barb · July 7, 2022, 8:29pm

Yes, we are using Dynamics GP and most of our docs are vendor invoices. Thanks for the upgrade notes. We will be working directly with Ancora to ugprade.

Feel free to message me (if that’s a thing here).

gpayne · July 7, 2022, 8:49pm

I believe the default had those unchecked and I just left it that way. Our first consultant set the batch size to one file and I have done our other imports the same way.

I have separate IDC and ECM servers based on a DocStar recommendation and I am processing no where near 20,000 documents a month. My IDC VM is 4 cpu and 16 GB, ECM is 4 cpu and 24GB.

I was curious about my timing so I did a batch of 38 POs and they finished in 12 minutes in IDC. The first one was ready to verify in 2 minutes, so I could have started verifying. We also went live on 7.x and I upgraded to 9.11 in March and am planning on 9.22 as soon as I can get time.

Hopefully the upgrade to 9.x will help.

Barb · July 13, 2022, 3:10pm

Ancora upgraded us, and our hardware was improved and IDC is flying now!

utaylor · July 13, 2022, 3:18pm

Good to know Barb!

MikeGross · July 13, 2022, 3:23pm

Awesome!!!

Care to share versions and hardware specs with us so we can all do a little ‘at home’ comparisons?

utaylor · July 13, 2022, 3:24pm

That would be cool to see.

Beth · July 13, 2022, 3:52pm

Awesome news, Barb!

Beth · July 13, 2022, 3:57pm

I didn’t know there is a new version out. Do you know what the difference are Greg?

Topic		Replies	Views
DocStar Opinions Epicor ERP 10 e10	15	2219	April 30, 2021
Any tips for training the system? ECM ai	24	2805	September 8, 2022
DocStar Invoice Automation ECM docstar	11	4334	November 1, 2019
AP Automation Epicor ERP 10 epicor	14	2621	December 2, 2024
IDC Learning ECM	15	1298	July 26, 2023

IDC OCR Capture Accuracy -- Is Your Experience Similar?

Related Topics