Modeler
methods to generate synthetic data sets to fit patterns seen in invoice texts
We will be using an AI tool to generate json required to render invoice images that will be fed into the training pipeline. There are multiple helper methods you can use for this :
Inspectors
We will be using the AI tool to impart different patterns i.e repeating line descriptions, multiple country of origins, same part numbers for different line descriptions . This is going to be an iterative process. The following are the methods that can be used to examine the json data we generate this way to feed into the model along with the invoice images for the training.
flatten_json_items
flatten_json_items (path, rows=50)
Load JSON and concatenate all ‘items’ lists into one DataFrame.