Modeler

methods to generate synthetic data sets to fit patterns seen in invoice texts

We will be using an AI tool to generate json required to render invoice images that will be fed into the training pipeline. There are multiple helper methods you can use for this :

Inspectors

We will be using the AI tool to impart different patterns i.e repeating line descriptions, multiple country of origins, same part numbers for different line descriptions . This is going to be an iterative process. The following are the methods that can be used to examine the json data we generate this way to feed into the model along with the invoice images for the training.


source

flatten_json_items

 flatten_json_items (path, rows=50)

Load JSON and concatenate all ‘items’ lists into one DataFrame.