Integrated development environments

Prediction: Companies will soon be able to develop and deploy production-grade NLU without ever leaving a managed graphical environment custom-fit to that task.

Any technology in the early days is like a messy workshop contraption, covered in oil and duct tape, decorated with oddly attached components. Once the community converges on a common set of knowledge and practices, a new class of tools are built with that hindsight, streamlining work.

Across the document extraction world, there is evidence of this convergence taking place:

  • Multi-task models allow a common modeling architecture to be used in many different types of problems.
  • Multi-lingual models allow a common model instance to be used across inputs of multiple languages.
  • Pre-training and fine-tuning allow a common model instance to birth a family of special-purpose fine-tuned models
  • Less focus on linguistically derived model design encourages convergence on standard building blocks invariant of any particular task, resulting in less bespoke engineering and reduced engineering risk

All of these trends indicate that we're on our way to Henry Ford-style efficiency in the document extraction world. Production systems that once required a thousand separate bespoke parts will become standardized around a small set of common, generally useful practices.

This trend will be felt at the product level by the growth of integrated development environments (IDEs) targeted at document extraction. If a company is willing to accept the standard collection of components and practices being converged upon, they will be able to pursue projects of great sophistication with the ease and aid of a visual development environment: upload a few documents, highlight what results you want, and click the "Train" button.