Blueprint

Andrey Mishchenko,Adrian Blue,Dominique Danco,Abhilash Jindal

doi:10.14778/3554821.3554836

Abstract

Blueprint is a declarative domain-specific language for document extraction. Users describe document layout using spatial, textual, semantic, and numerical fuzzy constraints, and the language runtime extracts the field-value mappings that best satisfy the constraints in a given document. We used Blueprint to develop several document extraction solutions in a commercial setting. This approach to the extraction problem proved powerful. Concise Blueprint programs were able to generate good accuracy on a broad set of use cases. However, a major goal of our work was to build a system that non-experts, and in particular non-engineers, could use effectively, and we found that writing declarative fuzzy constraint-based extraction programs was not intuitive for many users: a large up-front learning investment was required to be effective, and debugging was often challenging. To address these issues, we developed a no-code IDE for Blueprint, called Studio, as well as program synthesis functionality for automatically generating Blueprint programs from training data, which could be created by labeling document samples in our IDE. Overall, the IDE significantly improved the Blueprint development experience and the results users were able to achieve. In this paper, we discuss the design, implementation, and deployment of Blueprint and Studio. We compare our system with a state-of-the-art deep-learning based extraction tool and show that our system can achieve comparable accuracy results, with comparable development time, for appropriately-chosen use cases, while providing better interpretability and debuggability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Blueprint

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Aug 1, 2022
Citations: 1

Similar Papers

Reasoning on Spatial Semantic Integrity Constraints
Stephan Mäs
-
Stephan MäsStephan Mäs
19 Sep 2007
19 Sep 2007

Formalization and reasoning about spatial semantic integrity constraints
Loreto Bravo ... M Andrea Rodriguez
Data & Knowledge Engineering | VOL. 72
Loreto Bravo, et. al.Loreto Bravo ... M Andrea Rodriguez
28 Sep 2011
Data & Knowledge Engineering | VOL. 72

Multi-level semantic constraints for dam safety monitoring scenario construction
Mingzhu Geng ... Kelong Yang
-
Mingzhu Geng, et. al.Mingzhu Geng ... Kelong Yang
23 Feb 2023
23 Feb 2023

Efficacy of Food Safety Training in Commercial Food Service.
Patricia Mcfarland ... Barbara Rasco
Journal of Food Science | VOL. 84
Patricia Mcfarland, et. al.Patricia Mcfarland ... Barbara Rasco
08 May 2019
Journal of Food Science | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blueprint

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment