Unlock Hidden Data with Composable Enterprise
Meaningful and relevant information is often buried in images, scanned documents and PDFs. As a result, significant and valuable data is often “hidden” and inaccessible to automated data analysis. Unfortunately, off-the-shelf optical character recognition (OCR) and text extraction systems are typically inadequate, resulting in solutions that are not scalable or error prone. Although custom solutions can be built, taking this approach can be complicated and time-consuming, often requiring significant capital investment.
Composable Enterprise provides an intelligent, accurate and scalable analytics environment for extracting hidden information found in documents and images. Built with a composable architecture that enables abstraction and integration of any software or analytical approach, Composable Enterprise can use state-of-the-art OCR libraries, image processing routines, and text analytics interchangeably to perform the necessary data extraction with high accuracy and at scale.
Composable Enterprise aims to simplify and automate big data analysis, where big data is defined as all data, regardless of format or structure.
Customer Experience: Insurance Sector
Composable Analytics, Inc. has partnered with one of the largest US life insurance companies, known for their forward-thinking approach to underwriting, life insurance and long term care. Their forward-thinking approach drives them to continuously launch new offerings and products that are data-driven and, where possible, based on Internet of Things (IoT) data collection. Transforming insurance underwriting, marketing and operations requires the use of advanced analytics and the availability of high-quality historical data. While most insurance firms have massive amounts of historical data, given their long running lines of business and rigorous reporting requirements, much of the historical data is in paper format.
The underwriting, marketing and business analytics teams were intent on unlocking data from hundreds of thousands of scanned internal documents. The scanned documents represented a treasure trove of information for the firm’s underwriting and marketing efforts. If this hidden data could be extracted, cleansed and stored in a well-formed relational data model that can be queried, analysts could use this data in predictive analytics models, further improving their underwriting, marketing and operations. After experiencing several pain points with off the shelf solutions, and estimating that a custom-built solution would require significant capital investment, the firm deployed Composable Enterprise on their on-premises cloud infrastructure and were successful in processing over 500,000 documents to date.
Composable Enterprise, with its powerful data ingestion, processing, enrichment and distribution architecture, allowed the firm’s analysts to develop a fully-tuned dataflow to ingest the individual document files, parse out specific page sections for either text extraction or image processing, cleanse the extracted information, enrich the data with external information and, finally, insert the extracted data into a relational database management system.
Composable Enterprise was successful in processing multiple document types and formats, utilizing scalable, quality-controlled dataflows. Specifically, Composable Enterprise was able to:
- Dynamically define a relational data model for the extracted data
- Automatically prepare the document files, stored as image files (e.g., TIFF, PNG, etc.) or as PDFs, for digitization, fixing the document rotation, image contrast and other attributes
- Dynamically detect structure within the document, and parse the document into sections that contain text where OCR can be used (e.g., a numerical table), or that contain markings where image processing is needed (e.g., a check box)
- Embed state-of-the-art, industry-standard OCR and image processing analytics, in a plug-and-play fashion, to achieve high accuracy, quality and performance
- Cleanse and enrich all the extracted data using external data sets, for error-correction, verification and added insights
- Structure and stage the data
Business Value in Practice
Composable Enterprise has opened up new opportunities by unlocking hidden data in scanned documents. This data, coupled with predictive analytics, can now be used to market and offer insurance products with a shorter and less-invasive application and underwriting process, ultimately reducing risk and cost to the firm.
Several metrics demonstrating the unprecedented value achieved include:
- Data Throughput: Automated streaming of thousands of documents through a data extraction dataflow in hours, not days, weeks or months (approximately < 5 sec per page)
- Data Variety: Automated parsing of numerous image and document types, including TIFFs, PNGs, PDFs, DOCX, XML, HTML, etc.
- Data Availability: Full-stack data extraction, cleansing, enrichment and curation
Composable Enterprise’s ability to unlock hidden data from documents is not limited to the insurance sector. Current projects involve using Composable to automatically extract and detect information captured in financial reports, medical records and government documents. Composable Enterprise excels at ingesting data from any data source, regardless of structure or format, and processing the data through a complex analytic dataflow to produce actionable insights.
Composable Analytics, Inc. builds software that enables enterprises to rapidly adopt a modern data strategy and robustly manage unlimited amounts of data. Composable Enterprise, a full-stack analytics platform with built-in services for data orchestration, automation and analytics, accelerates data engineering, preparation and analysis. Built with a composable architecture that enables abstraction and integration of any software or analytical approach, Composable Enterprise serves as a coherent analytics ecosystem for business users that want to architect data intelligence solutions that leverage disparate data sources, live feeds, and event data regardless of the amount, format or structure of the data. Composable Analytics, Inc. is a rapidly growing data intelligence start-up founded by a team of MIT technologists and entrepreneurs. For more information, visit www.composableanalytics.com.