Going with the Data Flow
by Eric Bender. Published at Massachusetts Institute of Technology ILP News Office
Composable Analytics aims to make data analytics faster, friendlier and more flexible.
Complex business problems call for complex data analytics, and the complexities keep climbing. Analysts now bring back data from an ever-broadening collection of data sets, many of them massive. These disparate data must be painstakingly cleaned, extracted and integrated, and the results must be analyzed thoughtfully and presented understandably. Unsurprisingly, traditional tools such as spreadsheets often fail to bring pull all this together, and programming for customized analytics is often slow, expensive and inflexible. Composable Analytics, a startup firm that spun out of MIT Lincoln Laboratory, visualizes this problem differently—quite literally, since its business intelligence platform lets analysts visually design their own workflows for gathering and manipulating data from a series of sources. Today, data-driven organizations often collect their raw information from email or FTP downloads or third-party databases or other wildly diverse sources, says Andy Vidan, Composable co-founder and chief executive officer of the Cambridge, Massachusetts firm. “Typically a single analyst will grab that data, transform it, cleanse it and push it into a proprietary data warehouse,” he says. “We automate that process with a data flow approach, which liberates the staff to focus more on the analytics than on the process of bringing in the data.” The Composable platform lets analysts author an application’s data flow by stringing together analytic modules—core building blocks that do everything from ingesting data to cleaning the data, performing analyses and visualizing the results. Optical character recognition (OCR) modules, for example, can pull text out of very complicated, multi-page, unstructured document image files. A business user can create an analytical flow in which one module extracts typed text, another module extracts other areas of the document with check boxes or handwritten notes, a third module provides OCR for those components, and finally another module joins all the results together and pushes them into a final database or spreadsheet. “Once you’ve designed that data flow and executed it on one document, the system can scale out to hundreds or thousands or millions of documents,” says Lars Fiedler, Composable co-founder and chief technology officer. “And the time it takes to actually implement one of these distributed systems is much lower [than for a traditional system]. You just focus on your particular domain of information—on the document and the information you need to extract.”
Fiedler, a software developer, and Vidan, a physicist, met while working on distributed systems and big data projects for homeland security and defense applications at Lincoln Laboratory, a Department of Defense R&D Center at MIT. Fiedler was lead architect on the data analytics project that evolved into the Composable platform. Early on, Fiedler and Vidan recognized that an army of analysts, in different organizations and different roles, all were trying to solve basically the same problems of sifting through floods of incoming information. “We decided to create a platform that allows them to do that very seamlessly, from ingesting the data and massaging it all the way up to visualizing that information in an intuitive way,” he says. At Lincoln Laboratory, research scientists often work directly with users and see how their discoveries are applied. “That really fosters the idea of creating technology to create impact,” Vidan comments. “To take that one step further, if you have a piece of technology that’s innovative and disruptive, you may want to take it out to an even wider audience in the private sector.” Given the entrepreneurial environment of the Laboratory and MIT in general, forming a startup was a logical next step, and the two launched Composable in 2014. The new company benefited strongly from the Laboratory’s focus on creating operational prototype systems. “We left the Laboratory with an operational product in hand,” Vidan says, “which allowed us to go directly to customers to learn about their specific needs and customize our technology for them.” One of the startup’s first applications was inbound marketing for a small financial firm. “Using our system, we effectively grew their business by ten times in about three months, by ingesting massive amounts of data, cleansing it, and extracting and prioritizing their customer relationships,” Fiedler says. Composable now is finding many other potential applications in financial firms, such as reconciling reports, bills, orders or trades. For instance, “you might think that linking bank accounts would be very easy, but in reality there are many different ways to reference accounts, and so the reconciliation process can be very difficult,” Fiedler says. “In our platform, you can develop matching algorithms to link all this financial data together.” Another early application challenge came from a large insurance company that was trying to figure how to handle hundreds of thousands of lab reports, doctor’s notes and other medical records. A streamlined Composable process for transcribing all these into digital form allowed the insurance firm to run better analyses that helped to tailor its policies. “Our platform can be applied across many industries, with very similar repeatable problems that can be solved with a more streamlined approach to data analytics,” Vidan says.
The Composable platform is designed to be self-service, so business users can run advanced analytical applications on their own, starting by downloading the software or trying a cloud-based version. Additionally, “it is a very flexible system and you can always plug in your custom code for your business needs and your workflows,” says Fiedler. Another platform benefit is its ability to speed up analysis by providing “just-in-time analytics” rather than traditional batch-processed analytics. In a batch processing system, the analyst typically sets up a data warehouse and a single analytical query, which then goes out, processes the data and maybe a few hours later provides an answer, Vidan explains. But in the Composable platform, because the steps within the application are handled separately by the various analytic modules, each module can run on its own as soon as the data it needs are available. “With just-in-time analytics, you can query your data multiple times in a very fast and agile manner, get your answers, realize whether the question you’re asking is really the right question, and query your system again,” he says. Just as crucial is the software’s support for collaboration. “Our platform is all about sharing your work and your analytical data flows,” says Fiedler. “An analyst can share his or her data flow application with other people in the organization who can clone it, adjust it to their own needs and build on top of it.” Unlike applications such as Microsoft Excel, where customization is tightly coupled to specific data sources, the Composable platform allows users to develop a data flow that is separated from the data, and easily share that data flow with colleagues. Composable is constantly introducing to innovations into the platform, re-inventing the concept of self-service business intelligence. One such example is with an approach to predictive analytics, in which “the system can understand the right analysis that should be done on a given type of data and assist the user in designing analytical applications,” says Vidan. Today, a relatively small number of “digital master” firms are effectively leveraging their business intelligence systems to increase overall corporate productivity and profitability, Vidan says. “Our goal is to allow other companies to get on that same page and do the same analytical magic,” he adds. “We’re working with companies that can really visualize how data analytics can transform their businesses and disrupt their sectors.”