On DataOps, the DoD, and Operationalizing Data Science
Composable Analytics CEO Andy Vidan spoke with Toph Whitmore of Blue Hill Research on the emergence of DataOps. Full transcript of interview posted below.
On DataOps, the DoD, and Operationalizing Data Science: Questioning Authority with Composable Analytics’ Andy Vidan
Toph Whitmore, Blue Hill Research
Andy Vidan is the CEO of Cambridge, Massachusetts-based DataOps startup Composable Analytics. He founded the company two years ago with MIT colleague Lars Fiedler. They now lead Composable—self-funded and self-sustaining, by the way—and are establishing a beachhead in the nascent DataOps space. I recently spoke with him about the genesis of his company, what it’s like to (maybe) work with the U.S. DoD, and the challenge of evangelizing DataOps to line-of-business stakeholders.
TOPH WHITMORE: Tell me about Composable Analytics.
ANDY VIDAN: Composable Analytics grew out of a project at MIT’s Lincoln Laboratory. Lincoln Lab is an MIT lab that’s focused on U.S. Department of Defense and Intelligence Department work. There we saw the clear need for a single platform that can ingest all types of data and feed it to an intelligence analyst. An intelligence analyst within the Department of Defense is similar to a business analyst within the private sector. They’re sophisticated. They know their subject matter well, better than software developers may ever know their business. But they’re not always technical, and when they have to deal with different data sets from different systems, with different formats, different structures, they typically need to use different tools.
We wanted to develop a single ecosystem to bring in data from all sorts of sources, and present it to the user for self-service analytics. For us, Big Data always meant all data. It’s not just massive amounts of data, or highest throughput, it’s that variability that comes with all data: tabular data, and tabular data, and more tabular data, but we also have to think about image files and PDFs and sound files and so on. That’s one area that was a focus for us. The other was making data accessible to an end user who is sophisticated and knows the subject matter but is not a technical person.
TW: You and Lars Fiedler developed Composable while working at Lincoln Labs. How did Composable evolve from an MIT idea into a commercial solution?
AV: Lincoln Laboratory is a well-kept secret.
TW: With the defense department involved, it probably has to be!
AV: Yes. It’s much like a Bell Labs, or the Jet Propulsion Lab that NASA runs with Cal Tech. Composable Analytics was initially funded directly by the DoD. The nice thing about Lincoln Labs is that you have that user interaction. You aren’t just writing research papers, you are actually building systems, you are meeting with end users—in this case, intelligence analysts and operators—to be able to really get down to requirements and get a system that they would eventually use.
TW: Does Composable Analytics still serve the Department of Defense?
AV: Yeah. So I can’t really answer the question.
TW: Good enough!
AV: Our main focus is private sector.
TW: Tell me more about the Composable Analytics technology. What value propositions do you offer to an enterprise IT leader?
AV: Three things: orchestration, automation, and analytics. To me that really embodies what’s behind DataOps. Our platform, our ecosystem provides those three things for an enterprise and for users of data within that enterprise.
A real use case: We have a customer with a call center. You might call in to change address after a real-estate purchase. Typically, the call center agent would change the address and hang up the phone and everybody’s happy. In this case, the enterprise wants to use that little tidbit of information that you just revealed about yourself in order to understand what other product and services you might be interested in. The fact that you purchased a home might mean you’re willing to purchase life insurance. You might mention your wife is having a baby. That might incite you to open an educational savings account with the company. Being able to go from a call-center recording, in real time take it out of the voice-over-IP system, orchestrate that process, push it into a speech-to-text engine, go from speech to text and now running a data-finds text analytic on it in order to figure out key words, trigger words, etc., and then push it all back into a CRM, like Microsoft Dynamics or Salesforce, so that the call center agent can see that on your profile and talk to you about it during that call, or next time you call. That embodies orchestration, automation plus analytics. We’re not talking just about tabular data here. We’re talking about going from call recordings to text to tabular data in a CRM. Those are the types of media use cases we’re addressing.
TW: It sounds like a platform play. Are you essentially offering and delivering and serving pretty much the whole data value chain from ingestion through consumption?
AV: Yes, we are, and that’s where DataOps comes into play. There’s always raw data out there. At the end of the day your business users are getting value from applications, Excel or Dynamics or Power BI or Salesforce or NetSuite, whatever it is. But there’s a whole process that happens in between the raw data getting to the high-level application that needs orchestration, automation, and analytics. That’s our play. That’s where we live. That’s what we do well.
TW: I like to talk about the enterprise conflict between IT leadership and line-of-business stakeholders like my former marketer self. Toph-the-marketing-boy wants self-service everything—data immediacy without data-administration complexity. On the other side, IT leadership is tasked with ensuring auditability, lineage, governance, security. Which side of that customer equation do you target? IT side? Business influencer? Or both?
AV: Almost always the business side.
TW: Interesting. I confess that’s not what I expected!
AV: But whatever we do has that governance layer around it, has that auditing layer around it.
We talk about operationalizing data. In many cases, organizations have invested in PhD-level analysts to develop and test data models. They typically build a one-off analytic. It works beautifully, but at that point, the model has not provided value to the organization.
That little data model or data analytic must fit into a larger data workflow…one that the organization supports, and which works in conjunction with IT, and hits production databases to grab data out, pull it into the analytic model, do the data-finds computation, then potentially pushes it out again into other production databases, production CRMs, maybe into ERP systems. It’s that part—the data-workflow management—that is missing in today’s Big Data solutions. That’s where the Composable platform comes in. It allows you to connect the data sets, plug-and-play the analytics—that you either write or bring in from other open-source libraries—and be part of this broader operational process.
TW: You’re preaching to the converted! Enterprises need to hear the DataOps gospel. But I think most face a challenge on both the data consumption and data management sides of the house: They must overcome conflicting objectives to collaborate. Do you find that it’s difficult to evangelize collaboration to these enterprise groups?
AV: No. It’s actually easy once we’re in. When enterprises use our platform as a framework for building these operational data flow nodes, we typically have good engagement with IT leaders because they see things are sandboxed correctly.
TW: What’s deployment like?
AV: The platform itself is a native distributed application, native cloud architecture. It can be deployed on the cloud, and can scale both horizontally and vertically right on the cluster. But it doesn’t have to be on public cloud. You can spin up an instance with Composable on AWS or Axure, but it’s not required. We can run on-prem. Back to our Department of Defense legacy, we had to be able to run on-prem, and we do that. With some of our customers—like insurance industry—the data is sensitive, and we run on a cluster behind the corporate firewall completely disconnected from the web.
We’re a small company. The entire team is technical. I’m doing most of the business development! There’s a mismatch there, but we are growing and now focusing on bringing in good sales engineers and account executives. For deployment, we approach the people who get the most value out of data and work with them to develop a small-scale, short-term pilot, typically one or two months at most. After they see the value, they buy into Composable as a licensed delivery platform.
TW: What’s Composable’s funding situation?
AV: We were lucky enough to leave MIT with a product and customers ready and waiting. From day one—the end of 2014—we’ve been completely client-funded.
TW: Will you look to subsidize growth with outside investment?
AV: Yes. I think 2017 is the year for us. We’re reaching a point where capital will help us scale out dramatically.
TW: Where is your customer base?
AV: All regions, but predominantly domestic. We have one major customer who’s a global oil-and-gas conglomerate with operations in South America and other parts of the world.
TW: I understand you’re producing an upcoming tradeshow?
AV: Yes—the DataOps Summit conference series. The event is in June here in our hometown in Boston. More details will come online at dataopssummit.com.
We envision this to be an annual event, focused on getting all the data professionals into the same room. That’s both the business side of the house and technical audiences, like software developers, data scientists, data engineers, IT operations, quality assurance engineers, and so on.
Many enterprises have invested in data science, and developed some cool data applications, and now must figure out how to put them in an operational workflow to actually generate value! That’s what we’re trying to illustrate with this DataOps Summit series. We’ll bring in executives from the business side—financial services, insurance, oil and gas, cybersecurity, other verticals as well—and talk about what DataOps tools, techniques, best practices they can put together around data operations. But we’ll listen, too: The technology vendors in the room—Composable and others—can work with them on a DataOps vision that we can all build towards.
TW: Where does Composable Analytics go from here?
AV: First, democratizing data science. Enterprise data users should be able to become more and more like data scientists. Our current end users are typically sophisticated business users, but not necessarily technical. Ultimately, they know the business better than anyone else. We’re creating a framework to help these users develop their own workflows. Composable lets you create visual data flows. If you can imagine a complex data pipeline, you can create it visually as in a flow-based programming environment. We have a machine-learning computational framework behind this that will accelerate the process for an analyst to build these workflows. As that analyst selects different modules to build up the data flow, the machine will recommend the next such module to come in. So, machine learning is accelerating the development of new machine-learning data flows. That’s pretty cool.
Second, there’s a lot of noise out there, and we’ve seen many organizations delay data-management solution adoption. Composable started as a self-service analytics platform, but over time has become a DataOps platform with orchestration, automation, and analytics aimed at getting people out of the rat’s nest of spreadsheets, and to start thinking about data stores. We see DataOps being this transformative notion of best practices that allow organizations to say “Okay, we can do this.” We know how to do software development. We know how to do a production system. Now, let’s bring that to the data world and start to think about production and operational data science.