DataOps for the Analytical Workflow
Why DataOps for improving the analytical workflow?
While every data-driven organization embraces an analytical workflow, those that hope to become “digital masters,” as called by George Westerman of MIT, are quickly adopting a DataOps approach.
The analytical workflow is the process in which an organizations “data professionals,” that is, data scientists, data engineers and business analysts, work together to identify, cleanse, mine, filter, pivot, and exploit data for insights and business value. The analytical workflow typically starts with a challenge or problem in a business process, and a hypothesis on how to solve this. For example, in the airline industry, a problem may be a delay in take-off times; and a hypothesized reason for this is because of a recent change in airline boarding practices. There are several steps that need to occur to try and identify the root cause, and typically it is not a lock-step process; rather, it involves multiple iterations. Some of the steps in this iterative process include: identification, preparation, development, deployment, execution, and adjustment.
Working through this use case, we can identify how DataOps, enabled by Composable Enterprise, can improve the efficiency of the team of data professionals by reducing the latency between and within each of these steps.
Identifying and getting access to the source data for analysis can sometimes be an undertaking in itself for large distributed organizations. This may include getting access to flight log times, check-in records, and customer complaint data. And while you would think most data feeds would already been identified in an organization, most problems are unanticipated, and require data from sources not normally used.
During this process, the identified source data is sometimes not the actual data required to fully investigate the problem. This only occurs after diving deep into the data. This can be a very time consuming process. For example, the hypothesis may change and it is now thought maintenance issues are causing the delays. Maintenance records will need to be pulled to continue the investigation.
Composable Enterprise provides capabilities for all data professionals to share and reuse data views. While the data may be new to the analyst, most likely another analyst has investigated a previous problem that required the dataset. The data professionals can take the live view and use it within their dataflow application by simply dragging and dropping the component.
Data is never pretty. It’s hairy, covered with warts, and downright nasty sometimes … and data is never in the format data professionals want it in. This may include dates and times in different zones, null and 0 values, strings vs numerics, unstructured text, and contradictions in the data are commonplace. This is actually where the majority of the time is typically spent. And sometimes, preparing the data takes so long, that the results become meaningless.
Composable Enterprise provides capabilities to cleanse and filter the data with a dataflow methodology. This allows the data professionals to view the intermediate data at each step in the preparation process. This process alone can uncover problems within the systems and overall business process, which may not have been the original target for the analysis. In addition, multiple datasets can be joined together by connecting components using the dataflow language. This results in a tremendous amount of time savings, and the cleansing processes can be shared and reused for future use.
There may be multiple paths a data professional may take to find the root cause. They may group the data by different plane models, carriers, or geographic regions. They may also want to compare departures over time with the deployment of new practices, weather, or seasonality trends. There are several steps in the data processing chain. Unfortunately, building new and accurate analytic process takes far too much time and resources.
Composable Enterprise leverages a dataflow methodology for developing analytics. Users can string together reusable components (queries, filters, statistical functions). Multiple applications can be created, or multiple analysis branches within the same application can be developed for side-by-side comparisons of results. A visual mapping of how the data is being processed and joined is critical to extracting insight.
If interesting results are found through the creation process, the analytic may be deemed important enough to be ‘built-in’ to the current business practice. However, data professionals may use Matlab, R, SAS, and other statistical packages to do the heavy mathematical modeling. However, there is a disconnect between these technologies and production systems. Developers, who normally don’t use the analytical packages, receive the requirements, and rewrite the methods to fit in the product environment. In some circumstances, deployment takes longer than the initial development.
Composable Enterprise is a robust solution, created for a “multi-stakeholder” user-base within an organization and allowing for analytical techniques to be written and deployed within production systems. There is no need to rewrite the ‘gold standard’ or validated statistical methods for use in production, which would result in delays and bugs potentially introduced. An application can be authored, tested, and deployed in the environment by specific users who have permission. This lowers the bar tremendously for deploying new capabilities within an enterprise.
For large datasets, there may be long delays in execution time. It is also common that all data professionals do not have the hardware resources seen in production, resulting in unknowns surrounding model execution.
Composable Enterprise parallelizes the execution of the analytical process, resulting in a tremendous amount of time savings. Execution of the analytics occurs in a shared environment, resulting in very easy execution of another user’s work. There is no need to download all the required dependencies, build, and adjust configuration and connection strings. Analytics can be executed with a click of a button, and knobs adjusted. It’s literally that easy.
Because data and processes are constantly in flux, analytics can become obsolete quickly. Adjustments will need to be made. For example, once a bottleneck in the airplane departures is found, another plaguing delay may be found elsewhere.
Within Composable Enterprise, analytical methods can be very easily modified. The system blurs the lines between configuration adjustments, and newly developed methods because components within an application can be easily swapped out and reconfigured.
Embracing and implementing DataOps with Composable Enterprise shortens the analytic development cycle by integrating the whole process from start to end. By using a dataflow methodology, data professionals organize data flows without the need to learn another programming language, allowing analysts to become very efficient. Not only is the authoring of analytics efficient, but the execution is as well. Composable Enterprise parallelizes the execution steps of analytics by analyzing the dependencies between components.