Active Data Profiling

Active Data Profiling

Nestlogic's innovative approach to big data anomaly management is called Active Data Profiling, and is embodied in a product line called Nestlogic ADP. The core assumptions behind ADP include:

  • Important anomalies will likely be found in segments of data, rather than individual records.
  • There are many variables or values that can delineate significant data segments, such as time, geography, technical indicators about internet clients, or business indicators about a system's actual users.
  • Data profiles for different segments should be compared in as many ways as is practical, so as to maximize the chance of uncovering anomalies.

In line with those principles:

  • Nestlogic ADP profiles -- i.e. models -- big datasets and streams.
  • Anomalies are data segments whose profiles deviate greatly from what the models suggest.

 

Nestlogic ADP

Active Data Profiling

There are three main parts to Nestlogic ADP.

  • A stack of standard big data platform packages, including Hadoop, Spark, Kafka and others. (Nestlogic's team has vast experience operating and using such technologies.)
  • Our algorithms for Active Data Profiling, which yield an unprecedented combination of breadth, precision and performance.
  • Tools to see, analyze and share the anomalies discovered.