Big Data Anomalies

Why anomalies matter

Most analytic technology use fits into three categories:

  • Find problems, and help fix them.
  • Find opportunities, and help exploit them.
  • Monitor to see whether there are any problems or opportunities that need attention.

Data anomalies are central to all three.

For example, in internet marketing:

  • Data may arrive from a large number of internal and external systems, each with its own technology stack. The sooner you surface evidence of a malfunction, the sooner you can get it fixed.
  • Internet fraud can be a hugely expensive problem, and is only caught via "tells" that somehow distinguish it from genuine activity.
  • If you notice and analyze a small, localized increase in sales, you might find a way to multiply the effect.

Many enterprises face such challenges and opportunities.


Anomalies in big data

The big data era has introduced new challenges in anomaly identification and management.

  • Old-style anomaly detection typically looks for known data patterns, or for changes in pre-specified data metrics. But the variety, variability and complexity of big data render such techniques obsolete.
  • The alternative to known-pattern matching is to compare data to other data. But naive strategies of this kind have processing burdens that are exponential in data volume.
  • Velocity requirements have also increased, to the point that anomaly detection may soon need to be done at streaming speeds.

The tools used to visualize and analyze big data anomalies must also respect these challenges.