Configuring Anomaly Detection

Anomaly detection uses the built-in machine learning algorithm to discover hidden trends and outliers in your data without the need for manual analysis or custom development. Anomalies are data points that fall outside an overall pattern of distribution. Identifying the anomaly causes and correlations can help you make data-driven decisions.
You can customize the anomaly detection settings by specifying your own parameters and adding key driver analysis.
To configure anomaly detection in an insight:
  1. Click Dashboards tab, the Dashboards tab.
  2. Click Dashboard Designer, in the bottom-left corner.
  3. On the Analyses dialog, click New analysis or click Options button, the Options button, next to an existing analysis and select Edit.
  4. On the analysis page, add a suggested or a custom anomaly detection insight.

    Add at least one date, one measure, and one dimension to the insight. You can add up to five dimension fields that are not calculated fields in the Categories field well.

    For more information on creating insights, see Adding Suggested Insights or Adding Custom Insights.

    Note: You can also add an anomaly detection insight by clicking ML-powered insight button, the ML-powered insight notification button, that is displayed for visuals where an anomaly or key drivers analysis opportunity is identified. Follow the prompts to set up anomaly detection based on the data from the visual.

  5. To configure the anomaly detection settings, click Menu options button, the Menu options button, in the corner of the insight and select Configure anomaly.
  6. In the Set up anomaly detection dialog, configure the anomaly detection settings.
    1. For Combinations to be analyzed, select the method for analyzing the hierarchical field combinations for the Categories field well:
      • Hierarchical: Analyzes the fields hierarchically. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), the fields are analyzed as:
        T-N, T-C1-N, T-C1-C2-N, T-C1-C2-C3-N
      • Exact: Analyzes only the exact combination of fields in the Categories field well, in the order that they are listed. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), the fields are analyzed as:
        T-C1-C2-C3-N
      • All: Analyzes all the field combinations in the Categories field well. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), the fields are analyzed as:
        T-N, T-C1-N, T-C1-C2-N, T-C1-C2-C3-N, T-C1-C3-N, T-C2-N, T-C2-C3-N, T-C3-N
      If you chose only a date and a measure, the fields are analyzed by date and then by measure. The Fields to be analyzed section displays the list of fields from the field wells, for reference.
    2. For Name, enter a descriptive alphanumeric name with no spaces.
      This name is used as the name of the computation. If you edit the narrative that automatically displays in the insight, you can use the specified name to identify this computation.
    3. In the Display options section, customize what is displayed in the insight.
      These options are available:
      • Maximum number of anomalies to show: The number of outliers that you want to display in the insight.
      • Severity: The minimum level of severity for anomalies that you want to display in the insight.

        A level of severity is a range of anomaly scores that is characterized by the lowest actual anomaly score included in the range. All anomalies that score higher are included in the range. If you set the severity to Low and above, the insight displays all the anomalies that rank between low and very high. If you set the severity to Very high, the insight displays only the anomalies that have the highest anomaly scores.

      • Direction: The direction on the x-axis or y-axis that you want to identify as anomalous.

        The default option, [ALL], identifies all anomalous values, high and low. You can select Higher than expected or Lower than expected to identify only higher values or only lower values as anomalies.

      • Delta: The minimum deviation used to identify anomalies. Any amount higher than the threshold value counts as an anomaly.

        You can set an absolute value as the threshold. For example, if you enter 48, values are identified as anomalous when the difference between the value and the expected value is greater than 48. You can also set a percentage threshold. For example, if you enter 12.5%, values are identified as anomalous when the difference between the value and the expected value is greater than 12.5%.

      • Sort by: The sort method applied to the anomaly detection results.

        These options are available:

        • Weighted anomaly score: The anomaly score multiplied by the logarithm of the absolute value of the difference between the actual value and the expected value. This score is always a positive number.
        • Anomaly score: The actual anomaly score assigned to this data point.
        • Weighted difference from expected value: The anomaly score multiplied by the difference between the actual value and the expected value. This is the default option.
        • Difference from expected value: The actual difference between the actual value and the expected value.
        • Actual value: The actual value with no formula applied.

      Note: You can still explore all the results, regardless of what the insight displays.

    4. In the Schedule and alert options section, set the schedule for automatically running the insight recalculation.
      The schedule runs only for published dashboards. In the analysis, you can run it manually as needed.

      These options are available:

      • Occurrence: How often the recalculation is run. You can set it to run every hour, every day, every week, or every month.
      • Start schedule on: The date and time to start running this schedule.
      • Timezone: The time zone that the schedule runs in. To view the list of time zones, delete the current entry.

    5. In the Top contributors section, configure settings for analyzing the key drivers when an anomaly, or outlier, is detected.
      For example, you can see the top locations that contributed to a spike in printer throughput.

      Select the fields for the contribution analysis under Select fields. You can select up to four dimensions from your dataset, including dimensions that are not added to the field wells of the insight.

  7. Click Save.
  8. To run the anomaly detection, click the Run now button inside the insight.
    The amount of time the anomaly detection takes to complete varies depending on how many unique data points you are analyzing. The process can take a few minutes to up to a few hours. While processing is running in the background, you can still work with the analysis.

    Make sure that you wait for the processing to complete before you change the configuration, edit the narrative, or open the Explore anomalies page for the insight. The anomaly detection needs to run at least once before you can see results. If you think the status might be out of date, try refreshing the web browser page.

    Anomaly detection insights display different options or messages depending on the processing status:

    Option or message Status
    Run now button The job has not started yet.
    Analyzing for anomalies... message The job is currently running.
    Narrative about the detected anomalies The job has run successfully. The message also says when the insight calculation was last updated.
    Alert icon with an exclamation point (!) There was an error during the last run.

    If the narrative is still displayed, you can still click Explore anomalies to use data from the previous successful run.