Fraud detection, a common use of AI, belongs to a more general class of problems — anomaly detection.
An anomaly is a generic, not domain-specific, concept. It refers to any exceptional or unexpected event in the data: a mechanical piece failure, an arrhythmic heartbeat, or a fraudulent transaction.
Basically, identifying a fraud means identifying an anomaly in the realm of a set of legitimate transactions. Like all anomalies, you can never be truly sure of the form a fraudulent transaction will take on. You need to take all possible “unknown” forms into account.
Here’s an interesting article on doing anomaly/fraud detection with a neural autoencoder.
Using a training set of just legitimate transactions, we teach a machine learning algorithm to reproduce the feature vector of each transaction. Then we perform a reality check on such a reproduction. If the distance between the original transaction and the reproduced transaction is below a given threshold, the transaction is considered legitimate; otherwise it is considered a fraud candidate (generative approach). In this case, we just need a training set of “normal” transactions, and we suspect an anomaly from the distance value.
Based on the histograms or on the box plots of the input features, a threshold can be identified. All transactions with input features beyond that threshold will be declared fraud candidates (discriminative approach). Usually, for this approach, a number of fraud and legitimate transaction examples are necessary to build the histograms or the box plots.