Richard A. Berk and Susan B. Sorenson
Mass violence, almost no matter how defined, is (thankfully) rare. Rare events are very difficult to study in a systematic manner. Standard statistical procedures can fail badly and usefully accurate forecasts of rare events often are little more than an aspiration. We offer an unconventional approach for the statistical analysis of rare events illustrated by an extensive case study. We report research whose goal is to learn about the attributes of very high risk IPV perpetrators and the circumstances associated with their IPV incidents reported to the police. Very high risk is defined as having a high probability of committing a repeat IPV assault in which the victim is injured. Such individuals represent a very small fraction of all IPV perpetrators; these acts of violence are relatively rare. To learn about them nevertheless, we apply in a novel fashion three algorithms sequentially to data collected from a large metropolitan police department: stochastic gradient boosting, a genetic algorithm inspired by natural selection, and agglomerative clustering. We try to characterize not just perpetrators who on balance are predicted to re-offend, but who are very likely to re-offend in a manner that leads to victim injuries. There are important lessons for forecasts of mass violence.