Dig Into Data Mining

What's your favorite brand of toilet paper? How about deodorant?

Some stores, especially online retailers, don't have to ask. They already have the answers they're looking for, helping them sell more products. They know what you buy, in what amount, at what price, how you pay, and when you are most likely to come back to restock your supplies.

Moreover, they know who you are, the best ways to deliver advertising to you, and they know what else you are likely to buy at the same time you buy toilet paper or deodorant.

While the specific examples above may not cause envy among law enforcement, the fact that retailers have better analytical capacities than most law enforcement agencies should.

This fact frustrates Colleen "Kelly" McCue, a senior research scientist at RTI International, a non-profit research institute.

"In law enforcement, if you do your analysis wrong, you can compromise public safety," she says.

Before joining RTI International, McCue was program manager for the Richmond (Virginia) Police Department Crime Analysis Unit, where she pioneered the use of data mining and predictive analysis.

Data mining, also referred to as predictive analytics (or analysis), sense making or knowledge discovery, involves the systematic analysis of large data sets using automated methods, she explains. Wanting to help the enforcement community learn more about data mining, she wrote "Data Mining and Predictive Analysis."

McCue is hopeful data mining will become more widespread in law enforcement, because she says it is within the grasp of agencies of all sizes and at all levels. In fact, she says agencies are already data mining to some extent in investigations (determining motive is one example), but they also can use data mining to predict and prevent criminal acts.

A big emphasis today is being placed on counting crime, counting what happened, she says.

"One of the things data mining and predictive analytics allows us to do is move from counting crime to anticipating, preventing and perhaps responding more effectively to it," she says. "We can focus on what we consider to be an effective use of our information and how we want to manage our resources and fight crime. If it is counting crime, that's great. But we know criminal behavior tends to be relatively predictable. By exploiting the data, we can be much more proactive in anticipating and preventing crime than we are now."

The importance of analysis
Data which means nothing to one case could solve another.

"All law enforcement data is very important," says Steve McCraw, director of homeland security in Texas. "A parking ticket, for example, could be a valuable lead in a conspiracy investigation being worked on a series of robberies."

Overall, law enforcement has become very good at collecting and compiling data, especially since the advent of computerized records management systems. Regional sharing initiatives and state-level fusion centers add to the data that individual agencies can tap into. And, national law enforcement data sharing standards help make this possible.

While information sharing initiatives certainly are beneficial, McCue says "don't stop there." Once data is collected in a meaningful fashion, the next step is analysis, she notes.

Unfortunately, McCue adds, the importance of analysis is not a universal understanding today.

Yet, she says the process of analyzing the data is important to:

  1. confirm what you already know and,
  2. discover new information or relationships in data (knowledge discovery).

Jay Albanese, graduate director of criminal justice at Virginia Commonwealth University, says police need information more than ever before and it is increasingly difficult to obtain.

The point at which police solve major crimes has been dropping nationwide over the past 10 to 15 years, he says. One reason is there are more complicated crimes, affiliated with terrorists, organized crime or ethnic minorities, where language can be a barrier.

Data mining technology
Once the value of analysis is understood, McCraw says the question is "How do you sift through the data and find the key elements that can help prevent an act of terrorism or crime?"

Law enforcement chiefs, sheriffs and other managers want to work smarter, cheaper and faster, says McCraw, former FBI assistant director of The Office of Intelligence.

"The way to do that is to do what the private industry has done and take advantage of the tremendous gains in information technology," he says, noting law enforcement should adopt the National Information Exchange Model for its records management systems.

"You want to be able to empower your personnel with the ability to find points of information they previously couldn't — and to find the links, the associations between data sets. That's very powerful."

Timely information also is key.

"You want to be able to exploit the data in your files as quickly as possible," he adds.

If it takes a week to show a supervisor the crimes that took place in one night, it's dated; it's not as useful as showing a supervisor last night's crimes, says Albanese, former chief of The International Center for the National Institute of Justice.

"The longer the time lag between the incident and being able to get it into a useable form, the less useful it is," he says, noting reports should be electronically entered (not handwritten) so data can be included in analysis and acted on quickly.

Using an analytical overlay or filter with remote data entry, an investigator could enter relevant information while at a crime scene and receive a rapid analytical response, McCue says.

Specialized databases can be created for crime or intelligence analysis. These databases might be offense-specific, such as a homicide or robbery database, or associated with a pattern of crimes. Records management databases generally were not made for analysis. Rather, McCue says they were created for case management and general crime counting.

Unfortunately, analytical software is not inexpensive and software specifically for data mining and predictive analytics falls into the high end of the price range, McCue points out. Agencies sharing information could benefit from pooling their financial resources for data analysis. Predictive analytics requires specialized software. Other data mining can be done without sophisticated software, but, she adds, "the software really helps."

Link analysis tools, used to identify relationships in data, such as telephone calls, can be an economical point of entry into data mining, she suggests.

Natural data miners
With today's friendly, commercial-off-the-shelf software packages, McCue believes most agencies are capable of analyzing their own data.

In fact, she says investigators and crime analysts are natural data miners. Based on her experience, she says it's far easier to teach them how to use data mining tools and apply them to law enforcement than it is to teach statisticians how to work in law enforcement.

For those somewhat afraid of numbers and run an incalculable distance at the mention of "statistics," McCue offers comfort: "Data mining is an intuitive process. It's not statistics."

What is important is knowing:

  • what questions you want to answer,
  • what you need to analyze the data and
  • what you need the output to look like.

There also are rules of the road to avoid errors in analysis, but McCue reassures they're not very difficult.

"I think it's incredibly important that law enforcement agencies get over the fear and trepidation and technophobia or whatever they might have, and analyze their own data," McCue says. "Particularly in a specific department or region, agencies are going to have the tacit knowledge and domain expertise, and understand their data better than anyone else. I can't go in and learn a community to the depth they already know. They are going to have that domain or subject matter expertise on their community and department that's going to be necessary to evaluate the results and operationalize them effectively."

Albanese points out the New York City Police Department's CompStat, now used by a number of other agencies, is essentially an exercise in data mining. "It's looking more carefully, more systematically at the information police are already collecting," he says. "It's looking at reported crimes and different areas of the city, plotting them on maps, looking at trends, looking at the allocation of police around the city, looking for hotspots."

Law enforcement also can use data mining to marshal support of the community to assist in crime prevention. Armed with data about trends and patterns, police can turn to businesses, school groups and others, and show where help is needed.

"If a lot of theft activity is taking place near a mall, it only makes sense the shopping mall share responsibility for the efforts to prevent crime there," he says.

Long-term, he says, "We want to prevent crime, and crime prevention is really everybody's responsibility."

Addressing critics
Police managers who understand data mining can in turn educate the public about data mining and its benefits, as well as address critics.

"Police managers, command staff and public officials always need to be sensitive to public perceptions about how they do business," McCue says. "There's a move toward transparent government. People want to know how we do things, how we analyze data, what data we're looking at."

Working with the city council, legislators or an agency's oversight group is important when technology is upgraded, McCraw says, because it helps alleviate presumptions and misinformation.

Data mining is not an abusive technique to spy on citizens, says McCraw, who testified before Congress on the subject during his tenure with the FBI.

"It's using information technology to locate the information that you need among data you already have," he emphasizes.

Data mining is an analytical process. "The same rules that have always applied to legally permissible means of accessing data are always going to apply," McCue says.

Other criticisms of data mining are that it doesn't work and wastes resources.

"I think they are absolutely wrong," she adds. "We found it does work. When data mining is done by someone who knows data mining, and understands the limitations of law enforcement data and the analytical outcomes sought — or works with someone who does — data mining reduces errors."

While with the Richmond PD, McCue used data mining to reduce gunfire complaints by almost 50 percent on New Year's Eve 2003 and increase the number of illegal weapons seized by 246 percent from the previous year, while using fewer officers.

Some data mining is more difficult than others. Very infrequent events are difficult to model.

"That is where I think it becomes really important law enforcement personnel do the analysis themselves or participate very actively in the analysis," she says.

Despite the fact that measures are taken to reduce errors, errors happen, as they do with anything.

McCue uses a medical analogy to remind that not all errors in law enforcement are equal.

As long as a disease is identified effectively, screening tools are allowed a certain number of errors, or false positives. Yet, there are other situations in which there is no room for error. If someone who is ill is given a wrong antibiotic, an illness might not only not be cured, it could worsen.

Again, people doing data mining must work closely with people who understand law enforcement and criminal behavior so they can make informed decisions about the nature of the errors, which errors are acceptable and which are not, she says.

"Maybe if you put officers in the wrong location, they spend a night in the cold," she says. "That's not necessarily a big deal."

But, she says if you're using data mining to determine motive and you make an error, the danger associated with misdirecting resources can cause a crime to remain unsolved.

In her book, McCue gives the example of creating a model that's 97-percent accurate by always predicting crime will not take place in a certain low crime area. That is unacceptable, she says.

"Getting inside the nature of the errors and making informed decisions is key," she says.

Predicting the need for predictive analysis
Once law enforcement starts looking at data mining, they realize in many ways, they're already doing it, she says.

Determining motive in violent crimes is one example she gives: "It's setting up decision trees: Was the victim at high risk or was the victim not at high risk? Was the victim killed in the location she or he was found, or was the victim moved? Was it a crime of opportunity?"

McCue encourages capturing and extending some of the natural data mining that's already occurring and then bringing in additional law enforcement-specific tools. While more can be done today, even more will be needed tomorrow.

"The population of the United States is at an all-time high, so the volume of crime is going to rise as population increases," Albanese says. "As the population gets more diverse, solving crimes is going to get more and more difficult. Police need all the potential tools they can find, and I think data mining is a very useful tool."

McCraw asks, "How can you not be excited about being able to identify seemingly unidentifiable points that will enable you to prevent acts of terrorism or crime or even solve crimes?"