How does a retailer tease out, from among the millions of web browsers, which online visitors are actually going to spend money on its site?
That’s a timely problem this time of year, when single-day online sales topped $1 billion for the first time on Cyber Monday, the first weekday after Thanksgiving.
Or how can scientists and regulators predict from a molecular stew of emissions whether a particular exhaust cloud comes from a car or a truck? Or how can bankers and Realtors predict from among thousands of homes in a particular market which ones are likely to be refinanced — and what they would sell for?
Each of those data-crunching questions has been a challenge laid down to computer science students around the globe by Minneapolis-based FICO in conjunction with the Computer Science Department at the University of California-San Diego. The two organizations recently announced the winners of their seventh annual UCSD-FICO Data Mining Contest.
FICO, formerly known as Fair Isaac, is best known by consumers for its FICO credit score. The company also bills itself as the leader in predictive analytics that aim to help corporate clients make better informed decisions across a host of corporate functions.
Only two of the prize winners were from the United States, including graduate student Jianfei Wu from North Dakota State University and an undergraduate from Duke who won in two separate categories.. The rest of the winners were from India (four), Korea (two) and one each from Ireland, New Zealand and Russia.
“Predictive analytics is a rapidly growing field that is changing the way businesses make decisions and develop strategies,” said Dr. Andrew Jennings, FICO’s chief research officer and the head of FICO Labs. “Once again, UCSD put on a great competition that encouraged students to utilize their creativity as well as their training in predictive analytics to solve the same type of problems that the world’s biggest companies face every day.”
The contest, which started with 20 teams from UCSD in 2004, has grown to more than140 contestants from six continents this year. “We wouldn’t have predicted that [growth] but there is talent everywhere,” observed Professor Charles Elkan of UCSD’s Department of Computer Science and Engineering.
One reason more overseas students compete and win, Elkan believes, is that students outside the United States don’t have the same opportunity as American students “to show their stuff” at their own universities. “They have to do this in order to stand out,” he said. “Americans are less motivated to participate.”
Students with data mining expertise are in very high demand, he added. “Any business that wants to target customers more closely, or understand the behavior of customers, suppliers and employees,” uses data mining techniques. “Pretty much every business can be more efficient by making more accurate predictions,” he said.
This year, contestants were given anonymous data for more than 130,000 consumers. The data included no personally identifiable information, but included information such as ZIP codes, which could be classified as urban or rural, for example. Based on that data, competitors built models to predict which consumers were most likely to buy products online. Winners were able to predict the likelihood of future purchases 68 percent of the time, Elkan said.
The hardest part of running the contest is getting the data set, he added. “For almost any data set, there are problems of privacy and anonymity. We work very hard to make sure everything we do respects [the research subjects’] privacy.”
The competition was divided into graduate and undergraduate divisions and worked with two data sets, one with raw data and the other with “transformed” data that had received some preliminary work. The top three finishers in each category and each division shared $10,000 in cash prizes.