By training, Dr. Zhu “Drew” Zhang is a computer scientist teaching at Iowa State University’s Ivy College of Business. By enthusiasm, practice and dedication, Zhang is a bit of a self-taught linguistics expert.
It started when he moved from his hometown in northern China to school in the south, where the difference in dialects was so dramatic he essentially learned a new language to function.
“I tried to mimic my dorm mates, and they kept laughing at me for three weeks, but in a few months they stopped laughing,” Zhang recalled. “There was motivation to speak the local language – I didn’t want to go shopping and need some local interpreter… that was a very mundane life goal.”
In the early 2000s, when Zhang moved to Michigan pursuing a PhD in computer science, the daily chore of practising English – and a bit of chance – pulled him toward a professor studying computer systems that answered questions. Issuing instructions to Siri on a smartphone was still a few years off.
“That particular goal was considered the holy grail of the entire computing community, broadly speaking,” Zhang said. “There wasn’t anything close to what we have today, in terms of smartphones being able to do all kinds of things they do [today].”
Today, Zhang studies natural language processing, a discipline of artificial intelligence.
“Computers can’t quite speak human languages yet, but our research is trying to bring that dream closer,” Zhang said. “We want to make computers understand human language and speak human language.”
The challenge, Zhang said, is the ambiguity in human languages – like paraphrasing.
“Computers are not built to handle ambiguity,” Zhang said. “We have thousands of ways to say the same thing, ‘I love you’ or ‘I hate you’. As human beings, we’re extremely good at exploiting the subtlety of language.”
Zhang is working with ISU doctoral student Amulya Gupta to develop computational models based on deep learning to improve accuracy and comprehension. Consumer product reviews offered a key insight opportunity for Zhang and Gupta’s computer subjects, and led to the development of a new model.
“We say, ‘hey, here is a pair of paraphrases, find some patterns.’ ‘Here is a pair of non-paraphrases, find some patterns in it.’ At a very high level, that’s how that algorithm works,” Zhang said. “It’s a machine-learning algorithm that is trained over a large number of sentence pairs. Half of those are paraphrases, half of those are not paraphrases.”
The pair tested the model using 50,000 sample sentences, and found the algorithm to be 80 to 85 percent accurate at identifying patterns. Zhang and Gupta recently presented on their research at the annual meeting of the Association for Computational Linguistics in Australia.
Zhang’s goals has grown since he was grocery-shopping in a new town as a young student. Today, Zhang believes the near-term challenges in his field will be building a computer capable of conversation in human language – ambiguity and all.
“Computers are still quite bad at doing these things. We’re far from being there,” Zhang said. But, “this line of tech has advanced quite a lot in the last five years.”