Computer Science Distinguished Seminar
On the Naturalness of Software, and How to Exploit It
This Distinguished Seminar has been cancelled.
When: Monday, April 27, 2020
Where: PGH 232
Time: 11:00 AM
CANCELLED
Speaker: Dr. Prem Devanbu, University of California-Davis
Host: Dr. Amin Alipour
While natural languages are rich in vocabulary and grammatical flexibility, most
human <utterances> are mundane and repetitive. This repetitiveness in natural language
has led to great advances in statistical NLP methods.
At UC Davis, we discovered back in 2012 that, despite the considerable power and
flexibility of programming languages, large software corpora are actually even more
repetitive than NL Corpora. We were the first to show that this “naturalness” of code
could be captured in statistical models, and exploited within software tools. The
field has since blossomed, with numerous applications: to de-obfuscation, code synthesis,
defect-finding, etc. New groups have formed at Facebook, Microsoft, and Google, and
also several startups. In this talk, we will introduce our earlier findings, and some
recent results exploring the science of why code-in-the-wild is so repetitive, and
also some new ways of training deep-learning models to correct student code.
Bio:
Prem Devanbu received his B.Tech from IIT Madras, and his Ph.D. in Computer Science from Rutgers University under Alex Borgida. After working at Bell Labs and its various offshoots in New Jersey for many years, he joined the faculty of UC Davis in 1997. He is an ACM Fellow.