Rachel Greenstadt

September 18, 2015 at 12:00 PM in 380 Soda Hall

Title: De-anonymizing Programmers via Code Stylometry

Abstract: Source code authorship attribution is a significant privacy threat to anonymous code contributors. However, it may also enable attribution of successful attacks from code left behind on an infected system, or aid in resolving copyright, copyleft, and plagiarism issues in the programming fields. In this work, we investigate machine learning methods to de-anonymize source code authors of C/C++ using coding style. Our Code Stylometry Feature Set is a novel representation of coding style found in source code that reflects coding style from properties derived from abstract syntax trees. Our random forest and abstract syntax tree-based approach attributes more authors (1,600 and 250) with significantly higher accuracy (94% and 98%) on a larger data set (Google Code Jam) than has been previously achieved. Furthermore, these novel features are robust, difficult to obfuscate, and can be used in other programming languages, such as Python. We also find that (i) the code resulting from difficult programming tasks is easier to attribute than easier tasks and (ii) skilled programmers (who can complete the more difficult tasks) are easier to attribute than less skilled programmers.

Bio: Rachel Greenstadt is an Associate Professor of Computer Science at Drexel University, where she research the privacy and security properties of intelligent systems and the economics of electronic privacy and information security. Her work is at "layer 8" of the network—analyzing the content. She is a member of the DARPA Computer Science Study Group and she runs the Privacy, Security, and Automation Laboratory (PSAL) which is a vibrant group of ten researchers. The privacy research community has recognized her scholarship with the PET Award for Outstanding Research in Privacy Enhancing Technologies, the NSF CAREER Award, and the Andreas Pfitzmann Best Student Paper Award. She is currently visiting Berkeley while on sabbatical and will be around until December.

Security Lab