They took syntax trees of decompiled contract functions ( https://eveem.org/json/0x06012c8cf97BEaD5deAe237070F9587f8E7... ), and launched machine learning on top of them.
This resulted in groupings of contract functions not by name and behaviour.
Here's the outline:
https://twitter.com/willprice221/status/1104739673593835521
Here's a direct link to an example contract: http://35.198.178.99:8080/contract/0x06012c8cf97bead5deae237...
They were able to cluster various implementations of the same functions (e.g. transfer), and also found out that some of the bugs in functions make them pushed into separate clusters.
I think it may be one of the first analysis of contracts based on their decompiled bytecodes (and not just bag of word assembly).
Compared to analysing the original source codes it has a few benefits: - way more data to work with (~3M contracts, not ~200-300k with published sources) - the codes after the decompilation have a form that is close to canonical - that is, two contracts that implement similar functionality will look similar - with source codes this can vary.
The team behind it is Will Price, Aleksey Studnev, Ankit Chiplunkar and Alexander Azarov (ccing them to this post)