The rise in phishing attacks via e-mail and short message service (SMS) has
not slowed down at all. The first thing we need to do to combat the
ever-increasing number of phishing attacks is to collect and characterize more
phishing cases that reach end users. Without understanding these
characteristics, anti-phishing countermeasures cannot evolve. In this study, we
propose an approach using Twitter as a new observation point to immediately
collect and characterize phishing cases via e-mail and SMS that evade
countermeasures and reach users. Specifically, we propose CrowdCanary, a system
capable of structurally and accurately extracting phishing information (e.g.,
URLs and domains) from tweets about phishing by users who have actually
discovered or encountered it. In our three months of live operation,
CrowdCanary identified 35,432 phishing URLs out of 38,935 phishing reports. We
confirmed that 31,960 (90.2%) of these phishing URLs were later detected by the
anti-virus engine, demonstrating that CrowdCanary is superior to existing
systems in both accuracy and volume of threat extraction. We also analyzed
users who shared phishing threats by utilizing the extracted phishing URLs and
categorized them into two distinct groups – namely, experts and non-experts. As
a result, we found that CrowdCanary could collect information that is
specifically included in non-expert reports, such as information shared only by
the company brand name in the tweet, information about phishing attacks that we
find only in the image of the tweet, and information about the landing page
before the redirect.