Can-train-and-test: A curated CAN dataset for automotive intrusion detection

Brooke Lampe,Weizhi Meng

doi:10.1016/j.cose.2024.103777

Abstract

When it comes to in-vehicle networks (IVNs), the controller area network (CAN) bus dominates the market; automobiles manufactured and sold worldwide depend on the CAN bus for safety-critical communications between various components of the vehicle (e.g., the engine, the transmission, the steering column). Unfortunately, the CAN bus is inherently insecure; in fact, it completely lacks controls such as authentication, authorization, and confidentiality (i.e., encryption). Therefore, researchers have travailed to develop automotive security enhancements. The automotive intrusion detection system (IDS) is especially popular in the literature—due to its relatively low cost in terms of money, resource utilization, and implementation effort. That said, developing and evaluating an automotive IDS is often challenging; if researchers do not have access to a test vehicle, then they are forced to depend on publicly available CAN data—which is not without limitations. Lack of access to adequate CAN data, then, becomes a barrier to entry into automotive security research.We seek to lower that barrier to entry by introducing a new CAN dataset to facilitate the development and evaluation of automotive IDSs. Our datasets—dubbed can-dataset, can-log, can-csv, can-ml, and can-train-and-test—provide CAN data from four different vehicles produced by two different manufacturers. The attack captures for each vehicle model are equivalent, enabling researchers to assess the ability of a given IDS to generalize to different vehicle models and even different vehicle manufacturers. Our datasets contain replayable .log files as well as labeled and unlabeled .csv files, thereby meeting a variety of development and evaluation needs. In particular, the can-train-and-test dataset offers nine unique attacks, ranging from denial of service (DoS) to gear spoofing to standstill; as such, researchers can select a subset of the attacks for training and save the remainder for testing in order to assess a given IDS against unseen attacks. Many of our attacks, particularly the spoofing-related attacks, were conducted during live, on-the-road experiments with real vehicles. These attacks have known physical impacts. As a benchmark, we pit a number of machine learning IDSs against our dataset and analyze the results. We present our datasets—especially can-train-and-test—as a contribution to the existing catalogue of open-access datasets in hopes of filling in the gaps left by those datasets.

Full Text