The InterPlanetary File System (IPFS) has recently gained considerable attention. While prior research has focused on understanding its performance characterization and application support, it remains unclear: (1) what kind of files/content are stored in IPFS, (2) who are providing these files, (3) are these files always accessible, and (4) what affects the file access performance. To answer these questions, in this paper, we perform measurement and analysis on over 4 million files associated with CIDs (content IDs) that appeared in publicly available IPFS datasets. Our results reveal the following key findings: (1) Mixed file accessibility: while IPFS is not designed for a permanent storage, accessing a non-trivial portion of files, such as those of NFTs and video streams, often requires multiple retrieval attempts, potentially blocking NFT transactions and negatively affecting the user experience. (2) Dominance of NFT (non-fungible token) and video files: about 50% of stored files are NFT-related, followed by a large portion of video files, among which about half are pirated movies and adult content. (3) Centralization of content providers: a small number of peers (top-50), mostly cloud nodes hosted by tech companies, serve a large portion (95%) of files, deviating from IPFS's intended design goal. (4) High variation of downloading throughput and lookup time: large file retrievals experience lower average throughput due to more overhead for resolving file chunk CIDs, and looking up files hosted by non-cloud nodes takes longer. We hope that our findings can offer valuable insights for (1) IPFS application developers to take into consideration these characteristics when building applications on top of IPFS, and (2) IPFS system developers to improve IPFS and similar systems to be developed for Web3.
Read full abstract