Cancer diagnosis, prognosis and therapeutic response predictions are based on data extracted from various modalities, such as histology slides, cell graphs and molecular profiles from genomic data. In recent years, with the increasing availability of multimodal biomedical data, an increasing number of multimodal fusion strategies have been proposed to overcome the challenge of data heterogeneity and make full use of intersections and complementarities of different data sources. However, most of them have the problem of parameter redundancy and inefficiency in the process of fusion. Therefore, in this work, we present a Cycle-Attention Transformer (Cy-Atten) framework that learns a more robust joint representation for multimodal cancer survival analysis. The Cy-Atten adopts a cycling mechanism with sparse attention and yields a theoretical cost of O(N) for adding a modality, which is proven to effectively reduce the parameter redundancy. Empirically in the experiments on two cancer datasets from the TCGA database, the parameters in our fusion part are only of the order of 103 considering three modalities. In addition to being lightweight, the Cy-Atten outperforms several state-of-the-art multimodal fusion methods in survival analysis and grade classification of two cancers from the TCGA database. Our work can be reproduced at https://github.com/dlmscl/Cycle-Attention-Transformer.