In modern information transmission systems, a special place is occupied by telecommunication systems (TS), which include digital data transmission systems (DSTS) containing specialized computers (SEVMs). The systems under consideration must ensure fast and reliable transmission of discrete data over communication lines. In this case, the loss of even one transmitted character is unacceptable; therefore, the main indicator of the reliability of a computer is the probability of failure-free operation. To increase the probability of failure-free operation of a storage device (MS), corrective codes and duplication are used, and to increase the probability of failure-free operation of the arithmetic-logical unit (ALU) of a computer, majority redundancy is used. The disadvantage of the majority method is the high hardware costs for redundancy, which reduces the efficiency of its use. The disadvantages of using correction codes include: as a rule, they are used for separate redundancy (detection and correction of errors that occur in storage devices; majority redundancy is used to detect and correct errors in the ALU); not all arithmetic and logical operations can be controlled based on existing correcting codes; a sharp increase in hardware costs when using algebraic linear codes to detect and correct multiple errors. Detection of errors in the ALU of a computer processor can be achieved on the basis of duplication in a loaded mode with replacement, however, this is associated with solving the main problem of computer duplication: - selection of monitoring tools for determining a failed channel (detection of errors in storage and information processing devices). The purpose of the research is to develop a scientific and methodological apparatus for computer backup using the duplication method with error detection in backup channels based on the use of an algebraic linear code, which makes it possible to detect errors not only in storage devices, but also adapted for detecting errors when performing arithmetic and logical operations. Results. An analysis of operation and selection of reliability indicators for specialized computers (SEVMs) of telecommunication systems was carried out. Requirements for computer backup methods are formulated. A comparative assessment of the detecting ability and hardware costs was carried out when implementing the majority redundancy method, the duplication method and the use of correcting codes. The expediency of using the duplication method is substantiated to increase the probability of failure-free operation and survivability of self-healing digital computers, using algebraic linear codes to determine the faulty channel, which allows to significantly reduce hardware costs for constructing monitoring equipment and use 10-30% of backup equipment for these purposes. It is proposed to use an algebraic linear code, in which, unlike well-known codes, the values of the check bits correspond to the direct and inverse values of the information bits, which makes it possible to detect errors when reading information from the inverse outputs of the memory, correct single errors, detect double errors and control the logical inversion operation , necessary to represent a negative number in two's complement code, which makes it possible to adapt the code to control arithmetic and logical operations of the computer processor. An assessment was made of the probability of failure-free operation of a duplicated computer, with its general redundancy, with detection and correction of single errors in the backup memory channels and detection of errors in the backup channels of the processor ALU based on the proposed code, and an assessment of the probability of failure-free operation of the computer, with its separate redundancy, with error detection in the backup channels of the duplicated memory based on the Hamming code and error correction in the backup channels of the processor ALU based on the majority method. A comparative assessment of the probabilities of failure-free operation allows us to conclude that the general redundancy of the computer based on the proposed code, in comparison with the separate redundancy of the memory and ALU of the computer processor, allows for a gain in the probability of failure-free operation of the computer and its functional devices throughout the entire period of operation. The practical significance lies in the fact that the proposed scientific and methodological apparatus provides: reduction of hardware costs for identifying a faulty channel when organizing duplication; detection of errors in the memory and ALU of the computer processor with time costs not exceeding the time of error detection when using standard methods of monitoring the computer system.
Read full abstract