Abstract
A novel k-winners-take-all (k-WTA) competitive learning (CL) hardware architecture is presented for on-chip learning in this paper. The architecture is based on an efficient pipeline allowing k-WTA competition processes associated with different training vectors to be performed concurrently. The pipeline architecture employs a novel codeword swapping scheme so that neurons failing the competition for a training vector are immediately available for the competitions for the subsequent training vectors. The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for realtime on-chip learning. Experimental results show that the SOPC has significantly lower training time than that of other k-WTA CL counterparts operating with or without hardware support.
Highlights
The k-winners-take-all operation is a generalization of the winner-take-all (WTA) operation.The kWTA operation performs a selection of the k competitors whose activations are larger than the remaining input signals
There are three different types of area cost considered in this experiment: number of logic elements (LEs), number of embedded memory bits, and the number of embedded multipliers
With the aid of codeword swapping scheme, the system throughput soared due to the ability to perform competitions associated with different training vectors in parallel
Summary
The k-winners-take-all (kWTA) operation is a generalization of the winner-take-all (WTA) operation. This paper presents a novel kWTA hardware architecture performing the concurrent winner detection operations over different input sets. The competition process can only be performed for one input training vector at a time These architectures may provide only moderate acceleration. When a training vector reaches the final stage of the pipeline, a hardware-based neuron updating process is activated. In addition to the high throughput and low area cost, the proposed architecture can move the best k matching vectors to an input training vector to the final k stages of the pipeline, because of the employment of the codeword swapping scheme. Experimental results show that the proposed architecture attains a high speedup over its software counterpart for the kWTA CL training It has a lower latency over existing hardware architectures. Our design is an effective alternative for the applications where realtime kWTA operations and/or CL training are desired
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.