Abstract

Pattern matching or finding the occurrences of a pattern in a text arises frequently in many applications. The task of splitting the character stream or text into words is called tokenization. Search engines use tokenizers. The first phase of a compiler outputs a stream of tokens of the given high-level language program. The pattern rules are specified as regular expressions. Many tools have been developed in the past that generate the tokenizer automatically which are mostly sequential. The advent of multicore architectures has made it mandatory to use its features like multiple threads and SIMD instructions in generating software tools. This works attempts to parallelize tokenization. This is a simple prototype implementation of a parallelized lexical analyzer that recognizes the tokens of the given source code. Each Synergetic Processing Element (SPE) of the cell processor works on a block of source code and tokenizes them independently. The Power Processing Unit (PPE) is responsible for splitting the source code into a finite number of blocks to be used by the different processing elements. Each SPE sends the stream of identifiers to the PPE which maintains the symbol table. The parallel lexical analyzer developed runs on IBM Cell Processor simulator and the execution times are plotted varying the code size and the number of processing elements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.