TY - GEN
T1 - CSE
T2 - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
AU - Zhuo, Youwei
AU - Cheng, Jinglei
AU - Luo, Qinyi
AU - Zhai, Jidong
AU - Wang, Yanzhi
AU - Luan, Zhongzhi
AU - Qian, Xuehai
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/12
Y1 - 2018/12/12
N2 - Finite State Machine (FSM) is known to be embarrassingly sequential because the next state depends on the current state and input symbol. Enumerative FSM breaks the data dependencies by cutting the input symbols into segments and processing all segments in parallel. With unknown starting state (except the first segment), each segment needs to calculate the state transitions, i.e., state→state, for all states, each one is called an enumeration path. The current software and hardware implementations suffer from two drawbacks: 1. large amount of state→state computation for the enumeration paths; and 2. The optimizations are restricted by the need to correctly performing state→state and only achieve limited improvements. This paper proposes CSE, Convergence Set Enumeration based parallel FSM. Unlike prior approaches, CSE is based on a novel computation primitive set→set, which maps N states to M states without giving the specific state→state mappings (which state is mapped to which). The set→set has two key properties: 1. if M is equal to 1, i.e., all N states are mapped to the same state, the state→state for all the N states are computed; 2. using one-hot encoding, the hardware implementation cost of state→state is the same as set→set. The convergence property ensures that M is always less than N. The key idea of CSE is to partition the original all S states into n state sets i.e., convergence sets. Using set→set to process each CS if the states converge to a single state, then we have successfully computed the enumeration path for each state in CS; otherwise, we may need to re-execute the stage when the outcome of the previous stage falls in CS. CSE is realized by two techniques: 1. convergence set prediction, which generates the convergence sets with random input based profiling that maximizes the probability of each CS converging to one state; 2. global re-execution algorithm, which ensures the correctness by re-executing the non-converging stages with known input state. Essentially, CSE reformulates the enumeration paths as set-based rather than singleton-based. %Given a sequence of input symbols, a set FSM maps a state set of N states to another state set of M states without computing any enumeration path. We evaluate CSE with 13 benchmarks. It achieved on average 2.0/2.4x and maximum 8.6/2.7x speedup compared to Lookback Enumeration and Parallel Automata Processor, respectively.
AB - Finite State Machine (FSM) is known to be embarrassingly sequential because the next state depends on the current state and input symbol. Enumerative FSM breaks the data dependencies by cutting the input symbols into segments and processing all segments in parallel. With unknown starting state (except the first segment), each segment needs to calculate the state transitions, i.e., state→state, for all states, each one is called an enumeration path. The current software and hardware implementations suffer from two drawbacks: 1. large amount of state→state computation for the enumeration paths; and 2. The optimizations are restricted by the need to correctly performing state→state and only achieve limited improvements. This paper proposes CSE, Convergence Set Enumeration based parallel FSM. Unlike prior approaches, CSE is based on a novel computation primitive set→set, which maps N states to M states without giving the specific state→state mappings (which state is mapped to which). The set→set has two key properties: 1. if M is equal to 1, i.e., all N states are mapped to the same state, the state→state for all the N states are computed; 2. using one-hot encoding, the hardware implementation cost of state→state is the same as set→set. The convergence property ensures that M is always less than N. The key idea of CSE is to partition the original all S states into n state sets i.e., convergence sets. Using set→set to process each CS if the states converge to a single state, then we have successfully computed the enumeration path for each state in CS; otherwise, we may need to re-execute the stage when the outcome of the previous stage falls in CS. CSE is realized by two techniques: 1. convergence set prediction, which generates the convergence sets with random input based profiling that maximizes the probability of each CS converging to one state; 2. global re-execution algorithm, which ensures the correctness by re-executing the non-converging stages with known input state. Essentially, CSE reformulates the enumeration paths as set-based rather than singleton-based. %Given a sequence of input symbols, a set FSM maps a state set of N states to another state set of M states without computing any enumeration path. We evaluate CSE with 13 benchmarks. It achieved on average 2.0/2.4x and maximum 8.6/2.7x speedup compared to Lookback Enumeration and Parallel Automata Processor, respectively.
KW - Accelerator
KW - Finite State Machine
KW - Parallelism
KW - Speculation
UR - http://www.scopus.com/inward/record.url?scp=85060033866&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060033866&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2018.00012
DO - 10.1109/MICRO.2018.00012
M3 - Conference contribution
AN - SCOPUS:85060033866
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 29
EP - 41
BT - Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
PB - IEEE Computer Society
Y2 - 20 October 2018 through 24 October 2018
ER -