It has been well established that reverse-carpooling based network coding can significantly improve the efficiency of multi-hop wireless networks. However, in a stochastic environment when there are no opportunities to code because of packets without coding pairs, should these packets wait for a future opportunity or should they be transmitted without coding? To help answer that question we formulate a stochastic dynamic program with the objective of minimizing the long-run average cost per unit time incurred due to transmissions and delays. In particular, we develop optimal control actions that would balance between costs of transmission against those of delays. In that process we seek to address a crucial question: what should be observed as the state of the system? We analytically show that just the queue lengths is enough if it can be modeled as a Markov process. Subsequently we show that a stationary policy based on queue lengths is optimal and describe a procedure to find such a policy. We further substantiate our results with simulation experiments for more generalized settings.