Energy detection has been used almost exclusively in spectrum sensing. This paper studies the potential optimality (or sub-optimality) of the energy detector in spectrum sensing for two systems: one employing a single node and the other using multiple nodes, i.e., cooperative spectrum sensing. We consider both Gaussian channels as well as fading channels with different signaling from the primary user. For a single node case, we show that the energy detector is provably optimal for most cases and for the case when it is not theoretically optimal, its performance is nearly indistinguishable from the true optimal detector. For cooperative spectrum sensing, however, the problem becomes extremely complicated. The presence of the common signal from the primary user introduces dependence among the observations at different nodes; it is well known that for decentralized detection with dependent observations, designing optimal local decision rules typically is an NP problem. Using a recently proposed framework for distributed detection with dependent observations, we establish the optimality of energy detector for several cooperative spectrum sensing systems and point out difficulties for the remaining cases.