In this paper, we propose regular vine copula based fusion of multiple deep neural network classifiers for the problem of multi-sensor based human activity recognition. We take the cross-modal dependence into account by employing regular vine copulas that are extremely flexible and powerful graphical models to characterize complex dependence among multiple modalities. Multiple deep neural networks are used to extract high-level features from multi-sensing modalities, with each deep neural network processing the data collected from a single sensor. The extracted high-level features are then combined using a regular vine copula model. Numerical experiments are conducted to demonstrate the effectiveness of our approach.