The field of Distributed Constraint Optimization (DCOP) has gained momentum in recent years, thanks to its ability to address various applications related to multi-agent coordination. Nevertheless, solving DCOPs is computationally challenging. Thus, in large scale, complex applications, incomplete DCOP algorithms are necessary. Recently, researchers have introduced a promising class of incomplete DCOP algorithms, based on sampling. However, this paradigm requires a multitude of samples to ensure convergence. This paper exploits the property that sampling is amenable to parallelization, and introduces a general framework, called Distributed MCMC (DMCMC), that is based on a dynamic programming procedure and uses Markov Chain Monte Carlo (MCMC) sampling algorithms to solve DCOPs. Additionally, DMCMC harnesses the parallel computing power of Graphical Processing Units (GPUs) to speed-up the sampling process. The experimental results show that DMCMC can find good solutions up to two order of magnitude faster than other incomplete DCOP algorithms.