Unmanned aerial vehicles (UAVs) are expected to be an integral part of wireless networks, and determining collision-free trajectories in multi-UAV non-cooperative scenarios is a challenging task. In this paper, we consider a path planning optimization problem to maximize the collected data from multiple Internet of Things (IoT) nodes under realistic constraints. The considered multi-UAV non-cooperative scenarios involve random number of other UAVs in addition to the typical UAV, and UAVs do not communicate with each other. We translate the problem into an Markov decision process (MDP). Dueling double deep Q-network (D3QN) is proposed to learn the decision making policy for the typical UAV, without any prior knowledge of the environment (e.g., channel propagation model and locations of the obstacles) and other UAVs (e.g., their missions, movements, and policies). Numerical results demonstrate that real-time navigation can be efficiently performed with high success rate, high data collection rate, and low collision rate.