Autonomous trajectory generation in a complex environment is a challenging task for multi-rotor unmanned aerial vehicles (UAVs), which have high maneuverability in three-dimensional motion. Safe and effective operations for these UAVs demand obstacle avoidance strategies and advanced trajectory planning and control schemes for stability and energy efficiency. To solve those problems in one framework analytically is extremely challenging when the UAV needs to fly large distance in a complex environment. To address this challenge, a two-level optimization strategy is adopted. At the higher-level, a sequence of waypoints is selected that lead the UAV from its current position to the destination. At the lower-level, an optimal trajectory is generated between each pair of adjacent waypoints analytically. While the goal of trajectory generation is to maintain the stability of the UAV, the goal of the waypoints planning is to select waypoints with the lowest control thrust consumption throughout the entire trip while avoiding collisions with obstacles. The entire framework is implemented using deep reinforcement learning, which learns the highly complicated and non-linear interaction between those two levels, and the impact from the environment. A progressive learning strategy is investigated that not only reduces convergence time but also improves result quality. We further investigate and provide results regarding the tuning of gains in the optimal trajectory scheme using genetic algorithm. The experimental results demonstrate that our proposed approach is able to generate a list of obstacle-free waypoints with minimum control energy and develop an optimal trajectory with optimized platform velocity, acceleration, jerk and control thrust.