Understanding the movement patterns of humans and vehicles traveling in a city is important for many applications like emergency evacuation and rescue, as well as city planning and management. In this paper, we aim to predict citywide crowd flows within a period in the future to give aid to urban management, through modeling spatiotemporal patterns of recent crowd flows. We present a novel deep model for this task, called "AttConvLSTM", which leverages a convolutional LSTM (ConvLSTM), Convolutional Neural Networks (CNNs) along with an attention mechanism, where ConvLSTM keeps spatial information as intact as possible during sequential analysis, and the attention mechanism can focus important crowd flow variations which cannot be identified by the recurrent module. We conducted extensive experiments for performance evaluation using three large datasets, including Beijing Taxi dataset, Rome Taxi dataset, and Chengdu Didi chauffeuring trace. The experimental results show that AttConvLSTM significantly outperforms several widely-used baselines in terms of Root Mean Squared Error (RMSE), and Mean Average Percentage Error (MAPE), indicating that our approach can deal with crowd flows with different dynamics in both spatial and temporal domains, and make valid predictions several steps ahead.