Automatic counting of people, entering or exiting a region of interest, is very important for both business and security applications. This paper introduces an automatic and robust people counting system which can count multiple people who interact in the region of interest, by using only one camera. Two-level hierarchical tracking is employed. For cases not involving merges or splits, a fast blob tracking method is used. In order to deal with interactions among people in a more thorough and reliable way, the system uses the mean shift tracking algorithm. Using the first-level blob tracker in general, and employing the mean shift tracking only in the case of merges and splits saves power and makes the system computationally efficient. The system setup parameter can be automatically learned in a new environment from a 3 to 5 minute-video with people going in or out of the target region one at a time. With a 2GHz Pentium machine, the system runs at about 33fps on 320×240 images without code optimization. Average accuracy rates of 98.5% and 95% are achieved on videos with normal traffic flow and videos with many cases of merges and splits, respectively.