Detecting and tracking moving objects are important and challenging problems which have attracted much attention from the research community. However, in most cases, it is not enough to only track the objects. The goal should be to detect the occurrences of events of interest, which is important for applications such as video surveillance, video browsing and indexing. Yet, event detection introduces the challenge of providing the flexibility to specify customized events with varying complexity, and entering them to a system in a generic way. The event definitions should not be pre-defined and hard-coded. We introduce a spatio-temporal event detection system which lets the users to specify multiple composite events of high-complexity, and then detects their occurrence automatically. Events can be defined on a single camera view or across multiple camera views. Semantically higher level event scenarios can be built by using the building blocks, which we call the primitive events, and combining them by operators. More importantly, the newly defined composite events can be combined with each other. This layered structure makes the definition of events with higher and higher complexity possible. The event definitions are written to an XML file, which is then parsed and communicated to the tracking engines running on the videos of the corresponding cameras. With the proposed system, we have reached the next level and managed to go from detecting "a person exiting the building" to detecting "a person coming from the south corridor of the building and then exiting the building ".