Thermography technology is widely used to inspect thermal anomalies in building façade systems. Computer vision-based techniques provide opportunities to autonomously detect such heat anomalies to significantly improve the efficiency of decision-making for building envelope retrofitting and maintenance. However, traditional performance metrics for evaluation of image segmentation-based anomaly identification methods do not accurately reflect the true performance of the segmentation models. One of the major problems is that labelling suffers from high subjectivity in this task and traditional performance metrics do not account for that. Also, traditional metrics are more skewed towards lower scores due to high sensitivity to overlap ratio. In this work, a novel performance metric, which is robust to the above-mentioned drawbacks, is presented. Experimental results show both qualitatively and quantitatively that the scores that our metric generates better align with the scores provided by building performance experts.