A conflict probe is an air traffic management tool that can predict conflicts well in advance, using information on aircraft position, speed and plans along with forecasts of wind and temperature profiles. Strategic (early) resolution of conflicts has the potential to reduce the cost associated with conflict resolution maneuvers. A conflict probe would be especially useful in a free flight environment, which is expected to have a less structured traffic flow compared to the current operating environment. This paper presents a comprehensive methodology to quantitativel y evaluate the performance of a conflict probe, using real traffic data and expanded separation criteria. The methodology is generic in nature, and can be applied to any conflict probe. It can therefore provide a framework for a comparative study of conflict probes. Several metrics of conflict probe performance have been developed and evaluated. The missed alert rate and false alert rate are primary metrics that quantify the reliability of a conflict probe; the mean conflict warning time and errors in key conflict prediction parameters such as minimum horizontal and vertical separations are important secondary metrics that quantify the accuracy of a conflict probe. Work is underway to validate the evaluation methodology developed in this work by applying it to the Center-TRACON Automation System's Conflict Probe Tool, using real traffic data from the Denver Air Route Traffic Control Center; some preliminary results are presented as an example.