In a number of recent editorials I have addressed the problem of (reliability) modeling. Developing adequate, usable, models is a process that depends on two conflicting parameters: the (dynamics of) the influx of new technology and the time required to develop adequate models of this technology. The first aspect is strongly related to the rate at which technology innovates, while the second is currently strongly related to the time that is required to obtain sufficient data to form the (often statistical basis) of new models. This last process in particular has, over the recent years, become increasingly complicated. Finding a statistical basis to form models of products that are predominantly hardware based is already becoming more and more complex due to the increasing complexity of the hardware; developing models for products that are predominantly software based is even more complicated, as the (increasingly) large state space is further complicating things. I have not yet seen a reliability model of a computer as a function of the use of MSWord‥‥. It could very well be, however, that the problem created by these increasingly complex systems also carries the solution. The computing power of many modern devices, as well as their connectivity, is such that it is at this moment quite possible that a product continuously observes the interaction with the user and, in case of a problem, not only sends an error code but also a very detailed history that may contain valuable clues that can lead to new knowledge on what happened in the field relating or leading to a field failure. On an experimental level we are currently working here at Eindhoven University on so-called Self Observing Systems that, more or less in real time, report detailed use cases/user problems/field failures. Without revealing any confidential data on this project I can share with you that the data obtained are unusually detailed, rich and allow very fast model development. We are not the only ones following this approach; people who have recently used a Microsoft product and have encountered a problem with it have certainly come across a screen requesting them to allow the sending of a detailed error report/trace back to Microsoft. I expect that these data will be very valuable for a company like Microsoft. Using this approach it is certainly possible to create very rich data sets that contain a lot of information that can help us to create better and more accurate reliability models even for very complex systems as a function of wide ranges of product use. From the research perspective it creates, however, a new problem. Whereas in the past reliability engineers have been struggling to get meaningful models from very sparse and information-poor data sets, they now have to bring the structure to extremely large and information-rich data sets. So far I have only seen a very few of such models. Another issue here is the legal perspective. The data obtained can be so rich that they contain information that an end-user may not want to share. In this context I can imagine very well why a company like Microsoft asks for explicit approval from an end-user before any failure report is sent back to them. Similarly, telephone companies have the ability to track the whereabouts of every individual GSM user by tracing the signals of their phone. How to balance between obtaining, also for end-users, valuable models that create opportunities for improving the reliability of their products and services and, on the other hand, ensuring their privacy is a matter still unresolved. Therefore, I warmly appreciate your reactions!
Read full abstract