The problem of learning queries from tree structured data is studied by this paper. A tree structured data is modeled as a node-labeled tree $$T$$ T , and applying a query $$q$$ q on $$T$$ T will return a set $$q(T)$$ q ( T ) which is a subset of nodes in $$T$$ T . For a tree-node pair $$(T,t)$$ ( T , t ) where $$t$$ t is a node in $$T$$ T , $$q$$ q is called to accept the pair if $$t\in {q(T)}$$ t ? q ( T ) , and reject the pair if $$t\notin {q(T)}$$ t ? q ( T ) . For some query class $$\mathcal{L }$$ L , given tree-node pair sets $$E_p$$ E p and $$E_n$$ E n , the tree query learning problem is to find a query $$q\in \mathcal{L }$$ q ? L such that (1) $$q$$ q rejects all pairs in $$E_n$$ E n , and (2) the size of pairs in $$E_p$$ E p accepted by $$q$$ q is maximized. On four different query classes $$\mathcal Q ^{\tiny /}$$ Q / , $$\mathcal Q ^{\tiny /,*}$$ Q / , ? , $$\mathcal Q ^{\tiny /,//}$$ Q / , / / and $$\mathcal Q ^{\tiny /,[]}$$ Q / , [ ] , this paper studies the hardness of the corresponding tree query learning problems. For $$\mathcal Q ^{\tiny /}$$ Q / , a PTime algorithm is given. For $$\mathcal Q ^{\tiny /,*}$$ Q / , ? and $$\mathcal Q ^{\tiny /,//}$$ Q / , / / , the NP-complete results are shown. For $$\mathcal Q ^{\tiny /,[]}$$ Q / , [ ] , the problem is shown to be NP-hard by considering two constrained fragments of $$\mathcal Q ^{\tiny /,[]}$$ Q / , [ ] . Also, for $$\mathcal Q ^{\tiny /,*}$$ Q / , ? , $$\mathcal Q ^{\tiny /,[]}$$ Q / , [ ] and $$\mathcal Q ^{\tiny /,//}$$ Q / , / / , it is shown that there are no $$n^{1-\epsilon }$$ n 1 ? ∈ -approximation algorithms for any $$\epsilon >0$$ ∈ > 0 .
Read full abstract