Proper Superset Research Articles

Sample compression schemes are schemes for “encoding” a set of examples in a small subset of examples. The long-standing open sample compression conjecture states that, for any concept class C of VC-dimension d, there is a sample compression scheme in which samples for concepts in C are compressed to samples of size at most d.We show that every order over C induces a special type of sample compression scheme for C, which we call order compression scheme. It turns out that order compression schemes can compress to samples of size at most d if C is maximum, intersection-closed, a Dudley class, or of VC-dimension 1 – and thus in most cases for which the sample compression conjecture is known to be true.Since order compression schemes are much simpler than sample compression schemes in general, their study seems to be a promising step towards resolving the sample compression conjecture. We reveal a number of fundamental properties of order compression schemes, which are helpful in such a study. In particular, order compression schemes exhibit interesting graph-theoretic properties as well as connections to the theory of learning from teachers.To obtain small compressed sets, order compression schemes for a concept class C must often use a proper superset H⊃C as a hypothesis space. We thus further compare order compression schemes for C to order compression schemes for such hypothesis spaces, leading to a study of a number of mutually related combinatorial parameters specifying compressibility.

Read full abstract

Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different labs or during various biological processes may overcome the heavy noise in the data. Moreover, by joint mining of gene expression data and protein-protein interaction data, we may discover clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this article, we investigate a novel data mining problem, mining frequent cross-graph quasi-cliques , which is generalized from several interesting applications in bioinformatics, cross-market customer segmentation, social network analysis, and Web mining. In a graph, a set of vertices S is a γ-quasi-clique (0 < γ ≤ 1) if each vertex v in S directly connects to at least γ ⋅ (| S | − 1) other vertices in S . Given a set of graphs G 1 , …, G n and parameter min_sup (0 < min_sup ≤ 1), a set of vertices S is a frequent cross-graph quasi-clique if S is a γ-quasi-clique in at least min_sup ⋅ n graphs, and there does not exist a proper superset of S having the property. We build a general model, show why the complete set of frequent cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop practical algorithms which exploit several interesting and effective techniques and heuristics to efficaciously mine frequent cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful frequent cross-graph quasi-cliques in bioinformatics. The experimental results also show that our algorithms are efficient and scalable.

Read full abstract

Proper Superset Research Articles

Articles published on Proper Superset

DEDEKIND-FINITE CARDINALS HAVING COUNTABLE PARTITIONS

Improved Bounds for Induced Poset Saturation

Analog neuron hierarchy

Unary Watson-Crick automata

Reconstructing One-Articulated Networks with Distance Matrices.

The saturation number of induced subposets of the Boolean lattice

Acyclicity in edge-colored graphs

Practical Algorithms for Finding Extremal Sets

Order compression schemes

Self-Stabilizing Algorithms for Maximal 2-packing and General k-packing (k ≥ 2) with Safe Convergence in an Arbitrary Graph

ON A SUPERCLASS OF A-GRAMMARS

Reduced Convex Bodies in Finite Dimensional Normed Spaces: A Survey

Locality for quantum systems on graphs depends on the number field

A Polynomial-Time Algorithm for Estimating the Partition Function of the Ferromagnetic Ising Model on a Regular Matroid

The Parallel versus Branching Recurrences in Computability Logic

The countable versus uncountable branching recurrences in computability logic

Mining frequent cross-graph quasi-cliques

Hadwiger's conjecture for quasi‐line graphs

Balanced parentheses strike back

WHEN CHURCH-ROSSER BECOMES CONTEXT FREE

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Proper Superset Research Articles

Articles published on Proper Superset

DEDEKIND-FINITE CARDINALS HAVING COUNTABLE PARTITIONS

Improved Bounds for Induced Poset Saturation

Analog neuron hierarchy

Unary Watson-Crick automata

Reconstructing One-Articulated Networks with Distance Matrices.

The saturation number of induced subposets of the Boolean lattice

Acyclicity in edge-colored graphs

Practical Algorithms for Finding Extremal Sets

Order compression schemes

Self-Stabilizing Algorithms for Maximal 2-packing and General k-packing (k ≥ 2) with Safe Convergence in an Arbitrary Graph

ON A SUPERCLASS OF A-GRAMMARS

Reduced Convex Bodies in Finite Dimensional Normed Spaces: A Survey

Locality for quantum systems on graphs depends on the number field

A Polynomial-Time Algorithm for Estimating the Partition Function of the Ferromagnetic Ising Model on a Regular Matroid

The Parallel versus Branching Recurrences in Computability Logic

The countable versus uncountable branching recurrences in computability logic

Mining frequent cross-graph quasi-cliques

Hadwiger's conjecture for quasi‐line graphs

Balanced parentheses strike back

WHEN CHURCH-ROSSER BECOMES CONTEXT FREE