Approximation and estimation capability of vision transformers for hierarchical compositional models
Approximation and estimation capability of vision transformers for hierarchical compositional models
- Book Chapter
2
- 10.1007/978-3-642-38886-6_42
- Jan 1, 2013
In recent years, hierarchical compositional models have been shown to possess many appealing properties for the object class detection such as coping with potentially large number of object categories. The reason is that they encode categories by hierarchical vocabularies of parts which are shared among the categories. On the downside, the sharing and purely reconstructive nature causes problems when categorizing visually-similar categories and separating them from the background. In this paper we propose a novel approach that preserves the appealing properties of the generative hierarchical models, while at the same time improves their discrimination properties. We achieve this by introducing a network of discriminative nodes on top of the existing generative hierarchy. The discriminative nodes are sparse linear combinations of activated generative parts. We show in the experiments that the discriminative nodes consistently improve a state-of-the-art hierarchical compositional model. Results show that our approach considers only a fraction of all nodes in the vocabulary (less than 10%) which also makes the system computationally efficient.
- Book Chapter
- 10.1007/978-3-642-41181-6_54
- Jan 1, 2013
Our goal is to identify hierarchical compositional models from highly cluttered data. The data to learn from are assumed to be imperfect in two respects. Firstly, large portion of the data is coming from background clutter. Secondly, data generated by a recursive compositional model are subject to random replacements of correct descendants by randomly chosen ones at every level of the hierarchy. In this paper, we study the limits and capabilities of an approach which is based on likelihood maximization. The algorithm makes explicit probabilistic assignments of individual data to compositional model and background clutter. It uses these assignments to effectively focus on the data coming from the compositional model and iteratively estimate their compositional structure.
- Research Article
- 10.1016/j.ijmedinf.2024.105646
- Oct 5, 2024
- International Journal of Medical Informatics
BackgroundLarge-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness. ObjectiveThis paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2). MethodsThe syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism. ResultsThe open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices. ConclusionThe construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.
- Conference Article
8
- 10.1109/cvpr42600.2020.01430
- Jun 1, 2020
This paper proposes to learn hierarchical compositional AND-OR model for interpretable image synthesis by sparsifying the generator network. The proposed method adopts the scene-objects-parts-subparts-primitives hierarchy in image representation. A scene has different types (i.e., OR) each of which consists of a number of objects (i.e., AND). This can be recursively formulated across the scene-objects-parts-subparts hierarchy and is terminated at the primitive level (e.g., wavelets-like basis). To realize this AND-OR hierarchy in image synthesis, we learn a generator network that consists of the following two components: (i) Each layer of the hierarchy is represented by an over-complete set of convolutional basis functions. Off-the-shelf convolutional neural architectures are exploited to implement the hierarchy. (ii) Sparsity-inducing constraints are introduced in end-to-end training, which induces a sparsely activated and sparsely connected AND-OR model from the initially densely connected generator network. A straightforward sparsity-inducing constraint is utilized, that is to only allow the top-$k$ basis functions to be activated at each layer (where $k$ is a hyper-parameter). The learned basis functions are also capable of image reconstruction to explain the input images. In experiments, the proposed method is tested on four benchmark datasets. The results show that meaningful and interpretable hierarchical representations are learned with better qualities of image synthesis and reconstruction obtained than baselines.
- Research Article
4
- 10.1016/j.cviu.2015.04.006
- Jul 10, 2015
- Computer Vision and Image Understanding
Adding discriminative power to a generative hierarchical compositional model using histograms of compositions
- Conference Article
16
- 10.1109/icpr.2016.7900171
- Dec 1, 2016
Hierarchical feature learning based on convolutional neural networks (CNN) has recently shown significant potential in various computer vision tasks. While allowing high-quality discriminative feature learning, the downside of CNNs is the lack of explicit structure in features, which often leads to overfitting, absence of reconstruction from partial observations and limited generative abilities. Explicit structure is inherent in hierarchical compositional models, however, these lack the ability to optimize a well-defined cost function. We propose a novel analytic model of a basic unit in a layered hierarchical model with both explicit compositional structure and a well-defined discriminative cost function. Our experiments on two datasets show that the proposed compositional model performs on a par with standard CNNs on discriminative tasks, while, due to explicit modeling of the structure in the feature units, affording a straight-forward visualization of parts and faster inference due to separability of the units.
- Research Article
10
- 10.1016/j.ajog.2022.05.026
- May 14, 2022
- American Journal of Obstetrics and Gynecology
Predictors of same-day discharge following benign minimally invasive hysterectomy
- Research Article
2
- 10.1016/j.patcog.2023.109397
- Feb 9, 2023
- Pattern Recognition
Human-centered deep compositional model for handling occlusions
- Book Chapter
98
- 10.1007/978-3-540-88688-4_56
- Jan 1, 2008
We describe a new method for unsupervised structure learning of a hierarchical compositional model (HCM) for deformable objects. The learning is unsupervised in the sense that we are given a training dataset of images containing the object in cluttered backgrounds but we do not know the position or boundary of the object. The structure learning is performed by a bottom-up and top-down process. The bottom-up process is a novel form of hierarchical clustering which recursively composes proposals for simple structures to generate proposals for more complex structures. We combine standard clustering with the suspicious coincidence principle and the competitive exclusion principle to prune the number of proposals to a practical number and avoid an exponential explosion of possible structures. The hierarchical clustering stops automatically, when it fails to generate new proposals, and outputs a proposal for the object model. The top-down process validates the proposals and fills in missing elements. We tested our approach by using it to learn a hierarchical compositional model for parsing and segmenting horses on Weizmann dataset. We show that the resulting model is comparable with (or better than) alternative methods. The versatility of our approach is demonstrated by learning models for other objects (e.g., faces, pianos, butterflies, monitors, etc.). It is worth noting that the low-levels of the object hierarchies automatically learn generic image features while the higher levels learn object specific features.KeywordsLeaf NodeChild NodeUnsupervised LearningCompetitive ExclusionVertical EdgeThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Conference Article
79
- 10.1109/cvpr.2014.109
- Jun 1, 2014
This paper proposes a framework for recognizing complex human activities in videos. Our method describes human activities in a hierarchical discriminative model that operates at three semantic levels. At the lower level, body poses are encoded in a representative but discriminative pose dictionary. At the intermediate level, encoded poses span a space where simple human actions are composed. At the highest level, our model captures temporal and spatial compositions of actions into complex human activities. Our human activity classifier simultaneously models which body parts are relevant to the action of interest as well as their appearance and composition using a discriminative approach. By formulating model learning in a max-margin framework, our approach achieves powerful multi-class discrimination while providing useful annotations at the intermediate semantic level. We show how our hierarchical compositional model provides natural handling of occlusions. To evaluate the effectiveness of our proposed framework, we introduce a new dataset of composed human activities. We provide empirical evidence that our method achieves state-of-the-art activity classification performance on several benchmark datasets.
- Research Article
48
- 10.1016/j.imavis.2016.11.004
- Nov 27, 2016
- Image and Vision Computing
Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos
- Conference Article
2
- 10.1109/cvpr.2019.01188
- Jun 1, 2019
In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. Existing approaches to this problem are limited by making strong a-priori assumptions about the object’s geometric structure and require seg- mented training data for learning. In this paper, we propose a novel framework for learning hierarchical compositional models (HCMs) which do not suffer from the mentioned limitations. We present a generalized formulation of HCMs and describe a greedy structure learning framework that consists of two phases: Bottom-up part learning and top-down model composition. Our framework integrates the foreground-background segmentation problem into the structure learning task via a background model. As a result, we can jointly optimize for the number of layers in the hierarchy, the number of parts per layer and a foreground- background segmentation based on class labels only. We show that the learned HCMs are semantically meaningful and achieve competitive results when compared to other generative object models at object classification on a standard transfer learning dataset.
- Conference Article
4
- 10.1109/cvprw.2010.5543894
- Jun 1, 2010
Door detection by using wearable cameras helps people with severe vision impairment to independently access unknown environments. The goal of this paper is to robustly detect different doors and classify them as office doors, elevators, exits, etc. These tasks are challenging due to the factors: 1) small inter-class variations of different objects such as office doors and elevators, 2) only part of an object is captured due to occlusions or continuous camera moving of a mobile system. To overcome the above challenges, we propose a Hierarchical Compositional Model (HCM) approach which incorporates context information into the model decomposition process of a part-based HCM to handle partially captured objects as well as large intra-class variations in different environments. Our preliminary experimental results demonstrate promising performance on doors detection over a wide range of scales, view points, and occlusions.
- Conference Article
1
- 10.1109/cvprw.2009.5204336
- Jun 1, 2009
Summary form only given: In this work we consider the problem of object parsing, namely detecting an object and its components by composing them from image observations. We build to address the computational complexity of the inference problem. For this we exploit our hierarchical object representation to efficiently compute a coarse solution to the problem, which we then use to guide search at a finer level. Starting from our adaptation of the A* parsing algorithm to the problem of object parsing, we then propose a coarse-to-fine approach that is capable of detecting multiple objects simultaneously. We extend this work to automatically learn a hierarchical model for a category from a set of training images for which only the bounding box is available. Our approach consists in (a) automatically registering a set of training images and constructing an object template (b) recovering object contours (c) finding object parts based on contour affinities and (d) discriminatively learning a parsing cost function.
- Research Article
3
- 10.1080/03610926.2012.755199
- Mar 4, 2015
- Communications in Statistics - Theory and Methods
In biological, medical, and social sciences, multilevel structures are very common. Hierarchical models that take the dependencies among subjects within the same level are necessary. In this article, we introduce a semiparametric hierarchical composite quantile regression model for hierarchical data. This model (i) keeps the easy interpretability of the simple parametric model; (ii) retains some of the flexibility of the complex non parametric model; (iii) relaxes the assumptions that the noise variances and higher-order moments exist and are finite; and (iv) takes the dependencies among subjects within the same hierarchy into consideration. We establish the asymptotic properties of the proposed estimators. Our simulation results show that the proposed method is more efficient than the least-squares-based method for many non normally distributed errors. We illustrate our methodology with a real biometric data set.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.