Superpixel representation. Specifically, we .

Superpixel representation. Specifically, the Fisher score derived from a generative model is introduced for superpixel representation, which is obtained via the multi-instance factor analysis (MIFA) [31]. The core module consists of two attention blocks, pixel-to-superpixel cross-attention and superpixel-to-pixel cross-attention, which alternately update the superpixel and pixel features. Its robustness to image distortions, such as rotation and occlusion, makes it robust compared to traditional pixel or patch methods. Finally, we elaborate on the integration To fully exploit the semantic information within and across the generalized su-perpixels, we propose a novel superpixel-informed implicit neural representation with elaborately designed modules, i. Unlike the traditional Vision Transformer, which uniformly partitions images into non-overlapping patches of fixed size, our superpixel approach divides an image into distinct, irregular regions, each designed to cluster pixels based on shared semantics for better While Convolutional Neural Networks and Vision Transformers are the go-to solutions for image classification, their model sizes make them expensive to train and deploy. Recently, visual semantic representation has achieved fine-gra… Mar 1, 2024 · Deep neural networks combined with superpixel segmentation have proven to be superior to high-resolution remote sensing image (HRI) classification. To leverage semantic priors from the data, we propose a novel Superpixel-informed INR (S-INR). This makes it hard to embed the superpixel representation into neural network architectures [29] as this association is not differentiable for back propagation. Among these, robust principal component analysis (RPCA) gains considerable attention in HAD, as it separates the matrix into global low-rank (LR) and sparse components, corresponding to the property of background and anomaly. This paper proposes a discriminative sub-dictionary learning algorithm and an adaptive multiscale superpixel classification strategy under sparse representation framework for hyperspectral image classification. 02931)] MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation Then, to fully exploit the semantic information within and across generalized superpixels, we proposed a novel superpixel-informed implicit neural representation (termed as S-INR), which includes two elaborately designed modules, namely the exclusive attention-based MLPs and a shared dic-tionary matrix. Oct 20, 2024 · Then, to fully exploit the semantic information within and across generalized superpixels, we proposed a novel superpixel-informed implicit neural representation (termed as S-INR), which includes two elaborately designed modules, namely the exclusive attention-based MLPs and a shared dictionary matrix. Currently, most HRI classification methods that Abstract: Recently, implicit neural representations (INRs) have attracted increasing attention for multi-dimensional data recovery. Alternatively, input complexity can be reduced following the intuition that adjacent similar pixels contain redundant information. This repository contains a Python implementation of S-INR (Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data), with the specific code located in the master branch. . Oct 7, 2020 · For multiscale sparse representation, the shape of the regions would not adaptively change according to context structure. May 1, 2024 · The proposed method also considers both the local and global information during the extraction of superpixel representation. Jan 5, 2024 · In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Recent works, such as superpixel sampling networks (SSN) [24], resolve this issue by turning the hard Jan 28, 2025 · The key to integrating visual language tasks is to establish a good alignment strategy. Each cell in this grid stores descriptors derived from an entire superpixel, including both appearance and shape-based features. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details Jan 5, 2024 · Abstract In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Oct 2, 2023 · Classical superpixel algorithms, such as SLIC [1], however, rely on hard associations between each image pixel and superpixel. Enhancing Vision Transformer with Superpixel Representation Pixel Representation Typically have high-resolution Need a local sliding window approach for efficient processing Intractable for global self-attention, due to the quadratic complexity Jan 5, 2024 · The superpixel representation uniquely conserves boundary information, enabling the maintenance of high-resolution features crucial for detailed tasks. Specifically, we Oct 9, 2025 · We introduce Sigrid (Superpixel-Integrated Grid) a structured image representation that leverages superpixels’ perceptually coherent regions to construct a compact and meaningful input grid. 3 Method tes superpixel-based feature representation with a SCA mechanism. Jan 5, 2024 · Join the discussion on this paper pageSPFormer: Enhancing Vision Transformer with Superpixel Representation May 1, 2019 · Therefore, our idea is to build a scale-adjustable feature to encode the geometry and texture information from a thicker stripe near the fragment boundary. Recently, numerous hyperspectral anomaly detection (HAD) methods have been proposed for broad and crucial applications. ABSTRACT In this study, we present a novel approach to enhance Vision Transformers by leveraging superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. Addressing the limitations of traditional Vision Transformers’ fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image’s content. We introduce Adaptive Superpixel Coding (ASC), a novel transformer-compatible layer that integrates an adaptive superpixel mechanism, enabling the decoupling of the image’s grid-based structure from its representation structure. Sep 18, 2025 · SPFormer: Enhancing Vision Transformer with Superpixel Representation Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie Transactions on Machine Learning Research (TMLR), December 2024. However, INRs simply map coordinates via a multi-layer perceptron (MLP) to corresponding values, ignoring the inherent semantic information of the data. [preprint (arxiv: 2401. e. which features of a superpixel are relevant to the image classification. We integrate the advantages from both Superpixel and Bag-of-Features (BOF) representations, and design a Bundle-of-Superpixel (BOSP) representation. Superpixel representation enables localization of single and multiple actions within frames, and a sparse spatio-temporal video representation graph is constructed with superpixels as nodes, facilitating weakly supervised action labeling using random walks. This approach efectively addresses the limitations of traditional pixel and patch-based methods y enhancing b ding local details with global contextual information eficiently. Aug 1, 2017 · Compared to the traditional pixel representation of the image, the superpixel representation greatly reduces the number of, image primitives and improves the representative efficiency. However, RPCA solves the problem by treating Oct 7, 2020 · For multiscale sparse representation, the shape of the regions would not adaptively change according to context structure. This prior can be exploited by clustering such pixels into superpixels and connecting adjacent This flexibility presents us with a choice regard-ing the information about a superpixel we wish to include in its graph representation, i. , exclusive attention-based MLPs and a shared dictionary matrix. This approach divides the image into irregular, semantically coherent regions, effectively capturing Sep 23, 2023 · The paper proposes SPFormer, which is a vision transformer using a superpixel representation instead of a patch-based representation. Jan 28, 2025 · The visual module consists of four stages, superpixel-level representation, multiscale superpixel graph construction, difference graph convolution, and multi-level fusion strategy. eaq4t yv9tat kt gu5ew uizp pqhyi6 69 fef g9 xocwq