Abstract:The remainder control is crucial to the development and manufacturing of aerospace products, and the remainders state recognition is an important part of it. The key which is to effectively extract local features in high noise pictures. However, existing methods have not been modeled well specifically for remainder scenes, and generic vision models are prone to overfitting the noise, making it difficult to filter the noisy signals effectively. To solve this problem, this paper proposes a learnable Filter Network, which replaces the heavy self-attention mechanism by a learnable filter which is used to learn spatial location interaction information. And then incorporates a mask for frequency domain component feature extraction to learn the emphasis information of different frequency bands. It is experimentally demonstrated that this method works better in remainder recognition scenarios, outperforms the convolution and self-attention models, and has better time complexity.