Computer Vision Papers
Paper with Code computer vision papers section (opens in a new tab): Provides access to research papers along with the corresponding code.
Semantic Segmentation
-
U-Net: Convolutional Networks for Biomedical Image Segmentation (opens in a new tab) (2015)
-
Deep Residual Learning for Image Recognition (opens in a new tab) (2015)
-
MobileNetV2: Inverted Residuals and Linear Bottlenecks (opens in a new tab) (2019)
-
MMDetection: Open MMLab Detection Toolbox and Benchmark (opens in a new tab) (2018)
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (opens in a new tab) (2020)
-
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (opens in a new tab) (2016)
-
FCOS: Fully Convolutional One-Stage Object Detection (opens in a new tab) (2019)
Image Classification
-
Deep Residual Learning for Image Recognition (opens in a new tab) (2015)
-
Very Deep Convolutional Networks for Large-Scale Image Recognition (opens in a new tab) (2014)
-
MobileNetV2: Inverted Residuals and Linear Bottlenecks (opens in a new tab) (2019)
-
Densely Connected Convolutional Networks (opens in a new tab) (2016)
-
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (opens in a new tab) (2019)
-
CSPNet: A New Backbone that can Enhance Learning Capability of CNN (opens in a new tab) (2019)
-
Rethinking the Inception Architecture for Computer Vision (opens in a new tab) (2015)
Contrastive Learning
-
A Simple Framework for Contrastive Learning of Visual Representations (opens in a new tab) (2020)
-
Momentum Contrast for Unsupervised Visual Representation Learning (opens in a new tab) (2020)
-
Improved Baselines with Momentum Contrastive Learning (opens in a new tab) (2020)
-
SimCSE: Simple Contrastive Learning of Sentence Embeddings (opens in a new tab) (2021)
-
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (opens in a new tab) (2021)
-
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination (opens in a new tab) (2018)
Image Generation
-
Improved Techniques for Training GANs (opens in a new tab) (2016)
-
Improved Training of Wasserstein GANs (opens in a new tab) (2017)
-
Progressive Growing of GANs for Improved Quality, Stability, and Variation (opens in a new tab) (2017)
-
A Style-Based Generator Architecture for Generative Adversarial Networks (opens in a new tab) (2018)
-
Self-Attention Generative Adversarial Networks (opens in a new tab) (2018)
-
SinGAN: Learning a Generative Model from a Single Natural Image (opens in a new tab) (2019)
-
Denoising Diffusion Probabilistic Models (opens in a new tab) (2020)
-
Analyzing and Improving the Image Quality of StyleGAN (opens in a new tab) (2020)
Key papers
-
AlexNet (opens in a new tab) (2012): This convolutional neural network architecture was one of the first large CNNs to significantly outperform traditional computer vision methods on the ImageNet (opens in a new tab) dataset. It demonstrated the power of deep learning for computer vision.
-
Rich feature hierarchies for accurate object detection and semantic segmentation (opens in a new tab) (2014): Introduced the R-CNN algorithm for object detection. It uses region proposals and CNN features to detect objects in images. R-CNN significantly improved detection accuracy and kicked off the rapid progress in object detection research.
-
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (opens in a new tab) (2015): Introduced improvements to R-CNN with their Faster R-CNN algorithm. By introducing a Region Proposal Network, Faster R-CNN sped up object detection significantly compared to R-CNN.
-
Mask R-CNN (opens in a new tab) (2017): Presented an extension to Faster R-CNN by adding a branch for predicting an object mask in parallel with the bounding box. This Mask R-CNN architecture enables detecting and segmenting objects in one model.
-
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (opens in a new tab) (2019): A convolutional neural network architecture using a scaling method that balances network depth, width, and resolution. EfficientNet achieved state-of-the-art accuracy on ImageNet with significantly fewer parameters.
-
Densely Connected Convolutional Networks (opens in a new tab) (2017): Introduced dense connectivity between layers to strengthen feature propagation and reduce vanishing gradients.
-
You Only Look Once: Unified, Real-Time Object Detection (opens in a new tab) (2016): Framed object detection as a regression problem to apply a single neural network to the full image, enabling extremely fast detection.
-
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs (opens in a new tab) (2014): Introduced atrous convolution to capture multi-scale context for semantic segmentation. Significantly improved segmentation accuracy.
-
U-Net: Convolutional Networks for Biomedical Image Segmentation (opens in a new tab) (2015): Introduced an encoder-decoder network with skip connections for biomedical image segmentation. Became widely used for semantic and instance segmentation.
-
Very Deep Convolutional Networks for Large-Scale Image Recognition (opens in a new tab) (2014): Demonstrated the benefit of convolutional neural network depth using smaller 3x3 filters stacked across 16-19 weight layers.
-
Deep Residual Learning for Image Recognition (opens in a new tab) (2015): Developed deep residual networks using skip connections to enable training 100+ layer networks. Drove rapid progress in image classification.
-
Dynamic Routing Between Capsules (opens in a new tab) (2017): Proposed capsule networks with dynamic routing to better model hierarchical relationships than standard CNNs.
-
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (opens in a new tab) (2017): Presented adversarial training for unpaired image-to-image translation between domains without matched image pairs.