Computer Vision Papers

Paper with Code computer vision papers section (opens in a new tab): Provides access to research papers along with the corresponding code.

Semantic Segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation (opens in a new tab) (2015)
Deep Residual Learning for Image Recognition (opens in a new tab) (2015)
Mask R-CNN (opens in a new tab) (2018)
MobileNetV2: Inverted Residuals and Linear Bottlenecks (opens in a new tab) (2019)
MMDetection: Open MMLab Detection Toolbox and Benchmark (opens in a new tab) (2018)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (opens in a new tab) (2020)
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (opens in a new tab) (2016)
FCOS: Fully Convolutional One-Stage Object Detection (opens in a new tab) (2019)
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (opens in a new tab) (2018)

Image Classification

Contrastive Learning

Image Generation

Key papers

AlexNet (opens in a new tab) (2012): This convolutional neural network architecture was one of the first large CNNs to significantly outperform traditional computer vision methods on the ImageNet (opens in a new tab) dataset. It demonstrated the power of deep learning for computer vision.
Rich feature hierarchies for accurate object detection and semantic segmentation (opens in a new tab) (2014): Introduced the R-CNN algorithm for object detection. It uses region proposals and CNN features to detect objects in images. R-CNN significantly improved detection accuracy and kicked off the rapid progress in object detection research.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (opens in a new tab) (2015): Introduced improvements to R-CNN with their Faster R-CNN algorithm. By introducing a Region Proposal Network, Faster R-CNN sped up object detection significantly compared to R-CNN.
Mask R-CNN (opens in a new tab) (2017): Presented an extension to Faster R-CNN by adding a branch for predicting an object mask in parallel with the bounding box. This Mask R-CNN architecture enables detecting and segmenting objects in one model.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (opens in a new tab) (2019): A convolutional neural network architecture using a scaling method that balances network depth, width, and resolution. EfficientNet achieved state-of-the-art accuracy on ImageNet with significantly fewer parameters.
Densely Connected Convolutional Networks (opens in a new tab) (2017): Introduced dense connectivity between layers to strengthen feature propagation and reduce vanishing gradients.
You Only Look Once: Unified, Real-Time Object Detection (opens in a new tab) (2016): Framed object detection as a regression problem to apply a single neural network to the full image, enabling extremely fast detection.
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs (opens in a new tab) (2014): Introduced atrous convolution to capture multi-scale context for semantic segmentation. Significantly improved segmentation accuracy.
U-Net: Convolutional Networks for Biomedical Image Segmentation (opens in a new tab) (2015): Introduced an encoder-decoder network with skip connections for biomedical image segmentation. Became widely used for semantic and instance segmentation.
Very Deep Convolutional Networks for Large-Scale Image Recognition (opens in a new tab) (2014): Demonstrated the benefit of convolutional neural network depth using smaller 3x3 filters stacked across 16-19 weight layers.
Deep Residual Learning for Image Recognition (opens in a new tab) (2015): Developed deep residual networks using skip connections to enable training 100+ layer networks. Drove rapid progress in image classification.
Dynamic Routing Between Capsules (opens in a new tab) (2017): Proposed capsule networks with dynamic routing to better model hierarchical relationships than standard CNNs.
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (opens in a new tab) (2017): Presented adversarial training for unpaired image-to-image translation between domains without matched image pairs.

Natural Language Processing Prompt Engineering