CSC4040: Computer Vision

CSC4040 teaches students how to build systems that extract meaningful information from visual data. Computer vision sits at the intersection of image processing, machine learning, and geometry, powering applications from autonomous vehicles and medical imaging to facial recognition and augmented reality. This course develops both the mathematical foundations and the practical coding skills required to implement vision algorithms using industry-standard tools.

Edge detection method comparison

Method	Approach	Strengths	Limitations
Sobel operator	First-derivative approximation using 3x3 convolution kernels	Fast computation; good for detecting horizontal and vertical edges separately	Sensitive to noise; produces thick edges; does not produce single-pixel edge lines
Canny edge detector	Multi-stage: Gaussian blur, gradient magnitude, non-maximum suppression, hysteresis thresholding	Produces thin, connected edges; adjustable sensitivity via two thresholds; widely used baseline	Requires parameter tuning (sigma, low/high thresholds); slower than single-pass methods
Laplacian of Gaussian (LoG)	Second-derivative operator after Gaussian smoothing; detects zero crossings	Detects edges at multiple scales; isotropic (direction-independent)	Sensitive to noise without adequate smoothing; computationally heavier; produces closed contours that may include false edges
Structured Edge Detection (SED)	Random forest classifier trained on image patches and ground-truth boundaries	Learns what constitutes a meaningful boundary from labeled data; outperforms hand-crafted filters on natural images	Requires training data; model-dependent; slower inference than simple gradient operators

What CSC4040 covers

The course begins with image processing fundamentals: how digital images are represented as matrices of pixel intensities (grayscale) or multi-channel arrays (RGB, HSV), and how mathematical operations on these matrices produce useful transformations. Image convolution is the core operation: sliding a small kernel (filter) across the image and computing the weighted sum at each position. Gaussian blur kernels smooth images and reduce noise, sharpening kernels enhance edges, and gradient kernels (Sobel, Prewitt) detect intensity transitions that correspond to boundaries between objects. Histogram analysis reveals the distribution of pixel intensities in an image, and histogram equalization redistributes those intensities to improve contrast, which is particularly valuable for medical imaging and surveillance where lighting conditions vary. Morphological operations (erosion, dilation, opening, closing) use structuring elements to modify shapes in binary images, enabling noise removal, gap filling, and object separation. Szeliski (2022) emphasizes that mastering these low-level operations is essential because every high-level vision task (detection, recognition, segmentation) builds on them. Students implement these operations using OpenCV, the most widely used computer vision library, which provides optimized C++ implementations with Python bindings that make it accessible for rapid prototyping while remaining fast enough for real-time applications.

Feature detection, matching, and object recognition form the second major pillar of CSC4040. Features are distinctive, repeatable patterns in an image (corners, blobs, edges) that can be detected reliably across different viewpoints, scales, and lighting conditions. The Harris corner detector identifies points where the image gradient changes significantly in multiple directions, making them stable reference points for matching across images. SIFT (Scale-Invariant Feature Transform) and SURF (Speeded-Up Robust Features) detect keypoints that are invariant to scale changes and rotation, compute descriptor vectors that capture the local appearance around each keypoint, and match descriptors across images using distance metrics. ORB (Oriented FAST and Rotated BRIEF) provides a faster, patent-free alternative suitable for real-time applications. Once features are matched between images, geometric verification using RANSAC (Random Sample Consensus) eliminates outlier matches and estimates the transformation (homography or fundamental matrix) relating the two views. These techniques enable image stitching (panoramas), visual odometry (estimating camera motion), 3D reconstruction, and augmented reality overlays. Image classification and segmentation using convolutional neural networks (CNNs) represent the modern approach to object recognition: rather than hand-crafting feature detectors, CNNs learn hierarchical features directly from labeled training data. CSC4040 covers classical architectures (LeNet, AlexNet, VGGNet, ResNet) and practical techniques including transfer learning, where a model pre-trained on a large dataset (ImageNet) is fine-tuned for a specific task with limited training data (Gonzalez & Woods, 2018).

Working on an image processing project, object detection pipeline, or CNN classification report?

Our computer vision writers deliver OpenCV implementations, feature analysis reports, and the technical depth Capella's CSC4040 rubric requires.

Get Expert Help

Key topics in CSC4040

Image representation: pixel grids, color spaces (RGB, HSV, grayscale, LAB), bit depth, resolution, and the relationship between spatial and intensity resolution
Image filtering and convolution: Gaussian blur, sharpening, median filtering for salt-and-pepper noise, bilateral filtering for edge-preserving smoothing
Edge detection: Sobel and Prewitt gradient operators, Canny edge detector (non-maximum suppression, hysteresis thresholding), Laplacian of Gaussian
Histogram analysis: intensity histograms, histogram equalization, adaptive histogram equalization (CLAHE), color histogram comparison for image retrieval
Morphological operations: erosion, dilation, opening, closing, morphological gradient, top-hat and black-hat transforms, structuring element selection
Feature detection and description: Harris corners, SIFT, SURF, ORB, FAST keypoints, BRIEF descriptors, feature matching with brute-force and FLANN matchers
Geometric transformations: affine and projective (homography) transformations, RANSAC for robust estimation, image stitching, camera calibration
Object recognition: template matching, sliding window detection, HOG (Histogram of Oriented Gradients) + SVM, CNN-based classification and detection (YOLO, SSD, Faster R-CNN)
Image segmentation: thresholding (global, Otsu's adaptive), watershed algorithm, GrabCut, semantic segmentation with fully convolutional networks (FCN), instance segmentation

  OpenCV functions every CSC4040 student should know
  cv2.imread() / cv2.imshow(): load and display images; cv2.cvtColor() for color space conversion (BGR to grayscale, HSV, LAB)
cv2.GaussianBlur() / cv2.medianBlur() / cv2.bilateralFilter(): noise reduction with different trade-offs between smoothing and edge preservation
cv2.Canny(): the standard edge detection pipeline; cv2.Sobel() and cv2.Laplacian() for gradient computation
cv2.findContours() / cv2.drawContours(): extract and visualize object boundaries from binary edge maps or thresholded images
cv2.ORB_create() / detectAndCompute() / BFMatcher(): detect keypoints, compute descriptors, and match features across image pairs for recognition and stitching
cv2.warpPerspective() / cv2.findHomography(): apply geometric transformations for image alignment, panorama creation, and perspective correction

Get Help With CSC4040

Image processing projects, feature detection analyses, CNN classification reports, OpenCV implementations. Computer science coursework done right.

Place Your Order View All Services

Related courses

Frequently asked questions

What is image convolution and why is it fundamental to computer vision?

Image convolution is the mathematical operation of sliding a small matrix (called a kernel or filter) across an image and computing the weighted sum of pixel values at each position. The kernel determines what the operation does: a Gaussian kernel smooths the image by averaging neighboring pixels (weighted by distance from center), a Sobel kernel detects edges by computing the gradient (rate of intensity change), and a sharpening kernel enhances edges by subtracting a blurred version from the original. Convolution is fundamental because nearly every image processing operation can be expressed as a convolution or a sequence of convolutions. Blurring, edge detection, embossing, and feature extraction all use different kernels applied through the same convolution operation. This is also why convolutional neural networks (CNNs) are so powerful for vision tasks: instead of hand-designing kernels, CNNs learn the optimal kernels directly from training data. The first layer might learn simple edge detectors, the second layer learns combinations of edges (corners, textures), and deeper layers learn complex patterns (object parts, faces). In CSC4040, students implement convolution from scratch to understand the mathematics, then use OpenCV's optimized functions for practical applications.

How does the Canny edge detector work and what makes it better than simple gradient methods?

The Canny edge detector uses a multi-stage pipeline that produces thin, well-localized, connected edge lines. Stage one applies Gaussian smoothing to reduce noise (noise produces false gradients that look like edges). Stage two computes the gradient magnitude and direction at every pixel using Sobel operators in the x and y directions. Stage three performs non-maximum suppression: at each pixel, it checks whether the gradient magnitude is the local maximum along the gradient direction, and suppresses (sets to zero) any pixel that is not a local maximum. This produces edges that are exactly one pixel wide, unlike raw Sobel output which produces thick, blurry edges. Stage four applies hysteresis thresholding with two thresholds: pixels above the high threshold are definitely edges, pixels below the low threshold are definitely not edges, and pixels between the two thresholds are edges only if they connect to a definite edge pixel. This connectivity requirement eliminates isolated noise pixels that happen to have strong gradients while preserving weak but genuine edge segments that connect to strong edges. The result is cleaner, more connected edge maps than any single-stage method can produce. In OpenCV, cv2.Canny(image, low_threshold, high_threshold) implements the full pipeline in one call.

What is transfer learning and why is it important for computer vision projects?

Transfer learning means taking a model that was trained on one task (usually a large, general dataset like ImageNet with 1.2 million images across 1,000 categories) and adapting it to a different, often smaller and more specific task. For CNNs, this works because the features learned by early layers (edges, textures, color gradients) are general-purpose visual features useful for virtually any image task, while later layers learn task-specific features (e.g., dog breed characteristics for a dog classification model). In practice, transfer learning involves loading a pre-trained model (ResNet-50, VGG-16, EfficientNet), removing or replacing the final classification layer(s) to match the new task's number of classes, and fine-tuning the model on the new dataset. Fine-tuning can freeze early layers (keeping their general features fixed) and only train later layers, which works well when the new dataset is small; or it can train all layers with a small learning rate, which works better when the new dataset is large enough to support full model updates. Transfer learning matters because training a deep CNN from scratch requires millions of labeled images and significant compute time. With transfer learning, you can achieve strong performance on a custom task with hundreds or thousands of images instead of millions. CSC4040 projects frequently use transfer learning to build classifiers for domain-specific image datasets.

What is the difference between image classification, object detection, and image segmentation?

These three tasks represent increasing levels of visual understanding. Image classification assigns a single label to the entire image: "this image contains a cat." It answers the question "what is in this image?" but says nothing about where. Standard architectures include VGGNet, ResNet, and EfficientNet. Object detection finds and localizes multiple objects within an image: "there is a cat at coordinates (x1,y1,x2,y2) and a dog at (x3,y3,x4,y4)." It draws bounding boxes around each detected object and classifies them. Standard architectures include YOLO (You Only Look Once), SSD (Single Shot Detector), and Faster R-CNN. Image segmentation assigns a class label to every pixel in the image rather than just drawing boxes. Semantic segmentation labels each pixel with a class (all "cat" pixels, all "background" pixels) but does not distinguish between different instances of the same class. Instance segmentation (Mask R-CNN) goes further by distinguishing individual objects: "this group of pixels is cat #1, this other group is cat #2." The computational cost increases from classification (process image once, output one label) through detection (process image, output multiple boxes and labels) to segmentation (process image, output a label map the same size as the input). CSC4040 covers all three tasks, starting with classical methods (sliding window + HOG for detection, watershed for segmentation) before introducing CNN-based approaches.