CSC4030 introduces students to the theory and practice of machine learning, the branch of computer science that enables systems to learn from data rather than following explicitly programmed instructions. As organizations across every industry integrate ML into their products and decision-making, the ability to train, evaluate, and deploy models has become a core competency for computer science graduates.
Machine learning approach comparison
| Approach | Training Data | Goal | Common Algorithms |
|---|---|---|---|
| Supervised learning | Labeled (input-output pairs) | Predict labels for unseen inputs | Linear regression, logistic regression, decision trees, random forests, SVMs, neural networks |
| Unsupervised learning | Unlabeled (inputs only) | Discover hidden structure in data | K-means clustering, DBSCAN, hierarchical clustering, PCA, autoencoders |
| Reinforcement learning | Reward signals from environment | Learn optimal action sequences | Q-learning, deep Q-networks, policy gradient, actor-critic methods |
| Semi-supervised | Small labeled + large unlabeled set | Leverage unlabeled data to improve predictions | Self-training, label propagation, generative models, contrastive learning |
What CSC4030 covers
Supervised learning forms the backbone of the course. Students begin with linear regression for continuous prediction tasks and logistic regression for binary classification, learning how gradient descent minimizes a loss function (mean squared error for regression, cross-entropy for classification) by iteratively adjusting model parameters in the direction that reduces prediction error. Decision trees partition the feature space using information gain or Gini impurity to make predictions at leaf nodes, and random forests aggregate many decorrelated trees through bagging to reduce variance and improve generalization. Support vector machines (SVMs) find the maximum-margin hyperplane that separates classes, using the kernel trick to handle non-linearly separable data by mapping features into higher-dimensional spaces. Each algorithm embodies different assumptions about the data, and a central skill CSC4030 develops is the ability to select the right algorithm for a given dataset and problem type. Geron (2022) emphasizes that no single algorithm dominates across all tasks; the "no free lunch" theorem guarantees that every algorithm has datasets where it performs poorly, so practical ML requires understanding each algorithm's strengths and weaknesses.
Neural networks and deep learning occupy the second half of the course. A basic feedforward neural network consists of an input layer, one or more hidden layers of neurons (each computing a weighted sum of inputs plus a bias, passed through a nonlinear activation function like ReLU or sigmoid), and an output layer. Backpropagation computes the gradient of the loss function with respect to each weight by applying the chain rule backward through the network, and stochastic gradient descent (or variants like Adam, RMSprop) uses these gradients to update weights. Deep neural networks (networks with many hidden layers) learn hierarchical representations: early layers detect low-level features (edges, textures in image data; phonemes in audio), and deeper layers compose these into high-level concepts (faces, objects, words). Convolutional neural networks (CNNs) apply learned filters across spatial dimensions, making them the standard architecture for image classification and object detection. Recurrent neural networks (RNNs) and their variants (LSTMs, GRUs) process sequential data by maintaining hidden state across time steps, making them suitable for text, speech, and time-series analysis. CSC4030 requires students to implement models using TensorFlow, PyTorch, or Keras, working through the full pipeline from data loading and preprocessing through model definition, training, evaluation, and hyperparameter tuning (Goodfellow et al., 2016).
Building a classification model, writing an ML pipeline report, or analyzing model performance?
Our data science writers handle TensorFlow and PyTorch implementations, evaluation metric analyses, and the technical depth Capella's CSC4030 rubric requires.
Key topics in CSC4030
- Supervised learning algorithms: linear regression, logistic regression, decision trees, random forests, gradient boosting (XGBoost, LightGBM), support vector machines, k-nearest neighbors
- Unsupervised learning: K-means and K-medoids clustering, DBSCAN, hierarchical (agglomerative) clustering, principal component analysis (PCA), t-SNE for visualization
- Neural network fundamentals: perceptrons, multi-layer feedforward networks, activation functions (ReLU, sigmoid, tanh, softmax), backpropagation, gradient descent optimizers (SGD, Adam, RMSprop)
- Deep learning architectures: convolutional neural networks (CNNs) for images, recurrent neural networks (RNNs, LSTMs, GRUs) for sequences, autoencoders for dimensionality reduction
- Feature engineering: feature selection, extraction, scaling (normalization, standardization), encoding categorical variables (one-hot, label, target encoding), handling missing data
- Model evaluation: accuracy, precision, recall, F1-score, ROC/AUC curves, confusion matrices, cross-validation (k-fold, stratified), bias-variance trade-off
- Hyperparameter tuning: grid search, random search, Bayesian optimization, learning rate scheduling, early stopping, regularization (L1/L2, dropout, batch normalization)
- Reinforcement learning fundamentals: Markov decision processes, reward signals, exploration vs. exploitation, Q-learning, policy gradient methods
- Ensemble methods: bagging (random forests), boosting (AdaBoost, gradient boosting), stacking, voting classifiers
Industry frameworks and tools used in CSC4030
- TensorFlow: Google's open-source framework for building and deploying ML models; includes tf.keras for high-level model construction, TensorBoard for visualization, and TensorFlow Lite for mobile deployment
- PyTorch: Facebook's dynamic computation graph framework favored in research; eager execution for intuitive debugging, torchvision for image tasks, and strong GPU acceleration support
- Keras: high-level API (now integrated into TensorFlow) that provides a clean, modular interface for defining layers, compiling models, and running training loops with minimal boilerplate
- scikit-learn: Python library providing consistent APIs for classical ML algorithms, preprocessing utilities, model selection tools, and evaluation metrics; typically the starting point before moving to deep learning
- Jupyter Notebooks: interactive computing environment used throughout the course for exploratory data analysis, model prototyping, and presenting results with inline visualizations
Get Help With CSC4030
ML pipeline implementations, model evaluation reports, neural network analyses, feature engineering projects. Computer science coursework done right.
Place Your OrderView All ServicesRelated courses
Frequently asked questions
Supervised learning trains a model on labeled data, where each training example includes both the input features and the correct output (label). The model learns a mapping from inputs to outputs and generalizes that mapping to predict labels for new, unseen inputs. Examples include predicting house prices from square footage and location (regression) or classifying emails as spam or not spam (classification). Unsupervised learning works with unlabeled data: the model receives only input features and must discover structure on its own. Common tasks include clustering (grouping similar customers for market segmentation), dimensionality reduction (compressing high-dimensional data for visualization or preprocessing), and anomaly detection (identifying unusual transactions that might indicate fraud). The practical difference is data availability: supervised learning requires someone to label the training data, which is expensive and time-consuming for large datasets. Unsupervised learning works with raw data but cannot make predictions in the same way. Many real-world ML systems combine both: use unsupervised methods to discover structure and create features, then feed those features into a supervised model for prediction.
Overfitting occurs when a model learns the training data too well, capturing noise and random fluctuations rather than the underlying pattern. An overfit model achieves high accuracy on training data but performs poorly on new, unseen data because it has memorized specific training examples rather than learning generalizable rules. Think of a student who memorizes every practice exam answer but cannot solve a slightly different problem on the real exam. Preventing overfitting involves several strategies: using more training data (the most effective remedy when feasible), applying regularization (L1/Lasso adds a penalty proportional to the absolute value of weights, pushing some to zero; L2/Ridge adds a penalty proportional to the square of weights, shrinking all weights toward zero), using dropout in neural networks (randomly deactivating neurons during training to prevent co-adaptation), applying early stopping (monitoring validation loss during training and stopping when it begins to increase even as training loss continues to decrease), and using cross-validation to get a more reliable estimate of model performance. The bias-variance trade-off is the underlying framework: overfitting means high variance (model is too sensitive to training data), underfitting means high bias (model is too simple to capture the pattern), and the goal is to find the right model complexity that minimizes total error.
Backpropagation is the algorithm that computes how much each weight in a neural network contributed to the prediction error, so that gradient descent can adjust the weights to reduce that error. It works in two phases. In the forward pass, input data flows through the network layer by layer: each neuron computes a weighted sum of its inputs, adds a bias, applies an activation function, and passes the result to the next layer. At the output layer, the loss function compares the network's prediction to the true label and produces a single error value. In the backward pass, backpropagation applies the chain rule of calculus to compute the gradient (partial derivative) of the loss with respect to each weight in the network, starting from the output layer and working backward to the input layer. Each weight's gradient tells you how much a small change in that weight would change the loss. The optimizer (SGD, Adam, or another variant) then updates each weight by subtracting a fraction (the learning rate) of its gradient, nudging the network toward lower loss. This process repeats for many iterations (epochs) over the training data. The key insight is that the chain rule makes gradient computation tractable even in deep networks with millions of parameters, because each layer's gradient depends only on the layer above it.
A simple train/test split divides data into two parts: train the model on one, evaluate on the other. The problem is that a single split can be misleading. If the test set happens to contain easy examples, the model looks better than it is; if it contains hard examples, the model looks worse. The performance estimate has high variance depending on which examples end up in which set. K-fold cross-validation addresses this by dividing the data into k equal parts (folds), training the model k times (each time using k-1 folds for training and 1 fold for testing), and averaging the k performance scores. This produces a more reliable estimate because every example serves as both training and test data across the k iterations. Common choices are k=5 or k=10. Stratified k-fold ensures that each fold preserves the class distribution of the full dataset, which is important for imbalanced datasets where one class is rare. Leave-one-out cross-validation (k equals the number of samples) provides the lowest bias estimate but is computationally expensive for large datasets. In CSC4030, cross-validation is the standard method for model comparison: when deciding between a random forest and a gradient boosting model, you compare their cross-validated scores rather than single train/test scores to make a more trustworthy decision.