Home / Courses / CSC4030
Capella University — Computer Science

CSC4030: Introduction to Machine Learning

A complete guide to Capella's CSC4030, covering supervised and unsupervised learning, neural networks, deep learning architectures, industry frameworks like TensorFlow and PyTorch, feature engineering, model evaluation, and hyperparameter tuning.

Undergraduate LevelNeural NetworksTensorFlow & PyTorchAPA 7th Edition

CSC4030 introduces students to the theory and practice of machine learning, the branch of computer science that enables systems to learn from data rather than following explicitly programmed instructions. As organizations across every industry integrate ML into their products and decision-making, the ability to train, evaluate, and deploy models has become a core competency for computer science graduates.

Machine learning approach comparison

ApproachTraining DataGoalCommon Algorithms
Supervised learningLabeled (input-output pairs)Predict labels for unseen inputsLinear regression, logistic regression, decision trees, random forests, SVMs, neural networks
Unsupervised learningUnlabeled (inputs only)Discover hidden structure in dataK-means clustering, DBSCAN, hierarchical clustering, PCA, autoencoders
Reinforcement learningReward signals from environmentLearn optimal action sequencesQ-learning, deep Q-networks, policy gradient, actor-critic methods
Semi-supervisedSmall labeled + large unlabeled setLeverage unlabeled data to improve predictionsSelf-training, label propagation, generative models, contrastive learning

What CSC4030 covers

Supervised learning forms the backbone of the course. Students begin with linear regression for continuous prediction tasks and logistic regression for binary classification, learning how gradient descent minimizes a loss function (mean squared error for regression, cross-entropy for classification) by iteratively adjusting model parameters in the direction that reduces prediction error. Decision trees partition the feature space using information gain or Gini impurity to make predictions at leaf nodes, and random forests aggregate many decorrelated trees through bagging to reduce variance and improve generalization. Support vector machines (SVMs) find the maximum-margin hyperplane that separates classes, using the kernel trick to handle non-linearly separable data by mapping features into higher-dimensional spaces. Each algorithm embodies different assumptions about the data, and a central skill CSC4030 develops is the ability to select the right algorithm for a given dataset and problem type. Geron (2022) emphasizes that no single algorithm dominates across all tasks; the "no free lunch" theorem guarantees that every algorithm has datasets where it performs poorly, so practical ML requires understanding each algorithm's strengths and weaknesses.

Neural networks and deep learning occupy the second half of the course. A basic feedforward neural network consists of an input layer, one or more hidden layers of neurons (each computing a weighted sum of inputs plus a bias, passed through a nonlinear activation function like ReLU or sigmoid), and an output layer. Backpropagation computes the gradient of the loss function with respect to each weight by applying the chain rule backward through the network, and stochastic gradient descent (or variants like Adam, RMSprop) uses these gradients to update weights. Deep neural networks (networks with many hidden layers) learn hierarchical representations: early layers detect low-level features (edges, textures in image data; phonemes in audio), and deeper layers compose these into high-level concepts (faces, objects, words). Convolutional neural networks (CNNs) apply learned filters across spatial dimensions, making them the standard architecture for image classification and object detection. Recurrent neural networks (RNNs) and their variants (LSTMs, GRUs) process sequential data by maintaining hidden state across time steps, making them suitable for text, speech, and time-series analysis. CSC4030 requires students to implement models using TensorFlow, PyTorch, or Keras, working through the full pipeline from data loading and preprocessing through model definition, training, evaluation, and hyperparameter tuning (Goodfellow et al., 2016).

Building a classification model, writing an ML pipeline report, or analyzing model performance?

Our data science writers handle TensorFlow and PyTorch implementations, evaluation metric analyses, and the technical depth Capella's CSC4030 rubric requires.

Get Expert Help

Key topics in CSC4030

Industry frameworks and tools used in CSC4030

  • TensorFlow: Google's open-source framework for building and deploying ML models; includes tf.keras for high-level model construction, TensorBoard for visualization, and TensorFlow Lite for mobile deployment
  • PyTorch: Facebook's dynamic computation graph framework favored in research; eager execution for intuitive debugging, torchvision for image tasks, and strong GPU acceleration support
  • Keras: high-level API (now integrated into TensorFlow) that provides a clean, modular interface for defining layers, compiling models, and running training loops with minimal boilerplate
  • scikit-learn: Python library providing consistent APIs for classical ML algorithms, preprocessing utilities, model selection tools, and evaluation metrics; typically the starting point before moving to deep learning
  • Jupyter Notebooks: interactive computing environment used throughout the course for exploratory data analysis, model prototyping, and presenting results with inline visualizations

Get Help With CSC4030

ML pipeline implementations, model evaluation reports, neural network analyses, feature engineering projects. Computer science coursework done right.

Place Your OrderView All Services

Related courses

Frequently asked questions

What is the difference between supervised and unsupervised learning?

Supervised learning trains a model on labeled data, where each training example includes both the input features and the correct output (label). The model learns a mapping from inputs to outputs and generalizes that mapping to predict labels for new, unseen inputs. Examples include predicting house prices from square footage and location (regression) or classifying emails as spam or not spam (classification). Unsupervised learning works with unlabeled data: the model receives only input features and must discover structure on its own. Common tasks include clustering (grouping similar customers for market segmentation), dimensionality reduction (compressing high-dimensional data for visualization or preprocessing), and anomaly detection (identifying unusual transactions that might indicate fraud). The practical difference is data availability: supervised learning requires someone to label the training data, which is expensive and time-consuming for large datasets. Unsupervised learning works with raw data but cannot make predictions in the same way. Many real-world ML systems combine both: use unsupervised methods to discover structure and create features, then feed those features into a supervised model for prediction.

What is overfitting and how do you prevent it?

Overfitting occurs when a model learns the training data too well, capturing noise and random fluctuations rather than the underlying pattern. An overfit model achieves high accuracy on training data but performs poorly on new, unseen data because it has memorized specific training examples rather than learning generalizable rules. Think of a student who memorizes every practice exam answer but cannot solve a slightly different problem on the real exam. Preventing overfitting involves several strategies: using more training data (the most effective remedy when feasible), applying regularization (L1/Lasso adds a penalty proportional to the absolute value of weights, pushing some to zero; L2/Ridge adds a penalty proportional to the square of weights, shrinking all weights toward zero), using dropout in neural networks (randomly deactivating neurons during training to prevent co-adaptation), applying early stopping (monitoring validation loss during training and stopping when it begins to increase even as training loss continues to decrease), and using cross-validation to get a more reliable estimate of model performance. The bias-variance trade-off is the underlying framework: overfitting means high variance (model is too sensitive to training data), underfitting means high bias (model is too simple to capture the pattern), and the goal is to find the right model complexity that minimizes total error.

How does backpropagation work in neural networks?

Backpropagation is the algorithm that computes how much each weight in a neural network contributed to the prediction error, so that gradient descent can adjust the weights to reduce that error. It works in two phases. In the forward pass, input data flows through the network layer by layer: each neuron computes a weighted sum of its inputs, adds a bias, applies an activation function, and passes the result to the next layer. At the output layer, the loss function compares the network's prediction to the true label and produces a single error value. In the backward pass, backpropagation applies the chain rule of calculus to compute the gradient (partial derivative) of the loss with respect to each weight in the network, starting from the output layer and working backward to the input layer. Each weight's gradient tells you how much a small change in that weight would change the loss. The optimizer (SGD, Adam, or another variant) then updates each weight by subtracting a fraction (the learning rate) of its gradient, nudging the network toward lower loss. This process repeats for many iterations (epochs) over the training data. The key insight is that the chain rule makes gradient computation tractable even in deep networks with millions of parameters, because each layer's gradient depends only on the layer above it.

What is cross-validation and why is a simple train/test split not always sufficient?

A simple train/test split divides data into two parts: train the model on one, evaluate on the other. The problem is that a single split can be misleading. If the test set happens to contain easy examples, the model looks better than it is; if it contains hard examples, the model looks worse. The performance estimate has high variance depending on which examples end up in which set. K-fold cross-validation addresses this by dividing the data into k equal parts (folds), training the model k times (each time using k-1 folds for training and 1 fold for testing), and averaging the k performance scores. This produces a more reliable estimate because every example serves as both training and test data across the k iterations. Common choices are k=5 or k=10. Stratified k-fold ensures that each fold preserves the class distribution of the full dataset, which is important for imbalanced datasets where one class is rare. Leave-one-out cross-validation (k equals the number of samples) provides the lowest bias estimate but is computationally expensive for large datasets. In CSC4030, cross-validation is the standard method for model comparison: when deciding between a random forest and a gradient boosting model, you compare their cross-validated scores rather than single train/test scores to make a more trustworthy decision.