What I Learned Taking CS6601: Artificial Intelligence

I expected my earlier OMSCS classes to prepare me for CS6601, but I quickly realized they hadn't.

Artificial Intelligence has a reputation as one of the toughest classes in the program, and that's true. You can't just fit it in after work and expect to keep up.

I was surprised by how much the material actually made sense. After years of hearing "AI" as just a buzzword, it was satisfying to learn the real techniques behind intelligent systems. The course begins with a simple idea that stuck with me: AI is about making decisions when you don't know what will happen next.

Here's how that idea shows up throughout the course.

Search

The course starts with search, which makes sense because the main question is simple: how do you figure out the steps to reach a goal when you can't see the whole path?

It's like trying to find your way through a maze. You start at the entrance and need to reach the exit, but you can only see the paths right in front of you. You never see the whole maze at once. Search algorithms help you explore efficiently so you don't get lost or waste time.

A* search takes things a step further. It uses the actual cost of the path you've already traveled and a "heuristic," which is just a smart guess about how far you still have to go. Think of driving with GPS: your app tracks how far you've gone and estimates the straight-line distance to your destination. A* uses both to pick the best roads to try first.

The key is that a good heuristic never overestimates the distance left. If you guess there are at least 5 miles to go, the real distance should be 5 miles or more. This "optimistic guess" rule helps A* find the best path without checking every option.

The math is simple and elegant:

$f(n) = g(n) + h(n)$

In this formula, $g(n)$ is the actual cost to reach a spot, and $h(n)$ is the estimated cost from there to the goal. A* always chooses the option with the lowest total $f(n)$ to explore next.

Playing Games

Making an AI that plays games is different from solving a maze. In a maze, the walls don't change. In a game, you have an opponent who reacts to your moves and tries to beat you.

The Minimax algorithm is good for this. The main idea is simple: you try to get the highest score, and your opponent tries to lower it. Picture a tree of all possible moves. On your turn, you pick the best move for yourself. On their turn, they pick the move that's worst for you. You go back and forth down the tree, and the results move back up.

The problem is that game trees grow very quickly. Even a simple board game can have more possible positions than there are atoms in the universe. You can't check every option.

Alpha-beta pruning helps with this. If you already have a move that scores 10, and you find another branch where your opponent can force you down to 5, you can skip that whole branch. There's no reason to waste time on it. This trick lets you search about twice as deep in the same amount of time.

Evaluation functions let you stop searching before the game is over. Instead of playing out every move, you look a few steps ahead and estimate who's winning based on things like piece count or board position. A good evaluation function is what makes the difference between a grandmaster-level AI and one that plays poorly.

Probability

The first part of this lesson assumes you know all the rules and can see everything clearly. The second part is more like real life: sensors are noisy, results are uncertain, and you rarely have all the information you want.

Bayes' Rule answers a simple question: given what I just observed, how should I update my beliefs?

$P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$

In simple terms, the chance that A is true given B happened equals (how likely B is if A is true) times (how likely A was before), divided by (how likely B is overall).

Here's a classic example: Imagine a patient tests positive for a rare disease. The test is 99% accurate, and only 1 in 10,000 people have the disease. What are the chances the patient actually has it?

Most people would guess "99%!" But Bayes' Rule gives a different answer. Since the disease is so rare, most positive results are actually false alarms. The real chance is only about 1%. This surprising math comes up all the time in real AI systems, from spam filters to medical diagnosis.

Bayes Networks build on this idea. Real-world probability problems get complicated quickly. A system with 20 yes/no variables already has over a million possible states. Bayes Networks help by showing which things actually depend on each other. The key insight is that not everything affects everything else so that you can break big problems into smaller, manageable parts.

Machine Learning

The main challenge in machine learning is finding the right balance between underfitting and overfitting.

Underfitting happens when your model is too simple and misses patterns in the data. Overfitting is the opposite: your model is so complex that it memorizes the training data, including noise and quirks, and then does poorly on new data.

K-fold cross-validation helps you spot overfitting. You split your data into $k$ equal parts. Train on $k-1$ parts and test on the last one. Repeat this $k$ times, each time leaving out a different part, then average the results. This way, every data point is used for both training and testing, giving you a realistic idea of how the model will perform in the real world.

Most models learn using gradient descent. Imagine you're blindfolded on a hilly field, trying to find the lowest point. You feel which way the ground slopes under your feet. Gradient descent means you keep stepping downhill, following the steepest direction each time until you reach the bottom.

$\theta_{new} = \theta_{old} - \alpha \nabla L(\theta)$

Here, $\theta$ stands for your model's parameters (the things it's learning), $\alpha$ is the learning rate (how big each step is), and $\nabla L(\theta)$ is the gradient of your loss function (which direction is downhill). The math uses partial derivatives and chain rules, which I'll talk about again soon.

Decision trees work differently. They ask questions that best split your data into useful groups. For example, "Is it raining?" could separate sunny-day from rainy-day activities. The algorithm chooses questions that give the most information, reducing uncertainty about what you want to predict.

Hidden Markov Models

The course ends with systems that change over time. Markov models describe situations in which the future depends only on the present, not on the entire history of events leading up to it. For example, tomorrow's weather depends on today's weather, but not directly on what happened last week.

Hidden Markov Models (HMMs) are more complex because you can't directly see the real state of the system. You only get noisy signals. In speech recognition, the words someone says are the hidden states, and the audio waveform you hear is the signal you actually observe.

The main question is: given a sequence of observations, what's the most likely sequence of hidden states that produced them? The Viterbi algorithm solves this with dynamic programming. Instead of checking every possible path, which would take forever, it keeps track of only the best path to each state at every step. At the end, you trace back to find the best sequence.

Building the Viterbi algorithm from scratch and carefully tracking back-pointers to find the best path were some of the most satisfying moments of debugging in the course. There's nothing like staring at a screen full of probability values at 2 am, finding that one small mistake, and watching your program finally give the right answer.

The Experience

I loved this class.

The material is really interesting, and the projects are tough but fair. I finished the course feeling like I truly understood the basics of AI, not just how to use tools that feel like magic.

The textbook is Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig. It's the main book on classical AI. I found it so useful and interesting that I bought the hardcover to keep. It's one of those books that changes how you think about problem-solving, not just in AI but for complex problems in general. If you take this course, make sure to do the readings.

Still, this course has a lot of math. I had only taken Calc 1 and Calc 2, so I had to put in extra effort to understand things like gradient descent and optimization. When you're dealing with partial derivatives and chain rules, having solid calculus basics really helps. I spent a lot of time on Khan Academy and 3Blue1Brown videos to catch up. If your calculus is rusty, plan to spend extra time on it.

The current professor, Dr. Thomas Ploetz, is excellent. He's passionate about the subject and truly cares about student success. What stands out is that he hosts his own office hours instead of leaving everything to the TAs. For a class this size, that's rare and shows how much he wants students to succeed. This class is tough, and he knows it, but he's there to help you get through.

The TAs are great too. They make the class interesting and challenging, and they respond quickly on the forums. When you need help, you can tell they really know their stuff. They're not just following a grading guide. That support makes a big difference in a technical class like this.

What Stuck With Me

Six principles emerged from the semester:

Good heuristics make search manageable. The difference between an impossible problem and a practical solution often comes down to domain knowledge built into heuristics.
Assume your opponent is perfect. In competitive situations, being too optimistic will get you beaten. Always plan for the worst case.
Update your beliefs with Bayes' Rule. Prior knowledge, combined with new evidence, gives you an updated belief. It's the fundamental equation of rational learning.
Independence assumptions make scaling possible. Even when they're not exactly right, they often work surprisingly well in practice.
Always test your model on data it hasn't seen before. The only honest measure of a model is how it performs on new data.
Vectorization is a must. In real-world machine learning, efficient implementation is just as important as getting the algorithm right.

Final Thoughts

CS6601 connected theory to practical skills in a way that made the lessons stick. Each project turned abstract ideas into real, working systems.

The workload is heavy. Some weeks, I wondered why I signed up as I stared at failing code after a long day. But finishing each assignment reminded me that I made the right choice.

If you're considering taking CS6601, know that it's tough but highly rewarding.