Learn about the importance of data cleaning and how to improve the quality and reliability of your data. This guide covers the steps involved in data cleaning, tools and techniques, and best practices to follow

Machine Learning: A Friendly Tour

Ali’s Field Notes

THE ALGORITHM
GARDEN

Machine Learning isn’t magic. It’s a toolbox.
From teaching puppies to folding maps, here is your friendly tour.

Start With The Question

Before names or math, ask: What do you want to discover? Pick the right tool, plus a bit of care, and you get useful answers.

Predicting Numbers?

Supervised > Regression

Predicting Categories?

Supervised > Classification

Finding Groups?

Unsupervised > Clustering

Complex Patterns?

Deep Learning

🐶

“The first time I trained a model, it felt like teaching a curious puppy to fetch—clumsy at first, then surprisingly good.”

Family 01

Supervised Learning

Learning from examples that have the “right” answer. Like a student with an answer key.

Linear Regression

The “Ruler”

Think of a ruler through dots. Tries to draw the best straight line. Great for quick baselines like housing prices.

Logistic Regression

The “Yes/No”

Estimates the chance something is A or B. Will a customer churn? Is this spam? Clean and reliable.

Trees & Forests

The “Flowchart”

Splits data into simple rules (If > X then Y). Random Forests use many trees to vote, increasing stability.

Family 02

Unsupervised Learning

Finding structure without answers. Looking for shapes in the fog.

Clustering (k-Means)

segmentation

Imagine tossing magnets on a metal sheet. Points pull toward the nearest magnet. Great for grouping similar customers.

PCA (Dimensionality)

simplification

Like folding a complex map. It reshapes the data space so most variation fits into fewer directions.

Family 03

Deep Learning

When data is rich and messy (Images, Sound, Text). Neural networks shine here.

CNN

Convolutional Neural Networks scan images with filters. They catch edges, textures, shapes.

Transformers

The modern standard for text. They pay attention to all words at once. Powered by massive compute.

Neural Nets

Stacked layers of simple units. Can surpass classic models when signal is complex.

How To Judge Models

Overfitting is the classic pothole. You crush the training set, then stumble on new data. It’s like memorizing the answers instead of learning the subject.

Precision vs Recall

Precision: No false alarms.
Recall: Don’t miss the bad guys.
(Choose based on what hurts more.)

Split Fairly

Train Validate Test

The Sweet Spot (Bias vs Variance)

A Quick Field Guide

What to try first when you land on a new planet.

Scenario	Start With…	Then Try…
Tabular Data (Excel/SQL)	Linear / Logistic Regression	Random Forest or Gradient Boosting
Few rows, many columns	Regression + Regularization	PCA (Simplify) before modeling
Images	Small CNN	Pre-trained ResNet/EfficientNet
Text / NLP	Bag-of-Words	Transformers (BERT/GPT)
Anomalies	Isolation Forest	Simple Thresholds

Keep It Responsible

Models touch people. Check for bias. Monitor drift. Explain choices. A model that is fair and stable earns trust.

Trust is your real metric.

“Features beat fancy. Clean data wins.”

Ali Reza Rashidi

Ali Reza Rashidi, a Senior Data Scientist-Gen Al | Al Architect | MLOps with over ten years of experience, He is the author of three books that delve into the world of data and management.

Exploring Machine Learning Algorithms

What is Data Cleaning?

Balancing ACT

THE ALGORITHM
GARDEN