All Posts All Posts

Common Concepts in Machine Learning

April 8, 2017·
AI
·3 min read
Tecker Yu
Tecker Yu
AI Native Cloud Engineer × Part-time Investor

Features (Attributes)

Usually the columns of the training sample set, which can be viewed as column names for each column. For example, to distinguish bird species, features such as weight and back color can be used for differentiation.

Feature Instance

The data within a specific feature column

Types of Features

  • Numerical
  • Binary (similar to Boolean type)
  • Categorical (color values)

Training Set

A collection of data samples used to train machine learning algorithms

Training Sample

Each row of the sample set is a training sample

Target Variable

The prediction result of the machine learning algorithm

Classification:

  • In classification algorithms, typically nominal, the target variable is called class
  • In regression algorithms, typically continuous

Before training the sample set, the value of the target variable must be determined. For example, based on features like bird height, weight, and color, the specific bird species can be determined. The species is the target variable, and the species name is the specific value of the target variable.

Knowledge Representation

Refers to the work of presenting machine classification results

Forms include: rule sets probability distributions instances in training sample sets

Why?

So that machine learning algorithms can discover relationships between features and target variables

The main task of machine learning is classification

Dividing instance data into appropriate categories

Basic classification process:

  1. Obtain all feature information
  2. Algorithm training (learning how to classify)
  3. Testing the effectiveness of machine learning algorithms

How to test algorithm effectiveness?

To test effectiveness, two separate sample sets are typically used: training data and test data

Another important task in machine learning: Regression

Mainly used for predicting numerical data

Program starts running:

Training sample set provides target variable => Input to algorithm => Training completed => Input test data (without target variable) => Compare test results with actual target variable differences (regression fitting) => Obtain actual accuracy of the algorithm

Supervised Learning vs Unsupervised Learning

Supervised learning means the algorithm knows what to predict, i.e., has clear objectives

Regression and classification both belong to supervised learning

Compared to supervised learning, unsupervised learning has no target values

Unsupervised Learning

Purpose:

  • Reduce the dimensionality of data features

Clustering (Analysis)

Divide similar objects into different groups or more subsets through static classification methods. This is equivalent to dividing one training set into multiple training sets, where the data features in each new training set are similar.

Density Estimation

Refers to the process of finding statistical descriptions of data. It can be understood as obtaining a frequency distribution histogram that describes the data, reducing feature dimensions to make it more intuitive.

Brief Discussion on Algorithm Selection

  • Goal-oriented: Choose supervised or unsupervised learning algorithms based on the task to be accomplished
  • Data source-oriented: Analyze or collect what kind of data is needed

General Steps for Developing Machine Learning Applications

  1. Collect data

  2. Prepare input data

  3. Analyze input data (manual)

  4. Train algorithm (machine learning)

  5. Test algorithm

  6. Use algorithm

Views