Due to the different classification effects and accuracy of different classification algorithms in machine learning, it is inconvenient for scientific researchers to choose which classification algorithm to use. This paper uses the face data published by Cambridge University as an experiment. The experiment first reduces the dimensionality of the data through the principal component analysis (PCA) algorithm, extracts the main features of the data, and then respectively through linear logic classification, linear discrimination LDA, nearest neighbor algorithm KNN, support vector machine SVM and the integrated algorithm Adaboost are used for classification. By comparing the advantages and disadvantages of the classification performance and complexity of different algorithms, the final review reviews accuracy, recall, f1-score, and AUC as evaluation indicators.

With the rise of artificial intelligence and machine learning, face recognition technology is widely used in life, such as station security, time and attendance punching, and secure payment [

The data set contains a total of 400 photos. We use the machine learning library Scikit-learn provided by python to process the data, and display part of the data set pictures as shown in Figure

ORL partial face image

The experimental data has 4096 features per picture. Since the number of features is much greater than the number of samples, it is easy to regenerate overfitting during training. Therefore, a principal component analysis algorithm is required to reduce the dimensions of the features and select K main features as the input of the data set. The main idea of PCA [

This paper use formula (

Through scikit-learn processing, the reduction rate after dimensionality reduction is obtained from the PCA model diagram is shown in Figure

Relationship between reduction rate and K characteristics

Relationship between reduction rate and K characteristics

Supervised learning is the most widely used branch of machine learning in industry. Classification and regression are the main methods in supervised learning. Linear classifier is the most basic and commonly used machine learning model. This paper uses linear logistic regression to classify and recognize faces.

The prediction function of linear regression is:

Where is the prediction function, x is the feature vector, To handle the classification problem, this paper hope that the value of the function is [0,1], This paper introduce the Sigmoid function:

Combined with linear regression prediction function:

If there is a simple binary classification of class A or class B, then this paper can use Sigmoid as the probability density function of the sample, and the result of the input classification can be expressed by probability:

The cost function is a function that describes the difference between the predicted value and the true value of the model. If there are multiple data samples, the average value of the replacement price function is obtained. It is expressed by J(θ). Close to, based on the maximum likelihood estimate available cost function J (θ).

Linear Discriminant Analysis (LDA)^{[5, 6]}, also known as Fisher Linear Discriminant (FLD), was introduced into the field of machine learning by Belhumeur. LDA is a dimensionality reduction technique in supervised learning, which can not only reduce dimensionality but also classify, and mainly project data features from high latitude to low latitude space. The core idea is that after projection, the projection points of the same category of data should be as close as possible, and the distance between the category centers of different categories of data should be increased as much as possible [

Then within-class scatter matrix (within-class scatter matrix):

Between-class scatter matrix:

Among N training samples, find the k nearest neighbors of the test sample x. Suppose there are m training samples in the data set, and there are c categories, namely _{i} in k neighbors of x are _{1}, k_{2}, … kc

The core idea of K-nearest neighbor algorithm ^{[7]} is to calculate the distance between unlabeled data samples and each sample in the data set, take the K nearest samples, and then K neighbors vote to decide the type of unlabeled samples.

KNN classification steps:

Assuming that x_test is an unlabeled sample and x_train is a labeled sample, the algorithm is as follows:

K nearest neighbor algorithm

For distance measurement, Euclidean distance, Manhattan distance, Chebyshev distance, etc. are usually used. Generally, Euclidean distance is mostly used, such as the distance between two points and two points in N-dimensional Euclidean space.

Support Vector Machine (SVM)[

Classification model diagram

Let this plane be represented by g(x)=0, its normal vector is represented by w, the actual distance between a point and the plane is r point, and the distance from the plane can be measured by the absolute value of g(x) (called the function interval).

The penalty function is also called the penalty function, which is a kind of restriction function. For constrained nonlinear programming, its constraint function is called a penalty function, and the coefficient is called a penalty factor. The objective function of SVM (soft interval support vector machine) with penalty factor C is:

Introduction of C classification model diagram

Advantages of SVM:

Naive Bayes [

Therefor C_{k} ∈ [C_{1}, C_{2} …, C_{m}], this paper only require the probabilities of m categories, the category belongs to the largest value, and use Bayes’ theorem to solve:

And because x has n feature vectors, it is in the determined data set, C_{k}, P(x) are fixed values, according to the joint probability formula:

The data set in this paper uses the ORL face data set of Cambridge University, a total of 400 photos. After experimental analysis, this paper chose PCA to reduce the information rate after dimensionality reduction to 98%, and select 195 as the main features of each picture as data input. In order to make the experimental results more generalized, 80% of the data set is used as the training set and 20% is used as the test set. The 10-fold cross-validation method is used during model training. In order to ensure that the experimental results run the same data every time, a fixed random seed is set Is 7. For the test standard, this paper selected the accuracy rate (P), recall rate (R), F value, and AUC area. Compared with our prediction results, the accuracy rate indicates how many true positive samples are in the positive prediction samples. There are two possibilities for the prediction results, that is, the positive prediction is positive (TP), and the negative prediction is positive (FP), the formula is:

The recall rate refers to our original sample, indicating how many positive examples in the sample were predicted correctly. There are also two possibilities, that is, the original positive class is predicted as a positive class (TP), and the other is to predict the original positive class as a negative class (FN).

F1 combines the results of precision rate and recall rate. When F1 is higher, it means that the verification method is more effective.

THE COMPARISON OF EXPERIMENTAL RESULTS OF FIVE CLASSIFICATION ALGORITHMS

PCA+LR | PCA+LDA | PCA+SVM | PCA+KNN | PCA+NB | |
---|---|---|---|---|---|

Precision (%) | 99 | 98 | 94 | 59 | 91 |

Recall (%) | 99 | 96 | 91 | 45 | 85 |

Fl-score (%) | 99 | 96 | 91 | 44 | 85 |

Face recognition technology [