Skip to content Skip to sidebar Skip to footer

Increase Speed For Svm With Polynomial Kernel

I am new to machine learning. I am using Support Vector Machines (SVM) with 'polynomial' kernel for multi-class classification. My dataset size is (56010395, 4) in the form of (no

Solution 1:

SVM has a training time that scales quadratically with the number of samples, or worse. For O(n^2) the time is proportional to c * n^2). Your model configuration takes about 20 seconds with 100k features on my machine, giving a constant around c=2e9. So the expected training time for 56 010 395 samples is 72 days, probably significantly more.

So either subsample your dataset, or use another classifier. You can use a small Multilayer Perceptron to get an expressiveness similar to a SVM with polynomial kerne. It can be trained with mini-batches using SGD. Using Hinge loss is the same kind of loss as SVM uses.

Btw, you basically always need to optimize the hyperparameter C for SVM. The best practice way is to do 5-fold cross validation in a gridsearch. So you should plan to train at least 50 models...

import time

from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import numpy
import pandas

deftime_training(estimator, n_samples):
    X, y = make_moons(n_samples=n_samples, noise=0.1, random_state=1)
    X = numpy.concatenate([X, X], axis=1)
    assert (X.shape[1] == 4), X.shape

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

    start = time. time()
    estimator.fit(X_train, y_train)
    end = time.time()
    t = end-start
    print('took', n_samples, t)
    return t

defmain():
    model = SVC(kernel='poly', degree=3, C=1.0, gamma = 'auto')
    sizes = numpy.array((100, 1e3, 1e4, 2e4, 4e4, 6e4, 1e5, 1.1e5, 1.2e5)).astype(int)
    times = [ time_training(model, s) for s in sizes ]

    df = pandas.DataFrame({
        'samples': sizes,
        'time': times,
    })
    df.to_csv('temp/svmtrain.csv')

if __name__ == '__main__':
    main()

[jon@jon-thinkpad ~]$ python3 temp/svm-training-time.py
took 1000.0006172657012939453
took 10000.00444340705871582
took 100000.26808977127075195
took 200001.1068146228790283
took 400003.8822362422943115
took 600008.051671743392944
took 10000020.05191993713379
took 11000036.83517003059387
took 12000061.012284994125366
>>>0.26/(10000**2)
2.6e-09
>>>20/(100000**2)
2e-09
>>>2e-9*(56e6**2)/(3600*24)
72.5925925925926

Post a Comment for "Increase Speed For Svm With Polynomial Kernel"