Increase Speed For Svm With Polynomial Kernel
Solution 1:
SVM has a training time that scales quadratically with the number of samples, or worse. For O(n^2) the time is proportional to c * n^2).
Your model configuration takes about 20 seconds with 100k features on my machine, giving a constant around c=2e9
. So the expected training time for 56 010 395
samples is 72 days, probably significantly more.
So either subsample your dataset, or use another classifier. You can use a small Multilayer Perceptron to get an expressiveness similar to a SVM with polynomial kerne. It can be trained with mini-batches using SGD. Using Hinge loss is the same kind of loss as SVM uses.
Btw, you basically always need to optimize the hyperparameter C
for SVM. The best practice way is to do 5-fold cross validation in a gridsearch. So you should plan to train at least 50 models...
import time
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import numpy
import pandas
deftime_training(estimator, n_samples):
X, y = make_moons(n_samples=n_samples, noise=0.1, random_state=1)
X = numpy.concatenate([X, X], axis=1)
assert (X.shape[1] == 4), X.shape
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
start = time. time()
estimator.fit(X_train, y_train)
end = time.time()
t = end-start
print('took', n_samples, t)
return t
defmain():
model = SVC(kernel='poly', degree=3, C=1.0, gamma = 'auto')
sizes = numpy.array((100, 1e3, 1e4, 2e4, 4e4, 6e4, 1e5, 1.1e5, 1.2e5)).astype(int)
times = [ time_training(model, s) for s in sizes ]
df = pandas.DataFrame({
'samples': sizes,
'time': times,
})
df.to_csv('temp/svmtrain.csv')
if __name__ == '__main__':
main()
[jon@jon-thinkpad ~]$ python3 temp/svm-training-time.py
took 1000.0006172657012939453
took 10000.00444340705871582
took 100000.26808977127075195
took 200001.1068146228790283
took 400003.8822362422943115
took 600008.051671743392944
took 10000020.05191993713379
took 11000036.83517003059387
took 12000061.012284994125366
>>>0.26/(10000**2)
2.6e-09
>>>20/(100000**2)
2e-09
>>>2e-9*(56e6**2)/(3600*24)
72.5925925925926
Post a Comment for "Increase Speed For Svm With Polynomial Kernel"