Skip to content
/ mlspark Public

Machine learning algorithms implemented in Scala on Spark

License

Notifications You must be signed in to change notification settings

cdgore/mlspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Spark

Build Status

Machine learning algorithms implemented in Scala on Spark

Currently 4 models are included:

  • Gaussian Naive Bayes – Naive Bayes classifier for continuous features. Assumes likelihoods follow Gaussian distribution P(x_i | y) = (1/sqrt(2 * pi * sigma_y^2)) * exp(-((x_i - mu_y)^2)/2 * pi * sigma_y^2). The posterior distribution for each class is estimated by summing the exponential of all likelihoods and for a given class and class prior probability.
  • K Means – Performs k-means clustering on data samples labeled by class. The distance function distMeasure may be specified as either euclidean (default) or cosine. Distance functions are passed internally as partially defined functions for extensibility. Both the means and standard deviations are calculated and recorded for each cluster - useful for generating radial basis functions based on distance from clusters.
  • Logistic Regression – Binary logistic regression classifier with L2 normalization. Loss function is minimized with gradient descent
  • Softmax Logistic Regression – Multi-class logistic regression with optional regularizations: L1, L1 (with clipping), L2, none (default). Regularization gradient update functions are specified and passed as partials for extensibility.

About

Machine learning algorithms implemented in Scala on Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages