MMLSpark - Microsoft Machine Learning for Apache Spark
MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.
Its features include :
Its features include :
- Easily ingest images from HDFS into Spark
- Pre-process image data using transforms from OpenCV
- Featurize images using pre-trained deep neural nets using CNTK
- Use pre-trained bidirectional LSTMs from Keras for medical entity extraction
- Train DNN-based image classification models on N-Series GPU VMs on Azure
- Train classification and regression models easily via implicit featurization of data
- Compute a rich set of evaluation metrics including per-instance metrics
https://github.com/Azure/mmlspark
License:
Tech:
Tags: