A Guide to Python Machine Learning and Data Science Frameworks
A Beginner’s Guide to Python Machine Learning Frameworks
All libraries below are free, and most are opensource.
Table of contents:
Learn to build AI apps now »
Machine Learning
General purpouse Machine Learning
 scikitlearn  machine learning in Python
 Shogun  machine learning toolbox
 xLearn  High Performance, Easytouse, and Scalable Machine Learning Package
 Reproducible Experiment Platform (REP)  Machine Learning toolbox for Humans
 modAL  a modular active learning framework for Python3
 Sparkitlearn  PySpark + Scikitlearn = Sparkitlearn
 mlpack  a scalable C++ machine learning library (Python bindings)
 dlib  A toolkit for making real world machine learning and data analysis applications in C++ (Python bindings)
 MLxtend  extension and helper modules for Python’s data analysis and machine learning libraries
 tick  module for statistical learning, with a particular emphasis on timedependent modelling
 sklearnextensions  a consolidated package of small extensions to scikitlearn
 civismlextensions  scikitlearncompatible estimators from Civis Analytics
 scikitmultilearn  multilabel classification for python
 tslearn  machine learning toolkit dedicated to timeseries data
 seqlearn  seqlearn is a sequence classification toolkit for Python
 pystruct  Simple structured learning framework for python
 sklearnexpertsys  Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
 skutil  A set of scikitlearn and h2o extension classes (as well as caret classes for python)
 sklearncrfsuite  scikitlearn inspired API for CRFsuite
 RuleFit  implementation of the rulefit
 metriclearn  metric learning algorithms in Python
 pyGAM  Generalized Additive Models in Python
 luminol  Anomaly Detection and Correlation library
Automated machine learning
 TPOT  Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
 autosklearn  is an automated machine learning toolkit and a dropin replacement for a scikitlearn estimator
 MLBox  a powerful Automated Machine Learning python library.
Ensemble methods
 MLEnsemble  high performance ensemble learning
 brew  Python Ensemble Learning API
 Stacking  Simple and useful stacking library, written in Python.
 stacked_generalization  library for machine learning stacking generalization.
 vecstack  Python package for stacking (machine learning technique)
Imbalanced datasets
 imbalancedlearn  module to perform under sampling and over sampling with various techniques
 imbalancedalgorithms  Pythonbased implementations of algorithms for learning on imbalanced data.
Random Forests
Extreme Learning Machine
 PythonELM  Extreme Learning Machine implementation in Python
 Python Extreme Learning Machine (ELM)  a machine learning technique used for classification/regression tasks
 hpelm ![alt text][gpu]  High performance implementation of Extreme Learning Machines (fast randomized neural networks).
Kernel methods
 pyFM  Factorization machines in python
 fastFM  a library for Factorization Machines
 tffm  TensorFlow implementation of an arbitrary order Factorization Machine
 liquidSVM  an implementation of SVMs
 scikitrvm  Relevance Vector Machine implementation using the scikitlearn API
Gradient boosting
 XGBoost ![alt text][gpu]  Scalable, Portable and Distributed Gradient Boosting
 LightGBM ![alt text][gpu]  a fast, distributed, high performance gradient boosting by Microsoft
 CatBoost ![alt text][gpu]  an opensource gradient boosting on decision trees library by Yandex
 InfiniteBoost  building infinite ensembles with gradient descent
 TGBoost  Tiny Gradient Boosting Tree
Deep Learning
Keras
 Keras  a highlevel neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
 kerascontrib  Keras community contributions
 Hyperas  Keras + Hyperopt: A very simple wrapper for convenient hyperparameter
 Elephas  Distributed Deep learning with Keras & Spark
 Hera  Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
 distkeras  Distributed Deep Learning, with a focus on distributed training
 Conx  The OnRamp to Deep Learning
 Keras addons
PyTorch
 PyTorch  Tensors and Dynamic neural networks in Python with strong GPU acceleration
 torchvision  Datasets, Transforms and Models specific to Computer Vision
 torchtext  Data loaders and abstractions for text and NLP
 torchaudio  an audio library for PyTorch
 ignite  highlevel library to help with training neural networks in PyTorch
 PyToune  a Keraslike framework and utilities for PyTorch
 skorch  a scikitlearn compatible neural network library that wraps pytorch
 PyTorchNet  an abstraction to train neural networks
 Aorun  intend to implement an API similar to Keras with PyTorch as backend.
 pytorch_geometric  Geometric Deep Learning Extension Library for PyTorch
Tensorflow
 TensorFlow  Computation using data flow graphs for scalable machine learning by Google
 TensorLayer  Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
 TFLearn  Deep learning library featuring a higherlevel API for TensorFlow
 Sonnet  TensorFlowbased neural network library by DeepMind
 TensorForce  a TensorFlow library for applied reinforcement learning
 tensorpack  a Neural Net Training Interface on TensorFlow
 Polyaxon  a platform that helps you build, manage and monitor deep learning models
 Horovod  Distributed training framework for TensorFlow
 tfdeploy  Deploy tensorflow graphs for fast evaluation and export to tensorflowless environments running numpy
 hiptensorflow ![alt text][amd]  ROCm/HIP enabled Tensorflow
 TensorFlow Fold  Deep learning with dynamic computation graphs in TensorFlow
 tensorlm  wrapper library for text generation / language models at char and word level with RNN
 TensorLight  a highlevel framework for TensorFlow
 Mesh TensorFlow  Model Parallelism Made Easier
Theano
Warning: Theano development has ceased
 Theano  is a Python library that allows you to define, optimize, and evaluate mathematical expressions
 Lasagne  Lightweight library to build and train neural networks in Theano Lasagne addons…
 nolearn  scikitlearn compatible neural network library (mainly for Lasagne)
 Blocks  a Theano framework for building and training neural networks
 platoon  MultiGPU miniframework for Theano
 NeuPy  NeuPy is a Python library for Artificial Neural Networks and Deep Learning
 scikitneuralnetwork  Deep neural networks without the learning cliff
 TheanoMPI  MPI Parallel framework for training deep learning models built in Theano
MXNet
 MXNet  Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutationaware Dataflow Dep Scheduler
 Gluon  a clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
 MXbox  simple, efficient and flexible vision toolbox for mxnet framework.
 gluoncv  provides implementations of the stateoftheart deep learning models in computer vision.
 gluonnlp  NLP made easy
 MXNet ![alt text][amd]  HIP Port of MXNet
Caffe
 Caffe  a fast open framework for deep learning
 Caffe2  a lightweight, modular, and scalable deep learning framework
 hipCaffe ![alt text][amd]  the HIP port of Caffe
CNTK
 CNTK  Microsoft Cognitive Toolkit (CNTK), an open source deeplearning toolkit
Chainer
 Chainer  a flexible framework for neural networks
 ChainerRL  a deep reinforcement learning library built on top of Chainer.
 ChainerCV  a Library for Deep Learning in Computer Vision
 ChainerMN  scalable distributed deep learning with Chainer
 scikitchainer  scikitlearn like interface to chainer
 chainer_sklearn  Sklearn (Scikitlearn) like interface for Chainer
Others
 SKIL Skymind’s platform for distributed training of machine learning models, tracking machine learning experiments, deploying models to production and managing them over their lifecycle.
 Neon  Intel Nervana™ reference deep learning framework committed to best performance on all hardware
 Tangent  SourcetoSource Debuggable Derivatives in Pure Python
 autograd  Efficiently computes derivatives of numpy code
 Myia  deep learning framework (prealpha)
 nnabla  Neural Network Libraries by Sony
Model explanation
 Auralisation  auralisation of learned features in CNN (for audio)
 CapsNetVisualization  a visualization of the CapsNet layers to better understand how it works
 lucid  a collection of infrastructure and tools for research in neural network interpretability.
 Netron  visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
 FlashLight  visualization Tool for your NeuralNetwork
 tensorboardpytorch  tensorboard for pytorch (and chainer, mxnet, numpy, …)
 anchor  code for “HighPrecision ModelAgnostic Explanations” paper
 aequitas  Bias and Fairness Audit Toolkit
 Contrastive Explanation  Contrastive Explanation (Foil Trees)
 yellowbrick  visual analysis and diagnostic tools to facilitate machine learning model selection
 scikitplot  an intuitive library to add plotting functionality to scikitlearn objects
 shap  a unified approach to explain the output of any machine learning model
 ELI5  a library for debugging/inspecting machine learning classifiers and explaining their predictions
 Lime  Explaining the predictions of any machine learning classifier
 FairML  FairML is a python toolbox auditing the machine learning models for bias
 L2X  Code for replicating the experiments in the paper Learning to Explain: An InformationTheoretic Perspective on Model Interpretation
 PDPbox  partial dependence plot toolbox
 pyBreakDown  Python implementation of R package breakDown
 PyCEbox  Python Individual Conditional Expectation Plot Toolbox
 Skater  Python Library for Model Interpretation
 tensorflow/modelanalysis  Model analysis tools for TensorFlow
 themisml  a library that implements fairnessaware machine learning algorithms
 treeinterpreter [alt text][skl] interpreting scikitlearn’s decision tree and random forest predictions
Reinforcement Learning
 OpenAI Gym  a toolkit for developing and comparing reinforcement learning algorithms.
Distributed computing systems
 PySpark  exposes the Spark programming model to Python
 Veles  Distributed machine learning platform by Samsung
 Jubatus  Framework and Library for Distributed Online Machine Learning
 DMTK  Microsoft Distributed Machine Learning Toolkit
 PaddlePaddle  PArallel Distributed Deep LEarning by Baidu
 daskml  Distributed and parallel machine learning
 Distributed  Distributed computation in Python
Probabilistic methods
 pomegranate ![alt text][cp]  probabilistic and graphical models for Python
 pyro  a flexible, scalable deep probabilistic programming library built on PyTorch.
 ZhuSuan  Bayesian Deep Learning
 PyMC  Bayesian Stochastic Modelling in Python
 PyMC3  Python package for Bayesian statistical modeling and Probabilistic Machine Learning
 sampled  Decorator for reusable models in PyMC3
 Edward  A library for probabilistic modeling, inference, and criticism.
 InferPy  Deep Probabilistic Modelling Made Easy
 GPflow  Gaussian processes in TensorFlow
 PyStan  Bayesian inference using the NoUTurn sampler (Python interface)
 gelato  Bayesian dessert for Lasagne
 sklearnbayes  Python package for Bayesian Machine Learning with scikitlearn API
 bayesloop  Probabilistic programming framework that facilitates objective model selection for timevarying parameter models
 PyFlux  Open source time series library for Python
 skggm  estimation of general graphical models
 pgmpy  a python library for working with Probabilistic Graphical Models.
 skpro  supervised domainagnostic prediction framework for probabilistic modelling by The Alan Turing Institute
 Aboleth  a barebones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
 PtStat  Probabilistic Programming and Statistical Inference in PyTorch
 PyVarInf  Bayesian Deep Learning methods with Variational Inference for PyTorch
 emcee  The Python ensemble sampling toolkit for affineinvariant MCMC
 hsmmlearn  a library for hidden semiMarkov models with explicit durations
 pyhsmm  bayesian inference in HSMMs and HMMs
 GPyTorch  a highly efficient and modular implementation of Gaussian Processes in PyTorch
 Bayes  Python implementations of Naive Bayes algorithm variants
Genetic Programming
 gplearn  Genetic Programming in Python
 DEAP  Distributed Evolutionary Algorithms in Python
 karoo_gp  A Genetic Programming platform for Python with GPU support
 monkeys  A stronglytyped genetic programming framework for Python
 sklearngenetic  Genetic feature selection module for scikitlearn
Optimization
 Spearmint  Bayesian optimization
 SMAC3  Sequential Modelbased Algorithm Configuration
 Optunity  is a library containing various optimizers for hyperparameter tuning.
 hyperopt  Distributed Asynchronous Hyperparameter Optimization in Python
 hyperoptsklearn  hyperparameter optimization for sklearn
 sklearndeap  use evolutionary algorithms instead of gridsearch in scikitlearn
 sigopt_sklearn  SigOpt wrappers for scikitlearn methods
 Bayesian Optimization  A Python implementation of global optimization with gaussian processes.
 SafeOpt  Safe Bayesian Optimization
 scikitoptimize  Sequential modelbased optimization with a
scipy.optimize
interface
 Solid  A comprehensive gradientfree optimization framework written in Python
 PySwarms  A research toolkit for particle swarm optimization in Python
 Platypus  A Free and Open Source Python Library for Multiobjective Optimization
 GPflowOpt  Bayesian Optimization using GPflow
 POT  Python Optimal Transport library
 Talos  Hyperparameter Optimization for Keras Models
Natural Language Processing
 NLTK  modules, data sets, and tutorials supporting research and development in Natural Language Processing
 CLTK  The Classical Language Toolkik
 gensim  Topic Modelling for Humans
 PSIToolkit  a natural language processing toolkit by Adam Mickiewicz University in Poznań
 pyMorfologik  Python binding for Morfologik (Polish morphological analyzer)
 skift  scikitlearn wrappers for Python fastText.
 Phonemizer  Simple text to phonemes converter for multiple languages
Computer Audition
 librosa  Python library for audio and music analysis
 Yaafe  Audio features extraction
 aubio  a library for audio and music analysis
 Essentia  library for audio and music analysis, description and synthesis
 LibXtract  is a simple, portable, lightweight library of audio feature extraction functions
 Marsyas  Music Analysis, Retrieval and Synthesis for Audio Signals
 muda  a library for augmenting annotated audio data
 madmom  Python audio and music signal processing library
Computer Vision
 OpenCV  Open Source Computer Vision Library
 scikitimage  Image Processing SciKit (Toolbox for SciPy)
 imgaug  image augmentation for machine learning experiments
 imgaug_extension  additional augmentations for imgaug
 Augmentor  Image augmentation library in Python for machine learning
 albumentations  fast image augmentation library and easy to use wrapper around other libraries
Feature engineering
 Featuretools  automated feature engineering
 scikitfeature  feature selection repository in python
 sklgroups  scikitlearn addon to operate on set/”group”based features
 Feature Forge  a set of tools for creating and testing machine learning feature
 boruta_py  implementations of the Boruta allrelevant feature selection method
 BoostARoota  a fast xgboost feature selection algorithm
 few  a feature engineering wrapper for sklearn
 scikitrebate  a scikitlearncompatible Python implementation of ReBATE, a suite of Reliefbased feature selection algorithms for Machine Learning
 scikitmdr  a sklearncompatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
 tsfresh  Automatic extraction of relevant features from time series
Data manipulation & pipelines
 pandas  powerful Python data analysis toolkit
 sklearnpandas  Pandas integration with sklearn
 alexander  wrapper that aims to make scikitlearn fully compatible with pandas
 blaze  NumPy and Pandas interface to Big Data
 pandasql  allows you to query pandas DataFrames using SQL syntax
 pandasgbq  Pandas Google Big Query
 xpandas  universal 1d/2d data containers with Transformers functionality for data analysis by The Alan Turing Institute
 Fuel  data pipeline framework for machine learning
 Arctic  high performance datastore for time series and tick data
 pdpipe  sasy pipelines for pandas DataFrames.

SSPipe  Python pipe ( 
) operator with support for DataFrames and Numpy and Pytorch 
 meza  a Python toolkit for processing tabular data
 pandasply  functional data manipulation for pandas
 Dplython  Dplyr for Python
 pysparkling  a pure Python implementation of Apache Spark’s RDD and DStream interfaces
 quinn  pyspark methods to enhance developer productivity
 Dataset  helps you conveniently work with random or sequential batches of your data and define data processing
 swifter  a package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Statistics
 statsmodels  statistical modeling and econometrics in Python
 stockstats  Supply a wrapper
StockDataFrame
based on the pandas.DataFrame
with inline stock statistics/indicators support.
 simplestatistics  simple statistical functions implemented in readable Python.
 weightedcalcs  pandasbased utility to calculate weighted means, medians, distributions, standard deviations, and more
 scikitposthocs  Pairwise Multiple Comparisons Posthoc Tests
 pysie  provides python implementation of statistical inference engine
 Sacred  a tool to help you configure, organize, log and reproduce experiments by IDSIA
 Xcessiv  a webbased application for quick, scalable, and automated hyperparameter tuning and stacked ensembling
 Persimmon  A visual dataflow programming language for sklearn
Visualization
 Matplotlib  plotting with Python
 seaborn  statistical data visualization using matplotlib
 Bokeh  Interactive Web Plotting for Python
 HoloViews  stop plotting your data  annotate your data and let it visualize itself
 Alphalens  performance analysis of predictive (alpha) stock factors by Quantopian
 pythonternary  ternary plotting library for python with matplotlib
 Naarad  framework for performance analysis & rating of sharded & stateful services.
Evaluation
Computations
 numpy  the fundamental package needed for scientific computing with Python.
 Dask  parallel computing with task scheduling
 bottleneck  Fast NumPy array functions written in C
 minpy  NumPy interface with mixed backend execution
 CuPy  NumPylike API accelerated with CUDA
 scikittensor  Python library for multilinear algebra and tensor factorizations
 numdifftools  solve automatic numerical differentiation problems in one or more variables
 quaternion  Add builtin support for quaternions to numpy
 adaptive  Tools for adaptive and parallel samping of mathematical functions
Spatial analysis
 GeoPandas  Python tools for geographic data
 PySal  Python Spatial Analysis Library
Quantum Computing
 QML  a Python Toolkit for Quantum Machine Learning
Conversion
 sklearnporter  transpile trained scikitlearn estimators to C, Java, JavaScript and others
 ONNX  Open Neural Network Exchange
 MMdnn  a set of tools to help users interoperate among different deep learning frameworks.
See Also