The latest news, announcements and technical backgrounds on the Core Engine.
June 26th, 2020 - Hamza Tahir
Just a few days ago, I was able to share my thoughts on the state of Machine Learning in production, and why it’s (still) broken, on the MLOps World 2020. Read on for a writeup of my presentation, or checkout the recording of the talk on Youtube.
June 11th, 2020 - Hamza Tahir
One attempt to ensure that ML models generalize in unknown settings is splitting data. This can be done in many ways,
from 3-way (train, test, eval)
splits to k-splits with cross-validation. The underlying reasoning is that by training a ML model
on a subset of the data, and evaluating on
unknown data, one can reason much better if the model has underfit or overfit in training.
June 6th, 2020 - Hamza Tahir
Okay, lets make it clear at the start: This post is NOT intended for people who are doing one-off, silo-ed projects like participating in Kaggle competitions, or doing hobby projects on jupyter notebooks to learn the trade. The value of throw-away, quick, diry script code is obvious there - and has its place. Rather, it is intended for ML practitioners working in a production setting. So if you’re working in a ML team that is struggling to manage technical debt while pumping out ML models, this ones for you.
May 17th, 2020 - Benedikt Koller
No way around it: I am what you call an “Ops guy”. In my career I admin’ed more servers than I’ve written code. Over twelve years in the industry have left their permanent mark on me. For the last two of those I’m exposed to a new beast - Machine Learning. My hustle is bringing Ops-Knowledge to ML. These are my thoughts on that.
May 7th, 2020 - Baris Can Durak
In the last decade, machine learning applications have proven their capabilities and potential in various applications. Especially in the past few years, they have gained rapid prominence in the gaming industry and now there are countless projects creating an endless array of models interacting with different games.
May 4th, 2020 - Hamza Tahir
Over the last few years at maiot, we have regularly dealt with datasets that contain millions of data points. Today, I want to write about how the we use our machine learning platform, the Core Engine, to build production-ready distributed training pipelines. These pipelines are capable of dealing with millions of datapoints in a matter of hours. If you also want to build large-scale deep learning pipelines, sign up for the Core Engine for free here and follow along.
May 1st, 2020 - Hamza Tahir
Around 87% of machine learning projects do not survive to make it to production. There is a disconnect between machine learning being done in Jupyter notebooks on local machines and actually being served to end-users to provide some actual value.
February 27, 2020 - Hamza Tahir - Crossposted on Tensorflow Blog
Principal Component Analysis (PCA) is a dimensionality reduction technique, useful in many different machine learning scenarios. In essence, PCA reduces the dimension of input vectors in a way that retains the maximal variance in your dataset. Reducing the dimensionality of the model input can increase the performance of the model, reduce the size and resources required for training, and decrease non-random noise.