Financially Responsible Feature Engineering

April 22, 1:14 pm - 1:24 pm (10 Minutes)

Add to Calendar

Anyone who has tried doing machine learning at scale knows it can get expensive. The costs associated with training models using on-demand compute and storing features in low latency databases can quickly get out of hand, and we’re often forced to make hard decisions on what to include in a model in order to keep it financially viable.

Not all features are created equal – they can vary widely in their relevance to a particular use case, and how much they cost to productionise. As such, data scientists need to be able to make intelligent decisions on feature cost versus impact with sometimes incomplete information.

Drawing on experience from productionising machine learning at scale at Atlassian, this talk will explore how to better make these decisions, including:

– An exploration of the potential costs involved when productionising features
– How to estimate the cost of a feature before productionising it
– What tradeoffs can be made
– A technique for factoring in feature cost to the performance of a model during training and feature selection

Joshua Hansen

ML Platform Tech Lead

Atlassian

Joshua is a web developer-turned ML infrastructure engineer and tech lead on Atlassian’s Machine Learning Platform team. He’s been working on building out Atlassian’s ML infrastructure for the last 3 years, and prior to that worked on various products, including Jira Service Management.”

Geoff Sims

Principal Data Scientist

Atlassian

Geoff is a Principal Data Scientist at Atlassian, the software company behind Jira, Confluence & Trello. He works with the product teams and focuses on delivering smarter in-product experiences and recommendations to our millions of active users by using machine learning at scale. Prior to this, he was in the Customer Support & Success division, leveraging a range of NLP techniques to automate and scale the support function.

Prior to Atlassian, Geoff has applied data science methodologies across the retail, banking, media, and renewable energy industries. He began his foray into data science as a research astrophysicist, where he studied astronomy from the coldest & driest location on Earth: Antarctica.