Machine Learning Engineer, Liftoff Mobile
Yunshi Zhao is a Machine Learning Engineer at Liftoff, a mobile app optimization platform for marketing and monetizing apps at scale. Her responsibilities range from researching and training models to deployment and monitoring models in production. She is also part of the diversity, equity, and inclusion (DEI) committee at Liftoff, focusing on representation in engineering. Before transitioning to startup life, she worked as a data scientist and aerospace engineer. Here, she talks about machine learning development, best practices, use cases, and ML in production.
What is Liftoff’s mission as a company and what made you want to join the team?
Liftoff Mobile is a technology and programmatic advertising company. The organization has a lot of products in different areas of the advertising technology ecosystem, especially in the wake of our merger with Vungle last year. But the main mission is to help mobile apps grow and monetize.
I really like the vertically integrated system where you get to do everything along the model lifecycle. At most companies, you’re hired to do data science and model development. But then you hand that off to a different engineer to deploy it. At Liftoff, the ML group does it all and that was really appealing to me.
How did you train for this role and what are your tips for anyone interested in transitioning into AI?
Luckily, my previous job in aerospace engineering used a lot of the same math, so I would say anyone with a strong math background would have an easier time making the transition to being a machine learning engineer (MLE). For the programming part, there are so many online resources to help ramp up on the software and there’s also such a big community of people you can ask for help. If you don’t have the math background, you can always start with something that’s not as heavy on the programming part. Data science and data analytics are good starting points and then you can slowly work your way up to MLE. I think of this progression as a video game, where you advance through all the different levels.
What vertical are you focused on at Liftoff and what does your day-to-day look like?
I work on the demand-side platform (DSP), which is a system that helps advertisers buy the right ad for the right price. Our team’s main job is to build conversion models and predict the probability of conversion in one of the down-funnel events. My day-to-day job really depends on whatever project I’m working on, but it usually involves kicking off model experiments. Sometimes before the model kickoff I will also work on our code base to update our model. I also produce code to update how we train the model and make code changes in the bidding path for the part of code we use in the model to bid on an ad. Liftoff has a strong documentation culture, so I do a lot of writing as well for any ideas I want to propose or thought experiments I want to share. I also meet with other teams to better understand the business metrics and how our model should behave in that business context.
Scalability is an important part of infrastructure, especially with your use case in advertising technology. What are some things to keep in mind for the scalability of data?
Our Kafka processes two gigabytes of data per second, which is a lot of data. Much of our system is built around knowing the data we need to process, and it’s a challenge to do feature analysis mainly because a lot of our system is built in-house and they have a narrowed use case. It worked really well for the original case we built it for and they made sure that everything was really fast—the position is fast and then the continued training is fast, but then we have a challenge with feature analysis. Since we have a large data set, it’s not easy to natively do any feature analysis like you might for other use cases. It’s definitely something we always talk about when we decide on any system in our company.
Speaking about systems and integrations at Liftoff, each company goes through build versus buy evaluations, seeing what they can do in-house versus what is worth outsourcing. What are some of your current or future ML systems and infrastructure tools?
Most of the product we’re using right now is built in-house because the company wanted to move fast. A lot of our systems are really lean and were built for a specific use case. For example, we have an experiment tracking tool that you can go on and see some of the matches of each performance. It’s really simple and can’t really do a lot of fancy things that experiment tracking tools in the market right now can do, but it does the job.
Right now we do have a push to try and move towards a more standardized tooling because expansion can be a bit of a pain point. Before, I think our ML was focused more on the conversion models, but now we have so many other ML applications. For example, pacing the budget and market price. But then every time we try to build a new model, because of the narrow cases, it’s a bit hard. It’s also really hard to onboard people to use an in-house product or narrow cases. So because of that, we’re also investigating the other tools that might be more flexible and will apply to other ML applications in our company.
Do you have any favorite tools in your tech stack or things that make life a little easier for you as a machine learning engineer?
I really like Trino. It’s simple and I can investigate data quickly. Our data set is large, so if we want to do any data analysis on the impression level, it’s really slow, but our product analytics team made a daily, hourly analytics table that puts raw data into certain dimensions that we care about. It’s nothing fancy, but I like it a lot because it’s really easy to look at data without waiting forever for a query to run.
What are some best practices in the ML model lifecycle in terms of model training, development, and experimentation?
For training, I think having a good protocol is important. Whenever we experiment at Liftoff, we write a report with the whole protocol so everyone knows exactly what we’re doing and the system we build also ensures reproducibility. Also, the experiment tracking and sharing that I mentioned before is an important tool.
In terms of models in production, I would say that it depends on the type of application. For us, the model freshness is important, so we have to ensure that we build such a system that we can continuously train the model and deploy new models. But if we automate it, then we also want to ensure some type of safety, so we build a system to have automated safety checks to ensure that we don’t have any bad models.
Another best practice for model experimentation when you decide to roll out a model is to not just look at the aggregate, because sometimes when we say that a model is better on the aggregate level, there’s actually so much more to look at. For example, because our model is being used by so many campaigns, it’s always good to see the impact distribution for all campaigns.
What’s important to think once models are live in production and their impact on people in the real world?
For models in production, we have dashboard data that we use to keep track of metrics and ensure that the model in production is healthy. Because Liftoff is a fairly large company, there are teams that help us monitor campaign health. They’re more on the front lines and can help us understand if the model is performing well. We take precautions in the testing phase as well. Whenever we develop a model, we do an A/B test. And when we do roll out, we have a rigorous rollout plan with the MLEs, the teams who manage the campaign, the technical product manager, and also customer-facing teams. We plan it out like this and test carefully so that when we’re in production, hopefully we don’t see any big surprises.
In the ad tech space, you’re getting feedback on your models pretty fast. With the majority of your use cases, do you get the ground truths back quickly, or are a lot of the models a delayed ground truth, where you need to look for drift in production?
Some events are pretty quick. For example, installs are usually pretty fast but purchases are usually slower. So we do have some attribution delay and we do have some techniques to correct that in our model training. We do get ground truths pretty quickly, but I like to put quotations around “ground truths” because most machine learning models have a feedback loop issue, and I think in our case it’s probably worse because the way our model behaves actually affects what traffic we buy. So there’s always a bias in the sample we see. So yes, we do have ground truths, but we don’t always know if that’s the ground truth of the whole population or just for the sample we get.
Can you share your thoughts about diversity in engineering and what signals whether or not a company is doing a good job with this?
I’d say it’s quite hard to find a diverse engineering group because, unfortunately, colleges aren’t really that diverse. The Liftoff engineering team is open to trying and actively making it better. The key is having someone take a more active role in helping the company identify things they can change. It’s important to speak up, and you know you have a good team when they listen to your feedback, whether it’s negative or positive, and then take some concrete action. It’s exciting to be part of the solution.