Live Coding Interviews in the Era of LLMs

JB Lorenzo
OLX Engineering
Published in
6 min readOct 18, 2023

--

Image generated by using the title of this article

The introduction of ChatGPT, Copilot, Codex, and other tools makes it easier to code but also makes it easier to answer live coding interviews unless we do something about it.

Disclaimer: In this article, we will be exploring what happens if we don’t act on this, what can be done to adjust to the new normal, and what is the view of the future in tech interviews. Currently, we are not trying to suggest any changes to the interview process because we have not proven anything meaningful yet. This article is about reflecting on the implications of action and inaction towards the live coding interview process with regards to allowing using LLMs during it.

Context

In OLX, we promote the use of a tool called PlusOne, which is an internal LLM tool made to help us code, coach ourselves, and more. This reveals an interesting gap, which is

If OLX promotes the use of a tool once you are hired, why when hiring are we not evaluating people based on using these tools effectively?

The impact of not addressing this is that people who can’t adapt to the use of these tools will surely lag behind other engineers, and their performance will suffer. We will only be able to notice this after a few months, and it is better to notice that as early as possible.

For those who are not aware: ChatGPT is a Large Language Model able to generate code if you give it a problem statement and the platform+language it should use. Copilot is a tool from Github that can autocomplete large amounts of code, e.g. writing unit tests. Codex is a similar LLM tool from Microsoft.

A few weeks ago, we saw two interview candidates using Copilot/Codex during a live coding interview. While we talked about the possibility of that happening during our interviewer's sync, we were not really prepared to evaluate candidates if they use those, compared to candidates who don’t use them. Because of this, I was interested to know the latest on the:

  • Allowing the usage of Copilot/LLMs during live coding interviews
  • In case of allowing it, evaluating candidates that use them during the interview
  • In the case of not allowing it, evaluating if they know how to use these tools

In this article, we will be exploring what happens if we don’t act on this, what can be done to adjust to the new normal, and what is the view of the future in tech interviews.

Related Work

Several other articles discussed this topic. This article talks about the critical thinking and creativity skills that cannot be taken away by the AI tools that are provided.

In another article, Vidal talks about how soft skills become more important in the light of these tools that help with generating code.

A course on how to pair program with a LLM was published in collaboration with Google. This shows some basics of how to use a LLM while writing code.

A McKinsey study shows that developers are more productive using generative AI, but it depends on the complexity of the task.

Problem

Not addressing what to do on this topic will lead to a couple of scenarios

  1. Candidates will not be able to show their full potential during interviews when we don’t allow them to use the Copilot that they are used to.
  2. We will get candidates who we are not sure if they can’t learn to work with these tools.
  3. We get less time evaluating their critical thinking skills due to setting up code, downloading dependencies, or a slow machine, and so on.
  4. We will not be able to find candidates who work very well with these tools.

These all impact the quality of hires by decreasing success rates for people who cannot use LLM tools.

In the case of not allowing these tools. We are going to be falling back into the age of interviews before these tools existed. People rely on their own problem-solving skills, and the amount of output they generate during live coding interviews is limited by their machine’s speed in compilation and their ability to debug quickly.

In case we allow the copilot tools, there would be less effort in generating the initial code to allow developers to start quickly. There are multiple cases that could happen based on how good the Copilot/LLM tools they use are, in decreasing order of help from the Copilot/LLM:

  1. They don’t have access to these tools. In this case, we fall back to classical interviewing.
  2. The Copilot generates the whole solution for them, with no need to edit/debug, and it already covers the edge cases we would be throwing at them. We would only be able to evaluate their prompt engineering skills.
  3. The Copilot generates code that works for the normal cases and is missing edge cases. We would be able to evaluate their prompt engineering skills, their code review skills, their ability to identify edge cases, and their ability to adapt the code.
  4. The Copilot generates a barely working code that doesn’t meet the requirements. They would have to correct the LLM to guide it to a working solution like the case above. Or they would have to break down the problem so that the tool can help them with some parts of it.

These cases reveal some of the (new) key concepts that we need to pay attention to when evaluating candidates.

Key concepts

Please take the following with the grain of salt, since this is just based on self-reflection:

While trying to shifting technical coding interview from pure code generation to prompt engineering plus code review plus refactoring (of course, also debugging and testing). Some of the following key concepts might appear:

  • Prompt Engineering — e.g. they would write the problem in a form that LLM understands.
  • Code Review — e.g. to find edge cases and see if the code works as intended.
  • Refactoring — e.g. changing the architecture of the generated code
  • Debugging — e.g. if the generated code doesn’t work
  • Testing — e.g. to confirm that the generated code works in various cases

In order to assess the seniority of the person using these key concepts, we also have to benchmark what a junior vs mid vs senior interviewee would look like. For example, a Senior Engineer would be able to identify more edge cases or would be able to explain the engineering choices that the tool made and maybe explain what happens to these decisions if the assumptions change.

View of the Future

Live Coding interviews are a way to test how someone would perform in a real-life situation in a short amount of time using a controlled environment. Some tradeoffs in this process are the amount of time and the accuracy and granularity of evaluation that we are able to make during this process.

I was inspired by a talk I heard before that mentions that, ideally, we would also include working together with the team to evaluate a real-life situation where people are working together.

Ideally, the interview process would involve people from the team and a Copilot-like tool, which mirrors how a normal working day looks like. The problem with this approach is that it uses a lot of time from the interviewers. In the current view of things, we already have this problem of Engineers spending a lot of time on interviews, which is mitigated by limiting the time per week that is allocated to this. We also treat interviewing as part of Senior Engineer responsibilities.

Conclusion

While using LLMs won’t replace our technical skills, it will definitely add to our suite of second-brain tools like Google Search, StackOverflow, or our own notes.

In my opinion, there will still be some time until companies adapt to allowing these tools or being able to evaluate people who use these AI tools during live coding interviews. The call to action is for us to start talking about how to evaluate them. Otherwise, we wont know if the people we are hiring can adapt with working with these tools and they will be left behind.

We will be diving deeper into this topic in another article, e.g. what we have tried, what the results are, and our recommendations.

--

--