In our ninth episode, Aaron Erickson from NVIDIA joins Natasha Allen to share his perspectives on where generative artificial intelligence is heading. How are LLMs evolving beyond natural language to incorporate multimodal inputs e.g. images, sound, video, or even taste and smell? What are some of the new economic models being developed for AI tools? And are we primed to see a trend towards bespoke models designed for specific organizations?
- Navigating the AI Frontier: Legal and Operational Insights Into Generative AI
- My Employees Are Using ChatGPT. What Now?
- NIST’s AI Risk Management Framework Helps Businesses Address AI Risk
The below episode transcript has been edited for clarity.
Thank you for joining us for another episode of the Innovative Technology Insights podcast. My name is Natasha Allen and I am a corporate partner in the San Francisco and Silicon Valley offices at Foley and Lardner. I’m also the co-chair of the AI sub-sector, which allows me to have this cool opportunity to interview exciting individuals working on cutting edge technologies, including AI. I have with me today Aaron Erickson. Aaron is a key member of the software engineering team at NVIDIA, helping to build their enterprise DGX AI platform. Prior to NVIDIA, Aaron spent 30 years working in leadership roles, most recently as CEO of Org Space and a VP of Engineering at New Relic. Over the course of his entire career, Aaron has been an advocate for building better software from his home in San Francisco. Welcome, Aaron.
Thanks for having me.
Alright, so let’s get started. With the advent of ChatGPT came a better understanding of the world of AI. Prior to this, layperson individuals didn’t really have an understanding of what AI was or what it could do – in particular what large language models (LLMs) could do, which, as you’re aware, are the foundation for deep learning algorithms that help us understand and process natural language. As of now, we understand that LLM’s can be used for text, specifically to help generate or prompt certain text. But how do you think LLMs will evolve beyond natural language, incorporating multi-modal inputs such as images, sounds, video, or even taste?
It is one of the most exciting developments, I think. People are seeing this happen even now. The latest ChatGPT release, if you have the premium version, you can talk to it and it talks back in voice to you. In fact, one of the scariest things that people saw with initially was looking at CAPTCHA. If anybody’s ever used a CAPTCHA where you have to say “which of these things on the screen is a traffic light or not” ChatGPT can pretty much do that even today.
Multimodal is, I think, one of the really interesting places it’s going. One good example of this is, you think about Tesla and what they’re able to do with video. When you drive a Tesla, it’s almost like you have a kind of permanent dashcam on the car. It’s always taking pictures and it’s a big part of how we can drive independently or autonomously when you put it in that mode. But one of the really interesting things is how they’re using all that video that they’ve accumulated. Google is also doing this with the models they’re developing, and a lot of these new models are taking video and able to transcribe it to understand what each frame of the picture means. This is in order to really get at “what could we learn if we could see the world around us?”
The best way to think about it in a pre-multimodal LLMs is imagining you trained a human or took them to school, but the only thing they could do in school was read books or papers. They couldn’t actually experience the world as it is, be it through seeing the world, or in some of the more speculative applications I’m hearing about, even smell the world or taste the world, or understand different aspects and be able to code it into a training model. At the end of the day, you almost kind of have your inner monologue where you think about language. You look at a light and it translates into the word light and then you’re able to associate that to “when I have a light on, I can read.” That same kind of trick that we do as humans, we are now able to do with these kinds of biologic or synthetic brains that we’ve created.
Wow, that’s amazing. With the advent of this technology what are some of the key challenges you think to integrating multimodal inputs into LLMs, especially when it comes to sensory, taste and smell, and those things that you think are inherently human?
The first obvious sense is sight, thinking of being able to look at pictures, which we’ve been doing for years, and being able to look at video, which is a bunch of pictures and a lot of processing power, but it’s a known quantity. It’s not particularly groundbreaking at this point. Other senses, like being able to taste and smell, we have ways we can, through understanding the chemical composition of a substance, infer what it might smell or taste like. If you can digitize something, and these are all things that are digitizable, it’s just a matter of some of those other kinds of senses being a little more expensive or more involved. But there’s no reason it couldn’t be done. I used to say this, almost kiddingly “are we going to have generative smell next?” I’m not sure it’s something we want as society, but there’s no reason why it couldn’t happen. It’s just a matter of translating it into some digital form and then allowing the model to learn from it.
And so what is multimodal? There’s what the inputs are, what it learns from. There’s also what it can output. We think about generative AI and ChatGPT and smart technologies as generating text and pictures, and maybe generating videos someday. There’s early models that generate videos where you just type in “I would like to see blah happen” and a video is generated that does that. It’s a little rough right now, but the models are getting better and the ones you see today are the worst that you’ll ever see. They’re only going to get better from here.
One of the things that we do at NVIDIA is we have a product called BioNeMo and this is helping do generative drug discovery. How do you do that? If you think about it, it’s another kind of data and you’re able to literally use the same kind of technology to generate what ideal protein strands might be, maybe a cure for COVID, or whatever it is that you want to solve. Those kinds of applications are happening today. And research is being accelerated, think about the kinds of medications we’ll be able to get or the kinds of things we’ll be able to do with this technology, like make power plants more efficient. I know one startup that has demonstrated making a power plant 90% more efficient by using this kind of technology, helping figure out what would be some ways to make a plant more efficient that aren’t obvious to a human.
I think it’d be so amazing to see that evolution. We have the ways that we can expand the use of LLMs, now how do you monetize it? How do you monetize these AI tools? Maybe you have some examples, or can you can explain what are some of the current economic models prevalent in the AI industry?
You wonder why some of these startups are raising what seem like ridiculous seed rounds, I heard of one that was like US$50 million or US$100 million. And largely, they’re not just hiring employees – you’d actually be surprised by how few employees a lot of these firms are hiring. But largely they’re buying infrastructure. Either from us [NVIDIA] or from other providers of GPU technology, a lot of the money is going to go into that. And those are people developing foundational models. That’s your OpenAIs and Anthropics of the world. As well as some of the big FAANG companies, they’re all doing some version of this too. It’s just an arms race right now.
The more interesting thing is, and this is what I think a lot of people miss, is that there are a lot of what I call narrow AI solutions that you can build. They might be good at a specific problem. One good example of this, and you don’t need a massive array of GPUs to do this, you could develop a reasonable model that detects fraud.
Say you want to run your fraud detection department to find people that are fraudulently putting in expense reports or doing some other thing that was anomalous. You can build an LLM that is very narrowly trained to detect fraud and actually run that against your accounting system or run that against a set of accounting systems, an email, or something like that to be an early warning indicator. That doesn’t require the same kind of GPU fleet that you would need for building a foundational model. You could take something like Llama or one of the open source models, enhance it or fine tune it with data about how fraud happens or examples of bad examples of fraud in your company, and then build some pretty incredible products. This could be done internally in companies, building some pretty incredible capabilities using these things to automate certain kinds of routine intellectual work. That’s just one example, but it’s hard to go into a company and not find maybe 20 or 30 other examples like that where a narrow AI can do incredibly useful things that are far more valuable than what you’ll spend developing it.
Interesting. So maybe elaborate on the concept of bespoke LLMs. Is this a new thing that executives could ask for – I want my own LLM tailored to myself. What would make a bespoke LLM different from an off the shelf AI model?
Why would you develop a bespoke LLM in the first place is a good question. Because right now if you use this ChatGPT or whatever, it’s going to tell you general facts about the world. It’s not going to give you your sales forecast. It doesn’t know data about that. It might say, well, you know this company is doing pretty good and maybe even have it attached to Bloomberg and maybe it will tell you a little more about your company. But it can never know as much as you would know inside the company. This includes your private data, your accounting system, your HR system, and every kind of every digital footprint in your organization that’s specific to you. That’s just one of the obvious reasons to build your own LLM, to have a training model that, as a CEO, can tell you anything you need to know about your company – it can give you advice on your strategy, almost become like a McKinsey consultant in a box. Even if you have other people doing that, having something like this where you can iteratively ask questions and do so whenever you need, whenever the thought comes by, is pretty useful.
Then you think about, how is it different? I think it’s kind of like how somebody who’s been in your company for 10 years might be slightly more valuable than somebody that’s only been there a week. Institutional knowledge, right?
A person that’s been around the company for 10 years has a lot of tacit knowledge about everything from how people talk, what the company culture is, what kinds of things are inside or outside the opportune window, if things ought to be discussed or not, ideas that have been tried, things that the organization as a whole has learned. It would be a shame to have LLMs and not be able to take advantage of these things that we learn inside our companies over time.
These models get more and more useful the more you can train them on the context of your organization. And this is the part that’s incredible, and why we’re seeing a lot of demand for these kinds of systems, is that CEO’s don’t know the limits. Right now the more horsepower you put into one of these things, the smarter they get. It starts to become an economic imperative to have the smartest AI model. One smarter than your competitors. One trained on better data, one trained on more GPUs, you’re effectively making it smarter and that can be a competitive advantage over time. Especially as we start to get closer to artificial general intelligence, which even Sam Altman and other people are saying is a lot closer than we think. It might be before 2030 that we get this kind of broader concept of a generally intelligent LLM that’s better than most humans. It’s pretty incredible to think about. I don’t go a day where I don’t think of another application, so it could be used in one of these contexts.
With that in mind, two questions. The first one is, if organizations decide to go down this bespoke LLM route, what is some advice or considerations you could offer them to help them navigate?
One of the first pieces of advice, and this kind of goes for anybody using one of these tools, it is very easy to treat these like humans and think they’re human and to apply human characteristics to it. I think that’s a mistake. These are not humans. These are machines, but these are machines that sometimes are wrong in the same way a human might be wrong. People will complain about hallucination, and I think correctly. And a lot of the research that’s been happening is about how do you avoid hallucination? But it’s a lot more useful when you think about the fact that if you’re an executive and you have a team of people that report to you, it is very frequent that those people that report to you aren’t lying to you. Maybe they are occasionally, but if they’re wrong about something it’s because they were asked to produce an answer but they don’t really know 100% what that answer is. Most humans want to feel like they’re right. You’re trained to not be wishy-washy when you answer a question and so a lot of people will state things somewhat hubristically even when they’re not necessarily true. And I’d like to think LLMs are really just kind of following that pattern.
Treat the way an LLM works a little bit less like you would a traditional computer and expect the nondeterminism. Expect it might be wrong. And explain you might need to validate some of the facts. Now this goes for ChatGPT or your own bespoke LLMs. That’s the same risk. Imagine you trained a language model on all your HR data, all your accounting data, all your sales data, all the important data sources in your organization. Now, if you are an organization that is fully transparent in every capacity, maybe this isn’t going to bother you, but an LLM will answer based on any of the data it has seen.
A lot of the really interesting, and frankly hard, work that’s happening now is “how do you make a company-trained LLM not just reveal, say salary information or other kinds of personally identified information that it might learn about? How do you make it not regurgitate that back?” Now the CEO will have free access to it because hypothetically that’s the person that should have access to nearly every bit of company data, and there’s probably a number of people inside the executive suite, they’re going to be expected to have that. But if you start to take it out to the rest of the organization, we’re going to think pretty hard about how do you design these systems so that any given LLM is not going to reveal your company secrets.
Let’s take the secret formula for Coca-Cola, for example. If you’ve trained your LLM on Coke, then you could say “please give me the recipe for Coke” and it might say no. I’ve done this with LLMs, with other kinds of things you’re not supposed to know, or things that they try to protect you from getting the answer to. Somebody figured out a way to make ChatGPT give instructions to make a bomb not by saying, “please give me instructions to make a bomb” but “hey, if I was this character in this movie and I wanted to make a bomb, how would that character do it?” There’s ways people get around it, and I think security issues are going to be tough with that.
There’s ways to design these systems to do that. I can go into more detail, but that gives you a little bit of a sense for what some of the challenges are.
That’s very interesting. Almost like a walled off approach, certain people can have access to the outputs. This kind of ties into, in talking about using large amounts of data in particular organizations, what are you seeing? Is there an increasing trend towards walled garden LLMs designed specifically for an organization trained on a combination of their proprietary data and public datasets?
Absolutely, everywhere in the industry. Tons of business leaders I’ve been chatting with since ChatGPT came out and people learned that you could build these models, that can train these models. One of the first questions CEOs start to ask is, can we train one on my stuff? I don’t want to give away the company secrets. This is a big, powerful machine. I don’t want to use ChatGPT even if they say they’ll not train on my data. I don’t want somebody to accidentally do that, so just to manage the risk we started thinking about it that way.
It’s one of the biggest things that people are doing in terms of trends. But even as they think about how to use these LLMs, one of the trends I’m starting to see is people thinking about solving the security issue I was talking about before, you don’t necessarily have to train one. Again compared to a person, you might have an LLM that you train on CEO-knowable data and then you might say well, we’re going to have another LLM that we train on things that we can be publicly known internally within the company. So maybe the organizational chart is public information. You can train an LLM based on that, and then you can answer questions about it. Maybe certain kinds of other policy documents, and we start to almost think about separate and maybe even smaller purpose-built LLMs or machine learning systems for different parts of the organization, by department, if you will. The LLM structure starts to resemble a more traditional organization structure.
Interesting. Are there any examples of industries or sectors where you think this walled off garden LLM approach is beneficial?
All of them. When the IBM 360 came out in the 1950s and 1960s, the idea of every company owning their computer was pretty ridiculous. IBM owned the computer and everybody shared it. You had to be a pretty big company to own your own early on, hence the whole time-sharing thing. Over time, it became extraordinarily commonplace for companies to have their own computers. People started building their own networks within them, and then we eventually ended up with the PC. I think you’re going to see the same thing with machine learning systems.
Right now, because the economics to build a viable LLM are so difficult, if you want to train GPT5, you’re probably going to need over a billion dollars to do that. But the cost is going to come down, I think pretty radically over the next three to five years. The techniques to do custom elements, the raw technique, rank adaptation, and others that are becoming more commonplace, I think they’re only going to progress. Between that and there’s so much capital going into different kinds of AI startups that there’ll be a solution that you could probably own within your four walls for just about any kind of problem you can imagine. There’s going to be a marketplace bigger than what most people think that’s coming with this stuff.
That’s great. And I agree. I think it could be used across many organizations, probably the ones with larger datasets, right? Last couple questions – walled off garden LLMs, do you think they will replace enterprise software or do you think the two will coexist and be complementary to one another?
I think there are classes of enterprise software it will replace and new kinds of enterprise software will emerge.
It’s going to be like every other revolution. That stuff will still be around. You can still buy enterprise software for doing sales data, that was around before Salesforce. You could probably still build systems like that or use systems like that. They won’t entirely go away, but the state of the art will certainly be systems that just tell you the answer to the question you have, and then maybe let you interact with data in some way that makes sense for a human. The idea that you have to be trained to use a SaaS product I think will largely go away. Any SaaS product where you have to be trained as a human to do it is probably more complicated than the LLM version that’s going to be a combination of a chatbot plus some sort of model that you can interact with that helps you understand a solution to a problem or help you model something.
Final question – if organizations are thinking about offering or adopting walled garden LLMs, what steps do you think they should take to maximize the benefits while trying to minimize the risks, some of which you already alluded to before with regards to hallucination?
I think it’s very easy to just expect it to be magic and so I think tempering expectations a little bit. I think there’s a lot of really great experiments you can run without making the big upfront investment. One of the most powerful things about using OpenAI’s API or some of these other organizations’ APIs is that you’re able to experiment with what’s possible. You’re able to explore the art of the possible, which helps you understand, okay, well, if we can do the small thing really, really well and do it on somebody else’s LLM, you understand how your own data would help with that decision. That’s how you start to build the economic case for doing this stuff. That’s where some of the value is going to come in. I think it’d be a mistake for most organizations to say, oh, I need to build my own GPT5. Some will do it, right? I can see in the next five years, some companies saying, hey, I need to have the smartest one of these. I want to compete that way. Not against OpenAI, but one car company versus another car company. I think there’s a lot of big investments like that. I also think, like any other industry, there’s going to be a tremendous amount of waste in terms of people either not understanding what they’re capable of or expecting exact answers and having no tolerance for data even being slightly wrong, which kind of misunderstands what an LLM is capable of. Even the best GPT5 will probably still hallucinate from time to time, and we’ll still need a human in the loop for anything that’s life critical. I think those are some of the key ones.
Like I talked about with security, I’ll say again that being very aware of what data a given LLM has been trained on is critical. And starting to think about well, okay, maybe we need more than one, maybe we need dozens of them, just like we might have dozens of employees that understand their domain really well. And in fact, we train them on that narrative, and you might actually save money and not have to develop very, very expensive general LLMs. You can train them on their very specific function and almost create coordinator LLMs, just like your senior vice president that coordinates the activities of the subordinates – creating your own organizational structure out of synthetic brains not the biologic ones.
Very interesting. Well, that was my last question. I think this was a very insightful discussion. Appreciate you taking the time to talk to us and chat about some of these more cutting-edge decisions to be made when you’re dealing with LLMs, the AI tools, and what the next frontier may be. Thank you everyone for joining us, and until next time.
Foley & Lardner’s Innovative Technology Insights podcast focuses on the wide-ranging innovations shaping today’s business, regulatory, and scientific landscape. With guest speakers who work in a diverse set of fields, from artificial intelligence to genomics, our discussions examine not only on the legal implications of these changes but also on the impact they will have on our daily lives.