EPISODE 1693

[INTRO]

[0:00:00] ANNOUNCER: One of the most promising applications of large language models is giving non-experts the ability to easily query their own data. A potential positive side effect is reducing ad hoc data analysis requests that often strain data teams. Sarah Nagy is the co-founder and CEO at Seek, which is using natural language processing to change how teams work with data. She joins the podcast to talk about the platform and providing a natural language interface to databases.

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[EPISODE]

[0:00:47] SF: Sarah, welcome the show.

[0:00:48] SN: Thank you so much for having me, Sean.

[0:00:50] SF: Yes. Thanks very much for being here. So, we're talking about Seek today and empowering business users to be able to query their data without the need or assistance of like a data team. I feel like that's a problem that many companies struggle with. How do you empower the people in the organization to be able to use their data? Data teams end up, I think, a lot of times getting bombarded with requests, and people show up for a meeting, and they're presenting data, maybe someone asked a question, that's somewhat outside of what the data shows.

Then, sometimes, a lot of times, if someone doesn't have an answer for it immediately, and then they have to go off and schedule a follow up meeting, in order to get an answer, and this learning cycle really grinds, and data team thrashed answer questions while balancing the work. So, we've had these different tools over the years to try to solve this problem, like these visual tools like Tableau and Looker, and so on. But why do you think this is still a problem with organizations? Why do they really struggle to actually be able to answer these kinds of data questions without constantly going back to the data team to get answers?

[0:01:50] SN: I mean, I think there's a few causes here. The most obvious one is that you cannot ask new questions about data without bothering a human. You just can't. Just thinking about that really deeply. That's not scalable. What if you have 100,000 people in your business? Imagine the amount of humans you would need to scale, data analysts on the data team to support that many new questions. So, I think a lot of organizations have come to accept this fact that you know what, people just can't ask questions about their data. They're going to have hard-coded information that is built for lots of people to be able to look at. That's just kind of what people get today.

So, it's just kind of the status quo that I think everybody's kind of used to, because this used to be impossible until large language models, basically. But I think that's probably the number one issue here. I guess there's other issues too, though. I personally, as a data scientist, sometimes, I would work at even like startups where this would happen. There would be multiple people on the business side having questions, and sometimes I would get asked a question by someone, and then a colleague on my team, we get asked a question by someone else. We do the same exact work, not knowing that each other was doing that work.

When we found out, we were so disappointed. Wow, that was a huge waste of time. Imagine that in very large companies, actually, the anecdote I'm recalling happened to me at one of the startups that I worked at. So, I think that's the other problem is like, it doesn't even have to be new questions. It can just be like, how do you get access to a question that somebody already asked? That, in itself is a powerful scaling factor, and it doesn't require learned language models even. It just requires better software.

[0:03:45] SF: Yes. I mean, I think that's a super common question, or super common issue, even outside of data teams is that a lot of times the answer actually exists somewhere. But either the person doesn't know where to go to access it, or it's just easier to ask on Slack or some other channel. Does anybody know the answer, that's versus go and actually, seek the answer to it through some piece of software? But that creates a lot of thrashing as well as distraction. Because constantly people are having to divert from the work that they should be doing to go and answer a question that maybe they've already answered 10 times somewhere else.

[0:04:21] SN: Yes. I think that discipline is kind of key here. I have definitely seen this with other types of software. For example, Jira, or Linear, like shortcut, productivity software like that. Typically, people don't have a choice what ticketing software they use. It's determined by the tech leaders. It's just a convention. Everybody in the organization has to use this software. I do think on the data side, we do see that with BI tools. But there's all ad hoc stuff happening outside the BI tools, because there's really no product to capture on that activity.

So, if you are installing a new product, perhaps in addition to BI tools, that's kind of meant for all this kind of activity that's taking place outside the BI tools. It is important to stress, this is something that you need to use if you're a business user. In Seek's case, the more you use it, actually, the better and better it gets, because it learns from your data. So, I think that's one more reason to encourage people to keep using it. But I do think you need to have that discipline of like, "Oh, I'm not going to go outside the system. I want you to use this software, because that's what it's intended for," and it's going to be just an investment that's going to make this so much more powerful for not just you, but everyone else, not just in the short term, but especially in the long term.

[0:05:54] SF: Yes. Let's get into Seek a bit. What is he doing to help solve this problem and what is the product and the vision of what you're trying to accomplish?

[0:06:02] SN: Yes. I mean, it kind of helps to talk a bit about my background, just to kind of give some context to why I started Seek. I used to be a data scientist. Before that, I actually started out as astrophysicist. So, kind of pivoted into data science and actually started out as a quant on Wall Street doing high-frequency trading. Then, became more of a data scientist over time in the startup world, and then later, the hedge fund world. I kind of just kept coming across the same problem, pretty much everywhere I worked. I expected my career path to involve a lot of really just compelling research that I really wanted to provide meaningful outcomes to whatever business I was working at, because I just knew how powerful the data was. I was just so excited to play with the data and like find alpha.

That's just not really what ended up happening. There was just so much work to be done, helping non-technical people just get basic insights to the data. That's what I was doing most of my career. Having people come to my desk, ask me questions. Sometimes they were kind of funny questions, like, "Hey, Sarah. I want to make this trade. Maybe it was two different fast food companies or something like that, and like looking up market share for cheeseburgers or chicken sandwiches." That was the kind of work I was actually doing day to day, is pulling data to help people. I was just wondering, why am I doing this? Why are things so inefficient? It's because of some of the problems we already talked about, and just the fact that the tools that the non-technical people had access to, just couldn't meet the majority of their needs.

So, there was all this overflow. In 2020, that's when I first came across large language models that were actually pretty good at writing code. The thing that really kind of stood out to me was I was like, these models could really solve a lot of this problem, because they can understand questions, and anybody can ask a question, and they can write code that can be very similar to the kind of code that a lot of data people might find themselves writing and not really want to write.

So, 2021, that's actually when I decided to basically quit my job and bet my career on large language models. I ended up starting Seek that September, actually, so prior to ChatGPT. Although, we were always based on large language models, and the goal of Seek, even then, but all the way to this day, is to give everyone in a company, a natural language interface that can be used to, instead of bothering the data team. So, you don't want to wait two weeks to ask someone from the data team. Or you want to ask more than just one question. If it takes like two weeks on average, to get your question answered, imagine what it would be like for 10 questions.

I've been on the other side of that, and things just slipped through the cracks. On average, it's kind of a linear scale. So, yes, that's what Seek does. That's essentially what the product is.

[0:09:28] SF: As like, a non-technical user, what am I doing with Seek? What is the experience like? Am I just essentially just saying like, "Hey, go." What is the average income of, or I don't know, our customer base or something like that, and then it goes and figures out how to convert that into code and execute that code to pull back a response?

[0:09:48] SN: Yes. Exactly. We have a few different ways to access Seek. We have a web interface that is not that different of a form factor than what people are used to, like ChatGPT. We also have a Slack bot. Some companies we work with live in Slack and they actually don't want to use the web interface. So, they can use the Slack bot instead. Those are kind of the two main ways. We also do have an embedded use case as well. We've had quite a few companies. I really think our technology is cool, and be like, "Hey, I have data that I want to make more accessible to my customers," and plug Seek on top of that. That's another use case, we also have.

[0:10:32] SF: What is that pipeline look like from, I put something in, in natural language, whether it's through the Slack bot or through like the web interface. Then, can you walk me through what's happening behind the scenes, to turn that into code? Then, how is Seek actually connecting to my data sources in order to be able to go in Seek and find the answer?

[0:10:54] SN: Yes. Maybe you start with the data warehouse or data lake connection. The way we connect to sources of data isn't really that complicated to be honest. For the most part, we just use the SDK. We do have a partnership with Snowflake, however, where it's no part container services. That's a new product that snowflakes coming out with, it means ABS can actually live inside of the snowflake cloud. So actually, some of our customers are working with versions of Seek that just live in their Snowflake cloud, which is pretty cool. That's kind of slightly different type of connection.

Something that that we've also heard pretty good feedback about is that we can connect to multiple sources of - I know, it's an outdated term, but big data. There's no updated term for big data. I get made fun of, but I keep saying it until I can find a better term. Oftentimes, enterprises don't have all their data in just one place like they should, theoretically. So, we do try to make it easy to just have Seek sit on top of multiple databases, whatever you have. So, that's something that that I want to point out as well.

I can also talk about your other question, which was like, how exactly does it work? How does Seek, seek?

[0:12:18] SF: Yes.

[0:12:19] SN: So, Seek is powered by our production model called SEEKER-1. SEEKER-1 is a combination of actually about a dozen different types of models. We take a systems engineering approach to ML in that way. We've never sought out to train just one LLM to do everything. Rather, there's a lot of talk right now about agents. We've always kind of been an agent, just by definition, and that we take sequential steps, like we're looking at all the data in the database, which could be a lot of data, so you need systems to be able to analyze all that data and make recommendations about what to actually query with the LLM.

You need the LLM piece as well. That's kind of what makes it all possible. Actually, we have multiple LLMs, throughout different parts of the process. We have LLMs that write code, but we have other LLMs that do other kinds of stuff as well, even assess the viability of the code, and how likely it is to actually be acceptable to a non-technical person. Then, yes, as part of the confidence scores feature, we have things like classifiers, and we have a whole human feedback component as well, to help the system get smarter over time. Part of the system also has a patent pending for some of the kind of human feedback piece.

[0:13:51] SF: So, this like multi-model approach. Some of the value there are essentially like specialization. You're not going to use a large language model for every single part of that, and sometimes you might want to use a large language model has been essentially fine-tuned or augmented in some way to solve a specific problem. But then you're also going to use like other types of AI models, maybe a traditional classifier or something like that, as part of the entire ML pipeline, essentially, to solve these really specific problems so you get a really high sort of confidence level.

[0:14:21] SN: Yes. I mean, that's exactly the approach that we take. Some of the pieces of the system are quite simple on their own. The information retrieval. But you do need to put them together in a very elegant way. We can see that with other systems that take this approach. Have you heard of Devin? 

[0:14:38] SF: No, I haven't.

[0:14:40] SN: Oh, let me put you on. Devin is, I don't know, it's kind of like Seek, but for software engineering. It went viral a couple of weeks ago, and it's very similar approach. It's an agent that can take sequential steps to solve software engineering tasks. So, it is kind of cool to see like this new crop of systems that use LLMs. In our case, we do have a proprietary LLM that we trained. But it's not the make or break of the product. It's really the way we designed the system.

[0:15:13] SF: Yes. In some ways, the secret sauce is not necessarily like one specific thing, but the aggregate of putting all these things together in a way that might help solve this use case.

[0:15:25] SN: Exactly. We haven't even touched on the user experience piece of it. A lot of people haven't even used ChatGPT. So, if you want them, you use Seek. You need to have a really solid user experience, and customer service, to be able to support people. That's actually a really kind of overlooked piece of all of this that's just super, super important.

[0:15:50] SF: Yes. One of the things that I've been thinking about recently as more and more of these kind of like Copilot experiences crop up is, is sort of having a freeform text interface, always the best user experience for someone because you're giving them essentially infinite choice about what they can do. Some of the advantage of a more traditional graphical user interface is you can really force someone down a path. It limits the experience that they have, which can be a problem sometimes. But it also means that they're less likely to step on a landmine and lead to experience that's not what they want. Here, it's like, if you give someone access to just freeform tax, ask anything they want about the data, they might not even know necessarily what is the reasonable question to ask. How do you help them get to a place where they're asking things where they can actually get responses that makes sense?

[0:16:40] SN: Yes. Exactly. If you don't know anything about the data, which most non-technical people don't, how can you expect to come up with useful questions about the data? Actually, it's helpful to think about where do questions usually come from? Usually, it is in response to some other data that they're looking at it. Because of that, we do have places inside of Seek that, for example, say, you have a great chat with Seek, and you want to save it for future use, and be able to revisit it, even like, the next day. We just realized, "Hey, you know what, let's just put a button that will let you just save it, and even like, display it in a nicer way, rather than you scrolling through a chat every day."

Having that kind of functionality in the platform, kind of makes it so that as you're kind of developing a literacy for the data and getting more exposure to the parts of the data that you care about, you're going to have more questions and more intelligent questions that you can then go into the chat and be able to use the chat.

[0:17:52] SF: Yes. We think that in comparison to something like just using ChatGPT, which is a very general tool, like sure, I could probably dump some data into it and have it, analyze it. Of course, there's potential risks there. If it's just my company data, I might not want to do that. But the other thing, too, is that it's going to generate some sort of response. So, maybe, I'm going to have to essentially translate that response into some sort of output that is useful in the business context, whereas where Seek is a more specialized tool solving a specific problem. I would think that, and please let me know, from a user experience standpoint. If I'm generating some graphic, maybe I want to put that into a report, or something that is a slideshow, so you could have kind of those natural integration points, because this is essentially a tool for business.

[0:18:38] SN: Yes, totally. We've gotten a lot of requests from customers, things like just being able to export the data in all the ways that they need to be compatible with Excel, Google Sheets, PowerPoint, and Google Slides. Those were very easy things for us to build. But until we build them, people are constantly asking about it.

[0:19:02] SF: Yes. If they're not there, then sure people can do it, but it adds additional friction. It's not as like a seamless experience, essentially. When it comes to going back to the code generation part of it, do you think the bar is higher for code generation than it would be for something like natural language? Because if an LLM spits out something, or even I write something in English, but I miss a word, you can still figure out what probably the intention is. But if I miss an instruction in generated code, then it just doesn't work, essentially. So, is that something that you've like had to develop sort of specialized technology to handle and how do you think through generating code that's able to basically be immediately runnable versus something that maybe has mistakes within it?

[0:19:50] SN: Yes. I definitely think the tolerance for inaccuracy is zero in our world, and the reason for that, I think, I mean, even when I was starting to build Seek, I knew this was the case, because I was just like, "Okay, if I was at a hedge fund, giving insights to business users, your people were making trades on that, like, there's no room for error whatsoever." So, I even realized that as I was starting Seek in 2021, even from the beginning, we had guardrails around the product. The kind of the patent pending aspect I mentioned earlier, this is a big piece of it, is protecting against hallucinations. I also do think kind of going back to the agent discussion that's happening right now. I think agents can actually be kind of helpful for checking work or you can have different types of agents to prevent hallucinations of different types. It doesn't have to just be one LLM and it spits something out and that's it.

It's funny, like I was thinking about my own process as a data analyst and a data scientist. I was like, "How did I make sure that my answers were correct?" Or even like, as a student in high school, like, how did I check my work? It's really a series of sequential steps, that there's no reason an LLM can't do that kind of work if you build that kind of reputation, or those kinds of loops.

Long story short, yes. I mean, I think the only other thing I would add that maybe I didn't realize, is unstructured data is not the walk in the park, that maybe everybody thinks it is. Getting unstructured data questions wrong is equally bad. It's solvable in some ways, because you can present the user with the source directly, that they can just look at themselves. But what if you didn't retrieve the source correctly? Then you're still getting answers wrong. I wouldn't say unstructured data is a piece of cake, either. I kind of, to be honest, naively thought it was much easier than working with structured data. But it turns out, there's plenty of challenges there as well.

[0:22:12] SF: Yes. I mean, working with unstructured data is like a largely unsolved problem. Companies essentially have been sitting on a mountain of this stuff forever, even with data lakes, and so forth, coming in the last decade, it takes so much manual work to get up and running and maintenance that a lot of that data just goes unused. I think that LLMs are helping unlock some of this, but there's still a ton of endless amount of problems beyond this to solve it and do it at scale and do it accurately as well.

From the code generation point of view, so you end up having to generate, essentially translate a natural language query into multiple types of runnable code, depending on what the data sources. So, maybe some of the times, it's running a SQL query behind the scenes, but maybe some other type of data source requires a different technique to basically pull the data and then you have that, you know, aggregate those things together. Can you talk a little bit about how some of that stuff works?

[0:23:06] SN: Yes. In terms of the language that we work with, our focus definitely started as SQL. The reason for that was, when I was thinking about what's the very manual repetitive work that I'm doing that I want to automate, it wasn't so much the Python stuff. A lot of it was just the SQL stuff. So, that kind of came first. The reason I found it so repetitive is because, from my perspective, I found it to be an easier language than Python. I think that's up for debate, knowing what I know now. But at the time, I was just the SQL code just felt like so much more of a grind than Python, because you can do machine learning stuff in Python.

Yes. That's kind of how I decided to start with SQL. Obviously, you can't solve every problem with SQL. We are going to be supporting other languages like Python, but SQL was what we felt would solve a lot of problems just in and of itself.

[0:24:02] SF: Right. Makes sense. Then, how do you handle like ambiguous situations? Let's say, I'm asking for, I don't know, like the average price of something. Then, in theory, that could mean I want to compute, like run like an average function over a table column. But I might also have modeled or somebody might have modeled the data as already computing the average where I have like an average price column, or even something like pull all data from Vancouver, but that can be Vancouver, Washington, or Vancouver, BC, and a human working as a data scientist within that company would probably understand based on the nature of the business what the intention exactly was. But how does the model kind of figure that out? Or maybe it doesn't? Does it warrant essentially follow-up questions? How does that stuff work?

[0:24:49] SN: Yes. Disambiguation is definitely something that is super helpful in situations like that. But I think it also depends on what do human analysts do. If it's a situation where human analysts should know there's meaningful difference between all these subtle examples, and a human data analyst should know how to do this. You can just train Seek to also know how to do this. So, because Seek does adapt to each company's unique style of writing code from our training process and our feedback process from using it, it's going to pick up on those subtleties from the training anyway. You know, and the disambiguation can help early on in the process. But yes, our philosophy is, if a human can distinguish between these things, there's really nothing stopping you from training Seek to also be able to emulate that.

[0:25:44] SF: Okay. You can do some more personalization or training at the company level. Can you also personalize to the individual so that Seek can understand like how I ask questions, or what kind of answers I need, versus someone else?

[0:25:57] SN: You can. Yes. I think what we see is more like, what's more kind of common is people will group different types of users together, but more into group, although you could do this with an individual as well. But say you have a bunch of salespeople, and then you have a bunch of HR people. There are a lot of reasons you want to keep those groups separate. One of them is just governance. You do not want the salespeople seeing the HR data. But another one is like metrics. Salespeople are going to define churn in a much different way than HR people are going to define churn. One is revenue churn. The other is employee churn. You could have people asking the same exact question, but it's referring to totally different metrics. So, we've solved that problem, partially just with governance. The ability to partition just different types of data for different types of users, and that solves a lot of those issues.

[0:26:53] SF: This way, the HR department maybe is able to get access to employee information. But me as an employee, and I don't know, customer service, I'm not able to just like pull up my colleague's salary or termination date or something like that.

[0:27:08] SN: Yes. We get asked this question a lot, a lot about HR data, especially. People asked that all the time. "How do I know my HR data is protected?" I'm like, "Wow. You really thought we didn't think of that? That was one of the first things we thought of, because it's like, yes, you got to protect that sensitive data from all the other user groups."

[0:27:28] SF: Well, to be fair, to give those people some credit, I have seen consistently people miss this point, when it comes to LLMs and Generative AI. A lot of times they think about protecting the perimeter of the model by using something like a private LLM instance, but not actually, like, it's not necessarily about the outside forces. But can you trust everybody within that does have access has the same level of access?

[0:27:50] SN: Yes. I think we just find like, within the modern data stack community. There are just certain best principles that people are used to, like governance, like metrics, and we want to bring as much of that stuff into Seek because there's a reason there's best practices. So, yes.

[0:28:09] SF: Then, in terms of the deployment models, like, what are the options as a business to run this? Is this something I'm running within my cloud data centers? Is it more of like a managed service? How does that work?

[0:28:23] SN: We have both, actually. We started with a managed service. That's just pretty easy to use, because you just get it from the browser. We did start to expand into more custom types of deployments, mostly things like in VPC, or Snowpark Container Services, as we started working with more and more enterprises.

Right now, we actually try to encourage enterprises, if they're on Snowflake to go with the Snowpark Container Services route. A lot of people are excited about that right now, and it's just so much easier than going through procurement, going through InfoSec. So, that's been a popular choice for us. But yes, just like any other type of enterprise software, I think a lot of enterprises can't do a managed service.

[0:29:10] SF: Yes. For those listening that aren't familiar with Snowpark Container Services, I think it was launched at Snowflake Summit a year ago in private, and then now is, I think, public beta. But essentially, it's you can run a Docker container on like a managed Kubernetes within someone's snowflake account. In terms of the investment there, why invest in Snowpark Container Services where someone's deploying that running within their environment? Versus something like a native app where someone's like pulling that in from the marketplace and it offers kind of a similar level of integration? 

[0:29:41] SN: Oh, well, we actually do both. Snowpark Container Services and native apps, they're meant to solve just this one big problem of just installing an app with a click of a button inside of Snowflake. Native apps used to be, I think, you could call it a little more lightweight before Snowflake introduced Snowpark Container Services. So, sophisticated product like Seek would have had a really hard time being a native app, like last year. But with SPCS, all of a sudden, it's possible. We're even deploying private models and SPCS for some of our clients. It's an end-to-end private version of Seek. Whereas others, we can plug into whatever LLM they would want us to plug into. But that's kind of the level of sophistication that SPCS allows us.

[0:30:31] SF: Yes, that makes sense. Then, it turns of actually learning the like the model or the model schema, if we're talking about some sort of like structural data source like a database, how does Seek go about actually learning - I'm assuming I'm doing some configuration. I'm pointing it to these different data sources? But how does it like crawl or essentially analyze the schema to understand how to actually run queries against it? A lot of times, people don't do a great job of necessarily using the metadata available in the database to even explain what a column is. So, how does it figure out that this column means this thing in order to be able to translate a natural language query into some sort of a runnable that's going to pull the data correctly?

[0:31:14] SN: Yes. I mean, this is really - text-to-data is a very hard problem. I think doing this correctly, is still an area of research that's getting better and better. But we've been building these models for I think, over two years. Not, I think. I mean, I know. We've just learned so many nuances along the way of how to properly connect business context, to the actual data. There's a lot of things that we thought of that we might tell you, and then you think like, "Oh, that seems so obvious." But most people probably wouldn't necessarily think of until they were told that we just kind of learned through trial and error.

I mean, I will say, what's kind of interesting is we do have some customers that didn't have metadata when they started working with us. It's not that uncommon. We wanted to make sure that we were able to support customers that didn't have perfectly clean data, perfectly labeled data. No company has perfectly labeled data. That's going to be a very, very small town. If we would have gone that route, then like, you have to have perfect data. That would have been like one in a thousand companies. We decided to build something more flexible.

I have this quote I haven't used in a while, but I used to say this all the time. Gordon Ramsay has this phrase that I really like, "Let the knife do the work for you." Meaning like, yes, don't stress out your hands and your arms. Just know how to properly use the knife. We like to use the deep learning. Let the deep learning do the work for you. Let it just learn on its own. What your data team is actually doing day to day to train the model, believe it or not like that can be just as powerful as super well-built-out knowledge graph. I will point out, we do work with knowledge graphs, too. We do work with companies that have that perfect data. They're on the DVT semantic layer. They have like the state of the art knowledge graphs. Obviously, that's very, very well primed for large language models. But those are like the super early adopters, and it's going to take a long time for everybody else to get there.

[0:33:34] SF: Okay. Then, I know you're doing a lot more than just SQL, but I feel like specifically to that, people have been trying to kill SQL for 40 years. There's been domain-specific languages. A lot of people who've tried to bury it, and then it keeps coming back and it doesn't seem to be going anywhere. But do you think this sort of approach where if we can make it so that people can actually ask natural language queries to their data sources, that that will actually end up being - basically, English or whatever language you speak in, will end up being the SQL killer.

[0:34:10] SN: I think it's inevitable that there's going to be a world someday, where definitely a lot fewer people are writing SQL code manually. There are definitely also like even new languages coming out, like you said, that are trying to kill SQL or even Python. I'm definitely not necessarily saying that I think SQL kind of prevail. I think there could be some of these new languages that end up flourishing. Definitely, in our longer-term roadmap, we are accounting for that.

So yes, in the longer-term, I wouldn't be surprised if a new language would kill SQL, or even like merge SQL and Python type of work together in the data community. In the short term, I do think that a lot of SQL is going to get automated by large language models. There's just no reason to write this code out by hand. As long as the models are doing their job, and really, the systems of models are doing their job and getting questions right. I mean, as long as it's accurate, it's getting questions right, like, yes, the system will start to replace that type of manual code so that data teams can do the stuff that they hoped to be doing when they started out.

[0:35:23] SF: Do you think in five years, there'll be less people writing SQL day-to-day than there is today? Or do you think it'll be the same or more?

[0:35:31] SN: Okay. I think it's a good question, because the flip side could be, "Well, couldn't AI empower more types of people to learn SQL or even learn Python?" That's definitely an interesting hypothesis. From my experience, unfortunately, I wish the world weren't this way. But I've just found it to be this way. There's really the types of people that don't want to write any code or work with any data. Then, there's the people that do and there are people in between. But they're very few and far between. Why is that? I think that it's because data people exist for a reason. Their job is to be the expert of the data. It's really hard to be like, just kind of an expert on data. Data is so complicated. Even one little mistake can change an entire insight.

So, to put that burden on a non-technical person, I think, is sometimes a recipe for disaster depends on the use case. Sometimes you can have like a hybrid type of user be successful. But I think for the most part, it's less risky, just to prevent the non-technical people from worrying about it, and let the AI be the expert, instead of just asking the human experts.

[0:36:58] SF: Okay, and then, I guess, to that point, what's the world then look like if Seek is wildly successful and you kind of reach your goals for what the vision of the company is?

[0:37:08] SN: I mean, I would love to see just pretty much every data analyst, data scientist, analytics engineer, data engineer, roles, all the data, every data I meet. It's more often than not, they're usually really, really smart people that come from some sort of interesting background, like STEM or even like some sort of engineering background. There's always like a story of why they decided to go into data. I always kind of see a lot of myself when I talk with them, because I have that kind of story too. I would just love to see these really smart people just get the chance to fly. They can even use copilot. Seek can be used as a copilot. It's not like the super value add of the product. But it can be, and there's other copilots out there too.

It's like with the help of Seek, taking stuff off their plate with the business users, all these awesome copilots that are coming out, it's just like the amount of just questions they can answer that are their own questions. It's not about the business people being able to ask more questions. Now, it's about if you're a data person, you have your own questions that you haven't been able to answer, and now that's back in your control, and that's what's going to unlock so much value for businesses is going to be the people answering their own questions.

[0:38:31] SF: Do you think that it could potentially change the nature of some of the skill set for being like a data analyst where maybe it's less about your ability to write great SQL to pull data or whatever it is that you're going to pull data, but then it becomes more about like, how good are you at sort of articulating new types of questions, to seek answers that maybe were really difficult to be able to think about before. It's more like the out-of-the-box sort of thinking?

[0:38:58] SN: Yes. I do think that if we are talking about the data people right now, sometimes you do want to think in code. Maybe you don't always want a natural language interface, actually. Maybe you just want to code and get to that flow state where you're exploring. I think that that's pretty reasonable to assume. Now, if you're on the business side, it's not like that. You do think in language and Seek can kind of be your mouse and keyboard, so to speak, to finally navigate all this data. But yes, if you're a coder, you kind of have the freedom to use both. I still code actually, to this day, and I definitely, I write a lot of code. But I use like three or four different copilots as well. I mean, that's my own experience.

[0:39:45] SF: Awesome. Well, as we start to wrap up, is there anything else you'd like to share?

[0:39:50] SN: Well, I mean, just to give a shameless plug here. If anybody's interested in contacting me or my team, we're happy to show you a demo. It's really easy to get in touch with me. Just go to seek.ai. You can find me on LinkedIn. You can find me on Twitter. Just Sarah Nagy, N-A-G-Y. Hit me up.

[0:40:11] SF: Awesome. Well, Sarah, thank you so much for being here and sharing a little bit about what you're doing at Seek and some of the things that are going on behind the scenes. I think is a really interesting problem to be working on and I really enjoyed our conversation. Cheers.

[0:40:26] SN: Yes. This was such a great conversation. Again, thank you so much.

[END]