Developer Advocate Background and AWS Open Source Fork OpenSearch - Percona Podcast 37

by Kyle Davis, Matt Yonkovit

Link to listen and subscribe: PodBean

The Percona HOSS, Matt Yonkovit sits down with Kyle Davis, Senior Developer Advocate AWS, to tackle his background as Developer Advocate before diving into the details around AWS’s new open source fork OpenSearch. Kyle dives into the history of OpenSearch, the license, challenges, and roadmap. Kyle fills us in on how to use and to contribute to OpenSearch. If you did not see, Kyle generously donated his knowledge recently handling an entire track at Percona Live focused on OpenSearch. For a deep dive on how to use and deploy OpenSearch check out all his sessions which are available for free on percona.community or on our youtube channel.

YouTube

Link: https://youtu.be/0SqS0tt6YhI

Kyle Davis

AWS, Senior Developer Advocate

Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based home garden.

See all talks by Kyle Davis »

Matt Yonkovit

The HOSS, Percona

Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.

See all talks by Matt Yonkovit »

Transcript

Matt Yonkovit: Hi, everybody. Welcome to another episode of The HOSS Talks FOSS. I’m Matt Yonkovit, the HOSS here at Percona. And I’m here with Kyle Davis. Kyle, how are you doing today?

Kyle Davis: I am great. Thanks for asking.

Matt Yonkovit: Great. So if you don’t know, Kyle works for AWS and his project, his baby is the OpenSearch project, which is the fork of Elastic that just recently came out and recently went GA. So we’re glad to have Kyle here. If you hadn’t seen, Kyle also did probably the longest marathon session at Percona Live that we had, I think he had what, six sessions in a row, just on OpenSearch. So you know, a big deep dive there, which is awesome. But Kyle, thank you for joining us. And I wanted to talk to you a little bit about your background, because I’ve talked to a lot of different people over the last few months. And I don’t always talk to someone in a dev advocacy role. And this is something that is a very popular position that I’m starting to see crop up in lots of different organizations. And so maybe have you talk a little bit about your background, give us you know, your journey to how you became a dev advocate at AWS and what that means.

Kyle Davis: Yeah, sure. Well, thanks for having me. And the seasoning I’m pretty passionate about, I’m one of those people that I should have probably been riding my bike when I was a kid. And instead, I was writing software. And I have a big passion for that. And before I was in developer advocacy, and before I was, I had a career diversion into higher education. So I worked at universities for about a decade. So for me, it’s all about making sure that I can talk and share all sorts of information about developers and developer activities with people who are trying to learn this new set of software, or they’re trying to get involved with it in some way or another. So my background, previously, before this, I was a full stack developer, and I ran my own company for a while. And then I started using this software called Redis. And I worked for Redis labs for a long while there, the developer advocacy division, as well. And so for me, I’m a database guy. You know, that’s where I find a lot of my passion. And a lot of where I find, frankly, the neatest stuff to work with is in databases. So I like to be able to share my passion with other folks on this. So free open source project, I was actually hired for open distro for ElasticSearch, which is kind of the predecessor to open search. And kind of took this role because I want to work more with open source. I’ve worked with open source kind of projects throughout my career, but I really wanted the experience of working with a really interesting thing and trying to get contract contributors and that sort of thing. So in my role as a developer advocate, I have kind of a split responsibility. One, it’s to make sure people who are interested in open search, get what they need, like, how do they get the resources they need, is the product delivering what they need. And so I can work as that bridge between the developers who are using it, and those who are, who are wanting to develop it. And then also, my other responsibility is about contributors and getting people to contribute to the software as well. So giving people the resources that they might need to get started.

Matt Yonkovit: So yeah, it’s that dual path. So it’s you’ve got the feedback loop that you’re trying to get, I gotta be honest, every open source project I’ve ever worked on, or any product in whether it’s open source or not, there’s always that feedback. It’s like, I want more feedback, I can’t get the feedback I need from the users. And it seems like that is a missing component from a lot of software companies, a lot of products, because you want to make sure that you’re developing a product for the users that are going to use it. And a lot of times, you end up having two or three very vocal users who end up driving and most of the roadmap, which isn’t good, either. So it’s, it’s building out those relationships, so you can listen and hear. And it’s interesting, because I’ve seen, kind of this dev advocate evangelist role in many different companies take on different forms. And you mentioned two of the big ones, which are feedback and contributions, which are two, two big ones. But a lot of companies don’t necessarily even want contribution. So this ends up being a lot of evangelism as well, where you’re out there at Percona Live, for instance, and you’re telling people how to use it, how to instal it. So you’ve kind of got these three prongs, which is the feedback, you’ve got the contributions and you’ve got that evangelism that you look at all of the cool things you can do. Let me show you, let me teach you, then that sounds like it’s kind of your background, having come from university, having a passion to teach people how to do something more interesting. People who are trying to get into a dev advocate role, what advice would you have for them?

Kyle Davis: Gosh, right now, it’s great. This is I’m part of a deferral Slack called the dev rock collective. And when I joined, there was probably like, 500 people, which was a few years ago, now has over 2000 people. Yeah, a really active community. And there are, as you indicated, there’s lots of organizations that are that are hiring for for more developer advocates, so if you’re interested in this, basically the kind of ingredients to be a Developer Advocate is you want to know yourself, right, you need to be able to learn quickly, because you may get a job. And that is not something that you’re fully familiar with, right. So maybe you’re relatively new to it. So you have to dive in and be able to get there. And then you need to be open to talking to people. It’s really a people person type of role that has technology as its kind of background, right. So it’s thinking about communication and communication with developers and being empathetic to what they need is our key component to being a developer advocate.

Matt Yonkovit:
Now, it’s interesting. Developer advocate is very focused, it’s, it sounds like, it’s all developer focused. And in a lot of cases, it ends up transforming into more than just a developer because you’ve got your infrastructure people, you’ve got your DBAs, you’ve got your sis admins, you’ve got your DevOps engineers, you’ve got all of these different players in the space. And I think that a lot of companies are looking at how we reach those developers. But a lot of what we do ends up reaching not only developers, but it reaches those infrastructure folks as well.

Kyle Davis: That’s a great point. So the interesting thing is that there’s not a good term, that encompasses everybody. We’ve seen organisations jump through hoops, trying to say technical evangelism, or something along those lines to encompass people who operate software, people who are DBAs, people who are name these things. And so for, as a field developer advocates generally just use the word developer to mean not only people who write software, which is kind of, that’s the probably best definition, but developers, but also people who run the software and people who enable others to run the software instead of the infrastructure. So it’s kind of a catch all, for sure. And with open search, for example for sure, we have people who are writing Python, or Java, or whatever, every day, but we have a lot of folks that are more on the infrastructure side, and they’re monitoring and the log analytics, and trying to make sure that their services are staying up. So we have to make sure that we’re addressing everybody who would be using the software. And so often, the software addresses more than one kind of group of people. So it’s a challenge. And I think part of that is, this is not a good word for it.

Matt Yonkovit: Yeah, well, and this is where I come from the database space as well, as you guys like, I’ve been in the database space now for 20 plus years. And I’ll be honest, my experience, and this is me talking is developers don’t like databases. Right? Like, it’s like the necessary evil for their application, which is great, because like a lot of the new technologies that are coming out, whether it’s Database as a Service, or whatever, it enables developers to move quickly and click the button, it’s up running. But that shifts a lot of the focus for people who are trying to do that evangelism and trying to get people excited, or show them what they can do. Because a lot of the features, sometimes there are features that are like, yeah, they’re totally needed, security’s totally needed. But the mindset of a developer doesn’t always think about those security or stability or scalability issues. They’re a thinking feature. So, I mean, like, and this is, I mean, picking your brain here, how do you reach developers who might not necessarily care as much about the database internals or the functionality? That’s a challenge that I’ve seen many companies across the space run into.

Kyle Davis: Oh, yeah, it’s a challenge for sure. The one thing I think is interesting about it is when you get down to the basics of it, right? In my mind, databases, and you know, permanent languages are really two sides of the same coin. And so when you start exposing developers to that, it makes a lot of sense. I mean, if you look at SQL, right, like it derives from Ada. Like it has the same roots as Ada, right. And I learned Ada early in my career, which is a bizarre language learning. And when I figured that out, I was like, Oh my gosh, the world opened up to me. So I try to bring that along, for a lot of folks, to show people a database is just a really specialized domain specific language that will allow you to do really powerful things that you didn’t think were possible, right? So often it’s people will work around and do everything possible to avoid using some sort of feature and database. And when you expose them to some of the features that you can use, it’s like, everything that I’ve done so far has been too much. And I can just use this one little feature that will really make things a lot easier on myself. But it’s also a challenge too, because people are DBAs out there, for example, love DBAs. But there are DBAs out there that want people to stay one metre away from their database, right? For stability, or lots of other reasons do.

Matt Yonkovit: Yeah, no, it’s so interesting. You know, like, you talked about, like sometimes people avoid using features, because they want to prevent locking or whatever reason, they’re like, Oh, I’m not gonna use this database feature. I had a company that I worked for 10 years ago, when I was doing consulting. And they actually didn’t want to use database joins, in case they wanted to change the backend. So they would basically read two tables, and then join it in Java.

Kyle Davis: Yep. 100%.

Matt Yonkovit: Why would you do that? Like like, it’s, yeah, it’s weird stuff like that, that still happens. But thanks for telling us a little bit about the dev role background. And now coming into AWS, you mentioned you’ve been there for a bit of time now. And you started with coming in, hey, you’re gonna work on getting code contributions on the open distro. And then things kind of changed, right? So it changed to this fork called OpenSearch. So for those listening who might not have seen your sessions might not have seen that, maybe just give us a high level on what OpenSearch is?

Kyle Davis: Yeah, I think this is important to understand the context. So in the beginning, there was Elasticsearch, right, and Kibana. And there are multiple different ways you can get that right, there was an open source version. And then there was the version that has proprietary code that you couldn’t use, if you, for example, had an organizational prohibition from using certain licences, right. It might have been free, there might have been some that was paid for, so on and so forth. And that worked fine for a lot of folks. Basically, any of us said our users are asking for a lot more. So what’s the best way to solve that? Elasticsearch and Kibana, both are pluggable architectures. So they built a series of plugins to get both a series of plugins, release them as open source. And then package yourself together with an installer and a couple other tools and call it open distro for Elasticsearch, somewhat similar to a Linux distribution. And then it happened in 2019. Right. And that right, a lot of function for people because it was pure open source, right, it’s Apache two, up and down the stack. And you had the features that you needed. Some of these features were just table stakes, like security, like being able to make sure someone had to provide a password to get into Elasticsearch. And we all know that we have databases and a password, just asking for trouble, no matter how much you try to secure it. I’ve never heard of a problem occurring. Because if somebody didn’t, yeah. And so that existed for a while. And I think that we had a lot of adoption for that for across a variety of industries, people didn’t want people to adopt it, because they wanted to not have lock in or they wanted to have the same limit. That’s what AWS is. Amazon Elasticsearch service ran on, right. So they want to have the same type of functionality, either on-prem or in the service that they were using.

Matt Yonkovit:
Portability between the cloud and on-prem.

Kyle Davis: Yep, for sure. And you know, that’s totally reasonable. And so that existed for a while and, and then in January of 2021, we were in a meeting, and one of the developer managers in an open district said, Hey, there was a blog post, posted the other day this morning, and it said that there’s going to be a licence change on Elasticsearch and my thought was, okay, it’s been a good few months here, I need to go find a new job. I thought it was all over right. And come to find out Elastic changed the licence to a proprietary licence SSPL or the Elastic licence. And that meant that Open Distro was not something that was possible, any to be developed anymore. There’s no base for that. So after Elasticsearch 7.10.2, the licence was changed to this dual licence channel the last questions in SSPL, which both are not open source licences, their proprietary licences, they may be defined as sources available, but they’re not open source by the Open Source Definition. Right. And so, after a lot of deliberation, the decision was made at AWS to say we’re going to stick a bunch of engineers at this and let’s create a purely open source fork of Elasticsearch and Kibana which involved this renaming and because it is a fork, we needed to rename as well as the code base for searching. Kibana was a little bit unusual because it was mixed, right. So some files were proprietary and some files were open source. And so there was a lot of meticulous separation of that, right. So basically, we have stripped out all of that history, build path, new build processes, we had to strip out anything that was like marketing stuff that was being put into it, any type of phone home or telemetry collection that was being used, that had to be all stripped out. And then we took the plugins and tools that are part of the Open Distro project, and combined that with these new forks of Elasticsearch and Kibana and Elasticsearch became OpenSearch and Kibana became OpenSearch dashboards. And so combining those together with the plugins, you have this new kind of stack out here, that has all the great features that you would expect, some are premium features that other organisations charge money for. And now you have it all as an open source, Apache 2.0 stack that you can use, pretty much anywhere. Apache 2.0, is really permissive and used on places.

Matt Yonkovit:
And I noticed that there’s actually quite a few companies that have started to contribute as well and work with you. So it’s not just an AWS project. It’s a community driven project. So I mean, and there are other bigger companies out there that are starting to become part of this community and this collective? How has that growth been going?

Kyle Davis:
Yeah, so being community driven is really important and community driven as well, it’s kind of it’s a little bit of a it can mean a lot of different things to many different people, right? From our perspective, like, we want to build what people want, right? And as part of that, we want people to contribute, right? So we’ve been really aggressively looking at other folks to help us along the way, when I say us, I’m saying eight of us in this case. And then collectively, we need to OpenSearch project. So you, right Logz.io has been an active contributor. We have a lot of kind of drive by contributions, we have a lot of people from the Lucene project, which is one of the things that OpenSearch and Elasticsearch are based on right, have come in and dive in and helped us really build some great software. So we have a partners programme for those people who are wanting to have a kind of critical relationship, but in reality, everything occurs in the open, right, if you want to be part of the partners programme, it’s a pull request, right? So we really feel like this is all part of this, and we’re working on being as transparent as possible, right? As we go along, we’re bringing in more people into the fold and that’s going to be taking the form of different people in different organisations being able to have complete control over parts of the project, right. So it’ll be something that we see as not just an AWS thing, but really see as if you have the desire, the drive, and the engineering to contribute to it, we want to empower you to do that. Right. So there are some complications: OpenSearch is a very complex project, and the release is not simple. But I think there’s some architectural changes that are eventually coming down the pike that will make like this an easier project for multiple stakeholders to have a true ownership.

Matt Yonkovit:
Yeah, and, and honestly just just from what, what Elastic was the fork, a fork process, it’s a software this large, is a daunting task. I mean, that’s a very big undertaking. It’s not something that you can you can undertake lightly generally what were some of the challenges in forking such a large project, like, what did you run into that was we needed to overcome this, you needed to overcome that like, what were some of those things because it’s kind of a last resort generally, in most companies to fork something. You know, you want to contribute upstream, obviously, when they change the licencing, if you can, but what, maybe, maybe talk to us a little bit about That process of some of the gotchas in the big fork and you know, things you had to overcome.

Kyle Davis:
Yeah, it’s challenging. It’s super challenging. That’s been a constant discussion like, can we contribute this upstream somewhere? Or is it something that we need to fork because it should be the last ditch effort to fork something? And this is a case where Yeah, it should be. I think, for a lot of folks, if you have a kind of a cursory use of open source projects, you just go to have hit the fork button, and you got a fork. But that’s not the reality when you work for something that has so many components. And it’s such a large code base that has 10 years of development involved in it. So as far as what we had to go through, right, like, one of that was the separating out of the different licences, that’s done. And we never want to do that, again, incredibly painful. Yes, it was a lot of work, we have two different projects, where we have Kibana, that we turned it OpenSearch dashboards and an open Elasticsearch, we turned into OpenSearch. And they took beautiful teams doing this because they’re different programming languages. And you know, some teams, one team said, we’re gonna look to every single file that was OpenSearch, and an OpenSearch dashboard said, we’re going to copy, replace, clean up the mess, right? across the whole thing, I can’t think either one was really good. It was hard either way. So what we did was keep on finding little bits and pieces here, here and there. You know, right before we released our release candidate, we were going through, and we saw for one half of one second, a logo came up, that was not a logo for OpenSearch. And we had to delay our release, right? Like that was something that we looked at and said, That’s not typical for it to go forward in time come to find out that that logo was embedded in dependency, so it wasn’t something in the code, right? So we had to go and figure out why that was. But literally, for 480 milliseconds, it was on the screen. And we had one eagleeye developer who was kind of queuing and said, Wait a minute, I just saw something. And so just stuff like that. I mean that’s an eyeblink, right. And then we’ll find things like URLs that are embedded that follow a format that does open searches and have and we’ll find these different pieces here and there. And we’re gonna find these little pieces everywhere. But we’re very confident that all this kind of privateer stuff is gone. And so if you wanted to fork if, Matt, you said, I want to fork OpenSearch, you could, and it wouldn’t be something that you would have to do that kind of same painful process. And that’s important, because it keeps everyone honest.

Matt Yonkovit:
Yeah, because I mean, I think that’s the beauty of open source, right is you have the opportunity to take the software and do what you need to, and if whoever the maintainer is, isn’t keeping up with what you feel you have the opportunity to take it and try something else. And that’s a powerful thing in the open source space. And it’s something that I think a lot of us have valued the ability to do that. Because you can not only contribute, but if your contribution doesn’t make it in, you always have that opportunity to make your own version, build your own libraries, things like that. I mean, it This has been classic in the MySQL space and the Postgres space for years. Because a lot of times, what you’ll see is a company will come along, and they’ll say hey, I want this special feature, this special, I’m gonna hook into my security infrastructure, my monitoring infrastructure, and then it just doesn’t make it upstream. And so they maintain a fork, and they have their own internal thing. And many companies have done that over the years. Cool. So, you’ve gone through the work now, and you just recently GA. So congratulations on that. So now it is officially GA and ready for primetime. It’s out there. And the OpenSearch is this now, what’s running behind the scenes as well like the full stack for the service for AWS?

Kyle Davis: Not quite. So that’s something that we’re gonna be working on. I mean, right now, I’m on call every day about that. So well eventually happen is right now, the services Amazon Elasticsearch service, we’re changing that into Amazon OpenSearch service. And we’ll have some sort of formally known as Amazon Elasticsearch service, so people can identify it. And then we’ll be that’ll be the prime. You know, the first thing you’re offered as far as that, but if you are an existing user, or you want to use some old version, we do maintain old versions. So it’s not like you’ll be forced to use open search, right? So you can still select Open Distro based ones. I think 19 versions back, it’s it’s we keep everything alive. So you want to stick with it, go for it. My job is not really so much working with the service. So I can answer just the kind of brief questions about it. But yeah, the one thing I do want to say is the thing we really want to kind of tighten up on is we want OpenSearch service to be a kind of model version of open search. And we want to make sure that if there’s an open source release, as soon as possible, right now we’re in a period of weeks before it’ll be out on the service. We want that to be tightened up as much as possible. So if we release OpenSearch 2.4, in the future, we’d love I don’t know if we’re ever gonna get there or not, but within hours, it will, you’ll be able to get on service. And our release process. Great, would be that way.

Matt Yonkovit:
And I think that that allows people to run the mixed use cases. And I think that’s a great outcome.

Kyle Davis: Can I mention one thing here? I do want to say something about that, though. If you want to use OpenSearch right now, there are services that provide OpenSearch and do it, Bonsai search, for example, is offering to the customers right now. So they and I think one day, we’re able to get OpenSearch GA up on their service. They beat us to it, which is great.

Matt Yonkovit: Yeah, no, and that’s what you want to see is you want to see that collaboration and work together. And as these projects become more successful, the more people who can benefit from them, the better off the ecosystem is. Right? Yeah, it’s about making the pie bigger for everyone, and then everyone is going to share it. So I do think that that’s an excellent thing that you actually had someone else beat you to a service for open search. So that’s cool.

Kyle Davis: We’re proud of that.

Matt Yonkovit: So if people are looking to get started, and maybe you know, number one, start using and number two, maybe contribute, what does that look like right now?

Kyle Davis: Sure. So the way you can get started, the easiest way, I go to OpenSearch.org, on the top right hand corner, there is a button that says get started. And the first thing there is the doc compose that will launch a cluster and OpenSearch dashboards. So if you want to play around with it, within a few minutes, you’re up and running with just you know, you download the ammo file and run Docker compose up, you have a coffee because you have to download lots of dependencies. That’s just the way Docker works. And you’ll be able to get in there. And then you can start by clicking there, we go to dashboards. There’s, on the front page of it, some sample data. And you can start querying, it’ll load some sample data and you can start getting your hands dirty with visualisations super easy.

Matt Yonkovit:
Great. And if you’re if they’re looking to contribute, maybe participate in some of the discussions around features and roadmaps? Where can they do that?

Kyle Davis: Sure. So a couple different ways to do that. Like I said, we do everything on GitHub, right? So you can look at our roadmap, you can kind of dive into OpenSearch project is our GitHub ID. And also, we want OpenSearch.org, if you want to just kind of follow those links. And basically, you can, there’s lots of help you want to tag out there, just dive right in. The nice thing about the project is we don’t use a CLA, which is a kind of barrier to entry for those who haven’t used those before, you have to sign a legal document, and it’s kind of a pain in the butt. But we do use what’s called DCO, Developer Certificate of Origin, which basically sign off your commits. And it’s a lighter weight, and it still provides the protection that your code is contributed in a way that makes a lot of sense. It’s something the Linux Foundation uses. So you can get started really quickly. We do have community meetings that we have pulled bi weekly. So if you want to join in on those, you can kind of get the lowdown on those. And it’ll be really easy to kind of bridge your way into it. But then we have forums as well, that you can kind of gauge if you have questions or anything like that, that you can do. You can see what’s the most appropriate way to dive in? And we’ve had a lot of people that have said, Hey, is this possible thing on the forums? And we say yes, do it, and they’ve done it. And that’s great for everyone’s cool.

Matt Yonkovit:
Yeah, it’s good to get instant feedback and be able to contribute back. So that’s awesome. And so what does the future hold? Like you’ve been having these kinds of open discussions, where does the roadmap take you in the Next 6-9-12-36 months? Who knows a lot. Yeah, whatever you got planned.

Kyle Davis: So the best thing about OpenSearch, and I think it’s really a credit to the people who are doing it is that we have this policy of having open roadmaps. So we have a GitHub project, basically. We keep our roadmap available, and you can get that open. OpenSearch.org has links to it, it’s a bit of a long URL, but you can take a look at it and scroll down and see what we’re doing. And that’s the road that we work from. So, it’s interesting, we look at that. So we use semantic versioning. So, I’m starting right now. 1.0 is out late August timeframe that we’re looking at 1.1. And we’ll do kind of a release train schedule. So about every month, we’re gonna release something. And we’re looking at OpenSearch 2.0 coming out in January of 2022. So what’s my version 1.1 range is largely compatible with Elasticsearch 7.10.2, there’s just a few little nuances and setup things that are different. And 2.0 will bring the possibility of breaking changes. But that comes with a lot of ability to start advancing things really quickly. So we already have some breaking changes planned, they’ll be pretty minor, we’re going to get rid of some language that’s not inclusive, that will break some things we want to move away from just some things that are common in industry, that may not be the best for the whole industry. But that will be a breaking change. It’s nominal, but we’re looking at that. But we’ll also start looking at adding in some more features and changing some architecture that will really make it better for everybody.

Matt Yonkovit:
Great. Well, Kyle, thank you for hanging out with me this morning. I do appreciate it. And I appreciate you guys being transparent and helping out the community with all the things you’ve done with open search, and we look forward to seeing where the project goes.

Kyle Davis: It’s gonna be fun to watch.

Matt Yonkovit:
Wow, what a great episode that was! We really appreciate you coming and checking it out. We hope that you love open source as much as we do. If you liked this video, go ahead and subscribe to us on the YouTube channel. Follow us on Facebook, Twitter, Instagram and LinkedIn. And of course, tune into next week’s episode. We really appreciate you coming and talking open source with us.

Did you like this post? Why not read more?

✎ Edit this page on GitHub