Adding 4th and final post from Simon [deploy site]

diemol · diemol · commit 87d52f341cf7 · 2021-02-17T20:25:06.000+01:00
diff --git a/site_source_files/content/blog/2021/what-is-coming-in-selenium-4-the-new-selenium-grid.md b/site_source_files/content/blog/2021/what-is-coming-in-selenium-4-the-new-selenium-grid.md
@@ -0,0 +1,119 @@
++++
+Description = "In the fourth and final post in his series, Simon Stewart continues talking about what's coming in Selenium 4 and reviews what's new in the Selenium Grid."
+Title = "The new Selenium Grid"
+Date = 2021-02-05
+Author = "shs96c"
+AuthorLink = "https://twitter.com/shs96c"
+tags = ["selenium"]
+categories = ["general", "releases"]
++++
+
+>In the fourth and final post in his series, Simon Stewart continues talking about what's 
+>coming in Selenium 4 and reviews what's new in the Selenium Grid. 
+
+We’ve covered a lot of ground in the past few blog posts, including how to contribute to the project, and some 
+details of what you can expect as a Selenium user. But there’s more to Selenium than just the APIs you use to write 
+your tests, and one of the big features we’ve not covered yet is the refreshed Selenium Grid: a mechanism that 
+allows you to distribute your tests across a fleet of machines.
+
+Before going further, it’s always nice to acknowledge where we’ve come from, not only because it’s interesting, but 
+also because it helps explain the “why” of the refreshed design.
+
+Way back in the mists of time (2008), Jennifer Bevin and Jason Huggins worked on a system called the Selenium Farm 
+at Google. This was a fleet of machines sitting in a cupboard somewhere that allowed you to run the original 
+Selenium protocol. This was, of course, at Google scale, so there was more than one cupboard :)
+
+This allowed people at Google to distribute their tests and scale out beyond their individual machines. 
+This was such a great idea that when Jennifer talked about the Farm at a Selenium meetup, Philippe 
+Haringou (then at ThoughtWorks) decided to write an Open Source implementation of the same thing, 
+which he called “Selenium Grid”.
+
+Selenium Grid was a lovely piece of technology, but it had one drawback: it only spoke the original 
+Selenium RC protocol. That was fine, but WebDriver spoke a different wire protocol, which was called the 
+JSON Wire Protocol, and folks wanted to be able to use both Selenium RC and WebDriver at the same time.
+
+This is where Francois Reynaud enters the picture. He was working at eBay, reporting to Michael Palotas, 
+and he had written something like the original Selenium Grid, but which also worked with the JSON Wire 
+Protocol. They very kindly contributed that work to the Selenium project, and it formed the basis of the 
+Selenium Grid 2. We decided at the time that the Selenium standalone server would effectively be a 
+“grid of one”: it would have everything you needed in it to set up a Selenium Grid, as well as to work as 
+a single standalone server. Merging the code and getting it stable took a while, but thanks to the efforts 
+of Francois, Kristian Rosenvold, and many others, we merged Grid 2 into the main Selenium project and shipped 
+Selenium 2 in 2011.
+
+Even if 2011 doesn’t feel that long ago, the modern world has changed quite significantly. In 2011, we didn't 
+have Docker. We didn't have Kubernetes, we didn't really have AWS. So, the Selenium Grid didn't know that those 
+things were coming in, and wasn't written to take advantage of them. Fortunately, we did have virtual machines 
+back then, and the Grid 2 had been designed to be able to support them.
+
+This was the hook that an excellent project called Zalenium. Developed by Diego Molina, Zalenium added a really 
+nice UI and support for Docker and Kubernetes—all built on top of the Grid 2. This has allowed the Selenium Grid to 
+continue to be relevant and useful to this day, which is an amazing achievement.
+
+But, like I said, it took us time to stabilise Grid 2: about six months of hard work, largely led by Kristian. 
+That’s because while Grid 2 was sophisticated, the code was hard to read and hard to maintain—and very few people 
+could do that. Worse, the merging of the Grid 2 and the original Selenium server was pretty crude; effectively 
+there were two separate servers shipped in the same binary. This led to problems occurring in the Grid but not 
+when run in standalone mode, and vice versa.
+
+With Selenium 4 we’ve decided to bite the bullet and address these three concerns. Firstly, we want something that 
+was easier to work on and maintain. Secondly, we want to merge the servers into a single unit. Thirdly, we want 
+something that will take advantage of the world of modern infrastructure we have available to us now, not just in 
+the form of Docker and Kubernetes, but also to be able to use emerging technologies such as distributed tracing.
+
+To do this, we looked at the functionality the Grid provided, and to model each piece as a component that we can 
+either run “in memory” (allowing us to have a single standalone server), or in a more distributed way, allowing 
+the “hub and node” architecture that we’re familiar with from the original Selenium Grid, to a fully distributed 
+design.
+
+The first component is the “router”. It serves as the entry point to the Grid. You can expose it to the internet 
+and it directs requests to the Grid. It’s designed to be stateless, and so you can add more to Grid as you find 
+you need to.
+
+When the router sees a new session request, it places it on a “session queue.” The session queue is read by a 
+component called the “distributor,” which maintains a model of all the places in the Grid where a session could be 
+run: we call them “slots”. The slots are hosted by components called “nodes,” and each node can have one or more 
+slots. When the distributor pulls a new session request from the queue, it identifies the most appropriate slot 
+to use, and forwards the request to the node which owns the slot. Once the node has started the session, the 
+distributor puts the session ID and the URL of the Node that is running the test into the “session map”—you can 
+think of the session map as being a simple map of a session ID to URL—and the new session response is sent 
+back to the waiting test.
+
+Requests for running sessions (that is, for most webdriver calls!) are handled slightly differently. The router 
+will use the session map to look up which node to forward the request to, avoiding the need to involve the 
+distributor entirely. This means that you can just keep adding nodes to the Grid, and there are fewer bottlenecks 
+in the architecture to slow your requests down.
+
+Conceptually, there are these five moving pieces within the Grid. But there's actually a sixth one, which is a 
+message bus. The five Grid components communicate internally through the message bus, but the actual components 
+that you need to think about when you're thinking about the Grid are the Router, Session Queue, Distributor, and 
+Nodes.
+
+When you run Selenium Grid 4 in “standalone” mode, you actually get a “grid of one”. We wire up all these 
+components in a single process, but they’re all still there.
+
+You can also run it in the traditional Hub and Node approach that you've seen with Selenium Grid 2, where you 
+fire up a Hub and a Node and it registers. If you've used Selenium Grid recently, that's probably the architecture 
+that you're familiar with. In this case, most of the components (the router, session queue, and distributor) are 
+running in the Hub, and the Nodes run the sessions themselves.
+
+What’s new with Grid 4 is that you can go into a fully distributed mode if you want to. Typically, you would want 
+to use something like Kubernetes for this, and some of the key components are designed to store data using a 
+database or Redis for better reliability and scalability.
+
+Something to note is that when running a distributed Grid, it becomes really difficult to figure out what is 
+happening, particularly when something goes wrong. To alleviate that problem, we have adopted Open Telemetry to 
+bring observability into the Grid. What do we mean by observability? It simply means that we want to be able to 
+see everything that happens.
+
+Finally, we’d like to expose information about the running Grid in meaningful and useful ways. The original
+Grid supported both JMX (a Java management API) and an HTML-based console. While nice, this didn’t make it easy
+to query specific areas of the Grid that you might be interested in that weren’t surfaced by the UI (for example, 
+how many slots are available, or to find out which node a particular session is running on) To provide more 
+flexibility, we’ve chosen to provide a GraphQL endpoint for the Grid. To ensure that GraphQL endpoint is 
+sufficiently flexible, we’re building the new Grid console using it. That should allow you to also extract useful 
+metrics and information from the Grid for your monitoring needs.
+
+Those are some highlights of the new Selenium Grid. What are you the most excited about? 
+
+*This was originally posted at  https://saucelabs.com/blog/whats-coming-in-selenium-4-the-new-selenium-grid*