Abstract
In 2008 MySQL Cluster separated from the main MySQL Server to become a new product with a different development pace to better suit the ‘telco’ customers and users. Today we are at MySQL Cluster version 6.3 and some exciting new features are coming up which make it even more appealing for any setup which needs to be highly available.
This talk will be a short introduction into MySQL Cluster with some technical details on how it works. Additionally, some use cases will be shown how and where it could useful.
Additional material
Here you can find all available material for this talk.
PDFs
Audio recordings
Video recordings
Transcript
Geert Vanderkelen: [0:00] So, my name is Geert Vanderkelen. I’m a Belgian and live in Germany. I work for Sun Microsystems. I’ve been one of the MySQL guys who were taken over, assimilated by Sun last year, and I’ll be going to another company this year. [0:19] I work from home. So, we are support engineers. We do global support, 24 hours a day, in three time zones – Australia, India, and the Americas. So, we have our support team.
[0:35] I always try to mention – it’s not sales, but it’s pretty cool to know – that there is actually a support organization behind MySQL, since long. This is nothing new, more than five years now, six years.
[0:50] So, oh, five minutes.
[0:54] This talk is going to be about MySQL Cluster, technical side of it, where to use, where not to use it, details, features, defining it. This is going to be very short. Usually, I do this in one hour. Normally, you do that in one day.
[1:22] So, what is MySQL Cluster? It’s a distributed, in-memory storage. Distributed meaning it’s over network. So, your data is not locally stored on your MySQL servers, but it’s somewhere “in the cloud,” like they say now. So, it’s distributed over network. Every MySQL server is fetching the data from data nodes. I’m going to show a picture after this one.
[1:51] It’s going to be to make your data highly available. So, that’s the main purpose of MySQL Cluster. It was also developed by Ericsson, back in the days. MySQL AB bought the software innovation as well. And the whole team came to MySQL. I’m not sure in which year this was, but it was somewhere late ’90s.
[2:15] It’s also very important, shared nothing. It means that it doesn’t share any disk with each other. So, every data is stored on a machine, which is completely independent of the other machines. We have customers actually using SANs to store all the data in one place, which is a little bit stupid. But yeah, it’s also possible, but that’s not the point of MySQL Cluster.
[2:39] It’s transactional. So, the second service engine of MySQL, which is transactional, and of course, its assets. I’m not going to go further in, because we don’t have time.
[2:52] This is how MySQL Cluster works. The data is stored in the data nodes. These are processes running on machines here. These are separated from the MySQL servers, everyone knows. So, the gray ones are the usual MySQL servers, and then you have the applications connecting to these MySQL servers.
[3:16] So, basically, it’s transparent to the client, to the applications. So, basically, you can use Asterisk or whatever you want, connect to a MySQL server, and that one is connecting to the data nodes.
[3:29] As you can see, you can have billions of MySQL servers. Still true. You can only have about 200 MySQL servers connecting to data nodes. It’s a little bit more, but that would be scalable. So, you have 200 MySQL servers who are looking at the same data at the same time. If one permits a transaction, the other one sees it immediately. So, that’s a pretty cool thing.
[3:55] Very important to MySQL Cluster is that you can bypass the MySQL servers using NDB API. And this is one of the major features of MySQL Cluster. Our biggest telecom customers are writing their own applications using NDB API. So, they’re actually not using SQL anymore, because this is much, much faster.
[4:21] If you want to know more about NDB API, just go online, just check it out on Google. Sometimes they’re painful to program with it, but it’s pretty cool.
[4:33] Also, the MySQL servers connect with the storage engine, which is using the NDB API as well.
[4:41] Let’s go a bit further in detail, all the nodes. Every node in MySQL Cluster is a process, it’s on a machine. There can be multiple cluster nodes on one machine. That’s possible. It’s, of course, not the point, but it’s possible to actually put everything on one machine. These are just little processes, little.
[5:03] So, the data nodes contain data and indexes of your data. They’re also responsible of handling transactions inside the cluster. So, they’re actually the kernel of the MySQL Cluster.
[5:18] They’re all connected to each other, so if one fails, the others know. And you can have up to 48 data nodes. So, if you want to scale out, have more memory available, because it is, of course, in memory, then you put more data nodes.
[5:40] Why is it 48? Ask the Swedish guys. I don’t know. Maybe they drank a little bit too much vodka.
[5:48] Actually, if you are at 10 and you want more, then you do something wrong. So, actually, 10 would be the maximum, in the end, practical maximum. This is very theoretical. So, why 48? It’s a good question, actually.
[6:05] The SQL Nodes. These are the MySQL servers. We also call them the API nodes, because they are using the NDB API to connect to the cluster. They connect to all the data nodes, and they are fetching data.
[6:21] There are some examples of NDB API applications coming with MySQL – for example, the NDB restore with which we can restore backups into the MySQL Cluster.
[6:38] Management Nodes. These are responsible to actually handle the configuration of MySQL Cluster. Each node has to come up, needs to know where the other nodes are. So, they are all connecting to the management nodes to fetch the configuration.
[6:55] This one is also important for logging, so monitoring the MySQL Cluster. It shouldn’t be up all the time, but it’s very good to actually keep it running, and sometimes have two of them available so you have logs. It does help if something happens. It’s highly available, but when a data node crashes, you want to know why. These are important for that, to see what happened in the cluster. We usually don’t skip support when you don’t have logs.
[7:29] You have, also, management APIs, so you can make your cool applications for monitoring MySQL Cluster. For example, you can make a nice plug-in for Nagios, or whatever monitoring system you are using.
[7:44] Now, actually, I’m going too fast, so if you have any questions, raise your hands and ask away.
[7:53] The minimum requirement for a production MySQL Cluster is three machines. You need two for the data nodes. So, each machine has on it one data node process, and then you need one for the MySQL Server or your API applications. This means, three, minimum.
[8:17] You can install MySQL Cluster on your laptop. It’s possible, to check it out. It’s certainly highly available. Data nodes, using lots and lots of memory. So, if you have money to buy machines with 265 GB of RAM, do it. It is quite expensive, but you need lots of memory and lots of disk space as well, because everything that is installed in memory is at one point check-pointed to disk.
[8:49] So, if the process crashes, if the machine crashes – yeah, I’m dying myself now, I need my notes. So, if it goes away, if it fails, then it starts again by reading the data from disk and puts everything in memory. So, I will show that later on, how it works. But, therefore, it needs lots of disk space because it makes a few checkpoints. Very important.
[9:26] I am not sure how much people are using MySQL Cluster or have used it before – raise your hands if you have.
[9:33] Wow, cool we have two and a half. [laughs] MySQL Cluster 7.0 – I’ll talk about the version later on – brings multi-threading, so it is faster. Now, we can actually use MySQL Cluster with multi-core CPUs. Before, it was actually a single thread, because it was made for old switches of Ericsson. It is the legacy. So, now MySQL Cluster 7.0 will bring multi-threading.
[10:08] If you want to scale out, that you can do with more MySQL servers that connect to data nodes. So, you can have your PBX system on this server with MySQL running and then you can add a 100 more if you want. This is a way to scale out.
[10:32] Very important, this dedicated network – that is my next slide actually – is in the data center. I gave a talk last week, same thing, and this was around data centers and stuff like that. So, data nodes should be, first of all, behind the firewall – it stands there in the middle – because, there is actually absolutely no security in the data nodes. If you put it on the Internet, “the cloud,” then everyone can actually access your data nodes and actually shut it down. So, it is not really highly available. So, you have to put it up behind a firewall.
[11:15] On a separate network, just any traffic that goes… If you are, for example, downloading a movie and you are on the same switch as the data nodes, you might have problems. It might crash data nodes. It is very sensitive to lots of latency on the network. That is because there is heart-beating between the data nodes.
[11:39] So, different racks, different buildings, if the latency allows it. So, data nodes should be grouped. I will show that later on as well, but this is important – its own network and trusted users on MySQL. There is a nice little trick in MySQL, I am not sure if anyone knows this, but you can execute shell commands under MySQL prompt. This way you can actually shut down the whole cluster also. I didn’t show it completely, but it gives you an idea of what you can do there – hacking.
[12:15] So, does MySQL deliver performance? But, the focus is really on high availability. So, it’s performing well, but for example, if the network is slow, it really kills the whole thing. So, that is why you need a dedicated network. So, don’t expect it to be very much faster as, for example, InnoDB. So, local stored data is still faster than data that has to come from a network. But still, it’s quite cool, it’s performing quite well.
[12:56] It is scalable. That is what I mentioned before. You can have up to 48 data nodes. If you do more than 10, you did something wrong. In total, you can have in the system 225. Before it was 64, but now it was put higher to 225 nodes in total.
[13:21] It is very easy to do MySQL Cluster. Just alter the table, and you need to give it the engine, NDBcluster, and your table is in the cluster. That is how easy it is.
[13:37] We have another thing that is pretty cool in MySQL Cluster, that is the geographical replication. It is actually a normal MySQL application, but I am not sure if you guys have heard about row-based replication. This was developed for Cluster, this was important for Cluster. I’ll show a nice little graphic later on; it shows how it works.
[14:04] What it doesn’t do, so if you have big data sets, MySQL Cluster might not be ideal. If you have lots of blogs and lots of forums, and your data set is one terabyte, it will not fit in memory. That is pretty obvious. So, basically you want to hold hot data in the Cluster and the data warehousing. Pull stuff out of MySQL Cluster and put it somewhere in InnoDB MySQL server. So, you want to offload data from the Cluster.
[14:41] We have also disk-based data, so not everything should be in memory. This is still a feature that is getting better each version. Basically, you can put non-indexed data on disk, that doesn’t have to be in memory. This saves a lot.
[15:03] I have a friend in Belgium who tried to do Drupal with Cluster and failed, because it was too big. So, not every project will just fit in there. You have to actually write your application for MySQL Cluster.
[15:19] Joins are very bad, for example. If you joined two tables, that is quite hard because MySQL server needs to get all the data from one table, needs to get all data from the other table, and then calculate the result. So, this is quite heavy. It is two operations on the network. So, joins are a little bit problematic in MySQL Cluster. That is why primary key lookups are very good.
[15:49] BLOBS are still painful. So, big BLOBS; it is also getting better each version. I might have mentioned before also you cannot just put any application in there. You have to think about it a little bit more. But, it’s possible.
[16:11] OK. This is more technical stuff here now. So, I’m going to show how data is distributed in the data nodes. How backups work and fail over works and how the replication works between two data centers, for example.
[16:34] I figure everyone knows a little bit about databases. You can partition a table horizontally so you have a table and you can put in two pieces. Basically, you have a table here and say this is partition one, this is partition two.
[16:49] This is what happens in clusters automatically. So, it partitions a table between data nodes. It keeps a copy on each data node from the other one so you have actually data stored twice in the cluster. And this is how it works. It’s the only animation I have. But, I have two slides with animation. It’s pretty impressive.
[17:15] So, we split our table so we have two fragments or two partitions. It goes to each data node. These are data nodes, processes running on two separate machines. That means the data is stored on the data nodes. The problem here is, of course, if this one fails, you miss half of your data. So then, you have only that one and that one will not work because it says I cannot work with half of my data, which is normal.
[17:51] So, what is done inside MySQL cluster is synchronous replication. Each data node will replicate partition of the other one. So, if now this data node fails, the other one will automatically in a split second – not even a second, flash – take over and make the whole data available. This works actually.
[18:23] If this one comes back and if it was, for example, a minute it goes away, then it will check with its partner and say what did change and it will sync automatically. If this one was gone for a day, which is possible, and it comes back, it will completely re-sync because the changes were too much and it has to take the whole data again from its partner.
[18:51] This all works online. This works without interruption. It should work without interruption. The applications don’t see anything about this operation so this is what high availability is all about in MySQL Cluster – keeping a replica of the data.
[19:14] This is more… Yeah?
[19:16] Audience Member: [off-mic question 19:20]
Geert: [19:17] Oh, sorry, sorry. Wait.
[19:20] Audience Member: [off-mic question 19:21 – 19:33]
Geert: [19:37] Can data be inconsistent? No, it’s not possible.
[19:41] Audience Member: [off-mic question 19:42]
Geert: [19:42] Split brain?
[19:44] Audience Member: [off-mic question 19:45]
Geert: [19:50] Yeah.
[19:51] Audience Member: [off-mic question 19:51]
Geert: [20:02] No. This is the complete cluster; so you mean if this one fails and the client’s writing that one? But, it’s synchronized and… [20:14] Audience Member: [off-mic question 20:15]
Geert: [20:20] Yeah.
[20:20] Audience Member: [off-mic question 20:21]
Geert: [20:30] So, what do the users do when this one goes away? Nothing. When this one crashes, the transactions going on will stop, so they have to be retried. But, the applications itself, like the MySQL servers, they should not care. They should not really care. [20:51] So, if a data node is crashing and a transaction is going on, that transaction will be stopped, which is fair and is good. If the client again tries it, it will go to the other data node. Data is always available. You can always write in there. It’s no problem.
[21:13] Audience Member: [off-mic question 21:13 – 21:38]
Geert: [21:38] So, split brain situation. This is actually what I’m now going to show or trying to show. This is again with four data nodes, a little more complicated. The tables are fragmented in four. F1, F3 goes to the first node group. The other two fragments go to the other one. [22:00] Every node group manages its own data. So this one is now replicated just like before. So, we have this node group is responsible for F2, F4. They are replicating. That one is responsible for the other partitions. So…
[22:19] Audience Member: [off-mic question 22:19]
Geert: [22:22] No. So this node, this is the normal setup. So, it’s called a number of replicas. You can set it to one, two, or three. When you set it to three, we have no clue what happens. We don’t test that. So this is a situation people use that, but we don’t do that. This is actually the best situation you can have. We can set it to one if you want to live on the edge, but then you should not use cluster because then you don’t have highly available data. [22:58] So, this is a normal setup. Usually, it’s four data nodes and two node groups. Here’s an example how you could put two data node processes on one machine. So, you have Host A, Host B. Both are running data nodes. If Host B crashes and fails, then the other data node will continue because it has all the data.
[23:34] Actually, it’s wrong. That’s actually wrong. [laughs] Forget it. [laughs] This one should say two? No. I had too much coffee. One, two… I changed it once. No. No.
[24:05] Audience Member: [off-mic question 24:05]
Geert: [24:07] No. [24:09] Audience Member: [off-mic question 24:09]
Geert: [24:14] No. Yes. Mea culpa. Well, actually, the thing is, if you would switch those, if you put F3 to here and F4 here, then it will fail, because then you have – you need always one data node in one group. [24:44] So, if you have your two data nodes failing, like one and two – so, data nodes one and two fails – then the other data node group will also fail because it cannot survive. So, you need always one data node in each group. That’s the idea.
[25:01] Audience Member: [off-mic question]
Geert: [25:05] A split-brain is something else, that’s where the data nodes, if you have two data nodes and the network between them is gone, then one data node only can survive, the other one has to stop. This is called arbitration, and this is done by the management node, usually. All set, MySQL servers can do that. So, that’s the split-brain, that’s something very cool, but should not happen. Yeah? [25:37] Audience Member: [off-mic question 25:38]
Geert: [25:44] So, does it make sense to actually have two machines with each one allowing two data nodes? Yes. Before MySQL Cluster 6.2 we said no, but now it’s actually OK. So, if you have a four-core machine and you have lots of memory, then this makes sense. [26:05] Audience Member: [off-mic question 26:06]
Geert: [26:06] Yeah, you use your resources more on one machine. Usually, it’s one data node per system, per machine. That’s how it’s mostly done. Yes, question? OK, cool. Good. I have five minutes. [26:22] Fail-over. This is a configuration file. It’s pretty cool, it’s called config.ini. This is set up where I’m going to show how the fail-over works.
[26:36] As you can see, there’re the MySQL servers in gray and there’s the white data nodes where the data is stored. If a MySQL server crashes, it fails, then the applications have to go to the other MySQL server. Nothing happens here, these ones actually don’t care if the MySQL server goes away. So, the application is responsible to actually switch to another MySQL server.
[27:05] If a data node fails – so, this one, it says “boom” – the other data nodes will take over and all the data stays available. There, nothing happens except the transactions will be canceled and the applications have to retry the transactions. Usually, if you use transactions you actually have to do that anyway, so it shouldn’t be a problem.
[27:33] So, that’s actually how fail-over works in data nodes. And this is, again, transparent; the applications don’t notice anything about this, except for transactions, of course, but that’s stopped. The data node, like I’ve said before, comes back, resyncs with its partner, and everything comes again online. It’s a beautiful world, MySQL Cluster. It’s pretty cool.
[28:00] So, the management node, if that one crashes, it’s OK. They can continue.
[28:07] Very important in any system is hot backups. It’s also part of the high availability of MySQL Cluster. You can make backups without any problems. You just start backup and the data is done on each data node. It’s the only hot backup, actually, we have in MySQL right now.
[28:30] You don’t use MySQLDump. It’s inconsistent by definition because there is no concept of table locking in MySQL Cluster. So in NDB, you cannot lock tables. You can do it on a MySQL server. You can say “lock tables” and so on, but the other MySQL servers will not know these locks. They just still write in them. So, you cannot use MySQLDump for that.
[29:00] The schema has to be dumped with MySQLDump. This is important. So, if you use MySQL Cluster, use MySQLDump to dump the schema. OK, let’s continue. Yes?
[29:14] Audience Member: [off-mic question 29:11]
Geert: [29:17] Yep. [29:18] Audience Member: [off-mic question 29:16]
Geert: [29:24] So, start backup just starts to backup on the data nodes, and the data is dumped on the data nodes. That’s it. [29:32] Audience Member: [off-mic question 29:32]
Geert: [29:33] It’s on the local machines of the data nodes. So, where the process is running, the process is dumping its data on disk. [29:41] Audience Member: [off-mic question 29:43]
Geert: [29:44] It’s on disk. For data nodes, it’s on disk. I forgot to mention, if you do this, you actually have to go on the data nodes and take this backup and store it somewhere else; otherwise it doesn’t make sense to make backups. But, this is done locally on the data nodes. Vince? [30:02] Audience Member: [off-mic question 30:02]
Geert: [30:06] Yeah. So, backups do restore, do have schema, the problem is that they don’t have procedures. They don’t have the views, the triggers, all the stuff that’s coming with MySQL. So, if you have, if you use this stuff, then you don’t have to adopt backup. So, MySQL Cluster, the data nodes don’t have any clue about store procedures. That’s why you use MySQLDump to make backups there from the schema. [30:37] Audience Member: [off-mic question 30:37]
Geert: [30:45] Mm-hmm. [30:45] Audience Member: [off-mic question 30:45]
Geert: [30:52] So, each node will make a backup of its primary fragment, if it has a primary fragment. If this node is down, if you do the backup then, it will make a backup of both partitions. [31:10] Audience Member: [off-mic question 31:10]
Geert: [31:19] So, do you need all the data nodes ready to make a backup? No, you don’t. You can have this one down and this one down, and you still have all your data. [31:29] Audience Member: [off-mic question 31:29]
Geert: [31:41] Yeah. That’s why you actually need to copy the data over from the data nodes to a safe place, because if the machine burns, you might have trouble as well. [31:54] Audience Member: [off-mic question 31:54]
Geert: [32:01] Yeah. OK, so geographic replication. I still have two minutes, I guess. I’m going to show the picture, that’s always nice. So, you have one cluster somewhere, running in London, and you have two MySQL servers doing binary logging. So, this is like normal, usual MySQL replication, but you have two of them. All these changes go to both the nodes. [32:36] Now you have somewhere – let’s say in San Francisco – another cluster, and that’s your slave. So what you do is – so the replication, it’s actually the same way as doing it in MySQL. You take a backup of the master and you restore the backup on this one, and then you start at a certain point if you’re binary logging.
[33:03] The point is you have two of them, you have a channel going on – so this is, for example, the provider AT&T, let’s say. Now, the cool thing is, if this one goes away – if there’s a ship cutting the lines in the ocean, it happens – you can actually, I don’t know, take a satellite or whatever and go to the other way, channel two. And this is quite easy to do, it’s explained in the manual, but you’ve got to switch from one provider and go another route. So, you can reroute the complete replication from one channel to the other one.
[33:46] Audience Member: [off-mic question 33:46]
Geert: [33:52] No, this one… You can only have one active and the other one is passive. You can play around a little bit with that stuff, and you can also have three or four channels; this is only showing one. But, the point is, here also you have your high availability and replication. So, there is no single point of failure anywhere. I hope – yeah, no. [laughs] Yes? [34:27] Audience Member: [off-mic question 34:25]
Geert: [34:38] So, how does one know where to start again? These are stored in a cluster table, so the point – it’s explained in the manual how it really works, but there’s a table which says, “Where am I,” and this one can know, and thus you can see it, you can just do a “select.” [34:58] Audience Member: [off-mic question 34:58]
Geert: [35:07] Well, each time this slave is applying something from the master it will put it in a table and say, “I’m here, I’ve done that.” And then you can actually go back to the other one. They will be different, but there’s a mapping going on. So, there’s a special table to do that. That’s too much detail for this one; in the manual it says that. [35:32] OK, so, for this conference, MySQL Cluster is pretty cool for anything that is related to telecommunication. So voice-over-IP solutions are running on MySQL Cluster – lots of commercial stuff, because that’s what we do, actually in MySQL. We have HLR solutions, we have for example Alcatel/Lucent, who is doing its HLR backend with MySQL Cluster. You’d be amazed how much people are actually using MySQL Cluster with their mobiles. So, it’s pretty scary sometimes. Session management is also used, for websites.
[36:16] So, what I’m now asking, because I heard that lots of folks say that MySQL Cluster is complicated; it’s complex. It’s more complex than MySQL in master replication or something like that, master-slave replication, but it would be nice to actually have more users using us in the open source world. And tell us, “Look, this really looks bad and you have to do it differently.”
[36:41] We’ll work on this to make it more attractive, to make it more beautiful and simple and easier to use, but we need the open-source world to actually say what they want. We have our customers, of course, that tell us, but it’s not like there are thousands of them. So, please use it. If you want to see who is using us, it’s got some of this on the website – and how they are using us.
[37:11] OK, I’m just going to – yep. That’s the version number of MySQL cluster; it’s 7.0, the latest version. The official version is MySQL 5132ndb705. If you drink two beers, it’s pretty hard. It’s a different development cycle, so that’s why we have a different product. So, MySQL is not equal to MySQL Cluster. It’s out of MySQL 5.1 so you don’t have it there anymore.
[37:40] Everything is still GPL, so you can just download it and use it, and we have a different section in our manual which describes all the changes. OK?
[37:53] And 7.0 brings a cluster for Windows, which is very important for us because then we have more people going to use it. OK? These are the things we’re going to… So, any questions? Yep. Sorry, I didn’t see you?
[38:16] Audience Member: [off-mic question 38:16]
[38:22] Yeah, what is the split-brain situation? So, this part of the room and this part of the room are communicating to each other. Now, you put a wall in between. You can’t communicate anymore, so you can’t process anything. So, if you survive, then you do stuff. You make something new, but this one doesn’t know about it.
[38:47] If you put the wall away, then this one is likely to say, “Hey, you have new data, I don’t have it.” So, it’s inconsistent. If you insert something there, it will not be there. So, the primary key’s different sequence went on, no problem, but if it comes back, there is a clash between the two. So, the data is inconsistent.
[39:12] Audience Member: [off-mic question 39:12]
Geert: [39:16] There is a solution to split brains: you just kick one out. So if a data node – if I’m standing here and I’m the arbitrator, and the wall comes in the room, then I’m here and this group can continue. I’m not there, so you guys see that and you say, “We support you. We don’t do anything anymore.” So, I’m the arbitrator, I’m the one who says, “You guys can continue, they have to go.” [39:49] Audience Member: [off-mic question 39:49]
Geert: [40:02] Well, yeah. Please go to his talk, [inaudible 40:04] , so then you can actually see a nice use case and see how MySQL Cluster is there used, which is pretty cool. So, any questions more? OK, thank you.
[40:20] [applause]










































