字幕表 動画を再生する 英語字幕をプリント >> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North American Community Relations. I met him this week at the MySQL User Conference which is down in Sta. Clara. Now, the deal is that we're going to host this on Google video. So if you have confidential questions, if you could leave them till the cameras are turned off, that would be cool. With that, no more ado, Jay Pipes. >> PIPES: All right. Nice to meet all of you. I'm going to be giving a talk that I gave on, I think, Tuesday or Wednesday, on "Performance Tuning Best Practices for MySQL." Some of the things are version specific but for the most part, it's pretty general. We're going to be covering Profiling and Benchmarking Core Concepts, Sources of Problems that you might encounter, Index Guidelines, Schema Guidelines, Coding Techniques, and lastly, we'll go over some general things about tuning server parameters. Since the audience is, I think, sort of mixed levels of experience, we'll just shout out questions or tell me if you want some grand stuff or so. All right, Benchmarking Concepts. The general idea behind benchmarking MySQL is to, obviously, have a track record. You know, when you change things in your code, in your application, or the configuration files with MySQL, or your schema design, or you're indexing, you want to make sure you have a track record of what you're doing. Because a lot of times, if you--if you don't keep a track record and don't do benchmarking, you won't really know whether a specific change that you made had a detrimental impact or whether it actually increased performance. So, the baseline--and you always want to have your baseline for your starting point. And always give yourself a target. Don't just do it so it's a moving start--moving target, you always want to have an end goal, you know. You either want to get, you know, 20% increase in performance. So that's your goal. But just don't have it, you know, just kind of end with nothing because then you'll just spend years and years and years trying to, you know, tune milliseconds out of your--out of your application when there's really no benefit to it. So, change only one thing at a time. A pretty basic concept but it really--it goes a long way in benchmarking. If you change two things and you rerun a test, how do you know which one of those two things actually affected the difference between the two tests? So, always change one thing at a time and rerun the benchmarks. And record everything, configuration files, information on the hardware, information on obviously the operating system, the version of MySQL, and everything to do with the benchmarks. Keep it in a separate folder and keep everything for historical record. A lot of times, you'll think that you won't need information when you're running benchmarks. And then, six months later, you realize that, "Wow! It would be really useful if I had--oh, I threw it away". So, always keep information from your benchmarks so you have a historical record of what you did. And specifically with MySQL, you want to disable the query cache whenever you're running benchmarks. The reason is it will skew the results that you get from read-intensive queries and write-intensive queries as well. So, disable the query cache by setting the query cache size equal to zero when you--when you run benchmarks, so. Prof… Yes. Go ahead. >> [INDISTINCT] benchmarking for [INDISTINCT] problem with the operating system, caches [INDISTINCT]? >> PIPES: Right. Yes. The question was--what you were experiencing that the operating system caching was skewing the results of the benchmark. Generally, the idea is either you can disable caching if it's actually getting in the way or if you bump up the actual number of runs in the benchmark tests you can minimize the skew effect. So that's usually what we recommend. But, when you get into specific caches there's very specific things to do. So it kind of depend on the exact scenario that you're doing, so. But, yes. Bumping up the number of runs in the benchmarks generally tends to mitigate a lot of that, so. Does that answer your question? Okay. Profiling Concepts, profiling is a little different than benchmarking. Generally, you're profiling a production or running a test system that you're trying to figure out, you know, diagnosing what's going on while it's running. So, instead of doing--you know finding the stress or the load that the system can take, you're actually trying to pinpoint where the bottlenecks are in the code while it's running. With MySQL, the best tool that you have is the EXPLAIN command. So, regardless of what version of MySQL you are running, you really want to get familiar with everything in the EXPLAIN command. Understand what the access types, does everyone know what the EXPLAIN command is? Everyone use it? Okay. Does everyone know what every single type of access type is? Okay. It's sort of a poorly named column in the--in the EXPLAIN results. But, it actually describes the path or the optimization pattern at which the optimizer's saying, "Okay well, I'm going to, you know, access, you know, these results at and join it to these results at." And each of those things, the ref, the refrenol, the equaf range, all the different access types mean different things. And the performance that you get from each of them, you can go into MySQL manual, depending on your version. You can see which ones have a better performance for different types of queries. It's not always the same. So, you want to make sure you understand those. Understanding obviously that if you see all in the access type, it means it's doing a full table scan. You know just little things like that. Be aware of what they all mean and what the performance impact of the different ones are. So, by using the Slow Query Log and mysqldumpslow to parse the query log, very easy to set up, Log Slow Queries and then you give it a long query time, anything above that, it will log to the--to the Slow Query Log. There's also--I think in 4.1, we put it an option that you can--anything that--any table, any select statement that does not use an index on a table can get logged to the Slow Query Log regardless of how fast it runs. And that's--and that's useful for when you're doing an indexing strategy and you're trying to determine okay my patterns have access, do I--do I have indexes on those fields that I'm actually accessing? You know, and my where clause and ON clause and that kind of thing. So, low hanging fruit law of diminishing returns tackles stuff that's going to get you the absolute best return on investment for your time. Don't concentrate on, you know, I know you guys are Google. So, a millisecond definitely does count. But, if you got a--if you got a choice of, you know, going from five milliseconds to four milliseconds, you're going from a minute to, you know, a second, it's a no-brainer. You know, concentrate on the stuff that--that's going to give you the biggest impact. Using Mytop. Jeremy Zawodny wrote a nice little tool. Most of you are probably familiar with the top utility in UNIX. It's almost identical but it's just for MySQL. And it gives you an ability to on a delay--on a repeating delay, you know, show you the process list of what's going on in MySQL, and show, you know, show various show commands, the status variables and gives you a nice little interface to profile into the system while it's running. And one nice thing about Mytop that I really like is that you can--you can use it for remote hosts as well. So, you can--you can run it on your local machine and access a remote host. So that it is a nice little tool. Okay. Sources of Problems. This is kind of my personal opinion of where I rank the law of diminishing returns, like what I actually like go for immediately. The performance team tends to think that, you know, sometimes you can--you can tune a server parameter and that's like the biggest bang for the least, you know, amount of time. But I think, really, where performance problems come in are the--are the definitely the top three here. Poor Indexing Choices will kill an application faster than really anything else. If you're querying on a field and you don't have an index on it and you're repeatedly hitting that, it's going to kill the application faster than, you know, whether sort buffer size can be adjusted. So, same thing with the Inefficient or Bloated Schema Design, you know, I talked in the--in the performance talk. I've seen a number of cases where people designing new applications, well, actually de-normalized from the beginning because they think it's like going to give them better performance. Don't do that. You know, there's specific cases where you can de-normalize the schema and you can get some performance benefits. But don't do it right away. You know, just because you think it's going to get, you know, a performance benefit. Use good, efficient, normalize schemas. And we'll talk--we'll go into detail on this later. Bad coding practices. MySQL has some inefficiency problems with sub-queries. But in general, using joins is a lot more efficient than using sub-queries. We're going to go show some examples of specific where you can really get a performance increase by just simply changing the way that you--the way that you code SQL. Server Variables Not Tuned Properly, that's a source of problems. But in my opinion, very specific situations can be--can be helped by tuning server parameters. It's not some catch-all, you know, it totally depends on, you know, how much memory you have in the machine, what the architecture is, what operating system you're running on, what version of MySQL, what storage engines you're using, your, you know, the type of the application that you have, whether it's OLTP or heavy reads, you know, all these things. Yes. Tuning server parameters can help but it's very specific to certain situations. And it also requires retesting and rerunning. But, you know, you tweak one thing, you have to re, you know, run the benchmark test to see if--see if you got the performance gain out it. It can be time consuming to tweak server variables whereas it's pretty easy to, you know, change some seq--some SQL statements and immediately rerun it and see if you got a huge performance gain from doing that. And finally, Hardware and--and Network Bottlenecks. I/O, CPU bound, network latency. Right. There is a good book that I brought along that I recommend for that and when we get to the last slide, I'll show you that, so. All right, Indexing Guidelines. A poor or missing index, definitely the fastest way to kill an application, like I said. Look for covering index opportunities. We're going to go into some slides here to kind of explain what that is. But the covering index is--especially for my--who's using MyISAM versus InnoDB? InnoDB, raise hands. MyISAM, raise hands. Nobody is using MyISAM? For a single thing, okay. >> [INDISTINCT] >> PIPES: Say again? >> [INDISTINCT] >> PIPES: Got you, okay. Well, we'll explain some things about covering index opportunities that are very specific to InnoDB. But covering index is basically what it means is that the MySQL can get all the information from the index records which are by definition slimmer than the data records because they don't contain all the fields in the table and use that information to complete whatever the query was without having to go into the data records. One of the key concepts to this, the slimmer that your index records are, the more that you can fit into a single index block and the more that you can fit into an index block, the fewer reads you're going to do, the faster your application's going to be. So, covering index opportunities definitely look around for those. When you're deciding on which fields you actually want to index, make sure you take a look at what the selectivity of the--of the field is. Selectivity is the cardinality or the number of unique values that are contained in that field divided by the total number of values in a table. So, obviously a unique index would be a cardinality of one. Because it's, you know, unique over the total count. That's obviously the best-case scenario but you can get situations where you've got awful selectivity. If you're querying just on that index or just on that field you may get awful performance from that because--or may just decide not to use the index because it ceases to be useful. I think generally like, [INDISTINCT] could talk a little bit more about this but I think that it numbers about 30%. If your optimizer sees that, okay you do--you do a query select, you know, start from whatever, where column A equals two. And it will take a look and see an estimate from what the information that has on the--on the index, well this particular query returned a third--you know, around 30% or more of the rows in the table. If it does it's actually much more efficient to just sequentially scan the table than it is to go into the index and randomly seek and grab a bunch of records and then do lookups back to the data records. So, be aware of--of the uniqueness of the values in your index. For lower selective fields were you--were you don't have very much selectivity you can sometimes tack them on, you know, to a multicolumn index so that, especially we grouped by queries and we'll show an example of this you get some efficiencies there. But just be aware of the selectivity, so. On multicolumn indexes, like those kind of explained just now, the order of the fields is very important. You have to look at the queries that you're running against the database. You know, if you have the columns mixed up or they're not optimal, the optimizer just won't use the index. And I'll show you a perfect example of something that happens all the time with this on group by queries. And as the database grows you want to ensure that the distribution is good. You know if you set up an application, writing new application and, you know, you go through a couple of months and you got a couple of hundred thousand records in there. Okay, great. Everything looks great. The index strategy that you designed for it is wonderful but that doesn't necessarily mean that in nine months, you know, you've got millions of records in there that the index strategy that you initially picked out is going to be optimal, you know. Make sure that you understand how your data changes over time, you know. If business practices can change. And so, if you've got values, let say's you got seven values in a--in a field, right. And you've got an index on that, seven distinct values that you're putting into like a status field. Okay. And over time 90% of the field values contain one specific thing, that index ceases to be as useful as it was when you initially designed the application because 90% of the--the rows contain the same value. So just be aware that as your database grows and your application changes that you need to reexamine, you know, what you initially thought was a good index strategy. Okay. Remove redundant indexes for faster write performance. Obviously, every index that you put on a field you get a, you know, slight performance impact on writes because it's got to write an index record as well as the data record. Every time you do an insert and every time you update. If, you know, update the key value, it's got to write to the index. So unfortunately, MySQL allows you to create redundant indexes on the exact same things. So, I've seen cases where you get 40 indexes on the table and, you know, 10 of them are redundant. You know, because over time teams of people work on it and no one wants to change other people's stuff, you know. So, they just add their own indexes and with different names and all that but, they're redundant. They're on the same field. So, you know, as time goes by make sure you check, remove redundant indexes to get some write performance. Okay. Common Index Problem here. Kind of a web 2.0 example. You got a--you got a tag's table which got tag ID and a text for the tag and a products table. Both of these tables have an auto-incrementing primary key and then you've got a relationship table that stores the many to many relationship between products and tags. And if you noticed that the primary key on these Products2Tags table is product ID--tag ID. Makes sense? Pretty basic? Okay. Here's two examples of pretty common queries run on this Products2Tags table. Only the top one's going to use the index on it. And the reason is because of the order of the columns in the index. On the top one, you're grouping by the--what's essentially the product ID. And so, it's going to be able to take that left side of the index and use it in order to count the right side value which is the tag ID. Now in the bottom you're reversing that. You're saying for each distinct tag give me the--give me the count of products. You can't use the index. You can't use that primary key index. So, you won't use it, you get no read performance from it. The remedy would be to create an index that has tag on the left side. Okay. Now, there's two examples here where it says create index, the--the top example, create index, I X'd tag on just the tag ID for all you guys using InnoDB, that's exactly what you'd want to do. From MyISAM, you'd want to do the bottom one to create a covering index opportunity. Does anyone know why the top one should be used for InnoDB and not the bottom one? What's the difference between the secondary indexes in InnoDB versus MyISAM? >> Because it went back to the primary key. >> PIPES: Exactly. For it--it--because it's a clustered index organization in InnoDB, every time you add an index on a field, that's not the primary--every non-primary key index, the primary key is actually appended to every single index record. So, if you've got a primary key product ID, tag ID and you add an index on tag ID, product ID you're being redundant because you've already got the product ID actually in the index record in InnoDB. So, it's important to understand what the difference between the cluster organization and the MyISAM which is non-clustered where it's just a record number that's attached to each index record instead of the clustering key. So, be aware of those differences. So, I was talking about redundant indexes. That one on the bottom would be redundant for InnoDB. Did the mic just change? >> [INDISTINCT]. >> PIPES: Okay. Okay. Sorry. Schema Guidelines. Inefficient schema, another great way to kill your application. Use the smallest data types possible that you really need. If--if you don’t have four billion rows in the table, don’t use big int. I mean, it’s just the basics, you know. Keep them as trim as possible, you know, if you have maximum of a hundred values, don’t use int. Use tiny int, you can feed, you know, 255 unique values in it. Don’t use char[200], you know, when a varchar or you know, smaller char will do. Little things like this make a huge impact in performance. And the reason is remember that every index record has got to contain, you know, that width. So, if you've got an index on a char[40], every single records got 40 bytes in it. And well, a little bit different because there's some index prefix compression that’s going on but, you know, the--the general concept stays. If you have a index on first_name, just index the first 10 characters of it or the first 15 characters of it. You fit much--many more index records into a single key block and therefore you get fewer reads, faster performance. When you've got many columns in a table, especially many nullable columns, consider splitting the tables into two tables like a main and a detail table, something like that. If you've got many nullable columns or columns that are rarely even accessed or rarely searched on, put those in a separate table and join them on the one-to-one, you know, relationship, okay? And the reason for that is you can pull many more of the main records that you are actually searching on into memory. You get--especially, if you don’t need the detail records of those fields, you can have a lot more opportunities to get more performance from that. Consider vertically splitting many road tables using partitioning or the MyISAM merge tables which--since you guys are using InnoDB, it doesn’t matter. Partitioning is in 5.1, so, you guys are a little ways away from that. It's all right. You can also do, you know, your own application level vertical, you know, partitioning where you're naming different sets of tables and then using views on--the one person that was using 5.0 you can use views in order to aggregate many tables into a single view. And always remember that, you know, fewer reads has faster results. And this is--this is my--the limit of my graphical ability right here. That's about it. It's just, you know, a little flow, smaller data types placed in narrower index records. More records per block, fewer reads, faster performance. It's just the general thing. How to get better performance from your index, use schema. InnoDB, choose the smallest possible clustering key. Since like, I just explained, it's appended to every single secondary index record. So, if you've got a 12 byte primary key on your index in--on your InnoDB table, that's 12 bytes per index record that are being appended every single time you insert in the table. So, be aware of that. Don’t use surrogate keys when a naturally occurring one exists. We see this all the time in the forums. And I often wonder what the use of it is and why people do it. You've got a naturally occurring product ID, tag ID key. It's a naturally occurring primary key. Yet a lot of people will just kind of put an auto-increment in there just for the heck of it, you know, but you're never really going to be looking up by record ID. You will always going to be looking up based on product ID or tag ID. So, some people call it a surrogate key or a false key, get rid of them, they're just wasting space. Coding Guidelines, I tend to call it chunky coding habits. Cutting everything down into the smallest possible units you can. You know, thinking of SQL in terms of procedural programming is a bad idea. At least to--well, correlated sub-queries and other things I hate. You want to--you want to be an SQL programmer that thinks in sets. Don’t think in terms of for-loops, okay. Think in terms of merging and intersecting sets of data. And you'll be able to cut it up into chunks and you'll be much more efficient in the way when--in the way you code SQL. The 5.0 guy, use store procedures. If you're not using store procedures, you're missing out on a performance increase in 5.0. InnoDB, how many of you using SELECT COUNT(*) on InnoDB tables? I know they saw you. Okay. Don’t. InnoDB cannot optimize SELECT COUNT(*) queries, okay? So, your table and your schema will not scale well, okay? The more records you get in there, it requires actually reading through and counting all the records. Whereas with MyISAM, it can actually get the count from the index. With InnoDB, it cannot. It actually has to count all the values. And the reason for that is the way that InnoDB implements what's called multi-version concurrency control. And there's a version so that the row level locking can occur in InnoDB. There's a version for each row. So there's complexities involved with actually counting, you know, the number of values because well--which version are you--are you comparing and so use counter tables. So if you're doing, you know, select counts to our products, have a separate table that stores the number of products and when you insert into the product table, increment it, when you delete, decrement it. Okay. That’s how to scale InnoDB for performance. We've seen this by a number of big form applications that, you know, that they always do, you know, count the number of threads and then count--some--how many users are logged in. This is a big problem when, you know, if people use--switch from MyISAM to InnoDB without looking at the code, just simply switch the engines, you know, it was a--yeah. >> Well, also consider you don’t actually necessarily want to know the exact number of rows. Do you care if you've got a 1000 threads or 1100 threads? It really doesn’t matter. You want the ballpark figure. So even if your numbers are not fully up-to-date, it doesn’t matter. If you want to run Chrome job that does the InnoDB count once an hour, then put that values [INDISTINCT]. You could put it in another table. You could put it in a file that can include into whatever page is displayed, that works fine. Then reduce the problem to once an hour rather than every time you run that query. >> PIPES: All right. Isolating indexes or--index fields on one side of the equation, you know, your basic--your basic Math class, you want to put the single on the one side and had your equation on the other, you know. Well, we'll show an example of how to do this in a second. And using calculated fields when they're necessary, you know. Let's go into the example of isolating index fields on one side of the equation. Okay. Here we've got--whenever you apply a function to a field that’s indexed, it--the itemizer can't use the index on, okay. So you want to look for opportunities where you can rewrite SQL statements like on the top here into the exact same SQL statement or the exact same result, but you’ve isolated the index field without any function operating on it on, you know, the left or whatever, right. And just converted the actual statement into a--just a different way of representing it, okay. Now, in this case, you're just taking the two days and converting it into current date minus seven day, you know. Just make sure you understand the greater than and equals all that, you know, so you're including that. Now, what's wrong with the second part? How can we improve that with our performance? Anyone know? >> [INDISTINCT] >> PIPES: Say again? >> [INDISTINCT] >> PIPES: Yes, well, yeah, basically that. The current date function is a non-deterministic function and the problem with that is that the query cache won't cache this query. Well, the thing about current date is, it only returns a different value once every what, 24 hours? So if you're running this, you know, let's say a hundred thousand times in a day, instead of running this once, it's going to run a hundred thousand times. You know, with the query cache it actually puts the query into the cache stores the results and every time you issue that exact same query, just takes it rather to query cache never even has to hit the Storage Engine, the mySQLs, whatever. It just takes right in the query cache. So in your Perl Scripts or your PHP scripts or whatever you're using to write, just insert the date instead of using current date, okay. A lot of people don’t realize this, but very good way to get performance out of your query cache. Okay, using calculated fields. When you got [INDISTINCT] email address like present.com, index can't be used. The reason is because you can't have a wildcard on the left side of a constant, comparing it with like. If you put the wildcard on the right--on the end and you have a constant string and then wildcard the index can be used because you can look at the left side of the values, okay. So the way to do this is to do a little trickery and put a reversed email address actually in your table, okay, with 5.0 you can use a trigger. It actually inserts the reverse of the email address that you're inserting directly into that new reversed email address field, okay. And then on the bottom, you can just do the like reversed CONCAT and now you can--now you actually can use an index on that field. So that’s a way of using calculated fields or, you know, fields that, you know store different values based on other fields on the table to get performance out of something that you'd never be able to use an index on, you know. >> [INDISTINCT] >> PIPES: Yes. This is something, you know, if you have an application that's looking up, you know, all the email addresses in your database that end it dotcom or aol.com or whatever it is, you know, that's--you don’t want to be running that top one every single time if someone request that. You want to be operating on a reversed email address and using the index on it. Okay. Learn to use joins. Like I said earlier, eliminating correlated subqueries is a great way to get performance out of your application currently. Like, Timour who's going to be speaking in a little bit, is on the optimizer team and he's working on subquery optimizations but there's, you know, there's just a better way of doing things with the joins, okay? It's easier and it follows the set-based programming principles of SQL instead of the procedural language principles that most of us are very familiar with, you know. The for-loop, the operating on, you know, everything one at the time. So, let's go to some examples here. Don’t try and outthink the optimizing team. The team works pretty smart. You'll meet them in a little bit but he's really smart. You can give the SQL server--the MySQL server in a join hint using like straight join but the problem with that is while you may get some performance benefit if the optimizer currently can't figure out the right path and I've only seen this in like one or two instances. Well, in two months your data sets changes and the optimizer that you've now overwritten with your straight join hint, now you may have the suboptimal plan for it and you forgot that you put straight join in there and now you've got a suboptimal plan. So, if you are going to use join hint, make sure you thoroughly document where you put it in the code and don’t forget about it. Because as your data set changes, your optimizer could much more easily pick an optimal path for the query execution. Okay. Eliminating correlated subqueries. With 4.01, it looked like most of you are on 4.01. You could use subqueries. I tend to avoid them like the plague. We see this a lot on the forms, this kind of thing. Most of you like advance SQL programmers, you probably not do anything like this but we see it everyday. You just, I mean, the thing is, to try and get the maximum sold price from an order items table for each product. And the procedural programmer, you know, who's a pro coder doesn’t necessarily, you know, isn't familiar with SQL, hasn’t been trained in set-based, you know, thinking. This is a very natural way of doing something. Your saying, "Okay. I want my products table. I'm selecting for my product's table. Okay. And I want each of those products." And for each one, I want to get the maximum price that it was sold in the order items table. You know, correlating on the product ID. It's a very natural way for a procedural programmer to think of things, you know. But remember, it's for each product, I mean, that's the way that people think about it. That's not the set-based way of doing things. On the right is the set-based way. You're joining, you say, "Okay. I've got two sets of information. I've got my products and I got my order items. Okay. And I want to transform these order items by grouping them on a product id and then finding the maximum price." So, that’s my set. The maximum price is for each product today. And then, I have my products set and the maximum price is set and I join them. That--that actually is two operations, right? Two conceptual operations, grouping and joining to the products. On the left, you are actually doing a query for every single product so, it doesn’t scale to linear model. Actually, you have a thousand products; you're doing a thousand sub queries to get the maximum price from the order items table. You have a million, you do a million queries. You're still doing two queries on the--on the right. So, think about how you can get rid of correlated subqueries. This is one of the best ways to get better performance out of your SQL code. Using derived tables, it might seem like a big mess but it’s actually more efficient on the right. And it goes back to the same principle of how many queries are you actually issuing? On the left, you've got a join from customers to orders and in the where clause, you have a correlated subquery with a group by in it, okay? Not a group, with an aggregate. So, you're saying, the customers join to the orders but I want a group by the company of the customer, not the customer id. And in order date equals the maximum or the last order date in the orders table where the customer's the same, it's correlated to your customer's table C and then group by the company. Well, you're going to be doing a sub-query for this select MAX--for each customer in the order's table. Now, on the right, we say, "Okay, are these customers set?" And we're interjoining on what's called the derived table, a sub-query in the FROM clause. That’s one set. We're taking the maximum or the last order date for each customer, distinct customer ID in our order's table, that’s a set. And then we're joining back to the order's table so there's three--three operations. Three conceptual sets to get that result versus the non-scaling method of a correlated sub-query in the WHERE clause which has to be executed for every single customer in the order's table. Okay. All right. And this is my last slide and I tend to kind of scheme over server parameters because it's--I'm very happy to talk specifics with anyone of you but to generalize about server parameters, it's okay but there're very specific situations where they can help--where you can get some definite performance boost. Be aware of which server parameters are global versus what's per thread--per connection made to the database. Okay. For instance, key buffer size. It's a global variable. Sort buffer size, it's per thread so be aware of that. Don’t, you know, assume that sort buffer size is for the whole database and bump it up because it's actually per thread. So make small changes, preferably a single change to your configuration file, retest. See if you get a performance benefit from that specific, you know, change. Often provide a quick solution but they can be temporary. Really--really, the performance benefit is optimizing application, your indexes, your schema--your coding. Query cache, it's off by default so the size is zero for your query cache so if you haven't changed it, it's not on. You can do show status and they'll say query cache type--I think it's query cache type on but there's no size to it so that actually is use by default. So unless you turned it on, you're not using your query cache. If you're doing read-intensive applications, like, how many are doing, like, lots and lots of writes versus lots and lots of reads? Who has 90% reads or more? Okay. Do you have your query cache as turned on? Who has the query cache turned on that they know they've added size to? One guy in the back, all right. Two, all right, excellent. Okay. If you're doing read-intensive applications, turn the query cache on. Why? Because it doesn’t have to execute the query. If it's the exact same query and your dataset isn’t changing much, it just takes it directly from the query cache. The query cache actually stores the results set. It doesn’t just, like, compile a query and store the query so they can be re-executed. It actually stores the rows so that it doesn’t actually have to do in either parsing or anything and the optimization. It just sends the rows right back to the client. So if you're doing repeated queries like that, turn the query cache on and give it some size the--in the configuration file, it's the query_cache_size, set it to, whatever, 32 meg or 64 meg. You can play around with it. You can also do it online. I think it's sets global query cache size equal something or other, it's in the menu. So you can do it online or you can do it in your configuration file. Beware of--since you don’t use MyISAM tables. How many of you think that you're not MyISAM at all? I hope not. Okay. The MySQL's internal database is in MyISAM. Okay. So that’s why if you do show status, you'll see, actually, that you have MyISAM operations going on. So even if all your tables are, InnoDB, the MySQL internal database is MyISAM, so you still need to put, you know, something in there. So leave the defaults for key buffer size there so the MySQL database can, you know, live happily. But the InnoDB buffer pool size, that’s--that’s your main one for InnoDB that determines how many--how much memory is allocated for storing date and index pages in InnoDB. I think Hecky recommends about 50 to 80% of your total memory on the box should be allocated to InnoDB buffer size. Okay. Other important InnoDB stuff is InnoDB log, final size and log buffer size. I think he recommends that the log--I think the log buffer size maxes out for a gig but you can adjust it depending on, you know, what's your situation is and I can take some questions. Oh, yeah, memory is the cheapest, fastest, easiest way to get better performance out of MySQL. Don't even ask your boss. Just stick another piece of RAM in there. It really is. MySQL just loves RAM, you know. The more RAM you give it, it's like Pac-Man it just eats the stuff up. It loves it. So, that's the easiest and cheapest way to get better performance from a hardware situation. Yeah? >> you talked about >> PIPES: Okay. The question was if you've got a logical key in a character field, right? Say for instance--I don’t know, Social Security number, something like that. Is there a benefit to making a--what you're calling a surrogate key out of an auto incrementing ID? There is a benefit to that, a big benefit to that. Depending on the--especially like if you have, you know, a CHAR 4 versus an INT you're probably not going to see much of performance difference. If you got, you know, CHAR 8, CHAR 16, that--that's, you know, your product ID or your record ID or whatever it is versus an auto incrementing key. You're going to get a lot better performance from the numeric data type in four bytes. It's optimized for that, so. >> Why do you go up to [INDISTINCT] >> PIPES: Okay. There's that string data type that we need. Just--you mean versus like VARCHAR 4,000 or something versus text or whatever, there's some--maybe [INDISTINCT] can talk to that. I don't know--I don’t know if there's a specific, you know, how much performance benefit you get from doing VARCHAR versus text. I think in InnoDB, there maybe something but… >> Yeah. So the question is there a benefit to using CHAR rather a VARCHAR in certain cases? >> PIPES: Or text versus VARCHAR. >> Or text versus VARCHAR. In MyISAM, text fields actually store it separately, text in blobs. The actual row contains a pointer to where the text field is. In case of a CHAR or VARCHAR it's part of the row. In InnoDB, same up to a point. In certain cases you would just find everything in the row depends on which version and so on. InnoDB itself doesn’t care where the things are CHAR or VARCHAR. It's always internally uses VARCHAR basically. So, in InnoDB you don't need to make that distinction. Okay. I want to optimize for fixed length fields, that kind of stuff. It doesn't care about that. MyISAM would care so that's the basic story. Does that answer your question? Okay. >> PIPES: Any other questions? No one? All right. Well, thanks guys. Thanks for letting us come here.
B1 中級 MySQLのパフォーマンス・チューニング・ベスト・プラクティス (Performance Tuning Best Practices for MySQL) 344 16 陳柏霖 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語