字幕表 動画を再生する
>> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North
American Community Relations. I met him this week at the MySQL User Conference which is
down in Sta. Clara. Now, the deal is that we're going to host this on Google video.
So if you have confidential questions, if you could leave them till the cameras are
turned off, that would be cool. With that, no more ado, Jay Pipes.
>> PIPES: All right. Nice to meet all of you. I'm going to be giving a talk that I gave
on, I think, Tuesday or Wednesday, on "Performance Tuning Best Practices for MySQL." Some of
the things are version specific but for the most part, it's pretty general. We're going
to be covering Profiling and Benchmarking Core Concepts, Sources of Problems that you
might encounter, Index Guidelines, Schema Guidelines, Coding Techniques, and lastly,
we'll go over some general things about tuning server parameters. Since the audience is,
I think, sort of mixed levels of experience, we'll just shout out questions or tell me
if you want some grand stuff or so. All right, Benchmarking Concepts. The general idea behind
benchmarking MySQL is to, obviously, have a track record. You know, when you change
things in your code, in your application, or the configuration files with MySQL, or
your schema design, or you're indexing, you want to make sure you have a track record
of what you're doing. Because a lot of times, if you--if you don't keep a track record and
don't do benchmarking, you won't really know whether a specific change that you made had
a detrimental impact or whether it actually increased performance. So, the baseline--and
you always want to have your baseline for your starting point. And always give yourself
a target. Don't just do it so it's a moving start--moving target, you always want to have
an end goal, you know. You either want to get, you know, 20% increase in performance.
So that's your goal. But just don't have it, you know, just kind of end with nothing because
then you'll just spend years and years and years trying to, you know, tune milliseconds
out of your--out of your application when there's really no benefit to it. So, change
only one thing at a time. A pretty basic concept but it really--it goes a long way in benchmarking.
If you change two things and you rerun a test, how do you know which one of those two things
actually affected the difference between the two tests? So, always change one thing at
a time and rerun the benchmarks. And record everything, configuration files, information
on the hardware, information on obviously the operating system, the version of MySQL,
and everything to do with the benchmarks. Keep it in a separate folder and keep everything
for historical record. A lot of times, you'll think that you won't need information when
you're running benchmarks. And then, six months later, you realize that, "Wow! It would be
really useful if I had--oh, I threw it away". So, always keep information from your benchmarks
so you have a historical record of what you did. And specifically with MySQL, you want
to disable the query cache whenever you're running benchmarks. The reason is it will
skew the results that you get from read-intensive queries and write-intensive queries as well.
So, disable the query cache by setting the query cache size equal to zero when you--when
you run benchmarks, so. Prof… Yes. Go ahead. >> [INDISTINCT] benchmarking for [INDISTINCT]
problem with the operating system, caches [INDISTINCT]?
>> PIPES: Right. Yes. The question was--what you were experiencing that the operating system
caching was skewing the results of the benchmark. Generally, the idea is either you can disable
caching if it's actually getting in the way or if you bump up the actual number of runs
in the benchmark tests you can minimize the skew effect. So that's usually what we recommend.
But, when you get into specific caches there's very specific things to do. So it kind of
depend on the exact scenario that you're doing, so. But, yes. Bumping up the number of runs
in the benchmarks generally tends to mitigate a lot of that, so. Does that answer your question?
Okay. Profiling Concepts, profiling is a little different than benchmarking. Generally, you're
profiling a production or running a test system that you're trying to figure out, you know,
diagnosing what's going on while it's running. So, instead of doing--you know finding the
stress or the load that the system can take, you're actually trying to pinpoint where the
bottlenecks are in the code while it's running. With MySQL, the best tool that you have is
the EXPLAIN command. So, regardless of what version of MySQL you are running, you really
want to get familiar with everything in the EXPLAIN command. Understand what the access
types, does everyone know what the EXPLAIN command is? Everyone use it? Okay. Does everyone
know what every single type of access type is? Okay. It's sort of a poorly named column
in the--in the EXPLAIN results. But, it actually describes the path or the optimization pattern
at which the optimizer's saying, "Okay well, I'm going to, you know, access, you know,
these results at and join it to these results at." And each of those things, the ref, the
refrenol, the equaf range, all the different access types mean different things. And the
performance that you get from each of them, you can go into MySQL manual, depending on
your version. You can see which ones have a better performance for different types of
queries. It's not always the same. So, you want to make sure you understand those. Understanding
obviously that if you see all in the access type, it means it's doing a full table scan.
You know just little things like that. Be aware of what they all mean and what the performance
impact of the different ones are. So, by using the Slow Query Log and mysqldumpslow to parse
the query log, very easy to set up, Log Slow Queries and then you give it a long query
time, anything above that, it will log to the--to the Slow Query Log. There's also--I
think in 4.1, we put it an option that you can--anything that--any table, any select
statement that does not use an index on a table can get logged to the Slow Query Log
regardless of how fast it runs. And that's--and that's useful for when you're doing an indexing
strategy and you're trying to determine okay my patterns have access, do I--do I have indexes
on those fields that I'm actually accessing? You know, and my where clause and ON clause
and that kind of thing. So, low hanging fruit law of diminishing returns tackles stuff that's
going to get you the absolute best return on investment for your time. Don't concentrate
on, you know, I know you guys are Google. So, a millisecond definitely does count. But,
if you got a--if you got a choice of, you know, going from five milliseconds to four
milliseconds, you're going from a minute to, you know, a second, it's a no-brainer. You
know, concentrate on the stuff that--that's going to give you the biggest impact. Using
Mytop. Jeremy Zawodny wrote a nice little tool. Most of you are probably familiar with
the top utility in UNIX. It's almost identical but it's just for MySQL. And it gives you
an ability to on a delay--on a repeating delay, you know, show you the process list of what's
going on in MySQL, and show, you know, show various show commands, the status variables
and gives you a nice little interface to profile into the system while it's running. And one
nice thing about Mytop that I really like is that you can--you can use it for remote
hosts as well. So, you can--you can run it on your local machine and access a remote
host. So that it is a nice little tool. Okay. Sources of Problems. This is kind of my personal
opinion of where I rank the law of diminishing returns, like what I actually like go for
immediately. The performance team tends to think that, you know, sometimes you can--you
can tune a server parameter and that's like the biggest bang for the least, you know,
amount of time. But I think, really, where performance problems come in are the--are
the definitely the top three here. Poor Indexing Choices will kill an application faster than
really anything else. If you're querying on a field and you don't have an index on it
and you're repeatedly hitting that, it's going to kill the application faster than, you know,
whether sort buffer size can be adjusted. So, same thing with the Inefficient or Bloated
Schema Design, you know, I talked in the--in the performance talk. I've seen a number of
cases where people designing new applications, well, actually de-normalized from the beginning
because they think it's like going to give them better performance. Don't do that. You
know, there's specific cases where you can de-normalize the schema and you can get some
performance benefits. But don't do it right away. You know, just because you think it's
going to get, you know, a performance benefit. Use good, efficient, normalize schemas. And
we'll talk--we'll go into detail on this later. Bad coding practices. MySQL has some inefficiency
problems with sub-queries. But in general, using joins is a lot more efficient than using
sub-queries. We're going to go show some examples of specific where you can really get a performance
increase by just simply changing the way that you--the way that you code SQL. Server Variables
Not Tuned Properly, that's a source of problems. But in my opinion, very specific situations
can be--can be helped by tuning server parameters. It's not some catch-all, you know, it totally
depends on, you know, how much memory you have in the machine, what the architecture
is, what operating system you're running on, what version of MySQL, what storage engines
you're using, your, you know, the type of the application that you have, whether it's
OLTP or heavy reads, you know, all these things. Yes. Tuning server parameters can help but
it's very specific to certain situations. And it also requires retesting and rerunning.
But, you know, you tweak one thing, you have to re, you know, run the benchmark test to
see if--see if you got the performance gain out it. It can be time consuming to tweak
server variables whereas it's pretty easy to, you know, change some seq--some SQL statements
and immediately rerun it and see if you got a huge performance gain from doing that. And
finally, Hardware and--and Network Bottlenecks. I/O, CPU bound, network latency. Right. There
is a good book that I brought along that I recommend for that and when we get to the
last slide, I'll show you that, so. All right, Indexing Guidelines. A poor or missing index,
definitely the fastest way to kill an application, like I said. Look for covering index opportunities.
We're going to go into some slides here to kind of explain what that is. But the covering
index is--especially for my--who's using MyISAM versus InnoDB? InnoDB, raise hands. MyISAM,
raise hands. Nobody is using MyISAM? For a single thing, okay.
>> [INDISTINCT] >> PIPES: Say again?
>> [INDISTINCT] >> PIPES: Got you, okay. Well, we'll explain
some things about covering index opportunities that are very specific to InnoDB. But covering
index is basically what it means is that the MySQL can get all the information from the
index records which are by definition slimmer than the data records because they don't contain
all the fields in the table and use that information to complete whatever the query was without
having to go into the data records. One of the key concepts to this, the slimmer that
your index records are, the more that you can fit into a single index block and the
more that you can fit into an index block, the fewer reads you're going to do, the faster
your application's going to be. So, covering index opportunities definitely look around
for those. When you're deciding on which fields you actually want to index, make sure you
take a look at what the selectivity of the--of the field is. Selectivity is the cardinality
or the number of unique values that are contained in that field divided by the total number
of values in a table. So, obviously a unique index would be a cardinality of one. Because
it's, you know, unique over the total count. That's obviously the best-case scenario but
you can get situations where you've got awful selectivity. If you're querying just on that
index or just on that field you may get awful performance from that because--or may just
decide not to use the index because it ceases to be useful. I think generally like, [INDISTINCT]
could talk a little bit more about this but I think that it numbers about 30%. If your
optimizer sees that, okay you do--you do a query select, you know, start from whatever,
where column A equals two. And it will take a look and see an estimate from what the information
that has on the--on the index, well this particular query returned a third--you know, around 30%
or more of the rows in the table. If it does it's actually much more efficient to just
sequentially scan the table than it is to go into the index and randomly seek and grab
a bunch of records and then do lookups back to the data records. So, be aware of--of the
uniqueness of the values in your index. For lower selective fields were you--were you
don't have very much selectivity you can sometimes tack them on, you know, to a multicolumn index
so that, especially we grouped by queries and we'll show an example of this you get
some efficiencies there. But just be aware of the selectivity, so. On multicolumn indexes,
like those kind of explained just now, the order of the fields is very important. You
have to look at the queries that you're running against the database. You know, if you have
the columns mixed up or they're not optimal, the optimizer just won't use the index. And
I'll show you a perfect example of something that happens all the time with this on group
by queries. And as the database grows you want to ensure that the distribution is good.
You know if you set up an application, writing new application and, you know, you go through
a couple of months and you got a couple of hundred thousand records in there. Okay, great.
Everything looks great. The index strategy that you designed for it is wonderful but
that doesn't necessarily mean that in nine months, you know, you've got millions of records
in there that the index strategy that you initially picked out is going to be optimal,
you know. Make sure that you understand how your data changes over time, you know. If
business practices can change. And so, if you've got values, let say's you got seven
values in a--in a field, right. And you've got an index on that, seven distinct values
that you're putting into like a status field. Okay. And over time 90% of the field values
contain one specific thing, that index ceases to be as useful as it was when you initially
designed the application because 90% of the--the rows contain the same value. So just be aware
that as your database grows and your application changes that you need to reexamine, you know,
what you initially thought was a good index strategy. Okay. Remove redundant indexes for
faster write performance. Obviously, every index that you put on a field you get a, you
know, slight performance impact on writes because it's got to write an index record
as well as the data record. Every time you do an insert and every time you update. If,
you know, update the key value, it's got to write to the index. So unfortunately, MySQL
allows you to create redundant indexes on the exact same things. So, I've seen cases
where you get 40 indexes on the table and, you know, 10 of them are redundant. You know,
because over time teams of people work on it and no one wants to change other people's
stuff, you know. So, they just add their own indexes and with different names and all that
but, they're redundant. They're on the same field. So, you know, as time goes by make
sure you check, remove redundant indexes to get some write performance. Okay. Common Index
Problem here. Kind of a web 2.0 example. You got a--you got a tag's table which got tag
ID and a text for the tag and a products table. Both of these tables have an auto-incrementing
primary key and then you've got a relationship table that stores the many to many relationship
between products and tags. And if you noticed that the primary key on these Products2Tags
table is product ID--tag ID. Makes sense? Pretty basic? Okay. Here's two examples of
pretty common queries run on this Products2Tags table. Only the top one's going to use the
index on it. And the reason is because of the order of the columns in the index. On
the top one, you're grouping by the--what's essentially the product ID. And so, it's going
to be able to take that left side of the index and use it in order to count the right side
value which is the tag ID. Now in the bottom you're reversing that. You're saying for each
distinct tag give me the--give me the count of products. You can't use the index. You
can't use that primary key index. So, you won't use it, you get no read performance
from it. The remedy would be to create an index that has tag on the left side. Okay.
Now, there's two examples here where it says create index, the--the top example, create
index, I X'd tag on just the tag ID for all you guys using InnoDB, that's exactly what
you'd want to do. From MyISAM, you'd want to do the bottom one to create a covering
index opportunity. Does anyone know why the top one should be used for InnoDB and not
the bottom one? What's the difference between the secondary indexes in InnoDB versus MyISAM?
>> Because it went back to the primary key. >> PIPES: Exactly. For it--it--because it's
a clustered index organization in InnoDB, every time you add an index on a field, that's
not the primary--every non-primary key index, the primary key is actually appended to every
single index record. So, if you've got a primary key product ID, tag ID and you add an index
on tag ID, product ID you're being redundant because you've already got the product ID
actually in the index record in InnoDB. So, it's important to understand what the difference
between the cluster organization and the MyISAM which is non-clustered where it's just a record
number that's attached to each index record instead of the clustering key. So, be aware
of those differences. So, I was talking about redundant indexes. That one on the bottom
would be redundant for InnoDB. Did the mic just change?
>> [INDISTINCT]. >> PIPES: Okay. Okay. Sorry. Schema Guidelines.
Inefficient schema, another great way to kill your application. Use the smallest data types
possible that you really need. If--if you don’t have four billion rows in the table,
don’t use big int. I mean, it’s just the basics, you know. Keep them as trim as possible,
you know, if you have maximum of a hundred values, don’t use int. Use tiny int, you
can feed, you know, 255 unique values in it. Don’t use char[200], you know, when a varchar
or you know, smaller char will do. Little things like this make a huge impact in performance.
And the reason is remember that every index record has got to contain, you know, that
width. So, if you've got an index on a char[40], every single records got 40 bytes in it. And
well, a little bit different because there's some index prefix compression that’s going
on but, you know, the--the general concept stays. If you have a index on first_name,
just index the first 10 characters of it or the first 15 characters of it. You fit much--many
more index records into a single key block and therefore you get fewer reads, faster
performance. When you've got many columns in a table, especially many nullable columns,
consider splitting the tables into two tables like a main and a detail table, something
like that. If you've got many nullable columns or columns that are rarely even accessed or
rarely searched on, put those in a separate table and join them on the one-to-one, you
know, relationship, okay? And the reason for that is you can pull many more of the main
records that you are actually searching on into memory. You get--especially, if you don’t
need the detail records of those fields, you can have a lot more opportunities to get more
performance from that. Consider vertically splitting many road tables using partitioning
or the MyISAM merge tables which--since you guys are using InnoDB, it doesn’t matter.
Partitioning is in 5.1, so, you guys are a little ways away from that. It's all right.
You can also do, you know, your own application level vertical, you know, partitioning where
you're naming different sets of tables and then using views on--the one person that was
using 5.0 you can use views in order to aggregate many tables into a single view. And always
remember that, you know, fewer reads has faster results. And this is--this is my--the limit
of my graphical ability right here. That's about it. It's just, you know, a little flow,
smaller data types placed in narrower index records. More records per block, fewer reads,
faster performance. It's just the general thing. How to get better performance from
your index, use schema. InnoDB, choose the smallest possible clustering key. Since like,
I just explained, it's appended to every single secondary index record. So, if you've got
a 12 byte primary key on your index in--on your InnoDB table, that's 12 bytes per index
record that are being appended every single time you insert in the table. So, be aware
of that. Don’t use surrogate keys when a naturally occurring one exists. We see this
all the time in the forums. And I often wonder what the use of it is and why people do it.
You've got a naturally occurring product ID, tag ID key. It's a naturally occurring primary
key. Yet a lot of people will just kind of put an auto-increment in there just for the
heck of it, you know, but you're never really going to be looking up by record ID. You will
always going to be looking up based on product ID or tag ID. So, some people call it a surrogate
key or a false key, get rid of them, they're just wasting space. Coding Guidelines, I tend
to call it chunky coding habits. Cutting everything down into the smallest possible units you
can. You know, thinking of SQL in terms of procedural programming is a bad idea. At least
to--well, correlated sub-queries and other things I hate. You want to--you want to be
an SQL programmer that thinks in sets. Don’t think in terms of for-loops, okay. Think in
terms of merging and intersecting sets of data. And you'll be able to cut it up into
chunks and you'll be much more efficient in the way when--in the way you code SQL. The
5.0 guy, use store procedures. If you're not using store procedures, you're missing out
on a performance increase in 5.0. InnoDB, how many of you using SELECT COUNT(*) on InnoDB
tables? I know they saw you. Okay. Don’t. InnoDB cannot optimize SELECT COUNT(*) queries,
okay? So, your table and your schema will not scale well, okay? The more records you
get in there, it requires actually reading through and counting all the records. Whereas
with MyISAM, it can actually get the count from the index. With InnoDB, it cannot. It
actually has to count all the values. And the reason for that is the way that InnoDB
implements what's called multi-version concurrency control. And there's a version so that the
row level locking can occur in InnoDB. There's a version for each row. So there's complexities
involved with actually counting, you know, the number of values because well--which version
are you--are you comparing and so use counter tables. So if you're doing, you know, select
counts to our products, have a separate table that stores the number of products and when
you insert into the product table, increment it, when you delete, decrement it. Okay. That’s
how to scale InnoDB for performance. We've seen this by a number of big form applications
that, you know, that they always do, you know, count the number of threads and then count--some--how
many users are logged in. This is a big problem when, you know, if people use--switch from
MyISAM to InnoDB without looking at the code, just simply switch the engines, you know,
it was a--yeah. >> Well, also consider you don’t actually
necessarily want to know the exact number of rows. Do you care if you've got a 1000
threads or 1100 threads? It really doesn’t matter. You want the ballpark figure. So even
if your numbers are not fully up-to-date, it doesn’t matter. If you want to run Chrome
job that does the InnoDB count once an hour, then put that values [INDISTINCT]. You could
put it in another table. You could put it in a file that can include into whatever page
is displayed, that works fine. Then reduce the problem to once an hour rather than every
time you run that query. >> PIPES: All right. Isolating indexes or--index
fields on one side of the equation, you know, your basic--your basic Math class, you want
to put the single on the one side and had your equation on the other, you know. Well,
we'll show an example of how to do this in a second. And using calculated fields when
they're necessary, you know. Let's go into the example of isolating index fields on one
side of the equation. Okay. Here we've got--whenever you apply a function to a field that’s indexed,
it--the itemizer can't use the index on, okay. So you want to look for opportunities where
you can rewrite SQL statements like on the top here into the exact same SQL statement
or the exact same result, but you’ve isolated the index field without any function operating
on it on, you know, the left or whatever, right. And just converted the actual statement
into a--just a different way of representing it, okay. Now, in this case, you're just taking
the two days and converting it into current date minus seven day, you know. Just make
sure you understand the greater than and equals all that, you know, so you're including that.
Now, what's wrong with the second part? How can we improve that with our performance?
Anyone know? >> [INDISTINCT]
>> PIPES: Say again? >> [INDISTINCT]
>> PIPES: Yes, well, yeah, basically that. The current date function is a non-deterministic
function and the problem with that is that the query cache won't cache this query. Well,
the thing about current date is, it only returns a different value once every what, 24 hours?
So if you're running this, you know, let's say a hundred thousand times in a day, instead
of running this once, it's going to run a hundred thousand times. You know, with the
query cache it actually puts the query into the cache stores the results and every time
you issue that exact same query, just takes it rather to query cache never even has to
hit the Storage Engine, the mySQLs, whatever. It just takes right in the query cache. So
in your Perl Scripts or your PHP scripts or whatever you're using to write, just insert
the date instead of using current date, okay. A lot of people don’t realize this, but
very good way to get performance out of your query cache. Okay, using calculated fields.
When you got [INDISTINCT] email address like present.com, index can't be used. The reason
is because you can't have a wildcard on the left side of a constant, comparing it with
like. If you put the wildcard on the right--on the end and you have a constant string and
then wildcard the index can be used because you can look at the left side of the values,
okay. So the way to do this is to do a little trickery and put a reversed email address
actually in your table, okay, with 5.0 you can use a trigger. It actually inserts the
reverse of the email address that you're inserting directly into that new reversed email address
field, okay. And then on the bottom, you can just do the like reversed CONCAT and now you
can--now you actually can use an index on that field. So that’s a way of using calculated
fields or, you know, fields that, you know store different values based on other fields
on the table to get performance out of something that you'd never be able to use an index on,
you know. >> [INDISTINCT]
>> PIPES: Yes. This is something, you know, if you have an application that's looking
up, you know, all the email addresses in your database that end it dotcom or aol.com or
whatever it is, you know, that's--you don’t want to be running that top one every single
time if someone request that. You want to be operating on a reversed email address and
using the index on it. Okay. Learn to use joins. Like I said earlier, eliminating correlated
subqueries is a great way to get performance out of your application currently. Like, Timour
who's going to be speaking in a little bit, is on the optimizer team and he's working
on subquery optimizations but there's, you know, there's just a better way of doing things
with the joins, okay? It's easier and it follows the set-based programming principles of SQL
instead of the procedural language principles that most of us are very familiar with, you
know. The for-loop, the operating on, you know, everything one at the time. So, let's
go to some examples here. Don’t try and outthink the optimizing team. The team works
pretty smart. You'll meet them in a little bit but he's really smart. You can give the
SQL server--the MySQL server in a join hint using like straight join but the problem with
that is while you may get some performance benefit if the optimizer currently can't figure
out the right path and I've only seen this in like one or two instances. Well, in two
months your data sets changes and the optimizer that you've now overwritten with your straight
join hint, now you may have the suboptimal plan for it and you forgot that you put straight
join in there and now you've got a suboptimal plan. So, if you are going to use join hint,
make sure you thoroughly document where you put it in the code and don’t forget about
it. Because as your data set changes, your optimizer could much more easily pick an optimal
path for the query execution. Okay. Eliminating correlated subqueries. With 4.01, it looked
like most of you are on 4.01. You could use subqueries. I tend to avoid them like the
plague. We see this a lot on the forms, this kind of thing. Most of you like advance SQL
programmers, you probably not do anything like this but we see it everyday. You just,
I mean, the thing is, to try and get the maximum sold price from an order items table for each
product. And the procedural programmer, you know, who's a pro coder doesn’t necessarily,
you know, isn't familiar with SQL, hasn’t been trained in set-based, you know, thinking.
This is a very natural way of doing something. Your saying, "Okay. I want my products table.
I'm selecting for my product's table. Okay. And I want each of those products." And for
each one, I want to get the maximum price that it was sold in the order items table.
You know, correlating on the product ID. It's a very natural way for a procedural programmer
to think of things, you know. But remember, it's for each product, I mean, that's the
way that people think about it. That's not the set-based way of doing things. On the
right is the set-based way. You're joining, you say, "Okay. I've got two sets of information.
I've got my products and I got my order items. Okay. And I want to transform these order
items by grouping them on a product id and then finding the maximum price." So, that’s
my set. The maximum price is for each product today. And then, I have my products set and
the maximum price is set and I join them. That--that actually is two operations, right?
Two conceptual operations, grouping and joining to the products. On the left, you are actually
doing a query for every single product so, it doesn’t scale to linear model. Actually,
you have a thousand products; you're doing a thousand sub queries to get the maximum
price from the order items table. You have a million, you do a million queries. You're
still doing two queries on the--on the right. So, think about how you can get rid of correlated
subqueries. This is one of the best ways to get better performance out of your SQL code.
Using derived tables, it might seem like a big mess but it’s actually more efficient
on the right. And it goes back to the same principle of how many queries are you actually
issuing? On the left, you've got a join from customers to orders and in the where clause,
you have a correlated subquery with a group by in it, okay? Not a group, with an aggregate.
So, you're saying, the customers join to the orders but I want a group by the company of
the customer, not the customer id. And in order date equals the maximum or the last
order date in the orders table where the customer's the same, it's correlated to your customer's
table C and then group by the company. Well, you're going to be doing a sub-query for this
select MAX--for each customer in the order's table. Now, on the right, we say, "Okay, are
these customers set?" And we're interjoining on what's called the derived table, a sub-query
in the FROM clause. That’s one set. We're taking the maximum or the last order date
for each customer, distinct customer ID in our order's table, that’s a set. And then
we're joining back to the order's table so there's three--three operations. Three conceptual
sets to get that result versus the non-scaling method of a correlated sub-query in the WHERE
clause which has to be executed for every single customer in the order's table. Okay.
All right. And this is my last slide and I tend to kind of scheme over server parameters
because it's--I'm very happy to talk specifics with anyone of you but to generalize about
server parameters, it's okay but there're very specific situations where they can help--where
you can get some definite performance boost. Be aware of which server parameters are global
versus what's per thread--per connection made to the database. Okay. For instance, key buffer
size. It's a global variable. Sort buffer size, it's per thread so be aware of that.
Don’t, you know, assume that sort buffer size is for the whole database and bump it
up because it's actually per thread. So make small changes, preferably a single change
to your configuration file, retest. See if you get a performance benefit from that specific,
you know, change. Often provide a quick solution but they can be temporary. Really--really,
the performance benefit is optimizing application, your indexes, your schema--your coding. Query
cache, it's off by default so the size is zero for your query cache so if you haven't
changed it, it's not on. You can do show status and they'll say query cache type--I think
it's query cache type on but there's no size to it so that actually is use by default.
So unless you turned it on, you're not using your query cache. If you're doing read-intensive
applications, like, how many are doing, like, lots and lots of writes versus lots and lots
of reads? Who has 90% reads or more? Okay. Do you have your query cache as turned on?
Who has the query cache turned on that they know they've added size to? One guy in the
back, all right. Two, all right, excellent. Okay. If you're doing read-intensive applications,
turn the query cache on. Why? Because it doesn’t have to execute the query. If it's the exact
same query and your dataset isn’t changing much, it just takes it directly from the query
cache. The query cache actually stores the results set. It doesn’t just, like, compile
a query and store the query so they can be re-executed. It actually stores the rows so
that it doesn’t actually have to do in either parsing or anything and the optimization.
It just sends the rows right back to the client. So if you're doing repeated queries like that,
turn the query cache on and give it some size the--in the configuration file, it's the query_cache_size,
set it to, whatever, 32 meg or 64 meg. You can play around with it. You can also do it
online. I think it's sets global query cache size equal something or other, it's in the
menu. So you can do it online or you can do it in your configuration file. Beware of--since
you don’t use MyISAM tables. How many of you think that you're not MyISAM at all? I
hope not. Okay. The MySQL's internal database is in MyISAM. Okay. So that’s why if you
do show status, you'll see, actually, that you have MyISAM operations going on. So even
if all your tables are, InnoDB, the MySQL internal database is MyISAM, so you still
need to put, you know, something in there. So leave the defaults for key buffer size
there so the MySQL database can, you know, live happily. But the InnoDB buffer pool size,
that’s--that’s your main one for InnoDB that determines how many--how much memory
is allocated for storing date and index pages in InnoDB. I think Hecky recommends about
50 to 80% of your total memory on the box should be allocated to InnoDB buffer size.
Okay. Other important InnoDB stuff is InnoDB log, final size and log buffer size. I think
he recommends that the log--I think the log buffer size maxes out for a gig but you can
adjust it depending on, you know, what's your situation is and I can take some questions.
Oh, yeah, memory is the cheapest, fastest, easiest way to get better performance out
of MySQL. Don't even ask your boss. Just stick another piece of RAM in there. It really is.
MySQL just loves RAM, you know. The more RAM you give it, it's like Pac-Man it just eats
the stuff up. It loves it. So, that's the easiest and cheapest way to get better performance
from a hardware situation. Yeah? >> you talked about
>> PIPES: Okay. The question was if you've got a logical key in a character field, right?
Say for instance--I don’t know, Social Security number, something like that. Is there a benefit
to making a--what you're calling a surrogate key out of an auto incrementing ID? There
is a benefit to that, a big benefit to that. Depending on the--especially like if you have,
you know, a CHAR 4 versus an INT you're probably not going to see much of performance difference.
If you got, you know, CHAR 8, CHAR 16, that--that's, you know, your product ID or your record ID
or whatever it is versus an auto incrementing key. You're going to get a lot better performance
from the numeric data type in four bytes. It's optimized for that, so.
>> Why do you go up to [INDISTINCT] >> PIPES: Okay. There's that string data type
that we need. Just--you mean versus like VARCHAR 4,000 or something versus text or whatever,
there's some--maybe [INDISTINCT] can talk to that. I don't know--I don’t know if there's
a specific, you know, how much performance benefit you get from doing VARCHAR versus
text. I think in InnoDB, there maybe something but…
>> Yeah. So the question is there a benefit to using CHAR rather a VARCHAR in certain
cases? >> PIPES: Or text versus VARCHAR.
>> Or text versus VARCHAR. In MyISAM, text fields actually store it separately, text
in blobs. The actual row contains a pointer to where the text field is. In case of a CHAR
or VARCHAR it's part of the row. In InnoDB, same up to a point. In certain cases you would
just find everything in the row depends on which version and so on. InnoDB itself doesn’t
care where the things are CHAR or VARCHAR. It's always internally uses VARCHAR basically.
So, in InnoDB you don't need to make that distinction. Okay. I want to optimize for
fixed length fields, that kind of stuff. It doesn't care about that. MyISAM would care
so that's the basic story. Does that answer your question? Okay.
>> PIPES: Any other questions? No one? All right. Well, thanks guys. Thanks for letting
us come here.