Placeholder Image

字幕表 動画を再生する

  • what's going on?

  • Everybody.

  • And welcome to part four of the data science and in data analysis with python and pandas tutorial Siri's in this video, we're gonna be continuing off of the last video where we got our basic correlation table.

  • And what we want to do in this video is focus on visualizing that correlation table and some of the other things things that are gonna come along with doing that.

  • So to begin, let's just get basically where we were so important.

  • Pandas as PT important numb pie as np and at least two star P don't read C S V and we could do data sets men wage dot C s V and then we want to do, like that act min wage in that loop.

  • I think it would be this loop here, right?

  • This one I don't want to Ah, yeah, that was all just confirmation of what was the problem.

  • So really, just this right here.

  • Copy that says from the previous tutorial.

  • Um and then we want to set the min wage core here.

  • Eso this big yet this becomes our min wage core.

  • Uh, it's not dot head.

  • That's like a typo.

  • Really?

  • And then men wage Corta head.

  • Okay, cool.

  • So this is our correlation table.

  • Now we want a graph it.

  • So, uh, if you haven't already open up.

  • Terminal command prompt Pip, install Matt, plot live.

  • But if you've been following the Siri's you've already done that would be a hard syriza jumpin randomly.

  • So I'm gonna assume everybody's cut the things we've installed.

  • So for so first of all, with map hot lib, we just We're just gonna do import Matt, Matt plot lib dot Pipe lot as p l t.

  • And then we can do peel tea dot matt show and we're just gonna mat show that data frame.

  • So men wage corps.

  • Um and that should be enough.

  • Oh, peel TT.

  • We probably have to call P l t a show in here.

  • Yeah.

  • Okay.

  • So Okay, that got us pretty far.

  • Pretty quick.

  • Pretty simple.

  • Unfortunately, as you'll find with is the case with Matt plot lib all the time.

  • You really are gonna likely need to do a lot of customization.

  • So if you want to come into your groups, I don't even know how to navigate my own side.

  • Apparently, there we go.

  • Dave is coming here, and you can learn all this stuff about customizing, Matt.

  • Plot live if you want.

  • Uh, I'm not gonna spend too much time on that, but we are going to do at least some customization here just to make this work because I feel like one.

  • The colors are wrong to the labels.

  • Being numbers means nothing to us.

  • It's not helpful.

  • That's not helpful.

  • Visual ization.

  • So how can we make this a little better?

  • Well, right out of the gate, I would think.

  • Okay.

  • Hey, here's what we could do.

  • We could say labels equals and then just duelist comprehension.

  • So see, foresee, in men Wage corps, I see the issue there anyway, columns and then fix this.

  • And then let's just do like, the 1st 2 letters, so that would be like the first.

  • It's gonna be a long day.

  • Okay, so So this will be like, you know, if it's Alabama, a Michigan m I, and so on.

  • So we could do something really basic like that.

  • And then let's fix some of the other things Eso fig equals peel tea dot figure, and then figs size will be.

  • We'll just make this a 12 by 12.

  • So if you want to start customizing Matt Pot lib, you have to kind of even pull it out even further.

  • So first of all we've pulled, we stopped you.

  • Well, in this case, we actually didn't do like dot values or anything.

  • We just passed the entire data frame, and it works, so well, that's great.

  • Uh, but then if you want to start modifying things, you can't modify a p l t.

  • You have to modify an access.

  • Well, to have an axes, you have to have a subplot to have a supply you need to figure So So we have our figure.

  • Now we want to do our axes.

  • So we're gonna say axes, Eagles fig dot ad subplot.

  • And in here, I'm going to pass 111 What this means is, this is a one by one.

  • Um, let's say the figure.

  • All the subplots on the figure are in a one by one grid, and this is number one.

  • Um, so this just means there's gonna be one graph now, Uh, what we want to do is a ex dot matt show, and we want a mat show men wage core.

  • But then we also want to change the sea map.

  • That's that's Ah, color map and the sea map we want to use.

  • Here's p lt dot CME CME dot uh, red, yellow, green.

  • And now we have that.

  • Ah, we We could just go ahead and just do this real quick and boom.

  • Um, Now it's a heat map.

  • Looks more like what we would expect.

  • And, um, but our ticks aren't really labeled yet, so that's the next thing we want to do.

  • So the next thing we want to say is a ex dot set underscore, er ah, set underscored Why tick labels And we want to pass labels.

  • And then we'll do the exact same thing here.

  • And we we probably want to do that before the show.

  • I can't get away with that.

  • Oh, I do X and y Okay.

  • Okay, cool.

  • So we have the labels here, but we don't have all of them so mad.

  • Plot live is kind of truncating them because it doesn't want to put too many labels on axes, Make it hard to read.

  • If these were numbers, then this would be probably descriptive enough for us, but they're not numbers, and it's not.

  • So what we need to do is tell Matt Pot Lid.

  • Hey, show me, all of them.

  • So the next thing we were we can do here is we can actually just tell Matt Pot lib, show all of the labels and the way that we could do as a x dot set, underscore and the ex ticks.

  • And then we use numb pied up range, and that will be for the len of labels labels.

  • And we'll do the exact same thing for wife X y boom.

  • Um, no.

  • Why didn't that work set x two?

  • Why didn't that work?

  • Um, why isn't that working?

  • Hold on, everybody.

  • Mm.

  • Uh, why Tick labels label Matt show.

  • So maybe we will, uh, will set the mat show first and then do the modifications.

  • Yeah.

  • So I'm guessing that even after we set this than we do Did this and it, like, reset it for whatever reason.

  • So first to the mat show, then modify the ticks and I bet we can't do it.

  • Showed now right?

  • Like that will still be messed up.

  • Or would it?

  • And what's up with the text?

  • fascinating.

  • Okay, whatever we'll do this way.

  • Um, Okay, great.

  • So that's looking pretty awesome.

  • Um, now what?

  • Well, if we look, it's probably really hard to see here, but, like, what if I just print labels out?

  • Um well, you see, we actually have quite a few overlapping that are, like, the same name, like am I am I So, Michigan, Minnesota, and any any any elegant, like, 45 pennies here.

  • That's got a problem.

  • So maybe we want to actually make these right, so we could hard code them ourselves because we might actually know them.

  • Or we could bring in outside data set.

  • So a lot of times, in this case, we only have, like, 39 states that actually had minimum wage data, so we could totally fill that in by hand.

  • But later you might find you have a data set that's much larger, and then you've got, like, some sort of values that judge want a map that data set, and it would just be very cumbersome to do by hand.

  • So how could we fix this?

  • Well, first we need to find we need some way of mapping the state name to um, too, You know, I got to letter thing.

  • So the first thing I would do is just go online and look for one, right?

  • So, um, state abbreviations.

  • So I would start by here, and then basically, I think I started.

  • Yeah, found ended up finding this one.

  • I think I started here and then eventually led to this one.

  • So this was the 1st 1 I found.

  • Um, but we can't use pandas to pull from here, cause this website, uh, actively declines robots.

  • So then I found info please dot com and this one does not.

  • Yes, we can overcome, Uh, anything that tries to decline access to a robot.

  • That's not a problem.

  • Um, I did.

  • Problem I have is making a tutorial on doing that to a specific website.

  • I think you could run into certain issues if you do that.

  • So I'm going to avoid doing that.

  • So we use this side instead because this site does not block bots.

  • So cool.

  • So that's the one that we want to use now.

  • Um, so let's go ahead and read in that data.

  • So the first thing that we would do is we would just uh, you might have this in a separate script.

  • So I'm just gonna write everything out again.

  • Import pandas as PT.

  • Especially because this could cause we might have issues with this and a little bit.

  • Um, And then well, we want to do is d efs so d f s equals p dot Read html.

  • And don't run this yet on then that website that we just found.

  • It's like info please dot com.

  • I'll put either the link in the description where the text based material in the description on you can find a link there.

  • Or you could just google exactly how I did and find that one Super simple.

  • Now to use re dot html, we need a few packages we need Alex ml html five lib and beautiful soup before so come to command prompt Pip install html five live.

  • Actually, let me just do it in order l xml html five live b s four.

  • So go ahead and install those.

  • If you are on Mac, you will need to run an extra command.

  • Go to the text based version of tutorial to figure that out.

  • If you're on like some sort of company computer with some sort of proxy system.

  • What you're gonna want to do is also Pip Install, uh, requests.

  • You might as well have requests as well.

  • It's a much smarter way of accessing the Internet.

  • Um, so go ahead and grab that, too, even if you're not on a proxy so normal people should be able to just run this and get a return.

  • But when I run this, I get this nasty red nonsense about SSL certificate fail.

  • So to overcome that, um, instead, I'm going to import import requests and then do it actually, just say this.

  • I'll say Web equals requests I get, and I'm gonna get this paste that in there, and then we'll read html Web run that.

  • Oh, could not read.

  • Request.

  • Stop, get.

  • Really, um, I can't decide if I need to restart the colonel because I just installed this stuff or or what?

  • P d don't read.

  • H t o m o.

  • Okay.

  • Okay.

  • So, actually, what we need to say is web dot dot text.

  • What I need to say is what it takes, right?

  • Okay, great.

  • So, um, but again, if you aren't having issues, you probably need this If you continue to have SSL certain issues, you could also say verifying equals false in the request, not get.

  • For some reason, I was getting a certificate issue with Reed.

  • Html but not request.

  • Despite requests verifying the SSL certificate by default, I have no idea what's going on there, but anyway, cool.

  • So we have data frames.

  • So what pandas read HTML does.

  • It will parse that website and, um, and then return a list of data frames based on all of the tables it finds.

  • So even if it only finds one table, it's still a list off that one date a frame.

  • So, for example, for di f in d efs print DF dot head and you can see here that we get to date a friend's one is clearly states.

  • The other one is like territories, which we probably have in our, um, minimum wage data probably Puerto Rico and Guam.

  • But for now, we're going to be focused on, uh, the main one, which will be D f D F zero.

  • So what I'm gonna say is ST, um ab for abbreviation equals d efs zero.

  • Then we'll say state dot head.

  • Okay, great.

  • So that's exactly what we want.

  • We want state, and then we actually don't want the abbreviation.

  • We want the postal code, right?

  • We want the two letter thing.

  • So what I want to do now is just in case we have issues with this, like, a lot of times.

  • If I have from writing a script that gets information from the web, I really probably just want to run that one time just in case that website decides I've made too many requests or, you know, whatever.

  • So while we're at it, we might as well just save this state ab 0.2 c s v and then we'll save that as data sets.

  • And we'll make that, uh, state have C s V.

  • Now a couple things to think about when you save things to a C S.

  • V.

  • Remember a C S.

  • V.

  • So on pandas, Panis believes, as it should, your index is meaningful.

  • So when you save something to a C S v file, um, C S V does not understand anything's an index.

  • So when you save, um, you want to say I'm tryingto Cavor fully remember, it's like either India.

  • I think index equals false on save.

  • So pandas to C s V.

  • And then when you read it in, I want to say it's just index call.

  • So yeah, index default is true.

  • So when we go to save this, it's gonna assume it wants us to save the index.

  • So I say we do that, then we read it back in ST ab dot uh equals pedido read C s V.

  • And actually, I just read this in when we go to read this in suddenly, Um, stay suddenly.

  • What we have is two of these columns, and this time is gonna go on.

  • It's only gonna keep getting bigger and bigger and bigger.

  • So if you are consistently saving and reloading some sort of day set, this will cause trouble.

  • So instead, what you want to do is is in one or the other.

  • You know, one way you could say is you could say index call equals zero, and every time we read this in 01234 will always be your index call.

  • Alternatively, we could even say here index equals false.

  • Save that.

  • Now, the index data won't even be saved so we could do this and it looks yet again the same.

  • Or we could even go even further and say index call equals zero and boom.

  • Now we just have the state abbreviation and postal code.

  • So there's, you know, a 1,000,000 ways that we can do this thing that we're trying to do.

  • So the next thing I want us to do is go ahead and convert this to some sort of meaningful dictionary.

  • So the way I'm going to say that is ab dicked equals states AB.

  • Um And then what we want to say is the only column were actually interested in here is postal code again.

  • Don't forget double square brackets.

  • Otherwise it's gonna treat it like a Siri's.

  • And then we say dot to addict underscore there dot too.