Placeholder Image

字幕表 動画を再生する

  • what's going on?

  • Everybody.

  • And welcome to part four of the data science and in data analysis with python and pandas tutorial Siri's in this video, we're gonna be continuing off of the last video where we got our basic correlation table.

  • And what we want to do in this video is focus on visualizing that correlation table and some of the other things things that are gonna come along with doing that.

  • So to begin, let's just get basically where we were so important.

  • Pandas as PT important numb pie as np and at least two star P don't read C S V and we could do data sets men wage dot C s V and then we want to do, like that act min wage in that loop.

  • I think it would be this loop here, right?

  • This one I don't want to Ah, yeah, that was all just confirmation of what was the problem.

  • So really, just this right here.

  • Copy that says from the previous tutorial.

  • Um and then we want to set the min wage core here.

  • Eso this big yet this becomes our min wage core.

  • Uh, it's not dot head.

  • That's like a typo.

  • Really?

  • And then men wage Corta head.

  • Okay, cool.

  • So this is our correlation table.

  • Now we want a graph it.

  • So, uh, if you haven't already open up.

  • Terminal command prompt Pip, install Matt, plot live.

  • But if you've been following the Siri's you've already done that would be a hard syriza jumpin randomly.

  • So I'm gonna assume everybody's cut the things we've installed.

  • So for so first of all, with map hot lib, we just We're just gonna do import Matt, Matt plot lib dot Pipe lot as p l t.

  • And then we can do peel tea dot matt show and we're just gonna mat show that data frame.

  • So men wage corps.

  • Um and that should be enough.

  • Oh, peel TT.

  • We probably have to call P l t a show in here.

  • Yeah.

  • Okay.

  • So Okay, that got us pretty far.

  • Pretty quick.

  • Pretty simple.

  • Unfortunately, as you'll find with is the case with Matt plot lib all the time.

  • You really are gonna likely need to do a lot of customization.

  • So if you want to come into your groups, I don't even know how to navigate my own side.

  • Apparently, there we go.

  • Dave is coming here, and you can learn all this stuff about customizing, Matt.

  • Plot live if you want.

  • Uh, I'm not gonna spend too much time on that, but we are going to do at least some customization here just to make this work because I feel like one.

  • The colors are wrong to the labels.

  • Being numbers means nothing to us.

  • It's not helpful.

  • That's not helpful.

  • Visual ization.

  • So how can we make this a little better?

  • Well, right out of the gate, I would think.

  • Okay.

  • Hey, here's what we could do.

  • We could say labels equals and then just duelist comprehension.

  • So see, foresee, in men Wage corps, I see the issue there anyway, columns and then fix this.

  • And then let's just do like, the 1st 2 letters, so that would be like the first.

  • It's gonna be a long day.

  • Okay, so So this will be like, you know, if it's Alabama, a Michigan m I, and so on.

  • So we could do something really basic like that.

  • And then let's fix some of the other things Eso fig equals peel tea dot figure, and then figs size will be.

  • We'll just make this a 12 by 12.

  • So if you want to start customizing Matt Pot lib, you have to kind of even pull it out even further.

  • So first of all we've pulled, we stopped you.

  • Well, in this case, we actually didn't do like dot values or anything.

  • We just passed the entire data frame, and it works, so well, that's great.

  • Uh, but then if you want to start modifying things, you can't modify a p l t.

  • You have to modify an access.

  • Well, to have an axes, you have to have a subplot to have a supply you need to figure So So we have our figure.

  • Now we want to do our axes.

  • So we're gonna say axes, Eagles fig dot ad subplot.

  • And in here, I'm going to pass 111 What this means is, this is a one by one.

  • Um, let's say the figure.

  • All the subplots on the figure are in a one by one grid, and this is number one.

  • Um, so this just means there's gonna be one graph now, Uh, what we want to do is a ex dot matt show, and we want a mat show men wage core.

  • But then we also want to change the sea map.

  • That's that's Ah, color map and the sea map we want to use.

  • Here's p lt dot CME CME dot uh, red, yellow, green.

  • And now we have that.

  • Ah, we We could just go ahead and just do this real quick and boom.

  • Um, Now it's a heat map.

  • Looks more like what we would expect.

  • And, um, but our ticks aren't really labeled yet, so that's the next thing we want to do.

  • So the next thing we want to say is a ex dot set underscore, er ah, set underscored Why tick labels And we want to pass labels.

  • And then we'll do the exact same thing here.

  • And we we probably want to do that before the show.

  • I can't get away with that.

  • Oh, I do X and y Okay.

  • Okay, cool.

  • So we have the labels here, but we don't have all of them so mad.

  • Plot live is kind of truncating them because it doesn't want to put too many labels on axes, Make it hard to read.

  • If these were numbers, then this would be probably descriptive enough for us, but they're not numbers, and it's not.

  • So what we need to do is tell Matt Pot Lid.

  • Hey, show me, all of them.

  • So the next thing we were we can do here is we can actually just tell Matt Pot lib, show all of the labels and the way that we could do as a x dot set, underscore and the ex ticks.

  • And then we use numb pied up range, and that will be for the len of labels labels.

  • And we'll do the exact same thing for wife X y boom.

  • Um, no.

  • Why didn't that work set x two?

  • Why didn't that work?

  • Um, why isn't that working?

  • Hold on, everybody.

  • Mm.

  • Uh, why Tick labels label Matt show.

  • So maybe we will, uh, will set the mat show first and then do the modifications.

  • Yeah.

  • So I'm guessing that even after we set this than we do Did this and it, like, reset it for whatever reason.

  • So first to the mat show, then modify the ticks and I bet we can't do it.

  • Showed now right?

  • Like that will still be messed up.

  • Or would it?

  • And what's up with the text?

  • fascinating.

  • Okay, whatever we'll do this way.

  • Um, Okay, great.

  • So that's looking pretty awesome.

  • Um, now what?

  • Well, if we look, it's probably really hard to see here, but, like, what if I just print labels out?

  • Um well, you see, we actually have quite a few overlapping that are, like, the same name, like am I am I So, Michigan, Minnesota, and any any any elegant, like, 45 pennies here.

  • That's got a problem.

  • So maybe we want to actually make these right, so we could hard code them ourselves because we might actually know them.

  • Or we could bring in outside data set.

  • So a lot of times, in this case, we only have, like, 39 states that actually had minimum wage data, so we could totally fill that in by hand.

  • But later you might find you have a data set that's much larger, and then you've got, like, some sort of values that judge want a map that data set, and it would just be very cumbersome to do by hand.

  • So how could we fix this?

  • Well, first we need to find we need some way of mapping the state name to um, too, You know, I got to letter thing.

  • So the first thing I would do is just go online and look for one, right?

  • So, um, state abbreviations.

  • So I would start by here, and then basically, I think I started.

  • Yeah, found ended up finding this one.

  • I think I started here and then eventually led to this one.

  • So this was the 1st 1 I found.

  • Um, but we can't use pandas to pull from here, cause this website, uh, actively declines robots.

  • So then I found info please dot com and this one does not.

  • Yes, we can overcome, Uh, anything that tries to decline access to a robot.

  • That's not a problem.

  • Um, I did.

  • Problem I have is making a tutorial on doing that to a specific website.

  • I think you could run into certain issues if you do that.

  • So I'm going to avoid doing that.

  • So we use this side instead because this site does not block bots.

  • So cool.

  • So that's the one that we want to use now.

  • Um, so let's go ahead and read in that data.

  • So the first thing that we would do is we would just uh, you might have this in a separate script.

  • So I'm just gonna write everything out again.

  • Import pandas as PT.

  • Especially because this could cause we might have issues with this and a little bit.

  • Um, And then well, we want to do is d efs so d f s equals p dot Read html.

  • And don't run this yet on then that website that we just found.

  • It's like info please dot com.

  • I'll put either the link in the description where the text based material in the description on you can find a link there.

  • Or you could just google exactly how I did and find that one Super simple.

  • Now to use re dot html, we need a few packages we need Alex ml html five lib and beautiful soup before so come to command prompt Pip install html five live.

  • Actually, let me just do it in order l xml html five live b s four.

  • So go ahead and install those.

  • If you are on Mac, you will need to run an extra command.

  • Go to the text based version of tutorial to figure that out.

  • If you're on like some sort of company computer with some sort of proxy system.

  • What you're gonna want to do is also Pip Install, uh, requests.

  • You might as well have requests as well.

  • It's a much smarter way of accessing the Internet.

  • Um, so go ahead and grab that, too, even if you're not on a proxy so normal people should be able to just run this and get a return.

  • But when I run this, I get this nasty red nonsense about SSL certificate fail.

  • So to overcome that, um, instead, I'm going to import import requests and then do it actually, just say this.

  • I'll say Web equals requests I get, and I'm gonna get this paste that in there, and then we'll read html Web run that.

  • Oh, could not read.

  • Request.

  • Stop, get.

  • Really, um, I can't decide if I need to restart the colonel because I just installed this stuff or or what?

  • P d don't read.

  • H t o m o.

  • Okay.

  • Okay.

  • So, actually, what we need to say is web dot dot text.

  • What I need to say is what it takes, right?

  • Okay, great.

  • So, um, but again, if you aren't having issues, you probably need this If you continue to have SSL certain issues, you could also say verifying equals false in the request, not get.

  • For some reason, I was getting a certificate issue with Reed.

  • Html but not request.

  • Despite requests verifying the SSL certificate by default, I have no idea what's going on there, but anyway, cool.

  • So we have data frames.

  • So what pandas read HTML does.

  • It will parse that website and, um, and then return a list of data frames based on all of the tables it finds.

  • So even if it only finds one table, it's still a list off that one date a frame.

  • So, for example, for di f in d efs print DF dot head and you can see here that we get to date a friend's one is clearly states.

  • The other one is like territories, which we probably have in our, um, minimum wage data probably Puerto Rico and Guam.

  • But for now, we're going to be focused on, uh, the main one, which will be D f D F zero.

  • So what I'm gonna say is ST, um ab for abbreviation equals d efs zero.

  • Then we'll say state dot head.

  • Okay, great.

  • So that's exactly what we want.

  • We want state, and then we actually don't want the abbreviation.

  • We want the postal code, right?

  • We want the two letter thing.

  • So what I want to do now is just in case we have issues with this, like, a lot of times.

  • If I have from writing a script that gets information from the web, I really probably just want to run that one time just in case that website decides I've made too many requests or, you know, whatever.

  • So while we're at it, we might as well just save this state ab 0.2 c s v and then we'll save that as data sets.

  • And we'll make that, uh, state have C s V.

  • Now a couple things to think about when you save things to a C S.

  • V.

  • Remember a C S.

  • V.

  • So on pandas, Panis believes, as it should, your index is meaningful.

  • So when you save something to a C S v file, um, C S V does not understand anything's an index.

  • So when you save, um, you want to say I'm tryingto Cavor fully remember, it's like either India.

  • I think index equals false on save.

  • So pandas to C s V.

  • And then when you read it in, I want to say it's just index call.

  • So yeah, index default is true.

  • So when we go to save this, it's gonna assume it wants us to save the index.

  • So I say we do that, then we read it back in ST ab dot uh equals pedido read C s V.

  • And actually, I just read this in when we go to read this in suddenly, Um, stay suddenly.

  • What we have is two of these columns, and this time is gonna go on.

  • It's only gonna keep getting bigger and bigger and bigger.

  • So if you are consistently saving and reloading some sort of day set, this will cause trouble.

  • So instead, what you want to do is is in one or the other.

  • You know, one way you could say is you could say index call equals zero, and every time we read this in 01234 will always be your index call.

  • Alternatively, we could even say here index equals false.

  • Save that.

  • Now, the index data won't even be saved so we could do this and it looks yet again the same.

  • Or we could even go even further and say index call equals zero and boom.

  • Now we just have the state abbreviation and postal code.

  • So there's, you know, a 1,000,000 ways that we can do this thing that we're trying to do.

  • So the next thing I want us to do is go ahead and convert this to some sort of meaningful dictionary.

  • So the way I'm going to say that is ab dicked equals states AB.

  • Um And then what we want to say is the only column were actually interested in here is postal code again.

  • Don't forget double square brackets.

  • Otherwise it's gonna treat it like a Siri's.

  • And then we say dot to addict underscore there dot too.

  • Addict.

  • And now let's just do have dicks real quick.

  • Boom.

  • So now, um, we have pretty much what we want, but it starts with postal code.

  • So So I'm gonna say ab dicked equals, um dicked, uh, postal code.

  • Now let's print ab dicks.

  • Okay, Now that is a dictionary that we can easily map to a column.

  • So now, unless his try, um, yeah, well, just say this will say.

  • Actually, we don't have to map it to a call, and we could just labels equals equals.

  • And then it will be ab dicked.

  • See, foresee and min wage core dot homes.

  • Try that out.

  • Okay, We don't have federal f s or whatever.

  • So what we need to do is just set a new value in our ab addict, and we'll just add this in really quick.

  • We'll just say that I always do that.

  • Um, and why did you just do that to me?

  • Uh, you were gonna say that equals effortless as fine.

  • Um, okay.

  • And then we'll run this again.

  • Okay?

  • Yeah.

  • Right now we're missing Guam and will be missing Puerto Rico.

  • So let's just fix both of those as well.

  • While we're at it Womb in Puerto Rico O r.

  • And that should be a g u.

  • And that should hopefully be the only hard coating we have to actually do.

  • Cool.

  • Okay, now we have labels.

  • So then win.

  • Come up here, take this coat.

  • Here.

  • Copy That could come on down.

  • Price should he used page down.

  • But that's all right, Paste in our new calculation for labels and show our beautiful new graph.

  • Now, we don't have the overlap, Ege.

  • Everything has the proper name.

  • That looks pretty good.

  • We did it.

  • Okay, so I think that's an okay stopping point.

  • And we've actually covered quite a bit with, like, the reed html and fixing a bunch of stuff.

  • Yeah, so pretty cool.

  • So I think in the next tutorial, what I'd like to do is we have minimum wage data we can begin to, like, compare data sets in, like, look at data between multiple data sets and try to derive meaning in some subway.

  • Uh, we're not gonna find out anything to spectacular, but it will give us a good opportunity to just kind of bringing Maur outside in different data sets that actually aren't even from the same cause.

  • Like sometimes you're gonna find you got one day to set from over here, and it's like from one totally different provider.

  • They've got another one, and then sometimes they're not even organized by either.

  • Maybe not the same index or a different type of index, or like a different granularity and so on.

  • So anyways, we're gonna get a little messier in the next video um, maybe contained that all in one.

  • And then hopefully right after that, we'll get into doing some machine learning.

  • Just real basic example of machine learning with pandas.

  • So anyways, that's it for now.

  • Questions, comments, concerns, whatever.

  • Feel free to leave them below.

  • As always.

  • Thanks for watching things for the support of the subscriptions, the donations that memberships, all the good stuff, and I will see you guys in another video.

what's going on?

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

相関表の可視化 - PythonとPandasによるデータ分析 p.4 (Visualizing Correlation Table - Data Analysis with Python and Pandas p.4)

  • 0 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語