Placeholder Image

字幕表 動画を再生する

  • what is going on?

  • Everybody.

  • And welcome back to another data science with python and pandas tutorial video In this video, we're gonna be building on everything up to this point.

  • We're gonna continue with the minimum wage data set for one more video.

  • This one, uh, and kind of combined to other data sets just to Seymour about this whole relationships thing and learn a fume or cool features of pandas before I finalize this little miniseries with doing machine learning.

  • Ah, basically a machine learning workflow with pandas in the next video.

  • With that, let's get into it.

  • So I'm gonna be using to new data sets here.

  • We're gonna be using the 2016 presidential election vote by county as well as unemployment by county.

  • So pretty cool.

  • So it be nice to combine all of these data sets into one major data set and see if voting Proclivities, the unemployment rate and the minimum wage have any relationships with with each other, Right, Cool.

  • So that's the idea on really we're just going to see if there's any relationship.

  • It's pretty hard to based on that data loan, draw any riel conclusions, we're just doing it to c t learn more about pandas were really So if you find yourself offended by the results of this video um, not my problem.

  • So anyways, here we go import, import, import pandas as p d.

  • And then we're gonna bring in unemployment county as p dot Read under scorer Sissi And that is data sets slash.

  • And this one is unemployment by county dash us slash output that CSP.

  • So this one came with his own little directory, basically.

  • And then, um, we could do on an county dot head.

  • Okay.

  • And there we get a rate that's our unemployment rate on.

  • Then it's by county, and then you're so cool.

  • So now we want to do is combine this plausibly with our minimum wage.

  • So to do that, we're just gonna come over here and basically take all of this data Copy.

  • Come over here, paste Boom!

  • Awesome.

  • Okay, so now we want to do is ah, we saw that missing data.

  • So let's go ahead and fix that real quick.

  • Um, in fact, can I just do this?

  • Yes, I shall.

  • Working.

  • No, wait.

  • I just want to do this.

  • I don't want the core.

  • I just want to drop it.

  • So instead we'll say Atzmon wage equals act min wage, uh, dropping in and then act me had fixed that capital w Okay, great.

  • So now what do we want to d'oh.

  • So we want to take the minimum wage and we want to kind of add that to a new column in our unemployment county data set.

  • So the way I think I want to do that is by creating a new column and then mapping the values.

  • There's probably a better way than this, but it gives me the opportunity to show you guys at the most basic level a method that will always work for creating new column values and pandas.

  • This was not the most efficient, and in fact, it's going to probably take a moment, probably a minute or two to run.

  • But I'm gonna show it anyways because it always works.

  • And then we'll talk about it, will show a couple other examples that are going to be, like, way faster.

  • But anyway, uh, get eso it.

  • Let's do define, get, get you can't wage on.

  • This is a function is going to take two parameters is gonna take a year and by state, so recall minimum wages by state.

  • But our unemployment data's by county are voting data is also by county.

  • So, unfortunately, can't buy county.

  • The minimum wage actually does kind of differ.

  • So right away, we've got a slight violation here.

  • Um, but that's okay.

  • The show must go on.

  • And I'm really just trying to show you guys an example anyways, so whatever, but sure, that's that's a problem.

  • We just I don't have a minimum wage data by county, so I'm sure there's a file out there that exists.

  • Feel free to use that.

  • If you're If you're trying to make an actual report here, just take note that this isn't quite fair.

  • Uh, anyway, what we want to do, we'll do a try and then, uh, accept.

  • And if we hit an exception, it we're gonna turn n p dot nan also have we?

  • Do we We did important, right?

  • Because we used nan right using in here.

  • Wait, everybody just relax a second.

  • Where when did we We never imported numb pie.

  • Not on this one.

  • What?

  • Okay, well, you're gonna need numb pie.

  • Mmm.

  • I'm gonna twilight zone.

  • Okay.

  • Whatever Um, yeah.

  • Okay.

  • I'm still Okay.

  • Whatever.

  • Here we go.

  • Um, rude term act.

  • Men wait, because we don't wanna get the value.

  • Right.

  • So we're trying to get a minimum wage by year in state, so return Ackman wage.

  • Um, and we can't just say your state because that action Ackman wage is indexed by year.

  • So actually, Aaron do dot Lok, And then we grab the year, so dot Loke, what's that look year, Okay.

  • And then we'll actually, so it would be year, But no, we're gonna say that Lok year, Okay, as a reference to the index column, and then we want to get that specific.

  • Um ST.

  • So that's a function that will do that.

  • We can quickly test that function by doing something like this.

  • I get men wage 2012 and, uh, Colorado and, in fact, 2012 should be a string Cool.

  • So that's 8.33 Probably impossible to see, but hopefully on your screen.

  • You see it?

  • Okay, so now we've got that.

  • Now we're gonna map this function to a new column in pandas, and the way that we can do that is just like regular mapping function in python in general, so we can map a function to a list or array in python using map.

  • And this is like I said before, not going to be the most efficient way.

  • But this is a way that will always work.

  • Especially when you have things like functions with multiple parameters or you have a fun like you want it.

  • You're not just mapping like one value to another place.

  • Maybe you want to run some sort of true function or calculation or, um, algorithm.

  • Right.

  • You need some more advanced.

  • This is your This is your guy.

  • Okay, so this is how you're gonna do it.

  • So first of all, I'm gonna throw in time.

  • Just tow measure time here.

  • That is a python notebook.

  • Specific commands.

  • I don't do that if you're writing pie charm or something, I don't even know that might work in by term, but definitely not an idol in sublime text.

  • So we're gonna map a new column.

  • So it's bun emp county, and we're gonna say, Umm min, wage where men wage and that is going to be equal to, um I think I'm just gonna write map for now.

  • We have to, in case this is well, but we're gonna say we want a map, get men wage.

  • We're gonna map that function.

  • And then you just passed all the parameters of that function.

  • So year, what year are we gonna map on EMP County year, and then what's the state won't.

  • Same thing right on on EMP County, Um, state and actually, years A capital.

  • Why year?

  • Okay, so we map that.

  • Unfortunately, I don't think this will work.

  • I'm gonna run it just because I'm on fresh pandas and all that.

  • And I'm curious if it does, um, did that work?

  • Everybody relax.

  • A second on.

  • I have to check this now on on in County.

  • I didn't think that was gonna work because as I think like Python, I used to work and fight on two and then with python three, you actually had to do list map?

  • No way that worked.

  • No, you get uh, that's hilarious.

  • That's adorable.

  • Okay, so yeah, we do have to catch it.

  • So I'm planning to That actually worked.

  • You didn't get a map objects, so actually, we have to convert this to a list.

  • Oh, that was so exciting.

  • so fast.

  • Okay, so it actually be calculated.

  • Here we go.

  • And then whenever that is done, I just want it.

  • We could also see when it gets a number.

  • But I'm just gonna do a nympho county dot head here.

  • Cool.

  • Okay.

  • So again, all that's doing is we're just we're converting it to a list at the very end because that, you know, with with pandas, you you could just say a new column is a list of things, and it doesn't matter how you generated that list.

  • It doesn't have to be via python.

  • Well, it doesn't have to be via python initially, I guess, but doesn't have to be viable.

  • Pan does.

  • Okay, um but yet so this is going to relatively slow.

  • If you can do a dot map, that's great.

  • Otherwise, a dot apply or something like that.

  • There are some other options for us and all.

  • Kind of a show, an option, any way down the line.

  • But once that's done, cool, we are good to go.

  • So the first thing that I would do once that is done is I would just check to see unkempt county dots.

  • In fact, let's just do this on EMP County er rate and then men wage lips.

  • Did that capital W again.

  • So don't forget rate is the unemployment rate.

  • We price you to change the name at some point, but that's good.

  • Um, and then this is minimum wage.

  • And the question is, is there a relationship there?

  • Do they co relate together?

  • And then the next question would be pretty much the exact same thing, except to see, do they vary together, so co variance, Um, and we are still waiting on that calculation are Oh, my gosh, it's taking forever.

  • Um, so probably while we went on that we'll just load in the presidential data while we wait.

  • Um, And then if someone knows a quicker way to do multiple, like a quicker way to map a function with multiple parameters, so it can't just have one parameter, because I know, yes, there are quicker ways of it.

  • Just only had one, but what about when it has two parameters?

  • I don't know.

  • The cool thing is, though, that it just knows to map.

  • Um, you know, by row like it, it's still pretty magical, even though it takes a lot of time.

  • Okay, but anyway, we are done on, um, County had done.

  • And then here we can see.

  • Is there any correlation?

  • So it doesn't look like there is much of a true relationship, but it does look like they actually do very together pretty significantly.

  • Um, so that's interesting.

  • I guess that would probably suggest that there's other factors at play, and those other factors have a high correlation to these things.

  • But these things between each other don't.

  • That's kind of how I would read into it.

  • I could be totally wrong.

  • I'm just here to teach you guys pandas not toe, actually, uh, find true answers.

  • It's Anyway, let's load in the presidential data now.

  • So say Prez 16 equals P d dot read C S V.

  • And this would be data sets slash prez 16 results stopped C s V now, uh, prez 16 dot head.

  • And when I get here is, uh, County Phipps.

  • I'm not sure what Phipps is.

  • Candidate s t, which I believe is state, but I don't know for sure what S T stands for.

  • Ah, percent report.

  • That's probably how many police is reported.

  • I don't know votes.

  • That's I guess?

  • Total.

  • Oh, this is how many votes for that person.

  • This is how many votes in total were made?

  • Yeah, and then this is what percentage for what?

  • Which candidate.

  • So as you can see, so for each, um, for each county in each candidate, we get all this data.

  • But to keep things simple, let's just do Donald Trump.

  • And then the value we're gonna track is the percent.

  • So for each county that we find, our question will be So how many people in that county voted for Donald Trump?

  • And we'd like Thio check correlation co variance between the unemployment rate in the minimum wage in their relationship with the people voting for Trump.

  • Okay, so let's do that.

  • So how do we do that?

  • Because now we actually need to combine all this data on county and state and then share the percentage vote.

  • So that's that Sounds like a really complex thing that's really hard to jumble in your head toe.

  • How you might do that, but it's not too too too hard.

  • Curiously, um, I think I think I'll just continue on this, but, um okay, so So what?

  • We want to d'oh now is Let's just grab, because our our, um, unemployment data on EMP on hemp.

  • I'm sorry.

  • On a planet county, we're gonna call it 2015 in the second Uneme County.

  • Uh, let's just do a lend of a nem County, right?

  • That's a very big data frame.

  • So I think what I'd like to do is just get the latest unemployment county data, which I believe is February 2015.

  • And, um, yeah, so we could make that much smaller.

  • So I will do the following.

  • So we have Unemployment County.

  • So just just to make sure because we're going to be dealing with so many data sets here, it's gonna be easy to get a mixed up right.

  • So we have year and month.

  • So now we want to filter this data frame down to just 2015 February.

  • Okay, so how do we do that?

  • So So the way that we're gonna do that is similar to what we have before, but we need, like, 22 things.

  • So we're now what we're gonna say is county underscore 2015 is equal to a nem underscore county.

  • Um and then this is probably long, so I'm just actually think amazing.

  • God, no, it wasn't, Anyways, Probably long.

  • So I'll have to zoom out.

  • But for now, another county wear and then we have two clauses.

  • Basically, So we're gonna say Klaus number one and, you know, clause number two.

  • So, under what scenarios do we want?

  • County 2015.

  • Well, we want it to be the case where this, uh and then we're gonna say a year is 2015 and then we also like, I'm just gonna copy this whole thing.

  • That and then we're gonna say month equals February.

  • February.

  • Awesome.

  • Now that we've done that, um who?

  • I don't know if I accidentally just ran that twice.

  • I shouldn't No, that shouldn't matter, because we're creating a copy.

  • Um, when we like.

  • Do mapping is though, it really will screw things up.

  • Especially if you replace the current column name.

  • So we gotta be pretty careful about running things accidentally multiple times sometimes.

  • Uh anyway, cool.

  • Um, Mississippi Nan, Didn't we drop the Nance?

  • Okay.

  • I thought we dropped Nan's, though.

  • Like a way up here.

  • Right?

  • Didn't we, friend?

  • Or is that Ri Min wage?

  • No, wait.

  • Maybe I guess because we have a Maybe because we had the unemployment rate, but not a minimum wage value for that specific county.

  • I'm not really sure.

  • Um, we're gonna keep moving on and see if we hit any further issues.

  • Probably something to dig into those.

  • Maybe.

  • But I got so much I want to show so far, so we'll continue for now.

  • Cool.

  • Is that okay?

  • Anybody who disagrees speak up.

  • If I don't hear anything was continued.

  • So So what I'm gonna do now is way need thio.

  • So our presidential data set has, like, if we do state, let's say let's say prez 16 uh s t dot unique.

  • Okay, so these are actual states.

  • Um, but they are the two letter now.

  • We know postal code.

  • Right?

  • So we need to convert this state to being the postal code rather than the full state name.

  • So how could we do that?

  • Well, turns out, up to this point, we actually already did that right.

  • Well, we've already got the postal codes and the state.

  • We've got a dictionary, we can just map it.

  • So let's do that.

  • So what we're gonna say is state AB um, it was ST abbreviation for now, petey dot read C s v.

  • And we saved it because we're smart cookies that way did it.

  • And to be honest with you, um, as I was making the Siri's I do not plan that.

  • It just It just came in.

  • Just was super convenient.

  • Ah, ST ab of dot C s v and we will read in.

  • Uh, let me just print out state ab dot head real quick.

  • And let's just get this the way we want it.

  • So, first of all, we're going to say, uh, index under scorer Cole equal zero rates, and then we only care about the postal codes of state AB equals state.

  • Don't forget double brackets there for you get screwed, and it's a Siri's would be horrible.

  • Okay, great.

  • So cool.

  • So now well, we want to do is convert that to a dictionary.

  • So where Alabama maps to a l and so on.

  • So to do that, where's Jose State AB Dicks?

  • Equal State ab?

  • Um e I think first we would say dot too underscore dicked and then let me just print that real quick.

  • Let's just do state abou dicked, right?

  • So then we'll just say we converted to addict and then we'll say Postal code.

  • Boom.

  • Now we've got the dictionary that we can now map to the column so we can actually just rename that column just right in place so we can say County 2015.

  • State is equal to county 2015 state dot map, and then we can actually just map the dictionary.

  • And it's just beautiful and magical, and it works.

  • So we map that.

  • And this would be something you would not want to run twice.

  • Yeah, uh, County.

  • 20 feet.

  • Where do we define County?

  • 20 bucket here.

  • So I could run that again.

  • Right.

  • So, uh, the reason why we're seeing that is because accounting 2050.

  • So we could late.

  • We could have actually said Unemployment county.

  • Ah, don't.

  • We could run a dot copy there, and that should fix our error.

  • I'm too afraid, though.

  • I just want to have to run everything again.

  • But yeah, that's probably we would need to do is throw it.

  • Let's just try that really quick dot Cobby.

  • Um, so we throw in the copy, and then let's run all of this again.

  • Cool.

  • Okay, so now So let's just do County 2015 dive tail.

  • Okay, great.

  • So we have everything there, Uh, that we need So everything's been converted?

  • Um, yeah.

  • So now what we want to dio is let's check.

  • So print uh, Len counting 2015.

  • And now let's check the length of prez 16 prez 16.

  • Okay, so press sixteen's longer.

  • So let's map of that to 2015.

  • Now.

  • Um, let's do prez 16 dot columns real quick quips.

  • So we have a couple of issues here.

  • Um, we have we want to map these things together, so we're gonna merge them together or join, um, basic merging joint on the back end of pandas are actually the same code.

  • It just the default is different, But you can you actually do the exact same operation with either one?

  • Um, so yeah, so you can pretty much interchange those things.

  • Ah, but so the first thing to note is how do we want to merge these together?

  • Well, we would merge them based on the year in the state, so I'm sorry, not the year in the state.

  • The year is the same because everything is only one year, so this is both of these are one point in time.

  • So we would act.

  • We want to merge them together on state and county.

  • Those are the two different things.

  • So the first thing that we want to do is make sure state and County have I.

  • They're identical between these two data sets, and right now they're not.

  • One is S t, and then the other is county.

  • But it's Laura Casey, whereas here it's capital s and full state and then county.

  • So it fixed those two things first.

  • So first, we're going to say, uh, in fact, let me make a little more space.

  • So first we're gonna say prez 16 dot re name.

  • And then we will rename the following columns and we're going to do, uh, lower Casey County gets really renamed to Upper Casey County.

  • Simple.

  • And then we want to do state as well.

  • So s T maps to Capital s state, and then we're going to do this in place.

  • Eagles troop prez 16 dot n great So state.

  • Assuming county here eyes correct.

  • It's just this is this state is us, and that's totally fine.

  • No big deal.

  • So now to join these two things together.

  • What I would like to do is join them and weaken, join them on an index.

  • That's the easiest way.

  • So for both of these were just gonna set the county and state as a double index, basically, so you can have an unlimited number of indexes.

  • All right, um so that's what we're gonna do.

  • First, we're gonna do the same thing for both of these.

  • So we're gonna say four d f in and then we're gonna list.

  • Um, we're going to run this exact same code with both anyways, So county 2015 2015 and then prez 16.

  • What do we want to do?

  • Well, we're gonna say D f that's left a set underscored index, and we are going to set a double index were going to say County can t come on and whoops over Didn't screw anything up state and then, uh, in place equals trip.

  • Cool.

  • Now what we want to say is we still haven't filtered prez 16 by like if we say, uh, prez 16 dot head again.

  • Uh, we still have, um, all of these.

  • So we got Donald Trump, Hillary Clinton, Gary Johnson and so on.

  • And right now, I just want to grab Donald Trump later.

  • Maybe you'd want to add in all these other ones and just see like, maybe like Gary Johnson is the libertarian.

  • So, uh, his relationship toe like minimum wage and people who vote for Gary Johnson might be significantly different because, like, he would be probably a proponent of no minimum wage.

  • Right?

  • And so anyway, um, maybe you'd want all of these candidates.

  • But for now, let's focus purely on Donald Trump.

  • And so that is that will serve for the biggest divide between people.

  • You know, generally between Hillary Clinton and Donald Trump was where the majority of the votes went.

  • So anyway, press 16 dot head s o.

  • The next thing we want to say is, let's just say Prez Prez, 16 is going to be equal to Prez 16 where Prez 16 canned is equal to Donald Trump.

  • Po knows Donald Trump and then Prez And then, like, the only thing we care about at that point is we don't care about any of these other columns except for the percent column, Right?

  • So the next thing to say is pressed 16 equals prez 16.

  • And then again, no Forget to make this a true data frame.

  • Weird.

  • Curious On Lee of the percent column, then president steen dot drop in a in place.

  • He was true just to get rid of any mess.

  • And then prez 16 dot head when we're done Hopefully no errors.

  • Great.

  • Okay, so now, uh, we're ready to emerge these things together.

  • Okay, So to do that, we'll make another data frame all together together.

  • That's gonna be equal to County 2015 dot Merge.

  • And we're going to merge Prez 16 and then we're gonna merge on equals on We'll do county and state, you know, the emerge County state, and then all two together dot drop in a in place equals true.

  • And, uh, let's just see that real quick.

  • See where we stand.

  • Uh, all too together dot Head, let's drop that year column.

  • So all together, not drop, and we're gonna drop year.

  • Uh, axes equals one because we're trying to drop columns, not rose.

  • Uh, and also in placing a shrew in place equals true.

  • And then again, all thio they're not have I can't type together very fast.

  • Okay, now Let's grab the correlation and co variance.

  • So all together, the core and then all together dot co parents.

  • Okay, so it appears that, um that these things like, let's say, the percentage vote is like with minimum wage, for example, it appears the percentage vote is negatively correlated with minimum wage.

  • So the more people, let's say, as percentage goes up that voted for Trump minimum wage presenter or vice versa.

  • So the higher the minimum wages lost like we were a vote for Trump.

  • Um, that seems to be the relationship, which kind of makes sense, like higher minimum wages or, um, or liberal type issue.

  • So that would make sense that those things are cool related, but curiously enough, they don't appear to, um to vary together.

  • So I'm not really sure how to reach Read into that, To be honest with you, uh, otherwise, like, the unemployment rate doesn't appear to be two co very with, uh, we voted for and then also doesn't import appear really to correlate either.

  • So interesting.

  • It appears that, um, politics are way more complicated than we thought.

  • I guess it's not just simple anyway.

  • Uh, I think that's enough for pandas.

  • We've done a lot of stuff with panties again.

  • There are so many things that we can actually do with pandas that it would take me forever to show you guys like all of the things that you could plausibly do.

  • Like, for example, we mostly just did stuff with CS fees.

  • We don't really read from databases or anything like that and use other data files.

  • And there really is, like, so many things that we can do.

  • But for the most part, I tend to find myself reading and data kind of reshaping that data in some way, doing basic analysis, like describes.

  • And then the next Victoria I'm gonna show you guys is the machine learning stuff.

  • So how do you go from a data frame and feed that through an ML model?

  • That's what I'm gonna be showing you guys in the next tutorial with the different data set.

  • We're gonna be leaving politics behind, um, and yeah, so questions, comments, concerns, whatever feel free living below again, any of the findings that we found here are really not statistically significant, even though, really, the finding was we didn't find anything.

  • So anyway, uh, that's all.

  • For now.

  • Yeah.

  • Questions, Concerns.

  • Liam blow discord dot gov slash Centex.

  • I will see you guys a nice video.

what is going on?

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

複数のデータセットを組み合わせる - PythonとPandasによるデータ分析 p.5 (Combining multiple datasets - Data Analysis with Python and Pandas p.5)

  • 2 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語