Placeholder Image

字幕表 動画を再生する

  • If you enjoy content like this, please subscribe to the lucid programming channel for more programming tutorials.

  • In this video, we're gonna be going over a very quick overview of both the beautiful soup module python as well as the requests module in python as well.

  • So if you're not familiar, a very brief introduction to both of those requests is going to be a model that's going to allow us to access Various resource is on the web, so allow us to obtain information from various websites navigate to those websites, and beautiful soup is going to allow us to parse that information.

  • So once we navigate to those websites, we might want to extract certain types of content from those websites, and beautiful soup is going to allow us to do that.

  • So this is going to be very much from the ground up video.

  • The only thing I assume that you have installed in your machine is python and pip P I P.

  • For installing python packages.

  • If you have both of those things installed in your machine, you should be ready to follow along one toe one and it should work out just fine.

  • So we're gonna go over some of the basics of both the requests and beautiful soup module in the first part of the video.

  • And then we're gonna bring all of those things that we learned into, Ah, very real project or a very simple project.

  • I should say this is really meant to be a very minimal introduction to both of these modules.

  • And just to see an example of how you could be effective with both of these things.

  • So let's just get started by installing both the requests and beautiful soup modules on your machine.

  • You probably will already have requests installed, but we'll just make sure that you have that installed in any case.

  • So go ahead and open up a terminal or, if you're on Windows Command, prompt and go ahead and type in the following two lines.

  • So pip install requests.

  • If you already have this installed in your machine like I do, you'll see some requirement already.

  • Satisfied messages pop up here.

  • That's totally fine.

  • If you don't, it will install in your machine.

  • You probably should have this installed already.

  • The other one is install beautiful suit before, so go ahead and say pip installed B s four and I already have this installed on their machines.

  • So we're good to go.

  • All right, so we've got all the things that we need installed.

  • I'm just gonna go ahead and close that window, and we're gonna move back to our file here.

  • So we've gone ahead and done this part.

  • We've installed both of these modules.

  • So what we're gonna do next is we're going to import them and use them, so we're gonna import requests, and then we're going to import a specific class for the B s for module that we installed specifically the beautiful soup class.

  • So this is going to allow us to in the terms of beautiful soup soup if I content that we obtained from a website and again that will make it possible so we can extract information from that content.

  • Okay, so the first thing we're gonna do is we're going to use the request module, and I guess before I should go for before we go any further, I want to mention that all of the code, along with the comments that is kind of supplementing what I'm saying here, all that is going to be provided.

  • Um, I get hub and linked to that will be accessible in the description so you could go ahead and download that.

  • If you want some additional information as to what I'm doing or saying right so back to this, we're going to create a variable called results and we're going to say request dot get And they were going to pass in the website that that we want to access.

  • So in this case, I'm just accessing the home page of Google.

  • So I'm just going to move my browser over here, so I'm just going to access this page right there, So I'm just gonna go ahead and get that And then what we're going to do to make sure that the page was actually access is we're going to print out the status code, which is just a http status code letting us know whether or not the page was accessible or not.

  • So four or four errors are in http code That's letting you know that the constantly looking for is not present.

  • A 200 code is letting you know.

  • The content that you're looking for is present, and the response is okay.

  • So I'm gonna go ahead and write this.

  • I'm going to run it.

  • So what's a python?

  • This file is called Beautiful Soup and request that pie.

  • If I read this is going to run it.

  • Let me just clear the terminal because it's kind of messy.

  • Let me read that again.

  • And we see that the the code that we get back from the google dot com home page is 200 which indicates that indeed the page is accessible.

  • So another thing that we could do is we could also print out some other information of the Web page that we just accessed, like the http headers and again for more information on both the headers and status codes you could navigate to these links that are in the comments that point to the Wikipedia articles on both of those topics.

  • So I'm just gonna go ahead and write this, and we'll see what the headers look like.

  • There's just some extra information on the google dot com home page, so friends that we can see here the domain is going with that calm.

  • Some other information here is well, that might be of use just going to go ahead and close that.

  • That's more or less just to verify that.

  • Not only is the page valid, but it's also indeed the page that we wanted to obtain.

  • Let's keep moving down here.

  • So the next thing we're gonna do is we're going to extract the content of the page, and that is actually the source of the page.

  • We're going to store that a variable.

  • So the the object that we created here, this result the result has these built in methods status code, headers.

  • That's how we were able to print out both of those types of information that we just printed out there.

  • Another one that it has is dot content and that is going to return the source of that page and what we're gonna do, store that into a very well called SRC for source, and then we're going to have this variable.

  • So let's go ahead and run that, and I'm just gonna print out the source so you could actually see what it looks like gonna run that like this.

  • So we see a bunch of output here, and this is really just the Web source of the google dot com home page, so I'm just gonna go ahead, delete that.

  • That's kind of a lot of stuff.

  • All just throw it on the screen.

  • That's great that we can actually access it.

  • Now we turn to beautiful suit because this is going to allow us to actually do something with that content.

  • So what we're gonna do now that we've obtained that contact, we've navigated that Web page we've verified that were on the actual Web page that we want.

  • And now what we're gonna do is we're going to pass that source variable into this beautiful soup class, creating a soup object.

  • And this really is just kind of object that beautiful suit creates from this source, and it allows us to extract certain types of information that we wouldn't want to extract from this thing.

  • So we're going to store the result of that object in this variable called soup.

  • So I'm passing in source here, and there's an extra parameter here this Alex ml, you could more or less just ignore this extra parameter.

  • It doesn't really mean too much.

  • If you don't have it in there, you'll get awarded essentially to tell you to put it in there.

  • So it's there for all intensive purposes.

  • But you don't necessarily need to worry so much about why this is here.

  • So we have the soup object.

  • So, for instance, one thing that we could do now that we have the soup object, is we could do things like give me all of the links on the page.

  • So what we could do is create a variable which I call links, and I can say soup, not find all so final is a method provided from this beautiful soup object that were accessing here.

  • And what do I want to find all of on this page?

  • I wanna find all of the eight tags.

  • So this a is the argument that's being passed and it tells beautiful soup to say, Hey, find all of the eight tags are all of the links on this page and then store them in this variable links, and then we're gonna go ahead and actually just print those out to the screen.

  • And just so we don't get like, too much content, I'm gonna go ahead and comment out the previous print statements here just so we don't have too much output of the street so the gutter links will print about little pretty do line.

  • So right that give it a run, Let's see what we get.

  • So this is the output that we just ran here and we can see that we have a list so you can see the square bracket here.

  • And the Ed Square bracket indicates a python list.

  • And the contents of this list are all of the eight tags on the page.

  • So if I move, let's just take a look at some of these links.

  • So we have images.

  • We have one that has the text maps.

  • So let me just kind of move this back over here.

  • Indeed.

  • You know, if we look, let's just look it over here.

  • We have this thing called images that is pointing to the leak in the top, right, maps, etcetera so we could look for other legs in this list that would correspond to the actual links of the page that we have just have searched for all the links of the page.

  • So that's that.

  • So I would go ahead, comet that out so we don't have excess output here, and let's just keep going down, so get it.

  • All of the list of links is one thing, but we might want to actually extract certain types of links.

  • So maybe the page that we're requesting the page that we've just super fied it has links that we're actually after.

  • So this example here says, OK, actually, I care about all of the links that have a certain string in the text field.

  • So what?

  • I mean by the text field?

  • If I bring this back up essentially what to look for all of the links, Let's say that have the word about in them.

  • So I want to see all of the links that have the word about anything.

  • That's an a tag where we between the A and the slash a.

  • There's the word about it, has that I want that link.

  • So that's one example one use case out of many that you could form in this in this type of work space.

  • So what we're gonna do is we're going to look through all of the links that we've obtained from above, and then we're going to do is we're going to say if the word about is in the text of the link.

  • So you get what I'm doing is abusing the dot text part of that variable.

  • So as I'm looping through each of these links, that list every one of those elements in that list is actually a beautiful soup element.

  • So that allows me to call this dot text function on each of those elements because that is defined from this element that is extracted.

  • It's stored in this list.

  • So I'm saying, Look at the text in that Lincoln that list and is the word about in that in that text?

  • If so, then go ahead and print out that late because I care about that link and actually go ahead and print out the, um, prints out the actual thing that it goes to a CZ.

  • Well, so that's what this is doing here.

  • So Link also has a, uh, a tribute.

  • Or I should say, if a function a TT rs and I'm looking for a specific attributes of that link.

  • So inside the a tag, there's attributes inside of that, which is the H r E f tag inside of that, and I've said give me the content of that H R E f inside of the link tag, which is the inside of this tag that has the A.

  • So it's kind of a lot of Russia dull kind of structure going out there.

  • So let's go ahead and just put that out, all right, that run it.

  • So we have here is we have the lake.

  • So we found the about Google.

  • Fade right here, So that's good.

  • And then we also have the h ref attributes of this link.

  • So, for instance, we see this whole This is the whole A tag.

  • So we have everything from the start of the A tag to the end of it, including the text, including the H Raph.

  • And then what we did with the next line is we said, Hey, inside of the a tag there's this attributes called a treff and we want to access the content of the attributes inside of the A tag and what we did there.

  • That's precisely this.

  • And that's what we're prints it out right here.

  • So I hope that makes sense.

  • I know that's kind of drilling down quite a bit, and you could go pretty complicated with these types of expressions, but I think once you understand this level, you could arbitrarily apply different levels of complexity to your own situations.

  • Your own problems, your own situations, your own scenarios said that already anyway, So we have this structure set up.

  • We have kind of a general idea of how to use both requests and beautiful soup.

  • Let's go ahead and try to apply this on.

  • Ah, more elaborate Web page and the Web pages I had in mind is this What page?

  • Right here.

  • So this is just a collection of briefings and statements given at the White House.

  • So you can see the day of this upload is September 12th.

  • And so these are the most recent remarks by briefings and statements that have something to do with the White House.

  • So there's a whole list of the and the goal of what we're going to do with beautiful super requests is we're going to navigate to this page.

  • We're going to try to find a way to extract all of the links of this page, and specifically all of the legs that have are sort of in this feed here.

  • So that's what we're going to do.

  • We're going to store them into a list, and then you could think about doing whatever you like to that list.

  • I mean, of course, like the possibilities air are quite endless, depending on your goals.

  • But this just kind of gives you the scaffolding, the structure, the idea to take this and then, you know, do whatever you want for your own purposes.

  • Let me just minimize this for now.

  • Let's go back to this code.

  • So right.

  • So what we're gonna do is I'm going to open up a new tab.

  • So I'm gonna go and say, Tab, do have another file that I've created on Have some initial commented, which I've called White House example.

  • Not pie.

  • I'm gonna go ahead and open that file here so you can see that I've just opened a python file, which is called White House example, not pie.

  • And it just has some comments which kind of indicate what this is all about.

  • So we want to obtain links from this website, as we just mentioned.

  • We want to Well, this is just kind of telling you what?

  • This is what this website is all about.

  • And the goal, as I mentioned before it's to extract all of the links on that page on and then just kind of print them out into a list.

  • So right, so we can pretty much use the majority of what we already have.

  • We could just kind of go from there and more or less just copy and paste some things that we have from a previous file.

  • So let's start off by just doing the very basic things.

  • Let's go ahead and import requests.

  • Let's go ahead and say from V S for import beautiful soup.

  • And then what we're gonna do is we're going to let me just go back over in this tab over here because I think it's a little bit easier to just copy and paste.

  • Some of these things were going to want to do a very similar thing where we use the request stock get method, and we're going to paste and not google dot com this time.

  • But we're going to pace in the website of the uh of these statements.

  • So we're gonna go ahead and cut in.

  • They're going to bring this website up.

  • I'm going to copy this link here.

  • Put this back over there and then we're gonna go ahead and paste this in there.

  • So this is now the website.

  • There were accessing these statements for the content that we want to extract.

  • Okay, so we've got that.

  • Now the next up is we want to Let's go back over here.

  • So let's see, what else do we do?

  • So one of the other things that we did is we stored the contact.

  • Once we've navigated to that site is we store the content of that into a variable called SRC, so that's pretty much just a direct copy and paste.

  • We could just move that right in there.

  • So now we have SRC variable that has the content from the result.

  • And now we want to do going back over to the initial file that we have.

  • I want to create a soup object.

  • So I'm gonna copy that.

  • Move that over there.

  • That's also exactly the same.

  • So I've got the source of the Web page.

  • I've created a soup object based on that source that will allow us to parse.

  • And then let's go ahead and write some new coat.

  • So I'm going to create a list which are called you or else this will just be the list that will populate with the links that we care about.

  • And then what I want to do is I want to loop through all the links.

  • So let's go back to the page here.

  • So we have a couple links, these with links right here in the page.

  • Let's just go ahead and inspect them.

  • So if you're on chrome or fire Fox, if you right click on an elephant on your page, there should be something like this where it says inspect or inspect element.

  • If we go ahead and click that will be taken to this sort of thing over here was just showing us exactly what in the source of this page, this element correspond.

  • See, you could see as I move over this content, various things are being highlighted to indicate that these things correspond to the code that I'm sort of mousing over here.

  • It's a very nice tool.

  • So basically one thing that we could kind of observe is that all of these legs if I click on this one as well all of these links are contained in these H two heady class tags So inside of the H two tags we have the the a link or the link that we're actually after So you can see that all of the links on this page are in fact four bedded in this way.

  • So all of these states are in between h two tags and one thing that we could do.

  • We didn't see this exactly, But whether that we could do is we tell beautiful soup.

  • Hey, find all of the H two tags on a given page and then we'll look through those and then extract a link from those.

  • And then we'll have our list that we're after minimize that.

  • So we've got our empty list.

  • So now let's do this.

  • Let's say four h to tag ID soup dot fied underscore all and then here.

  • Remember, if we go back to this initial file here, we wanted to find all the links, and what we did is we said, find all the things that had the a tag.

  • Well, in this case, we don't want to find all the things with a tag just yet.

  • We want to find all the things with the H two tag, so we'll go ahead and say, Find all of the woods in this soup that have the tag h two and then inside of that If you recall for go back to that page inside of the H two tag, there's an a tag, and that's what we want to extract.

  • So we'll go ahead and minimize this and say, Okay, now that we're looping through old H two tags, let's say a tag is equal to each to tag, not find.

  • And what do we want to find?

  • We want to find the A tag inside of that.

  • So there's find all and there's find so find all is going to be a function that's going to return to us a list.

  • It's going to return all of the ones in the page.

  • There might be no items of this form on the page.

  • There might be one.

  • There might be Mandy.

  • In any case, it's going to return to us a list.

  • This a tag is going to just find a single element.

  • So it's just going to find a tag.

  • The first tag that corresponds to this and then we're going to store that in this variable a tag here Now we want to do We have hypothetically are a tag.

  • We're going to add that toe.

  • Are you earl's list that we have up here?

  • So we're gonna say you where else dot depend and they were going to say a a tag.

  • And actually, I don't want to just add the a tag I actually want to do, kind of similar to what we did over here where I said not just the leg, but I want the attributes of that link that corresponds to a traf.

  • So going back to this state here, we've lived through all of the H two tags on the page inside of the loop, we have this thing that says OK inside that h to tag there should be at a tag.

  • And actually, furthermore, I want the h ref attributes.

  • I want the actual leg that is inside of this a trap attributes.

  • So let's go ahead and add.

  • Do you still been very similar?

  • So let's say a tag dot a t TRS and then we'll do a trough just like we did before.

  • Okay, so that is pretty much all we need there and then just to make sure that we actually have a list that's populated properly.

  • Let's go ahead and print out you or else and see what we have so well, right, this clear the terminal and then we'll say python And the file is called the White House Example Pie.

  • If we do that, run there and grab, we could see that we have this now, this list of all of these links that correspond to each one of the elements of this page.

  • So we've successfully extracted the links for each of these briefings and statements.

  • Pretty cool and not very much code either.

  • You can see this is quite concise.

  • We're gonna be going over how to create certain objects in beautiful soup.

  • And we're going to be taking a look at how to make use of those objects to extract content from Web Resource is that you might be interested in extracting the content from.

  • So the first thing we're gonna do is we're going to go ahead and just import beautiful soup.

  • So again, assuming that you have this installed, we're just gonna say from B is for import beautiful soup, which is the class that we're gonna use to super fi, the HTML concept that we're going to eventually parson this video and the issue of content that we're going to Parson, this video is actually just going to be described by this variable here, which is a string that really just contains a very, very simple HTML Web page.

  • So the reason that we're doing it like this is for one.

  • To keep things very simple, we just want to illustrate very basic concepts about how to parse this type of content.

  • The other thing is to keep it reproducible.

  • We could see this example on an actual website that's on the Internet somewhere.

  • However, if the source code for that website is changed anyway, that the video may becoming correct as a result because the source change and therefore our source might not be doing the correct thing.

  • Ah, and at the time, whenever you happen to be watching, this could be giving you a different result that what you actually expect.

  • So this string is just going to describe a very simple HTML Web page that we're going to use our example and just to give you a sense of what this looks like.

  • We're going to just go to go ahead and write this content to a file.

  • So what I'm doing here is I'm just creating a file called Index html and then writing that content that we defined of above.

  • So the HTML Doc String, I'm just gonna go ahead and write that out to a file.

  • And then if we go ahead and run this, I'll go ahead and clear the terminal.

  • So I'm gonna go ahead and run this file, which is called beautiful soup Objects.

  • We don't see any output because there's no output to be seen.

  • But if we go to our directory in which the file lives, we can see that it did create this index html Web page so we could go ahead and open this up in a browser every day.

  • So we see this is the HTML rendering of the string that we saved and wrote to file.

  • So you can see there's some very simple content on this page.

  • There's some stuff between some bull tags, stuff between some paragraph tags, links things like that.

  • So we're going to see how to make use of beautiful soup objects to extract content like this and my hope is that you can see it applied in this very, somewhat contrived and simple setting and optimistically apply it to your own setting in which you want to apply these types of ideas.

  • So I'm just gonna close this.

  • I wanna minimize this and we're gonna go back to our code.

  • So let's keep moving on down here.

  • So the next thing that we're gonna do is we're going to create a super object based on the HTML doc string variable that we defined up above.

  • And so this is going to allow us to parse the HTML content.

  • So what did I do?

  • Want to mention that I don't believe I did mention in the quick start guide is one thing that's kind of neat is you can use this prettify function which had provided from the beautiful soup class, which allows you to output the HTML code in sort of a nicely formatted way.

  • So I'm just gonna go ahead and write this and give this a run.

  • So figured this run.

  • We see that the HTML code is printed out very nicely.

  • It's indented properly, and it just kind of looks very clean.

  • And how you would expect it to be properly formatted.

  • Let's say if you were writing this code So alternatively, if we get rid of this prettify function, we just kind of remove that they're right and then running again.

  • You can see that this output here that we got kind of structures, everything together.

  • So it really depends on what you're looking for, what you're trying to go for.

  • If you're getting, like, a big sense of maybe a more complicated web page, it might be nice to look at this form here.

  • Otherwise, if you kind of want to see everything scrunched on just this screen here, it might be fine.

  • Thio, look at it like this.

  • So it really depends on what you're going for.

  • But it's nice to know that this type of function exists.

  • So I'm just gonna comment this out so we don't have ah Holton of output as we're moving along here in this file, and the next thing that we're gonna cover here are tags.

  • So what we're gonna do is we're going to take a look at this code, this HTML source and the first thing we're gonna do is we're going to print out the bold tag.

  • So if we do soup dot b, this is going to give us the first occurrence of the bowl tag in the HTML that we're parsing from the soup.

  • So if I go ahead and write this and give this a run, we said the output here is this content here.

  • So it's some The dorm lost the story between these bold tags.

  • So I just want to also point out that this content this HTML content is really just ripped straight from the beautiful soup documentation page with some minor alterations.

  • So that bowl tag that we just extracted is this one right?

  • Here it goes.

  • It starts from the top of the file.

  • It keeps moving down and finds the first thing that has bowled tags and says, Okay, I'm going to print this out.

  • Alternatively, if we said soup, not dot b, but dot P.

  • This would also do the same thing, but with the P tag.

  • So here we have class title the dormouse A story Indeed.

  • If we go up to the top of this HD most source start from here.

  • Keep going down.

  • It's the same thing.

  • Here is the first occurrence of a P tag in the file.

  • So that's what's gonna be printed out there.

  • So we're just gonna comment that out and maybe we don't want to just print at the first occurrence.

  • But maybe we also want to do something similar.

  • So we're gonna kind of build up to a way that we can find all of the tags of a given element.

  • So another thing that we can do it that's very similar to what we just did instead of soup dot b or souped up that p we could do a similar thing where we used the find function.

  • So if we say soup, not find and then give it the tag that we're looking for, this will give us again the first occurrence in the HTML documentation starting the HBO doc that has a bold tax.

  • So if you go ahead and write this and I'm just gonna clear the terminal so we don't have too much output, we see that this is also the first occurrence of the bowl tack that we got from before what we just did souped up B.

  • So just go ahead and go back here.

  • Common that out now.

  • Maybe what we want to do is we want to find all of the occurrences of the bull tag in the HTML document that we're parsing.

  • So I'm gonna go ahead and uncommon This and the function that we're going to making use of here is find underscore all.

  • So this is a built in function that we can use of a soup object that is going to find all of the content in this soup that pertains to this bowl tag so we could go ahead and write this and run it, and we'll see that this returns a list where each of the elements in this list are beautiful soup objects that correspond to everything that contains a bold tag.

  • So that's also good to know.

  • So let's keep moving down in this file.

  • So another thing that we can also think about is something called a name.

  • So if we did soup dot be dot name, this is going to give us the name of the tag.

  • So, for instance, let's take a look at the first occurrence of the bowl tackle like we did before.

  • And then we'll tag on a dot name So if we write this and give it a run, we see that the name of the tag is be so that's essentially what this is going to give us.

  • It's telling us the name of the tag that we've just printed out.

  • Alternatively, we can alter the name and have that reflected in the source.

  • So, for instance, if we wanted to, for whatever reason, alter the name of this tag, we could do so in the following way so we can define a variable tag which is equal to soup dot Be So this is the first occurrence of a bold item in the HTML document that we're parsing.

  • We can print that out just to see what that is.

  • So gonna comment this out so we don't get too much output.

  • So we've created just to refresh with creative variable called tag, which is equal to soup, not be the first bold occurrence in the HTML document.

  • We're gonna print that out, we're going to see what that corresponds to, which is something we've already seen before.

  • And then what we're gonna do is we're going to say, tagged out name, which is a property just like we did.

  • We printed that out here.

  • So we're gonna say, tagged on name instead of being equal to bold or be we're going to say actually said that name Equal to block quote.

  • So it's just some other thing we could send equal to whatever we want.

  • I'm just using block coat.

  • Quote is an example, and then what we can do is we could print out the resulting tag.

  • So what we've done is we've found the tag, which is the first bold one printed that out to verify what that was, and we've actually altered this is immutable object were actually altering the bold content of this tag into block quote.

  • And if we go ahead and print that out, we'll see that this is no longer between B.

  • But it's between the block quote.

  • You see, it starts and finishes with this tagger block quote.

  • Go So it was going to come, but that out there and let's keep moving on.

  • So let's go on to attributes.

  • So let's go ahead and say that we want to define a given tag just like kind of how we did before here, where we're finding the in this case.

  • That's the third element of the list that has returned for all of the bold objects that we're looking for an HTML document.

  • So if we go ahead and go back up to our source, remember that one extra.

  • Let me not even go up that high.

  • But if we went back to this function here where we found all of the bowl tags, remember that return to us a list of all of the beautiful soup objects that corresponded to a bull tag in our HTML document.

  • So I'm essentially doing here is I'm saying find all of those elements and just give me the element index to of that list.

  • And then let's just print that out.

  • Just see what we got so we'll go ahead and do that.

  • So in this case, it's a bold tag that has ideas with one and then in between, that is, this is this name Test one, and then it's the end of the bull tag.

  • So let's go ahead and work with this and see what we can do.

  • So this specific tags we saw has attributes, so it's a bull tag, but it has an attribute inside of it, which is called I D.

  • So we can actually access that attributes by saying tag and then using the array index notation.

  • So it's overloaded that if we can access the i d field of this by saying tag open square bracket i d.

  • Close square bracket, which is the name of the attributes that we want to access to go ahead and write that we can see that the value of the I d field that we just accesses ableto want.

  • That's what's being printed out there.

  • So it's kind of cool so we can move right along and see sort of another example of this thing.

  • Let me just comment these lines out so we don't get too much output.

  • I'll go ahead and clear the terminal swell.

  • So now let's go ahead and consider another tag for the sake of example.

  • So I'm also creating another variable here, which I'm calling Tag, and now I'm just gonna print that out to see what we have.

  • So this is the index three element of the list that has returned from all of the bull tags on her page.

  • Just gonna go ahead and write this and run it.

  • So now this time we have another bowl tag which has not just one attributes.

  • So before we had one, that was just one attributes, which was I D.

  • Now we have some other attributes, which is called another attribute, which is also something else.

  • So notice that this I d is pretty widely used in HTML.

  • You'll see like a lot of different elements will have ideas.

  • He will do something.

  • And this another attribute is not typically used.

  • It could be anything else in the whole world.

  • What we're going to see here is a way that we can extract not just one attributes, but also this attribute as well.

  • So if we can move right along here, we can say Prince out tag I d.

  • Just like we did before.

  • We're printing out the attributes of the tag that we have to find a PPE here that has the field I D.

  • And we could also do the same thing for another attributes, which is the name of the other attribute that were trying to access.

  • So if we write this and then run it, you'll notice that it was able to successfully do it for I d.

  • So the idea is equal.

  • The very bold that's printed out here.

  • And then another attributes.

  • We printed that out as well, and that's equal to one.

  • And just like how we had here, where this is mutable were able to alter the name, we could also do the same thing for these sorts of taxes.

  • Well, we'll see that a little bit.

  • So let me just comment these lines out here.

  • Let's create another tag variable.

  • So I'll just study equal to the same tag variable that we had before just to verify that.

  • Let's print that out.

  • So this is the same tag.

  • So I'm just setting a variable tag equal to that, and then I'm just printing that out.

  • So if we wanted to see what are all of the other attributes that this tag actually has?

  • So we knew because we looked at it.

  • It had an attribute.

  • I d had another attributes called another attributes.

  • But what if we didn't know what those were?

  • What if we just wanted to see all of the attributes this particular tag had so beautiful suit provides to us something called in this method here, a T T E R s, which is going to allow us to see all of the attributes of a given tag.

  • So if we go ahead, write this in, print it out, we see that it returns a dictionary where the key is the name of the attributes.

  • And then the value is the value of the of the attributes.

  • So the key here, another attribute This is equal to one that's the value of the century in the dictionary and likewise this key here for I d.

  • The value of this is equal to very bold, so it's kind of neat as well.

  • So let me just go ahead and comment.

  • I will keep that actually out of print, get rid of these prints statements.

  • So, as I mentioned before, these sorts of things are also immutable.

  • So we can change the attributes, the values of the attributes to some other value, if we wish so just to kind of refresh, we have this variable tag which is equal to the element that we saw before.

  • So let me just write that and run it.

  • So this variable tag is equal to this beautiful soup object here and what we can do is we can say, OK, let's access the element Another attributes and recall that that value was equal to one.

  • What we're doing here is, we're saying, actually set that value equal to something else.

  • So in this case, I'm just setting it equal to two.

  • So let's go ahead and print out the result of that mutable action.

  • So we'll notice that the first time before we make any alterations.

  • The attributes value is one, however, wants me change.

  • Another attributes.

  • What's we changed that value equal to two notice.

  • It's reflected here, so we're able to actually change the values of those things, which is kind of neat.

  • So I'm just gonna go ahead and now common this out, and I'll comment out these prints statements as well.

  • So let's keep moving on.

  • Actually, what I'm gonna do is I'm going to keep this tag that we had before and illustrate one further point about this tag.

  • So another thing that we can do since these things immutable is we can also eliminate or delete the fields of an attribute.

  • So just like we do for a python lists what we're trying to remove an element from a list we can use the d E l keyword in python to remove the element for the list.

  • And this is what we're doing here.

  • Were saying Okay, the tag that had those two attributes i D and another attributes, actually, Just go ahead and remove I d.

  • So let's go ahead and print out.

  • Let's bring out the tag before So just to kind of give a sense of where we're starting from we're starting from this thing here.

  • And then what?

  • I did as I said, D L I D.

  • And noticed that after I did that and then printed out the resulting tag, there's no longer an I.

  • D.

  • Field in this thing.

  • So we can also do the same thing for the other field.

  • If you so wish, and then print out the result of that and notice that we have again what?

  • We're starting with this beautiful soup object with both of these fields, and we've deleted both another attributes and I d from it.

  • We've printed out the result, so just kind of showcasing what you could do with this.

  • So he's gonna go ahead and comments all of those out now and let's also comment out this tag as well.

  • Let's keep moving on down and see what else we can do.

  • So we're gonna look at some strings.

  • So we're gonna define a variable tag, which is the same thing that we've been working with for the past couple examples just to kind of show case that I'm just gonna print that out to the screen.

  • So this is the tag variable that we're working with now.

  • Same thing that we've been working with again, uh, for the past couple examples.

  • And then what we could do is beautiful.

  • Suit also provides to us a doubt string method.

  • And if we print that out, let's go ahead and see what we get.

  • We print that out, we get the content that's between the tags.

  • So notice that this is what beautiful soup is considering as the string the elements between the angle brackets between the tags themselves.

  • So this is the string content that we're printing out here, and that's what this is giving to us.

  • So let's just keep moving down here.

  • So I think we're pretty much at the end of the file.

  • I'm just gonna get rid of these two prints statements.

  • Another thing that we can do is just like we saw before the mutability of these things.

  • We can also alter the string content in between these tags as well.

  • So if I do something like tagged out string and then replace with so this is just a function that is going to allow us to replace the content that is that string with something else and I replaced the content with this is another string, and then if we print out the results of that, we'll see we'll see what we get.

  • So let me just kind of review what I'm doing here.

  • Creating the variable tag that we've seen before printing out the original tag with no alterations whatsoever that I'm using this dot replaced with Method toe actually alter the text inside of that beautiful city of object, and I'm going to print out the resulting tag.

  • So we should see is we should see the original tag with test you in between and then after we've replaced this content with this another string, we'll see the same tag but with the text altered.

  • So let me just go ahead and write that because we kind of have a lot of output of the screen.

  • Let me just clear it and let's go ahead and run this.

  • So indeed, this is what we see.

  • We see the initial attributes.

  • They're the initial beautiful soup object with Test two on Altered.

  • And then we've replaced that with the method given to us by beautiful soup, and we replaced it with this text here.

  • This is another strength.

  • So that's just some of the attributes or objects I should say of beautiful soup and some of the things that you might want to do.

  • When you're parsing the page, you might want to parse a page, make these changes and then write these things to a file for various reasons.

  • There's a lot of different things that you can do.

  • I hope that this kind of showcases the things that you can do, and hopefully it's broadly applicable to sites that you may be encountering into her own in your own scenarios.

  • So if you have any questions or comments, don't hesitate to leave them below.

If you enjoy content like this, please subscribe to the lucid programming channel for more programming tutorials.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

美しいスープのチュートリアル - PythonでWebスクレイピング (Beautiful Soup Tutorial - Web Scraping in Python)

  • 1 1
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語