
    2023.06.27 | admin | 177次围观








    Share interest,

    Spread happiness,

    Increase knowledge, and leave a good impression.

    Dear you,

    This is the Learning Yard!

    Today Xiaobian brings you

    Regular expressions crawl web content instance knowledge sharing.

    This tweet usually takes about 5 minutes to read. Please be patient and read.


    Today, I learned to use python batch crawl web text during the holidays, share it with you, if there is where there are questions, welcome to ask a private letter, I will actively make answers.


    This time use crawl web text information using regular expressions, regular expressions is only one of the methods to crawl web text information, you can also use xpath, BeautifulSoup and other methods. The target of the crawl is the average price information of each city. The first step in crawling web information is of course to collect a good collection of target web pages, I use the method of traversing for to find the characteristics of the web page (url) into a list of.


    Next is the request data fixed operation, write the request header (simulate a browser to send a request), and then use the get function to request text information into the variable named resp, and parse the results into a text, easy to operate.


    Next is the request data fixed operation, write the request header (simulate a browser to send a request), and then use the get function to request text information into the variable named resp, and parse the results into a text, easy to operate.


    I use the regular expression re.findall function, which finds all the matches and returns them to a list. The next step is to manipulate the data in the list, first of all to control the number of data to 100, and put the 100 into the list of data1.


    Since the content matched and put into the list is in string format, it is necessary to perform floating point processing on the numbers in string format before the numbers can be used in the calculation, where the calculation of the way to count the average of these 100 data (this reflects the power of the programming voice, which can batch process data).


    The last thing is to associate the two sets of data and sort them according to the values and combine them in the form of a dictionary and then put them into a table, which requires the use of the pandas library, and the following code outputs a table with two columns of data. This completes the operation of crawling numeric data from the web page, controlling the number of data, calculating, sorting, and putting it into a table.



    标签: 正则表达式