Game Prediction System Part 2
Data Retrieving
_Data Source and version control_
crawler with Urllib and BS4
- Something just be different with last part’s website,the web page on WAN PLUS has changed,this become a static webpage,so that we can just use bs4 to get what we want
- First Step is to get the different game series’s order. The order is very important,the urls http://www.wanplus.com/event/173.html have different order like 173 and so on,so we must find every series’ order.At the first time i want to find all the orders myself,and i found that it is two much workload. So,i put it to the work to the bs4,as follows:
Code1:
1 | # games_order = dict() |
- _these are some orders that i have found before.._
Code2
1 | conn = sqlite3.connect('lol.sqlite') |
- _use sql to create the table_
1 | for i in range(8): |
- _in this code segment,i use bs4 to find every order in the tag a’s attribute href’s text,at the first time i want to use REGEX to find the numbers,and then i found that the text is the url’s portion,just use it_
Code3
1 | for url in new_urls: |
- _find ‘match-team’,’team-name’,’team-vs’,we can get the team information and game information includes duration and series name,so that we can put them into our database ‘lol’_
There are also some tips and notes:
- use class name in bs4 should be like this
title = soup.find('div',class_ = 'caption-outer')
,other usages you should read the documentations - everything in db should not be unique except id
- just use bs4 for you to find something you don’t want to do ti hand by hand
- don’t forget to commit your database after everything has done
- the way to classify number of BO is still a problem, hope i will resolve it later
- use class name in bs4 should be like this
That’s all we get the game informations, in the next part we may be start our machine learning portion,very excited!
- _See you_