Game Prediction System Part 2
Data Retrieving
_Data Source and version control_
crawler with Urllib and BS4
- Something just be different with last part’s website,the web page on WAN PLUS has changed,this become a static webpage,so that we can just use bs4 to get what we want
 - First Step is to get the different game series’s order. The order is very important,the urls http://www.wanplus.com/event/173.html have different order like 173 and so on,so we must find every series’ order.At the first time i want to find all the orders myself,and i found that it is two much workload. So,i put it to the work to the bs4,as follows:
 
Code1:
1  | # games_order = dict()  | 
- _these are some orders that i have found before.._
 
Code2
1  | conn = sqlite3.connect('lol.sqlite')  | 
- _use sql to create the table_
 
1  | for i in range(8):  | 
- _in this code segment,i use bs4 to find every order in the tag a’s attribute href’s text,at the first time i want to use REGEX to find the numbers,and then i found that the text is the url’s portion,just use it_
 
Code3
1  | for url in new_urls:  | 
- _find ‘match-team’,’team-name’,’team-vs’,we can get the team information and game information includes duration and series name,so that we can put them into our database ‘lol’_
 There are also some tips and notes:
- use class name in bs4 should be like this 
title = soup.find('div',class_ = 'caption-outer'),other usages you should read the documentations - everything in db should not be unique except id
 - just use bs4 for you to find something you don’t want to do ti hand by hand
 - don’t forget to commit your database after everything has done
 - the way to classify number of BO is still a problem, hope i will resolve it later
 
- use class name in bs4 should be like this 
 That’s all we get the game informations, in the next part we may be start our machine learning portion,very excited!
- _See you_