Game Prediction System Part 1
Data Retrieving
_Data Source and version control_
crawler with PyQt and BS4
BeautifulSoup is just a Web page parser,cannot use for ajax or other dynamic pages,or it is very difficult to achieve it.There is a video on youtube offers an solution. It simulates an browser client to get the dynamic pages,so you can just use BS4 to parse the ajax pages,just as follows:
Code:
1 | import sys |
- However this way cannot turn the page down to another page,so we can just get the first page of 144 pages.So,in this system,we will not use this way to get the players and teams data,selenium will be the best way to get it,as for bs4,in the games data retrieving, we will see it again
There are some attentions:
- You’d best to choose PyQt4,this release is the most mature version
- No matter PyQt4 or PyQt5,it is just compatible to the Python 3.4.2 or less. (
a very big hole) - Once you have changed your python release,you’d better change your system environment variables to fit it
- Everything with BeautifulSoup is the same as before or follow
crawler with selenium
- Selenium it a library that can help you to simulate a browser behavior and get the ajax data that you can’t get on the formal situation
- First you need to get a selenium,and geckodriver as before i said
- Then a DB browser is neccessary for you to visuilizate your database
- Code:
1 | # -*- coding:utf-8 -*- |
- The code is very easy to understand,we create a sqlite3 db and put our data into it,that’s it
And also there is some attentions:
- Once you get a geckodriver,if you want to use
driver = webdriver.Firefox()
,you need to add the geckodriver’s path to the system environment variable path,press shift and click right,you can get the whole path. - The way to get players data is almost the same with teams
- Name cloumn in table should not be unique because one player can be in many teams in different period
- That’s it
- Once you get a geckodriver,if you want to use
Today that’s all,in the next part we will try to get games data using bs4 and urllib,that will be much more easier,see you~