Featured

Python first project - Homework Scraping

"

     This project is used to help me scrape an answer for my homework, especially on chemistry because I kept failing chemistry. The program shown above is just the simple version of the homework scraping, that only works for multiple choice type of question. This program can only be used for a specific type of question format, and in this case, all of my question is in Chinese, and luckily there's a specific word that is mostly used to conclude an answer in a site, which is "故选". The question format on my homework file, has a similar pattern which is "选择题"or"一、", which make separating the question easier. I used Beautiful soup 4 for scraping the answer, and I used docx module to read the homework docx file.

    The program also has an advanced version, so I like to say. If the simple one scrape the multiple choice of question, then the advanced version would scrape an answer for the writing(essay, short answer, etc) question. The simple one is only using bs4 to scrape it, and it has a limited range of scraping compare to selenium. For the advanced one, I had to make scraping format function for each websites that most likely would have the answer. I'm using baidu because it's just like a google for chinese people, and finding answer in there would be more compatible I use selenium not bs4 because I want to identify what sites is that, and scraping with baidu is different from google because each links that shown in search result is indirect. Using selenium, I could easily access the link, and get the direct link of the site, then identify what site is that to use the suitable scraping format. using this method, unfortunately, it would take a great amount of time to just get an answer for a question, because selenium is mostly used for interacting with the sites.

    For addition, I added timeout option, and for every site that resulted in timeout would be noted on a txt file, so it can be avoided the next time the program encounter the same site.















Comments