A python program that automates browser actions for web scraping!

I wrote this program during an internship in the summer after my junior year of high school. The start-up that I was interning at provided a point-of-sale service for restaurants as their product, and they thought it could be useful to scrape the menu of specific locations of a chain because the menu can very by location.

The program uses Python language bindings for Selenium WebDriver to automate a chrome browser and interact with the webpage. In this way, we can scrape javascript generated content that we can’t get from normal scraping (using requests.get() ). It also uses BeautifulSoup to parse HTML received from the browser.

Note: The program normally runs in headless mode. This was disabled in the video above for demonstration purposes.

Python Code Snippets (full code available here)
Getting options of an item in the menu:
def get_item(browser, id): # id is the html id
    """ given an id, scrape a menu item and all of its options """
    button = browser.find_element_by_id(id)
    # click on the item to open options chooser:
    browser.execute_script("arguments[0].click();", button)
    time.sleep(1)

    innerHTML = browser.page_source
    html = BeautifulSoup(innerHTML, 'html.parser') # feed html to parser

    _options = {}
    # divide into option sections
    options = html.find_all('div', class_='menuItemModal-options')
    for option in options:
        name = option.find(class_='menuItemModal-choice-name').text
        choices = option.find_all('span', class_='menuItemModal-choice-option-description')
        if ' + ' in choices[0].text:
            # divide into option, price pairs
            _choices = {choice.text.split(' + ')[0]:choice.text.split(' + ')[1] for choice in choices}
        else:
            _choices = [choice.text for choice in choices]
        _options[name] = _choices
    return _options
Getting page HTML with selenium:
chrome_options = Options()
chrome_options.add_argument("--headless") # run in headless mode

browser = webdriver.Chrome(options=chrome_options)
browser.get(url)
time.sleep(10) # give page time to load everything
innerHTML = browser.page_source
Compiling menu and writing to JSON file:

(cat_titles, cat_items, price were defined earlier in the program)

full_menu = {}
for ind, title in enumerate(cat_titles): # category titles
    all_items = []
    # iterate through all items in a category
    for ind2, itm_name in enumerate(cat_items[ind]):
        item = {}
        item['name'] = itm_name
        item['price'] = prices[ind][ind2]
        item['options'] = get_item(browser, ids[ind][ind2])
        all_items.append(item)
    full_menu[title] = all_items
# getting path of directory to find JSON file path
path = '/'.join(os.path.realpath(__file__).split('/')[:-1])
with open(f'{path}/data.json', 'w') as f:
    json.dump(full_menu, f, indent=4) # writing to file with pretty printing
print('[Finished]')