Monireh Ebrahimi a6729508c4 changed planning prompt, execution prompt, few shot examples, and browser executions to fix the demo for flight reservation 6 ヶ月 前
..
agent a6729508c4 changed planning prompt, execution prompt, few shot examples, and browser executions to fix the demo for flight reservation 6 ヶ月 前
README.md a44dda15ac updated browser navigation use case readme 6 ヶ月 前

README.md

Building an Intelligent Browser Agent with Llama 4 Scout

This project provides a comprehensive guide to creating an AI-powered browser agent capable of autonomously navigating and interacting with websites. By leveraging the capabilities of Llama 4 Scout, Playwright, and Together AI, this agent can perform tasks seamlessly while understanding both visual and textual content.

Features

  • Visual Understanding: Utilizes screenshots for visual comprehension of web pages
  • Autonomous Navigation: Capable of navigating and interacting with web elements.
  • Natural Language Instructions: Executes tasks based on natural language commands.
  • Persistent Session Management: Maintains browser sessions for continuous interaction.

    Example Tasks

  • Search for a product on Amazon.

  • Find the cheapest flight to Tokyo.

  • Purchase tickets for the next Warriors game.

    What's in this Project?

  • Environment setup instructions

  • Browser automation guides using Playwright

  • Structured prompting techniques for guiding the LLM in task execution

  • Content comprehension utilizing Llama 4 Scout

  • Creating a persistent and intelligent browser agent for real-world applications

    Demo

    For a detailed explanation and demo video, visit: Blog Post and Demo Video

    Prerequisite for Running the Notebook

  • Before getting started, please make sure to setup Together.ai and get an API key from here.

    Collaborators

    Feel free to reach out with any questions or feedback!

  • Miguel Gonzalez: X | LinkedIn

  • Dimitry Khorzov: X | LinkedIn