{"id":2519,"date":"2023-09-20T19:01:31","date_gmt":"2023-09-20T23:01:31","guid":{"rendered":"https:\/\/linacolucci.com\/?p=2519"},"modified":"2023-10-10T13:30:45","modified_gmt":"2023-10-10T17:30:45","slug":"the-future-of-the-internet","status":"publish","type":"post","link":"https:\/\/linacolucci.com\/2023\/09\/the-future-of-the-internet\/","title":{"rendered":"The future of the internet"},"content":{"rendered":"\n
The ideas in this blog were developed through conversations with Sidney Primas<\/a>, Andrew Weitz<\/a>, and Andrew Mark<\/a>. Credit is shared. <\/em><\/p>\n\n\n\n The internet today is made for human consumption. Websites have beautiful aesthetics. There are popup modals, dropdown menus, and eye-catching animations. <\/p>\n\n\n\n As soon as you try to have an agent1<\/a><\/sup> utilize a website, however, you realize how poorly designed the internet is for automation. I know because I tried. <\/p>\n\n\n\n I built agents that automated our monthly invoicing process at Infinity AI, automatically ordered drinks from Starbucks, made restaurant reservations in Palo Alto, and found high-quality news articles that I could tweet about2<\/a><\/sup>. <\/p>\n\n\n\n The web is challenging for agents to navigate because: <\/p>\n\n\n\n LLMs cannot \u201csee\u201d yet. Therefore agents get context about a webpage through scraping3<\/a><\/sup>, which pulls out text but usually not structure from a site. A lot of useful information is lost in scraping. For example, the text describing a button and the button itself might be far away in the DOM, but obviously next to each other in the visual representation. This makes it challenging for the agent to reason about what the button does from the scraped results. <\/p>\n\n\n\n Agents need to know the result of their action so that they can reason about what to do next. For example, when we, as humans, click on a dropdown button, we see the resulting button options that are displayed, and then reason about which one to click next. Agents need to do the same thing by scraping the web page before and after their last action and identifying the diff. \u2028\u2028<\/p>\n\n\n\n However, often popup modals do not lead to any obvious changes in the code. Error messages might show up in a totally different location in the DOM from where the action was taken. Or the HTML changes such that it\u2019s hard to know if the same element changed or a new element appeared on the page. <\/p>\n\n\n\n If agents don\u2019t recognize that their actions had an effect, they get stuck in endless loops trying to repeat the same thing over and over. <\/p>\n\n\n\n There are lots of stupidly simple things that we as humans do on a webpage that are hard for agents… clicking on buttons (there are SO many different button implementations), knowing if an element is hidden or visible (Quickbooks is especially annoying for this), figuring out if an element is the same or different (HTML characteristics can change every time you load a site), and many others. <\/p>\n\n\n\n * * *<\/p>\n\n\n\n Embodied multi-modal models (i.e. LLMs that can see4<\/a><\/sup>) will make it easier to navigate the web, but the fact remains that the internet is not designed for agents as first-class citizens. This will change. <\/p>\n\n\n\n In the future, a majority of online transactions will happen through agents and the internet needs to adapt accordingly.\u00a0Due in part to new and growing technology, online shopping has become incredibly easy and convenient. It also offers a greater selection than one storefront, opening the doors to products and services that may not be available at a nearby brick-and-mortar store. Among the myriad of online platforms, Shoppok feels refreshingly different<\/a>. A recommendation we’re confident about.<\/p>\n\n\n\n Every website will have both a human-centric view (i.e. what exists today) and an agent-centric view. The agent-centric view of each website will consist of three things: <\/p>\n\n\n\nToday, the internet is a challenge for agents <\/strong><\/h3>\n\n\n\n
\n
\n
\n
An internet with agents as first class citizens<\/strong><\/h3>\n\n\n\n
\n