Alan Blanchet
All projects

interact — vision-grounded computer-use MCP

  • MCP
  • computer-use
  • VLM
  • LiteLLM
  • Rust
  • MIT

MCP server letting any agent act on what it sees across browser and real desktop (navigate/click/type/scroll/drag); returns text diffs of what changed instead of raw screenshots. GUI grounding fuses VLM detection + the AT-SPI accessibility tree; LiteLLM multi-provider router with cost-aware auto model-selection ranked from public benchmarks (MMMU, ScreenSpot-Pro, Video-MME); isolated software-GL sandbox so GPU/Flutter/Electron apps render. Installs into the major agent clients; files GitHub issues automatically. MIT.