Side project: generating clickmaps as meta-data for screenshots01 Jan 2012
A long time ago, my mom persuaded me into playing Farmville so she could have more Farmville neighbors and gain some reward from that. Since Farmville was pretty repetitive, I created an automated Farmville player in a VM that would play the game 24/7, much to the dissatisfaction of my mother and other people, because the autoplayer blasted away their puny highscores in less than a week.
For a paper I submitted not long ago, there was also the need for this kind of automation. After submission, I realized that my automation could be improved if I could somehow determine which locations on screen were hotspots for an application with regard to clicking. For example, a menu bar obviously has special meaning to any application, and so does a textfield. On the other hand, the title bar or plain blank space does not react to clicks at all.
The main idea behind this "clickmap" discovery, is to observe the shape of the mousepointer as it moves along the screen. The X11 protocol defines a set of standard shapes that should be implemented by any pointer "theme", collectively called the "X11 cursor font" (XCursorFont). Each mouse pointer (cursor) shape is then a glyph from that font. Each of those shapes is represented by a unique color in the output "clickmap" image.
There are some problems with this setup: the current shape of the mouse pointer is determined by the top-most application over which the mousepointer hovers. That means that the application is responsible for changing the mouse pointer shape and if the application is slow, there is a delay between the time the cursor is positioned and the time the cursor shape is updated. Another problem is that for each pixel on the screen, a X11 protocol message must be sent to position the cursor, and then another message to query the cursor shape, followed by receiving a message containing the cursor shape ID. All of this makes that imaging the clickmap is very slow...
A final problem is that the X11 protocol itself does not provide a way to query the current cursor shape. Luckily, there is an extension to the X11 protocol, called the Xfixes extension, which allows the retrieval of the current mouse pointer shape (for applications that take screenshots, or remote desktop software).
All in all, creating a "clickmap" of the current screen is not very practical because it is very slow. If the clickable hotspots on screen change (e.g. animated menus or whatever) while the clickmap is being imaged, the output image is trash.
Nevertheless, I present you with the code in case you want to play around with it, as well as a screenshot from a portion of my screen with its accompanying clickmap (both sized 1024x768). Successive pixels were queried with a delay of 50 microseconds. The entire imaging took 3m43s...