Sunday, March 06, 2011

Remote Desktop Control with Automated Skype

Design.png

Introduction
Most personal computers have a dynamic IP address and firewall protection. Therefore, instant data exchange between them (particularly screen sharing) requires some mediator with a permanent IP address accessible for outbound requests from both parties and capable to transfer data between the parties. Cloud computing solutions provide such a mediator, e.g., Microsoft Azure AppFabric Service Bus. Other options are also available. Skype provides a ready-made infrastructure not only for data exchange but also for streaming of remote machine screen images free of charge. However, currently this is passive screen sharing; i.e., one Skype user can see the screen of another Skype user but cannot control the remote machine.

This article presents a way to automate Skype to achieve active screen sharing, allowing one Skype user to control the machine of another Skype user with Skype built-in screen sharing.

Background

Skype itself does not provide too many options for its automation. Some program API was announced in the Skype site [1], but is not yet available (at least for ordinary users). The only Skype API I found was the Skype4COM.dll in-process COM object. Skype4COM permits operations like management of Skype user accounts, calls, etc. But most Skype settings are not addressed, and screen sharing is left completely out of its scope. Clearly, other automation techniques should be combined with Skype4COM to achieve active screen sharing.

Skype automation in this article includes the following techniques working together:

usage of Skype4COM.dll provided by the Skype company,
control of Skype windows from outside of the process with Windows messages,
activation of Skype application menu commands (to utilize this technique, the Skype setting Tools->Options...->General settings->Visual style of the window should be set to Classic Windows),
emulation of user actions like mouse clicks and movements, and writing text to the appropriate Windows controls, and finally
injection of foreign code into a running Skype process.

Concept

The main idea of the project is to employ Skype communication infrastructure for all data exchange between the sides. The screen image of the target machine (the machine whose screen is exposed) is transferred to a remote machine as a video. On the remote machine, the Skype window containing the image is subclassed by the code injected into the Skype process. The subclassing window procedure senses mouse movements and clicks, text input, etc., and generates commands for the target machine to actually implement these actions there. The commands constitute short text messages containing the name of the action ("function") and the relevant parameters ("arguments"). For example, the command to perform mouse left button click provides coordinates of the click point as parameters. Commands are transferred between the machines as Skype text messages.

How Does It Work?

In our scenario, in order to get technical support, a Skype user shares the screen of his/her machine with another Skype user referred below to as adviser. Appropriate applications run on the adviser and user machines, namely AdviserSkypeDriver.exe and UserSkypeDriver.exe (these applications have nothing to do with kernel drivers). SkypeDrivers share several components like WindowFinderNET to find a window by its window class, HumanActionSimulation to reproduce mouse clicks and movements and text typing, and SkypeAutoHelper to actually control Skype by means of HumanActionSimulation and Skype4COM.

Both SkypeDrivers have a "Start / Attach to Skype" button. By pressing this button, the adviser and user either start or attach the driver to an already running instance of Skype and set Skype to View->Compact View mode (this is done to simplify further actions). If the driver is attached to Skype for the first time, then a warning dialog box appears requesting for permission. The driver automatically presses the "Allow access" button. Upon success of the attach operation, the drivers enable their buttons "Close Skype".

Note: Currently, the above operations are partly based on timeouts; so they take some time (and may even fail, alas), so please be patient.

Now the adviser and the user should type a string of the Skype handle of the other party to the only text box in their respective SkypeDriver forms. At this point, the user ceases his/her activity, and the adviser takes control over the user machine.

Read more: Codeproject