Code Storm: July 2005

Friday, July 29, 2005

Hide data in thestream

Streams are a facility built into NTFS that attaches metadata to files; this facility is similar to the extended attributes in an OS/2 file system. Unfortunately, using the term streams could be confusing because it's so overloaded (especially when it comes to files).

A new definition for streams

A file on an NTFS volume is composed of a primary stream and zero or more secondary streams . The primary stream is the data that you normally access, while the secondary stream(s) reside in parallel with the primary stream. Unlike the primary stream (which is unnamed), a secondary stream has a unique name. The secondary stream can hold any amount of any kind of data. However, streams are only available on the NTFS file system. So if a file with secondary streams is moved to another file system, you'll lose the secondary streams.

Secondary streams are invisible

Secondary streams are invisible to both the Windows Explorer and the console. In fact, the Explorer and the console will (incorrectly) report that the space is free. Streams are only partially implemented in Windows even though it’s been around since Windows NT 3.1. In order for the explorer to be stream aware, you need to install an Explorer add-on.

Accessing secondary streams

You can access secondary streams with the standard CreateFile, ReadFile, and WriteFile Win32 API’s or another API such as MFC’s CFile or executable that uses these functions for low-level file access. To access a secondary stream, append a colon followed by the name of the secondary stream to the file name. Stream names are case insensitive just like file names.

Here's the command’s you use when writing to the stream using the console:

echo foo > bar.txt:title

(Note about the above code: You can’t read the stream back from the console with the Type command.)

Here's the code you use when writing to the secondary stream, title, in C:

DWORD BytesWritten = 0;
const DWORD BufferLength = 3;
HANDLE hFile = CreateFile(“bar.txt:title”, GENERIC_WRITE,
0, NULL, OPEN_ALWAYS, 0, 0);
WriteFile(hFile, “foo”, BufferLength, &BytesWritten, NULL);
CloseHandle(hFile);

Here's the code you use when writing to the secondary stream in Python:

File = PrivoxyWindowOpen(“bar.txt:title”, “w”);
File.write(“foo”);

Below are some examples on reading the bar.txt:title stream. Specifically, here's code for reading the stream in C:

HANDLE hFile = CreateFile(“bar.txt:title”, GENERIC_READ, 0, NULL,
OPEN_EXISTING, 0,0);
const DWORD BufferSize = 25;
DWORD BytesRead = 0;
BYTE buffer[BufferSize];
ReadFile(hFile, buffer, BufferSize, &BytesRead, NULL);
CloseHandle(hFile);

Here's the code you use when reading the stream in Python:

file = PrivoxyWindowOpen(“bar.txt:title”, “r”)
print file.read()

The potential for misuse

If you start writing large amounts of data to secondary streams, you're going to create problems for users of the software. Even though they’ll run out of disk space, their systems will continue to report that there is free space. When you use streams in moderation, they can be invaluable tools for keeping key configuration data out of sight from the user.

But people can misuse streams. For instance, there is a potential for malicious use in worms and viruses. I don't know of any virus scanners that check secondary streams for malicious code. Like I said previously, secondary streams can store any type of data--that includes executable code. Hopefully the anti-virus companies catch this problem before hackers start using streams in their malicious code.

Update your application with BITS

BITS is a new service shipped with Windows XP and later. It allows large files to be uploaded or downloaded in the background. BITS will then notify you when the transfer is complete.

What’s it good for?

Windows XP’s version of Windows update makes heavy use of BITS to download updates in the background while the user is doing other things or while the system is unattended. I would say that following Microsoft’s lead in this is the way to go. You can easily use it to automatically download updates for your apps.

In the past developers have written their own update code or just depended on the user to check the applications web site. Using BITS to update your application gives the user more control and is also more convenient. It also reassures the more paranoid user, like myself, who get nervous when they notice the application they are using keeps going out to the Internet for no apparent reason.

Making use of BITS.

The first step in using BITS is to register a job for the queue.

//make sure to include Bits.h

//error checking has been left out

HRESULT hr = 0;

IBackgroundCopyManager* TransManager = NULL;

IBackgroundCopyJob *pCopyJob = NULL;

GUID ID;

//Your threading model here

hr = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);

//Set the impersonation level to RPC_C_IMP_LEVEL_IMPERSONATE

hr = CoInitializeSecurity(NULL, -1, NULL, NULL,

RPC_C_AUTHN_LEVEL_CONNECT,

RPC_C_IMP_LEVEL_IMPERSONATE,

NULL, EOAC_NONE, 0);

//Create an IBackgroundCopyManager instance.

hr = CoCreateInstance(__uuidof(BackgroundCopyManager), NULL,

CLSCTX_LOCAL_SERVER,

__uuidof(IBackgroundCopyManager),

(void**) &TransManager);

hr = TransManager->CreateJob(L”DemoJob”,

BG_JOB_TYPE_DOWNLOAD,

&ID, &pCopyJob);

//Add files to the job

hr = pCopyJob->AddFile(L"http://MyServer/Path/MyFile.Ext", L"c:\\Path\\MyFile.Ext");

One limitation with BITS that I think is pretty severe is that the content on the server cannot be dynamic. So for instance you can’t specify a URL like this, http://www.myserver.com/Update.asp?CurrentVersion=1.0 and have the server give you the right version or have the server notify you that you’re up to date. One simple solution to this is to have your installer setup the new job and delete the old job. So whenever an update is installed the installer removes the old job and creates a new job pointing at the future file name.

Checking the status of a job.

There are a number of ways to check the status of a job. You could poll for the jobs status and perform some action based on the status of the job or you can register to receive a notification of when the job has succeeded or failed. You can actually go one step further and register a program to be executed when the job succeeds.

The last option probably makes the most sense for updating an application. In our example application we have the installer registering a job with BITS and also registers a program with a command line argument consisting of the job ID, to be run when the job succeeds. When our transfer job succeeds our little program runs and retrieves the job information, including the path of the installer and the program runs the installer. Walla, our application is updated.

//Continuing from previous code.

WCHAR wJobID[48];

WCHAR wParameters[257];

IBackgroundCopyJob2* pCopyJob2 = NULL;

WCHAR *wProgram = L”C:\program files\MyApp\UpdateUtil.exe”;

StringFromGUID2(ID, wJobID, 48);

wsprintf(wParameters, L"%s %s", wProgram, wJobID);

pCopyJob->QueryInterface(__uuidof(IBackgroundCopyJob2), (void**)&pCopyJob2);

hr = pCopyJob2->SetNotifyCmdLine(pProgram, wParameters);

//remember to release you COM objects

pCopyJob2->Release();

pCopyJob->Release();

TransManager->Release();

Wrap up.

One of the first useful axioms I learned when I got into this industry was, “never code what you can steal”. This isn’t exactly stealing but it’s in the same vein. The less code you write, the less code you have to test, and the less code you have to maintain.

Wednesday, July 27, 2005

Taking Advantage of COM with Python

Microsoft's Common Object Model (COM) implements a cross-language object model.COM is the basis for most of Microsoft's recent technologies, such as WMI and Microsoft Office. COM allows you to write code in one language, like C++, and use it in another, such as Python or VB. Python is capable of both using COM objects and creating COM objects. It's been around, in one form or another, for a number of years.

Python has excellent support for COM. In fact, Python's support for COM is so good that COM can be used as a legitimate way to extend Python. If you need to write a Python extension that will be running exclusively on Windows, strongly consider using COM. COM objects can be reused, providing flexibility and reusability, as well.

To use COM in Python, you must install the Python Win32 extensions, which can be downloaded from http://sourceforge.net/projects/pywin32/. The pywin32 package comes in a nice installer package, so compiling isn't necessary.

Using a COM component

The steps for using a COM object is the same for all languages that support COM. First you must instantiate the object. The instantiation can be directly incorporated in the language like Jscript's ActiveXObject function or it can be supported via a library, like Pythons Dispatch function. The only difference between the two approaches is that with library based support you have to import the library code first. Once the object is instantiated, you can use the properties, fields and methods just like they were part of normal Python objects.

To start, run the makepy utility on the component that you intend to use. This isn't a mandatory step, but it has some nice advantages. Specifically, named constants defined in the typelib will be available to you when you import the constants class, and if you're using the PyWin IDE, you'll get intellisense for the properties and methods of the component.

The easiest way to run makepy is through the PyWin IDE. Launch PyWin, and under the tools menu, choose makepy. This opens a simple little dialog that contains a list of all the registed COM components on the system.

You can use a COM component without running the makepy utility, but you'll then have to use numeric values in place of the named constants that you normally use for parameters and such.

Using a COM component in Python is nearly as simple as using the component in VB.First you need to import the COM support module:

import win32com.client as w32c

If you ran the makepy utility on the component, then the constants class has all of the named constants defined in the typelib:

from win32com.client import constants

Once the module is imported, you can instantiate the COM component simply:

Obj = w32c.Dispatch(r"Object.name")

To instantiate a COM component, you need to know the class name of the component. You can find it in the documentation for the component or by looking in the registry for the object, if you know the GUID. However, If you run the makepy utility on the component then both the name of the component and its GUID will be listed in the generated python file.

Once you've instantiated a component, you have access to all the properties and methods of that component. For example the following code calls the load and transform methods of the MSXML control:

xmldoc = w32c.Dispatch(r"Msxml2.DOMDocument")

rval = xmldoc.load(xml)

xmldoc.transformNode(xsldoc)

The file transform.py that accompanies this article shows a complete example that demonstrates how to use the MSXML COM component. To use that example, you need to have the MSXML components installed. (MSXML is installed along with Internet Explorer, so it is usually present on most systems.)

Using COM, you can drive other applications such as Microsoft Office. For instance, Python can be combined with MS Word to create document management systems. Python COM integration is extremely powerful in this regard. On most Windows systems there are numerous COM objects installed. You can browse through them using the makepy utility. Anything that shows up in the makepy utility can be used in Python.

Creating a COM component

Let's look now at what may be the coolest part of Python's COM integration.

Python comes with a large library of modules that do not have equivalents in other languages. By using Python to create COM objects, it is easy to expose some of these modules to other languages.

Creating COM objects in other languages, such as C++, requires a lot of code and knowledge. It also requires that you be familiar with IDL, which is a seperate language. So, to create a COM object in C++, you actually must know two languages.

Contrast that to creating a COM object in Python.

class MD5Obj:

_public_methods_ = ["Update", "Hash", "Hex"]

_reg_progid_ = "Python.MD5"

_reg_clsid_ = "{895E2BD0-0DA7-11D9-9669-0800200C9A66}"

MD5 = None

def __init__ (self):

self.MD5 = md5.new()

return

def Update (self, val):

self.MD5.update(val)

return

def Hash (self):

return self.MD5.digest()

def Hex (self):

return self.MD5.hexdigest()

Notice, that there are no non-Python constructs, no IDL, and no funny preprocessor macros —just plain old Python. The code wraps the MD5 module and exposes the methods that it needs. Something else to notice is that this class does not inherit from another class some magical COM class. What makes this a COM class is the three fields, _public_methods_, _reg_progid_, and _reg_clsid_. _public_methods_ informs informs pyWin which methods you want publicly exposed. Usually you want only the methods that are part of you public interface listed in this field. _reg_progid_ is just a string that gives the object a semi-unique name, that can be used to instantiate the object. The _reg_clsid_ field is a GUID (Globally Unique ID) that is the actual unique name of the object. This is also called the class ID. The class ID can be used to instantiate an object but there are various reasons not to do this. It's better to use the program ID for this. There are a number of utilities to create GUID's available on the Internet as well as part of various development environments.

To have the COM object work in a development environment, you have to register it. The easiest way to do this is just create a little code that register's the object when the script file is run:

if __name__=='__main__':

import win32com.server.register

win32com.server.register.UseCommandLine(MD5Obj)

That's it. Just run the script and the object is registered. No seperate utility, like regsrv32, to worry about. You can't get much simpler than that.

To use the object from another language, like JScript, is just like using any other COM object.

var md5 = new ActiveXObject("Python.md5");

md5.Update("Th\is is a test");

WScript.Echo(md5.Hash());

Wscript.Echo(md5.Hex());

The full example is available in the MD5Obj.py script.

Wrapping Up

Even though this was a simple example, everything shown here holds true for complex object hiarchies. You have full access to COM's facilities. You can use attributes, read only attributes, methods, and so on. You've got access to it all as well as the full power of Python. It's all easy as (wait for it...) pie.

Consider using Python to prototype your objects before jumping into C++. This gives you the chance to evaluate your object interfaces without having to go through the build and register step each time you change something.

Who knows, you may find that there is no reason to implement the objects in C++ after all.

SIDEBAR:

Using Pyscript

Windows Common Object Model (COM) isn't the only way to integrate Python with Windows. You can also use PyScript.

PyScript integrates Python with the Windows Scripting Host (WSH), Windows' official scripting engine. WSH is interesting because it's scripting language-agnostic, and because it provides an API for adding new scripting languages to the system.

Using WSH, Python has full access many of the core Windows features, such as such as mapping drives, controlling network printers, and remote scripting. WSH also allows Python to be used in login scripts. And since WSH has poor support for creating graphical user interfaces (GUIs), Python is a nice complement. By using pyscript you get everything that the WSH provides as well as all the capabilities and just plain fun that Python provides.

Python's Win32 library has WSH integration, but it's not enabled by default. To enable pyscript you need to run the pyscript.py file. It's default location is C:\Python23\Lib\site-packages\win32comext\axscript\client\pyscript.py. Running this script simply registers Python with WSH. To create a pyscript file, use the pys extension intead of the py or pyw extensions.

(Because of the many exploits involving vbscript and WSH, the pyscript.py registration script does not enable explorer associations. Considering that many network admins are actively disabling explorer associations with .vbs and .js files, this is a smart default. If you would like to have explorer integration, I have included a reg file that will add the explorer integration for you. If you have an odd setup or use an older version of Python you will need to edit the reg file.)

The example script, example.pys, first creates an MD5 object using Python's MD5 module.

import md5

m = md5.new()

m.update("%systemroot%\\system32\\wscript.exe")

Next, it creates a link, on the desktop to the wscript.exe application. The MD5 hash is appended to the end of the link description.

WshShell = WScript.CreateObject("WScript.Shell")

Desktop = WshShell.SpecialFolders("Desktop")

link = WshShell.CreateShortcut(Desktop + "\\Wscript.lnk")

link.Description = "Link to the Windows Scripting Host" + " " + m.hexdigest()

link.IconLocation = "wscript.exe,1"

link.TargetPath = "%systemroot%\\system32\\wscript.exe"

link.Save()

Sunday, July 24, 2005

Here are two great books on Common lisp.

Practical Common Lisp

On Lisp

They're both available for download.

An interesting Python function for working with Subversion

This is a function I wrote for working with SVN repositories. It works like os.walk.

import pysvn

"""
Utilities for working with Subversion repositories

"""


def walk (targeturl):
  """
  Walk through an subversion repository.

  """
  dirStack = []
  theEnd = 0
  client = pysvn.Client()
  client.callback_get_login = prompt
  client.callback_notify = notify
  client.callback_get_log_message = logmessage

  ls = client.ls(targeturl, recurse = False)
  files = []
  dirs = []
  for item in ls:
      if item["kind"] == pysvn.node_kind.dir:
          dirStack.append(item)
          dirs.append(endOfPath(item["name"]))
      if item["kind"] == pysvn.node_kind.file:
          files.append(item["name"])
  yield (targeturl, dirs, files)

  while len(dirStack):
      files = []
      dirs = []
      base = dirStack[theEnd]
      del dirStack[theEnd]
    
      currentDir = base["name"]
      ls = client.ls(currentDir, recurse = False)
      for item in ls:
          if item["kind"] == pysvn.node_kind.dir:
              name = endOfPath(item["name"])
              dirs.append(name)
              dirStack.append(item)
          if item["kind"] == pysvn.node_kind.file:
              name = endOfPath(item["name"])
              files.append(name)

      yield (currentDir, dirs, files)


    
  return

Obligatory first post

I am a software engineer located in western Washington. I tend to use a lot of different programming languages in my development projects. Since I enjoy learning new languages, this works out well. I also enjoy experimenting with new gadgets. For instance this post is being written from an Ogo, which is a little email and IM only device.

Code Storm

Blog Archive

Links

About Me