Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

User:JesseW/wdb format

●User page ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●User contributions ●User logs ●View user groups ●Upload file ●Special pages ●Permanent link ●Page information ●Get shortened URL ●Download QR code Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia < User:JesseW

Yesterday, I was presented with a file format reverse engineering puzzle. My mom had a old address book database that she wanted to recover. It had been created in Microsoft Works 6.0 (I think) database format (the files with the .wdb extension). My first thought was to run strings(1) on it, and see if the data was in connected ascii text. This worked, mostly, as the text did show up; but it was not organized into fields and records, which would make it harder to recreate a database from it. When she explained that she was going to need the whole thing, not just a few entries, I said I would work on recovering more of the format.

My first attempts at finding a wdb parser involved a web search. Google can parse wdb files - but they don't publish their translation programs, and I didn't want to upload our personal address book to the web and wait for google to index it. I found various commerical programs that claimed to be able to recover wdb files, but I wanted to understand the format, not just buy a product which, might, or might not, work. I was unable to find any discussion in the free software/open source world on the details of the format(which is the main reason I am writing this right now). I did find various tutorials on reverse engineering, including a nice page on reversing file formats by John Rittenhouse, and the Reverse Engineering Wikibook. Both of these helped give me some pointers in my work. I also found a few other wdb files to examine via google, which I used a little in my work on the format.

Since I was unable to find a already written tool to do the job, I decided to investigate the format and write one myself. I used Python, along with Emacs, (and a little use of hexdump) as my tools of choice, as they are what I am most familiar with. I examined the files fruitlessly for a while - identifying various often used characters, but not getting a sense of what they meant. The breakthrough happened when I realized there was a recurring, 10 byte header before each of the ascii text strings representing the values of the fields(which were terminated by a NULL). This header always began with a \x0f byte and NULL, followed by 4 data bytes alternated with NULLs.

The format, as I currently understand it, is as follows:

A header, from the beginning of the file, ending with \x25\x00\xf2 and 244 NULL bytes
A 2nd header-like piece beginning with \xffT - i.e. \xff\x54 and extending for 4096 bytes, that contains column/field names and other things, and seems to begin at byte position 6144, i.e. 2**12+2**11.
Each field value has a 10 byte header, with the following format: {type byte} {type byte, part 2} {data byte 1} \x00 {db 2} \x00 {db 3} {db 3, part 2} {db 4} \x00. data byte 2 is the field number, data byte 3 is the record number (adding data byte 3, part 2 when it goes over 256 records). I don't know what data bytes 1 or 4 do; although in the address book, data byte 1 has 35 different values, and data byte 4 is nearly always \x03, most of the rest of it is \x02, and \x00 in only three lines.
- The type byte's I've seen are:
  
  "6\x00" (0x36)
  
  which serves as a continuation line. I'm not sure when it is used.
  
  "\x0f\x00" (0x0f)
  
  strings, null terminated.
  
  "\x0e\x00" (0x0e)
  
  standard double float (I think, although this may be wrong...)
  
  "[T" (0x5b)
  
  propritary varient of standard float. The exponent, rather than being 7 bits, is extended to 11 bits, shrinking the mantissa. If this is taken into account, integers expressed in this form work. However, decimals do not. I don't yet know why.

I've written a program to parse the format and translate it to CSV format; it is available here. If you know anything else about the format, or have any questions, please let me know. Any feedback gratefully accepted.

This tool works fine for me! thanks for the job. --eldraco (talk) 00:01, 11 May 2008 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=User:JesseW/wdb_format&oldid=639533853" ●This page was last edited on 25 December 2014, at 00:58 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view