[an error occurred while processing this directive]
Domain for sale!
Start Search Contents Index Links About

Bookmark utility part 5 - Importing bookmarks

Parsing and building trees.

2002 - Week 45 - Havard Rast Blok

This week will feature three quite large tasks. The concept is not difficult to grasp, however the solutions might involve more Java-code then other exercises you have seen on Remember Java. The task, or tasks, is to import the bookmarks of others. This includes the three major browsers: Internet Explorer, Netscape and Opera. They all save their bookmarks differently, however all use text files with a more or less simple format. I will reveal the formats to you. So let's start! :)

In the first three exercises you will parse the bookmarks of the browsers, and construct a Java tree structure from the information. This could be viewed in a JTree or saved to disk in a new format. Clearly, this is useful in a bookmark application.

Internet Explorer

Microsoft's browser features the simplest file format, because each link is stored in a separate file. The tree structure of the favorites is realized by the actual directory structure on your disk. The name of the file, with the extension .url is the title of the bookmarked page. Inside each file, you will find that it always has "[InternetShortcut]" on the fist line. The URL of the page is in clear text after the line which start with "URL=".

Now what you have to do, is to start your program in the base directory of the favorites. In Windows 2000, this would be C:\Documents and Settings\<username>\Favorites. You read all the files with the extension .url, and collect their name and URL. Then you gather all the subdirectories of Favorites and repeat this algorithm for each of them.

Hint: For file and directory retrieving and traversing, see the last questions of Week 50 - 2001. For storing tree structures in Java, see last week's exercises.


IEBookmarks.java

Opera

Opera may not be the most famous browser, however, it is the best alternative browser on the market, available on eight different operating systems including Windows, Linux and Mac OS. If you are not familiar with this browser, you might want to download and check it out for free from their website.

Opera also features a fairly simple way to store its bookmarks. They are stored in only one file, "opera6.adr" in the preference or user directory, rather than several files and directories, but the it is quite straight forward to parse: Each entity, URL or folder, is encoded in four or six lines. The first one stating the type "#FOLDER" or "#URL" on a single line. The next line lists the name of the entity after "NAME=". If it is a folder, this is all you need, but if you it is an URL, you need to read the next line as well, starting with "URL=".

The file is encoded with pre order traversing, which means that a folder name is stated first, and then follows all the entities, including sub folders (which are also in pre order) of that folder. This means that you first need to create a root folder in you program. If a URL is read, you just add it to the folder. If a new folder is read, this would be a sub folder of the current, recursively. When you read a single "-" (hyphen) on a line, this means that you have reached the end of a folder, and you need to step up to the previous level of the tree hierarchy.


OperaBookmarks.java

Netscape Navigator or FireFox

The Navigator features the most complex encoding for their bookmarks. They are presented in a HTML format, so you might think it would be easy to parse, but it turns out that it is a few things to look after here.

As with the other browsers, there are two entities: URLs and folders. Folders are within <H3>...</H3> tags, but make sure you only read the text between the actual tags and not what is inside them. The sub entities of a folder is surrounded by a <DT> and matching </DT> at the bottom. Finally, an URL is encoded as a link; e.g.: <A HREF="http://...">Page title...</A>. There is only one link on each line.


NetscapeBookmarks.java

Finally, this week, you could combine all the above parser into one program. And of course your program would not only read the different formats; the user would also have the choice to save bookmarks in the desired format.



site: Håvard Rast Blok
mail:
updated: 16 July 2010