Soup for Squeak

19 January, 2009

squeaksoup

Zulq Alam has been working on Soup, a Squeak port of Beautiful Soup, the tolerant HTML/XML parser written in Python, which is extremely useful when you need to scrape data from a web page. He has recently announced a working release and gave some example of its usage.

Zulq notes that there’s still plenty of work to do on this port:

  • No attempt is made to deal with different character sets and encodings.
  • The parser will not convert entity or char references.
  • The parser will not accept options such as whether to convert entities, which entities to convert, what to parse, etc.
  • The parser will only do HTML; there are no configurations for other XML flavours yet.

He adds that the project repository is globally writable, and he looks forward to your feedback and contributions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: