Tag: XML

  • Soup for Squeak

    squeaksoup

    Zulq Alam has been working on Soup, a Squeak port of Beautiful Soup, the tolerant HTML/XML parser written in Python, which is extremely useful when you need to scrape data from a web page. He has recently announced a working release and gave some example of its usage.

    Zulq notes that there’s still plenty of work to do on this port:

    • No attempt is made to deal with different character sets and encodings.
    • The parser will not convert entity or char references.
    • The parser will not accept options such as whether to convert entities, which entities to convert, what to parse, etc.
    • The parser will only do HTML; there are no configurations for other XML flavours yet.

    He adds that the project repository is globally writable, and he looks forward to your feedback and contributions.