Outline of speeches given at Reed College (OCT 2002)
and at Multimedia Internet Developers Group (SEP 2002)
The files associated with this demonstration can be found at www.reed.edu/~cosmo/2002/ant/
XML is a simplified version of SGML, which is the mother of HTML and is still in use.
well-formed vs. valid
Wellformedness is valuable because it means application doesn't have to worry about syntax. Don't have to worry about IE "guessing" what you meant to code.
DTD: limited control reduces usefulness; non-xml syntax is annoying; no embedded documentation
XML-schema: not as widely implemented, but complex, flexible, transformable (into documentation).
Samples courtesy XML resume library
All xml captures at least some meta-data by virtue of tag names.
Adding meta-data is easy. e.g. course lecture with
Adding functionality (i.e. new elements and attributes) doesn't break format, therefore increasing its life span.
Example: Cenquest course topic overview with graphic direction and audio direction within.
In xml an entity is a piece of boilerplate, analogous to a server-side-include
Unlike html where entities (e.g. "©" for ©) represent special chars
In xml use unicode (© or © for ©) to represent special chars
Side note: HTML within XML
3 ways to include HTML within XML so you can add emphasis and underline citations.
- nested elements: <note>Don't click <i>panic</i></note>
- entity escape: <note>Don't click <i>panic</i></note>
- CDATA element: <note><![CDATA[Don't click <i>panic</i></note>]]>
Unicode assigns a number (a codepoint) to every character in every current language, many past languages and leaves room for non-existent languages.
Unicode breaks the 1 byte to 1 character mapping and with it much software is broken.
Java, however is unicode-native.
Unicode Encoding
ISO-8859-1: 1 byte/char, same as Windows 1252 except missing chars 128-159
UTF-8: 1 byte/char for Western European languages, more bytes/char for other languages.
UTF-16: 2 bytes/char most all languages
Concept of parser internal representation - all encodings become the same
M.a.n.g.l.e.d. .U.T.F on linux and OS X use hexdump -C myfile to troubleshoot (e.g. em dash)
A Glyph is the visual representation of a character. Dash and minus might share the same glyph.
Unicode can't help if your fonts don't contain the glyph. Will your system be able to extract it from a font like Adobe Symbol? Maybe.
If data make sense as comprising a document, source control system can be useful repository
Version control, such as CVS, or Visual SourceSafe, offer security, versioning, differencing, and sharing.
Process on Client, as pushed by Macromedia, Microsoft?
Process on Server, as pushed by Sun, IBM, application server vendors?
Process at author-time or periodically with build system like Ant!
Open Source xml tools at Apache Jakarta and Apache XML
java-based truly cross-platform
use as build tool
Build file format, explanation of targets, tasks, and filesets
Copy Task
Copy Task with wild card
Delete Task
Replace Task
XSLT Task
XSLT transform from XML to another XML format or non-XML
template - driven (declarative rather than procedural; i.e. more about what to get done than how to do it)
<xsl:template > and <xsl:value-of>
XPATH language for finding any node within an xml doc (analogous to but much more powerful than directory paths).
Google api request as XML SOAP return
This example Ant buildfile asks google for a list of sites linked to the target sites, then transforms the information as outlined below.Direct XSLT
SOAP to linklist Google2LinkList.xsl
linklist to html LinkList2html.xsl
linklist to word html (as above but change the extension to .doc)
linklist to wml LinkList2wml.xsl
PDF and Postscript: FOP
linklist to Formatting Objects (XSL:FO) LinkList2fo.xsl
XSL:FO to PDF
SVG and JPEG: Batik
linklist to SVG LinkList2SVG.xsl
SVG to JPEG, TIFF
Macromedia Flash: javaswf2
swf2xml and xml2swf