I never thought I would hear myself saying this, but I think the world needs another file format for storing images.
Like many people in this industry, I have often had to fight the file format fight converting images endlessly from format A to format B and back again to achieve some result or work around some application limitation. More than once I have said to myself "It's only pixels darn it! How many sensible ways can there possibly be to store these things?". And now I find myself advocating the creation of another one? What gives?
Here is where my head is at. In my day job I regularly come across situations where very tight control over the presentation of textual information is required. Situations in which it is important to know for sure that information appears in a browser pretty much exactly as it appears on the paper produced through a good old fashioned publishing cycle. Situations where allowing a browser to re-arrange text and graphics to suit itself would be extremely undesirable.
Obviously, I could create images of the relevant material - perhaps in jpg or tiff and drop those into the web pages. This solves the layout problem at the expense of creating a whole bunch of other problems though. The text can no longer be seen by search engines. Browsers have nothing to work with in trying to make the underlying text copy/pasteable. Browsers have their hands tied in trying to support accessibility requirements. And so on.
Alternatively, I could drop the layout-sensitive information into a PDF and pop that onto the web page. This is better in many respects but still falls short. PDF is a page painter. Inside a PDF you tell the computer to move to X,Y. Draw some text. Move to some other X,Y. Draw some more text. And so on. By the time the text hits PDF, critical information about what text follows what other text is missing. Simply put, the flow order of the text has disappeared. This is a real problem as anyone who has attempted to extract text from PDF can tell you. For simple cases it works great. For complex cases involving, say, multiple columns, tables or footnotes... Well, let's just say that a variety of infuriatingly bad things can happen.
And thus we arrive at my tentative conclusion which is a wish list for a new file format. I want:
- a file format that is primarily an image. Something that a browser can render without any risk to the visual representation of the primarily textual information therein.
- the file format should allow HTML markup to be embedded within it so that markup & text can be carried around with the image. Applications such as search engines, copy&paste tools etc. would have access to the text as text rather than image pixels.
It is possible I guess, to do this with XMP[1], but my sense of it so far is that (a) it requires stretching the use case of XMP to breaking point (b) folks are not using XMP for this in any great numbers.
Am I nuts? Have I missed something? Can it really be that the world needs another file format?
[1] http://www.adobe.com/products/xmp/