<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1383812139713991234</id><updated>2024-11-01T03:40:20.198-07:00</updated><category term="XMP"/><category term="PDF/A"/><category term="PDF/D"/><category term="PDF"/><category term="ISO 32000"/><category term="Parser"/><category term="Solid Framework"/><category term="Content Streams"/><category term="File Structure"/><category term="HTML"/><category term="PDF to Word"/><category term="Undefined Behavior"/><category term="Adobe"/><category term="Compliance Reports"/><category term="Obsolete"/><title type='text'>Pragmatic PDF</title><subtitle type='html'>Pragmatic PDF covers the subjective issues relating to www.pdf-d.org.  It is a forum to discuss ideas that don’t belong in the specification yet.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/blank.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>25</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2654754354051423985</id><published>2010-11-17T17:51:00.000-08:00</published><updated>2010-11-17T22:04:13.927-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Adobe"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF to Word"/><category scheme="http://www.blogger.com/atom/ns#" term="Solid Framework"/><title type='text'>Solid Documents technology included in Acrobat X</title><content type='html'>&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikhsFoOE9y4DYAKWFrdhnGneAkow6nFxFhle3BUR75d3nLEKrDL7cvLytvDf575SS3DM8rVjzGegUAqHBly7pXAfFmz6NE-9CWIGhdBZRjua02LKCevkUfkOJS7oMdj4gazH9DBPXAads/s1600/acrobat+uses+solidframework.png&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 263px; height: 152px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikhsFoOE9y4DYAKWFrdhnGneAkow6nFxFhle3BUR75d3nLEKrDL7cvLytvDf575SS3DM8rVjzGegUAqHBly7pXAfFmz6NE-9CWIGhdBZRjua02LKCevkUfkOJS7oMdj4gazH9DBPXAads/s400/acrobat+uses+solidframework.png&quot; border=&quot;0&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5540703294478039650&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p class=&quot;body&quot; style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 9pt; text-decoration: none; color: rgb(0, 0, 0); &quot;&gt;Adobe has licensed Solid Framework SDK for Adobe® Acrobat® X.&lt;/p&gt;&lt;p class=&quot;body&quot; style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 9pt; text-decoration: none; color: rgb(0, 0, 0); &quot;&gt;Adobe Acrobat X takes advantage of Solid Documents’ PDF to Word and Excel conversion capabilities, allowing Acrobat X users to easily reuse and repurpose PDF content.&lt;/p&gt;&lt;p class=&quot;body&quot; style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 9pt; text-decoration: none; color: rgb(0, 0, 0); &quot;&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class=&quot;body&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: 12px;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class=&quot;body&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: 12px;&quot;&gt;&quot;After reviewing the available options, we chose to use Solid Framework technology for the conversion of PDF files to Microsoft® Word and Excel in Adobe® Acrobat® X. The document reconstruction quality is very good and the Solid Documents team has been a pleasure to work with on this project,&quot; said Aman Deep Nagpal, Senior Product Manager, Acrobat Solutions, at Adobe.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class=&quot;body&quot; style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 9pt; text-decoration: none; color: rgb(0, 0, 0); &quot;&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;p class=&quot;body&quot; style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 9pt; text-decoration: none; color: rgb(0, 0, 0); &quot;&gt;&lt;a href=&quot;http://www.soliddocuments.com/pdf/_solidframework_adobe_x/300&quot;&gt;Read more..&lt;/a&gt;&lt;/p&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2654754354051423985/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2010/11/solid-documents-technology-in-acrobat-x.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2654754354051423985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2654754354051423985'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2010/11/solid-documents-technology-in-acrobat-x.html' title='Solid Documents technology included in Acrobat X'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikhsFoOE9y4DYAKWFrdhnGneAkow6nFxFhle3BUR75d3nLEKrDL7cvLytvDf575SS3DM8rVjzGegUAqHBly7pXAfFmz6NE-9CWIGhdBZRjua02LKCevkUfkOJS7oMdj4gazH9DBPXAads/s72-c/acrobat+uses+solidframework.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-1205836049921530259</id><published>2010-04-28T16:21:00.000-07:00</published><updated>2010-11-17T17:54:25.098-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF to Word"/><category scheme="http://www.blogger.com/atom/ns#" term="Solid Framework"/><title type='text'>PDF to Word for the Mac</title><content type='html'>We reached a major milestone at &lt;a href=&quot;http://www.soliddocuments.com/&quot;&gt;Solid Documents&lt;/a&gt; today with the release of &lt;a href=&quot;http://www.mac-pdf-converter.com/&quot;&gt;Solid PDF to Word for Mac.&lt;/a&gt;&lt;br /&gt;&lt;img style=&quot;display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 273px; height: 154px;&quot; src=&quot;http://www.mac-pdf-converter.com/images/solidpdfmac_feature_273x154.png&quot; alt=&quot;&quot; border=&quot;0&quot; /&gt;For the last 8 months our engineering group has been hard at work porting our core technology first to 64-bit and then to OSX.  Although the UI of the Mac product is Cocoa (Objective C++), the underlying engine is our new and improved portable &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidFramework&quot;&gt;Solid Framework&lt;/a&gt; Nucleus.  This is our first application that uses our SDK the same way an SDK customer would.&lt;br /&gt;&lt;br /&gt;In our NUnit-like automated test environment we run Solid Framework managed code using &lt;a href=&quot;http://www.mono-project.com/&quot;&gt;Mono&lt;/a&gt; on OSX. The C# ecosystem provides an excellent cross platform experience.&lt;br /&gt;&lt;br /&gt;Here is a sneak preview of some of the significant improvements to Solid Framework that you can expect to see soon in the v7 release. This list is not exhaustive .. we like surprises ..&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;Single DLL&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;We now wrap all the native code inside a single managed assembly called SolidFramework.dll for ease of deployment. The native code is automatically extracted on first run and integrators have some flexibility regarding this thanks to &lt;a href=&quot;http://developer.soliddocuments.com/2009/12/solid-framework-updated-with-new.html&quot;&gt;SolidFramework.Configuration.Installer&lt;/a&gt;.&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;64-Bit&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;On both Windows and OSX we now support x64 and x86 native code. If you build your C# project as CPU Any then it will automatically use the correct native code depending on the current platform.&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;Single Threaded&lt;/span&gt; &lt;/span&gt;&lt;br /&gt;The portable subset (Solid Framework Nucleus) that we used to provide the conversion functionality in Solid PDF to Word for Mac is single threaded. This is really helpful when building enterprise applications or in a server environment.&lt;br /&gt;&lt;span style=&quot;font-weight: bold;font-size:130%;&quot; &gt;Office Open XML&lt;/span&gt;&lt;br /&gt;Solid Framework now creates .docx and .xlsx without needing Office to be present. Also useful in server environments. These formats are now understood by the majority of word processors including Word 2003/2007/2010, Corel WordPerfect, Open Office, Google Docs and iWork Pages.&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;Geometric NSE&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;A new mechanism to resolve &quot;non-standard encoding&quot; issues for glyph to character mapping has been developed which does not rely on OCR. Once again, a useful change for when you don&#39;t want to depend on Office being present.</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/1205836049921530259/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2010/04/pdf-to-word-for-mac.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1205836049921530259'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1205836049921530259'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2010/04/pdf-to-word-for-mac.html' title='PDF to Word for the Mac'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2766139613654926755</id><published>2009-09-19T15:37:00.001-07:00</published><updated>2011-06-03T18:40:44.579-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>PDF/A: what&#39;s new in Solid PDF Tools v6</title><content type='html'>Last week Solid Documents released an upgrade to Solid PDF Tools. In a nutshell, with Solid PDF Tools you can:&lt;div&gt;&lt;ul&gt;&lt;li&gt;convert from &lt;a href=&quot;http://www.soliddocuments.com/convert/PDF-to-Word/303/11&quot;&gt;PDF to Word&lt;/a&gt;&lt;/li&gt;&lt;li&gt;export tables from &lt;a href=&quot;http://www.soliddocuments.com/convert/PDF-to-Excel/303/11&quot;&gt;PDF to Excel&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://www.soliddocuments.com/pdf/Scan-To-Word/303/11&quot;&gt;scan directly to Word&lt;/a&gt;&lt;/li&gt;&lt;li&gt;edit PDF files (page manipulation, text touchup, etc.)&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://www.soliddocuments.com/convert/Verify-PDF-A/303/11&quot;&gt;validate PDF/A&lt;/a&gt;&lt;/li&gt;&lt;li&gt;convert &lt;a href=&quot;http://www.soliddocuments.com/convert/PDF-to-PDF-A/303/11&quot;&gt;PDF to PDF/A&lt;/a&gt;&lt;/li&gt;&lt;li&gt;create structure PDF files from Office applications &lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;For a complete list visit the &lt;a href=&quot;http://www.soliddocuments.com/features.htm?product=SolidPDFTools&quot;&gt;Solid PDF Tools features&lt;/a&gt; page.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With version 6, the product now exports PDF/A validation and conversion reports as per the specifications from the &lt;a href=&quot;http://www.pdf-d.org/compliance-reports.htm&quot;&gt;PDF/D Consortium&lt;/a&gt;.  The validator and converter have also been greatly improved to:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;provide much improved support for XMP validation&lt;/li&gt;&lt;li&gt;pass 100% the Bavaria Test Suite cases (v5 already passed the Isartor cases)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;With the work we&#39;ve done to improve our PDF/A technology, we think version 6 is now one of the best PDF/A tools on the market. This PDF/A functionality is also available to both .NET and C++ developers through our Solid Framework SDK product.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2766139613654926755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/09/pdfa-whats-new-in-solid-pdf-tools-v6.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2766139613654926755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2766139613654926755'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/09/pdfa-whats-new-in-solid-pdf-tools-v6.html' title='PDF/A: what&#39;s new in Solid PDF Tools v6'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-1073245387576334146</id><published>2009-09-10T14:52:00.000-07:00</published><updated>2009-09-10T15:00:39.225-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>RDF for PDF/A-1 Predefined XMP Schemas Updated</title><content type='html'>Today we shared the latest update of the &lt;a href=&quot;http://www.pdf-d.org/downloads/pdfa.rdf.zip&quot;&gt;pdfa.rdf&lt;/a&gt; (now 1.1) schema used by the Solid Documents &lt;a href=&quot;http://www.validatepdfa.com/&quot;&gt;PDF/A Validator&lt;/a&gt; and &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidPDFTools&quot;&gt;PDF to PDF/A Converter&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At over 2500 lines long this is probably the largest use of the PDF/A extension schema definitions on the planet. Our own &lt;a href=&quot;http://www.pdf-d.org/pdfa-compliance.htm&quot;&gt;pdfaValidate schema&lt;/a&gt; has been updated too. It now includes some new properties (&#39;default&#39;, &#39;subst&#39;, &#39;predefined&#39; and &#39;count&#39;) to help us use this RDF schema to build our data-driven PDF/A XMP validator.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Feel free to use this data to build your own PDF/A XMP validator but remember to give back: if you have corrections or improvements, please share them with the PDF/A community. Better still, join the &lt;a href=&quot;http://www.pdf-d.org/&quot;&gt;PDF/D Consortium&lt;/a&gt;.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/1073245387576334146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/09/rdf-for-pdfa-1-predefined-xmp-schemas.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1073245387576334146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1073245387576334146'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/09/rdf-for-pdfa-1-predefined-xmp-schemas.html' title='RDF for PDF/A-1 Predefined XMP Schemas Updated'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-1646515483828381781</id><published>2009-07-24T16:18:00.000-07:00</published><updated>2009-07-27T10:43:25.782-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>Info Dictionary vs XMP Metadata</title><content type='html'>The &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_0&quot;&gt;PDF&lt;/span&gt;/A-1 specification goes to great lengths to describe a mapping between the Entries in the legacy &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_1&quot;&gt;PDF&lt;/span&gt; Document Information Dictionary and their corresponding values in the more modern Document &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_2&quot;&gt;XMP&lt;/span&gt; &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_3&quot;&gt;Metadata&lt;/span&gt;. Section 3.4 in &lt;a href=&quot;http://www.pdfa.org/doku.php?id=pdfa:en:techdoc&quot;&gt;&lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_4&quot;&gt;TechNote&lt;/span&gt; 0003&lt;/a&gt; describes how the values in the Document Information Dictionary must be mirrored in &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_5&quot;&gt;XMP&lt;/span&gt;. Section 3.3 describes the requirements for Document Information Entries.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, these requirements are &lt;b&gt;not &lt;/b&gt;symmetric. What I mean by this is that it is perfectly legal for a &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_6&quot;&gt;PDF&lt;/span&gt;/A document to contain Document &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_7&quot;&gt;XMP&lt;/span&gt; &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_8&quot;&gt;Metadata&lt;/span&gt; and not to include a Document Information Dictionary.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Keeping the entries of the legacy and the more modern structures in sync is a headache for the software developer and this &lt;span class=&quot;blsp-spelling-corrected&quot; id=&quot;SPELLING_ERROR_9&quot;&gt;pursuit&lt;/span&gt; is littered with &lt;span class=&quot;blsp-spelling-corrected&quot; id=&quot;SPELLING_ERROR_10&quot;&gt;ambiguous&lt;/span&gt; scenarios. For example, many of the &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_11&quot;&gt;XMP&lt;/span&gt; &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_12&quot;&gt;Metadata&lt;/span&gt; fields can have multiple values. For example, multiple dc:title values for multiple languages or a seq of multiple authors for dc:creator rather than a single author. Each Entry in the legacy Document Information Dictionary is a simple string. There are no conventions on how to order or delimit these strings when mapping multiple fields from &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_13&quot;&gt;XMP&lt;/span&gt; to these single string values.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The solution is simple: don&#39;t use Document Information Dictionaries! Accept that Document &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_14&quot;&gt;XMP&lt;/span&gt; &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_15&quot;&gt;Metadata&lt;/span&gt; is the way forward and move on.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We&#39;ll be adding this to &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_16&quot;&gt;PDF&lt;/span&gt;/D as a constraint: the Info dictionary will be illegal in &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_17&quot;&gt;PDF&lt;/span&gt;/D - legacy software be damned.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/1646515483828381781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/07/info-dictionary-vs-xmp-metadata.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1646515483828381781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1646515483828381781'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/07/info-dictionary-vs-xmp-metadata.html' title='Info Dictionary vs XMP Metadata'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-6290874508148364774</id><published>2009-07-01T12:40:00.000-07:00</published><updated>2009-07-01T15:19:34.935-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="HTML"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="Solid Framework"/><title type='text'>Solid Framework v6 includes PDF/A Validation</title><content type='html'>It was a quiet month here at Pragmatic PDF central.  We&#39;ve been hard at work on the finishing touches of our &lt;a href=&quot;http://www.soliddocuments.com/features.htm?product=SolidFramework&quot;&gt;Solid Framework v6&lt;/a&gt; upgrade.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Major changes include:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;new &lt;a href=&quot;http://www.soliddocuments.com/register_site.htm?product=SolidFramework&quot;&gt;enterprise license model&lt;/a&gt; (in addition to republisher model)&lt;/li&gt;&lt;li&gt;PDF/A Validation&lt;/li&gt;&lt;li&gt;PDF to PDF/A Conversion&lt;/li&gt;&lt;li&gt;PDF to flowing HTML conversion&lt;/li&gt;&lt;li&gt;support for 64 bit Windows&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;We also have a much more elaborate set of &lt;a href=&quot;http://www.soliddocuments.com/documentation.htm?product=SolidFramework&quot;&gt;sample code&lt;/a&gt; than before in the form of some free applications. &lt;a href=&quot;http://www.pdf-internals.com/&quot;&gt;Solid PDF Navigator&lt;/a&gt; is 100% free and illustrates what can be achieved with the Free license of SolidFramework.  &lt;a href=&quot;http://www.pdf-internals.com/&quot;&gt;Solid PDF Mechanic&lt;/a&gt; uses the new free Developer license to allow exploration of all the premium Solid Framework features. All features are fully functional but include watermarks and &quot;not for resale&quot; text.  To take advantage of either the Free or Developer license, simply &lt;a href=&quot;http://www.soliddocuments.com/download.htm?product=SolidFramework&quot;&gt;download&lt;/a&gt; Solid Framework and start using it immediately.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;div style=&quot;text-align: center;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot; ;font-size:13px;&quot;&gt;&lt;b&gt;PDF viewer including Page Pane and standard navigation controls similar to Acrobat Reader.&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;http://www.pdf-internals.com/images/solidpdfnavigator_pdfexplorer.png&quot;&gt;&lt;img style=&quot;display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 640px; height: 480px;&quot; src=&quot;http://www.pdf-internals.com/images/solidpdfnavigator_pdfexplorer.png&quot; border=&quot;0&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div style=&quot;text-align: center;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold; &quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;Explorer view allows navigation and examination of PDF internals.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;http://www.pdf-internals.com/images/solidpdfnavigator_pdfviewer.png&quot;&gt;&lt;img style=&quot;display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 640px; height: 480px;&quot; src=&quot;http://www.pdf-internals.com/images/solidpdfnavigator_pdfviewer.png&quot; border=&quot;0&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/6290874508148364774/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/07/solid-framework-v6-includes-pdfa.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6290874508148364774'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6290874508148364774'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/07/solid-framework-v6-includes-pdfa.html' title='Solid Framework v6 includes PDF/A Validation'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-9204855184779730730</id><published>2009-05-11T11:16:00.000-07:00</published><updated>2011-06-03T20:27:33.136-07:00</updated><title type='text'>PDF: right up there with COBOL</title><content type='html'>And this is a good thing.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;PDF is an amazing document format: it is both backward and forward compatible:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;PDF 1.1 files from 1993 can still be perfectly understood by today&#39;s PDF tools&lt;br /&gt;&lt;/li&gt;&lt;li&gt;PDF files created by today&#39;s tools can still be viewed by older PDF software&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;Could this be one of the reasons, along with technological soundness, why PDF is ubiquitous?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What other parts of our industry can claim such success without leaving data or customers behind every three years after &quot;upgrade season&quot;?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot; ;font-size:24px;&quot;&gt;Not Google&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A few years back Google offered a very simple &lt;a href=&quot;http://googleajaxsearchapi.blogspot.com/2009/03/google-code-labs-and-soap-search-api.html&quot;&gt;Google SOAP Search API&lt;/a&gt; to allow 3rd parties to easily use the Google search engine to add native search to their websites. By native, I mean no ads from Google and 100% custom UI. We used this API as a quick fix to get search on the &lt;a href=&quot;http://www.soliddocuments.com/&quot;&gt;Solid Documents&lt;/a&gt; web site.  In 2006, Google &quot;deprecated&quot; this API and required web developers to migrate to their new and improved AJAX version of the same thing. in August 2009, the API will cease to function altogether.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To be fair, the service was free.  However, that&#39;s supposed to be the benefit of going with Google rather than Microsoft.  It is hardly a benefit if they pull the rug out from under you. The least they could have done was provide some sort of legacy wrapper for the new API.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you cannot rely on an API to exist for the life of your business, then it would be foolish to build your infrastructure on it. Luckily search was a cheap way for us to learn to steer well clear of any &quot;enterprise&quot; offerings from Google in future. No, we will not be using Google Apps (the &quot;enterprise&quot; version of GMail plus Google Docs). And we certainly will not be building anything using the &lt;a href=&quot;http://code.google.com/appengine/&quot;&gt;Google App Engine&lt;/a&gt;. I don&#39;t care how cool it is: I&#39;m willing to bet that your app will no longer be running in 10 years from now. This Blog uses a free service acquired by Google. Hmm....&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;Not Microsoft&lt;/span&gt;&lt;/div&gt;&lt;div&gt;What set me off on this tirade was our hosted Exchange upgrade this week. We drank the Kool-Aid and outsourced &#39;generic&#39; parts of our IT including our e-mail. This week they upgraded us from Exchange 2003 to Exchange 2007. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On the positive side, they didn&#39;t lose my e-mail. However, the transition has been anything but smooth. It included instructions like clearing your Blackberry to &#39;out of box&#39; state. In other words, assuming that the only thing you do with your Blackberry is use it as a client for their e-mail server. Most people I know have at least one other app that they regularly use on their Blackberry (&quot;telephone&quot; anyone?). So, plenty of time was wasted backing up and restoring address books and re-installing 3rd party applications. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Pretty much the only thing that worked after the transition was e-mail. One of the primary reasons we originally switched from our own simple open source e-mail server to Exchange was to take advantage of collaborative features of Outlook like shared calendars and address books. None of that worked after the transition.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;If it ain&#39;t broke ..&lt;/span&gt;&lt;/div&gt;&lt;div&gt;.. don&#39;t fix it! One of the key features expected from any &quot;Enterprise Solution&quot; should be longevity. Just like railways and roads, one should expect a bit of maintenence over the lifetime of the tool but one does not expect to have to toss the whole thing out and replace it every 4 years. Some of the open source projects deal with this issue a little better but that&#39;s not all roses either: anyone else remember the upgrade to PHP 5 or is it just me?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I understand that sometimes you need to throw out the legacy to make progress. Shutting down analog TV in the US is a great example of this. However, when it comes to expectations for enterprise business solutions, 4 years is a very low bar. For Exchange, part of the blame goes to Apptix and part to Microsoft:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;When I look for Exchange 2003 on Microsoft&#39;s site I&#39;m redirected to the Exchange 2010 pages. You have to dig deep on technet to find 2003 info. Even then, it is not clear how long Microsoft intends to support it.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Apptix should have offered the 2007 migration as an option rather than a compulsory disruption to all of their clients and their businesses. Part of their plan should have been to keep running Exchange 2003 for &lt;a href=&quot;http://en.wikipedia.org/wiki/Luddite&quot;&gt;Luddites&lt;/a&gt; like me. Remind me again what the benefit of the 2007 upgrade was?&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;In the event that breaking changes to an API, file format or service are unavoidable, a responsible enterprise service provider will provide a smooth transition path to their customers.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;Back to Solid PDF&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Aside from one small change in the way table reconstruction worked in a very early version of &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidConverterPDF&quot;&gt;Solid Converter PDF&lt;/a&gt;, the publically exposed APIs of our SDK have remained constant for 7 years now. That first minor change we made taught us our lesson: even as we&#39;ve migrated from a COM SDK to our more recent &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidFramework&quot;&gt;.NET Solid Framework&lt;/a&gt;, we&#39;ve taken great care to avoid breaking customer apps that rely on our older APIs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When we released &lt;a href=&quot;http://developer.soliddocuments.com/search/label/Scripting&quot;&gt;Solid Script&lt;/a&gt;, our command line syntax for our desktop applications had to change but we offered a legacy wrapper that translates old command lines into the newer scripts. Even this is not a big issue though since the software we created 7 years ago still works just as well as it did the day it was purchased. No forced upgrades due to changing file formats or &#39;deprecated&#39; APIs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When &lt;a href=&quot;http://www.soliddocuments.com/convert/PDF-A1b/303/11&quot;&gt;PDF/A&lt;/a&gt; was announced in 2005 we immediately recognized the value this added to an already awesome file format and decided to make archiving functionality one of the pillars of our business. The PDF/A standard underlines the already proven long term vision we have for both customer documents and PDF products:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Think 40 years, not 4 years&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Think incremental non-breaking improvements, not disruptive change&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Wouldn&#39;t it be grand if the bigger players had a similar definition of long term? With all the focus today on sustainability on conservation, why do they continue to waste our time, money and energy?&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/9204855184779730730/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/05/pdf-right-up-there-with-cobol.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/9204855184779730730'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/9204855184779730730'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/05/pdf-right-up-there-with-cobol.html' title='PDF: right up there with COBOL'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2276182685672700192</id><published>2009-04-30T21:40:00.000-07:00</published><updated>2009-05-04T15:24:47.252-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="HTML"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF"/><title type='text'>Structured Content: PDF to HTML</title><content type='html'>A while back I included the following as one of the areas of interest of the &lt;a href=&quot;http://www.pdf-d.org/about.htm&quot;&gt;PDF/D Consortium&lt;/a&gt;:&lt;br /&gt;&lt;blockquote style=&quot;font-style: italic;&quot;&gt;Structured Documents and Single Sourcing: improving round-trips to document software&lt;/blockquote&gt;What did I mean by Structured Documents? For years Solid Documents has been &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidConverterPDF&quot;&gt;converting PDF files to Word documents&lt;/a&gt; with a focus on retaining format and layout to allow customers to repurpose the content. While this is a great solution for a large amount of customers, it is not the only type of reconstruction that is interesting.&lt;br /&gt;&lt;br /&gt;PDF is by nature a &quot;document&quot; format: the layout is in the form of pages. Content also needs to exist in alternate formats like a continuously flowing stream. Use cases for continuously flowing content include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;conversion to HTML to reflow for form factors other than &quot;pages&quot;&lt;/li&gt;&lt;li&gt;conversion to content management systems where structure is more important than layout and formatting&lt;/li&gt;&lt;li&gt;conversion for alternate readers for people with disabilities (text to speech, etc)&lt;/li&gt;&lt;/ul&gt;Reconstruction for these use cases focuses more on the structure of the document than on the layout and formatting. For example, we need to take unstructured PDF files and recognize columns, tables, lists, headers and footers, etc. This allows us to organize the content in a logical structure. Ultimately, we&#39;ll recognize topics and sections too so that we can produce logical hierarchies from plain old non-tagged PDF files.&lt;br /&gt;&lt;br /&gt;One great example of where conventional PDF pages are not the most appropriate way to read a document are on small screens of handheld devices. For example, the typical Blackberry has a 3&quot;x2&quot; screen with a resolution something like 320x240 pixels.&lt;br /&gt;&lt;br /&gt;In this diagram the little rectangles represent the viewing area on a Blackberry when viewing a document laid out on 8.5&quot;x11&quot; pages.&lt;br /&gt;&lt;br /&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJ1nQTXgtv926krHOIZP5FDqCzFK-uC61KVnkJRKJSUhamWrSvEjpfHvjQszqtBWMQhdUuXqL3gDg5fmnbZp88QuRypvTobQWkHQ9TlhJPuPMDTlJBVtpRq8cMNHq3p9zKMcVhG9uLc4/s1600-h/pdf2mobile.png&quot;&gt;&lt;img style=&quot;display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 275px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJ1nQTXgtv926krHOIZP5FDqCzFK-uC61KVnkJRKJSUhamWrSvEjpfHvjQszqtBWMQhdUuXqL3gDg5fmnbZp88QuRypvTobQWkHQ9TlhJPuPMDTlJBVtpRq8cMNHq3p9zKMcVhG9uLc4/s400/pdf2mobile.png&quot; border=&quot;0&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5332090937974719618&quot; /&gt;&lt;/a&gt;&lt;br /&gt;For 100% zoom we get about 100 pixels per inch. Think &lt;span style=&quot;font-weight: bold;&quot;&gt;bad quality fax&lt;/span&gt; machine resolution.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;For 50% we get a mere 50 pixels per inch which is &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;worse than &lt;/span&gt;&lt;span style=&quot;font-weight: bold;&quot;&gt;really bad fax quality&lt;/span&gt;. However, because of the layout, you need to move your little screen &quot;window&quot; both left-to-right and top-to-bottom to scroll the page. With or without columns, the amount of scrolling to read a single page is quite tedious.&lt;br /&gt;&lt;br /&gt;There is already a much better format for reading documents at lower resolution. This format is HTML. Back in the 90&#39;s when the internet was becoming popular for web browsing, screen resolutions for desktop machines were in the same ball park as handheld device resolutions today. Even with a 640x480 pixel handheld screen resolution, the physical size is still a limitation, typically still 3&quot;x2&quot;.&lt;br /&gt;&lt;br /&gt;Assuming one can reconstruct PDF files as continuously flowing documents, then the next step would be to convert them to HTML. If the target device is a handheld, then the complexity of the HTML should be kept to a minimum. This means simplifying the fonts, using CSS for styles and using HTML elements that look great even in the simplest browsers. Based on experimentation we&#39;ve seen that XHTML 1.0 is well supported by the HTML viewers on most handheld devices.&lt;br /&gt;&lt;br /&gt;To see how well our &lt;a href=&quot;http://www.pdf2mobile.com/&quot;&gt;PDF to HTML&lt;/a&gt; reconstruction works, you can experiment with it at &lt;a href=&quot;http://www.pdf2mobile.com/&quot;&gt;www.pdf2mobile.com&lt;/a&gt; without needing a mobile device.&lt;br /&gt;&lt;br /&gt;Next, we want to make it really easy to use from any handheld device. Assuming you receive an e-mail on your Blackberry with a PDF document attached to it, simply forward it to &lt;a href=&quot;mailto:convert@pdf2mobile.com&quot;&gt;convert@pdf2mobile.com&lt;/a&gt;.&lt;br /&gt;The service will convert it to HTML and e-mail it back. Alternatively, if you have a handheld device with an e-mail client that renders HTML then you can forward your e-mail to &lt;a href=&quot;mailto:detach@pdf2mobile.com&quot;&gt;detach@pdf2mobile.com&lt;/a&gt; - it will be returned as an HTML e-mail rather than an HTML attachment.&lt;br /&gt;&lt;br /&gt;We&#39;re interested in your feedback (&lt;a href=&quot;mailto:standards@soliddocuments.com&quot;&gt;standards@soliddocuments.com&lt;/a&gt;) on our conversion and our HTML format. This PDF to HTML conversion functionality will be available for other uses in the next release of &lt;a href=&quot;http://www.soliddocuments.com/products.htm?product=SolidConverterPDF&quot;&gt;Solid Converter PDF&lt;/a&gt;.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2276182685672700192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/04/structured-content-pdf-to-html.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2276182685672700192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2276182685672700192'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/04/structured-content-pdf-to-html.html' title='Structured Content: PDF to HTML'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJ1nQTXgtv926krHOIZP5FDqCzFK-uC61KVnkJRKJSUhamWrSvEjpfHvjQszqtBWMQhdUuXqL3gDg5fmnbZp88QuRypvTobQWkHQ9TlhJPuPMDTlJBVtpRq8cMNHq3p9zKMcVhG9uLc4/s72-c/pdf2mobile.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2567098439214638495</id><published>2009-04-30T20:47:00.001-07:00</published><updated>2009-05-01T11:53:04.275-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>XML Comments in XMP</title><content type='html'>Nowhere in the XMP or RDF specifications is any mention of XML comments.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On validating our vast set of PDF files gathered from the wild, thanks to sites like &lt;a href=&quot;http://www.freepdftoword.org/&quot;&gt;www.freepdftoword.org&lt;/a&gt;, &lt;a href=&quot;http://www.pdf2mobile.com/&quot;&gt;www.pdf2mobile.com&lt;/a&gt; and &lt;a href=&quot;http://www.validatepdfa.com/&quot;&gt;www.validatepdfa.com&lt;/a&gt; we have run into a multitude of cases where XMP produced by reputable (read &quot;Adobe&quot;) products includes XML comments.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;After consulting with collegues at Adobe, &lt;a href=&quot;http://www.soliddocument.com/products.htm?product=SolidPDFTools&quot;&gt;Solid Documents&lt;/a&gt; and &lt;a href=&quot;http://www.pdflib.com/developer/xmp-metadata/free-xmp-validator/&quot;&gt;PDFlib&lt;/a&gt; we reached consesus on this topic. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Two conclusions:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Since XML comments are legal XML and not explicitly prohibited, we conclude that they are allowed.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;XML comments may be dropped when converting PDF files based on this clause from the &lt;a href=&quot;http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-comments&quot;&gt;XML specification&lt;/a&gt;:&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&quot;an XML processor MAY, but need not, make it possible for an application to retrieve the text of comments&quot;&lt;/span&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2567098439214638495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/04/xml-comments-in-xmp.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2567098439214638495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2567098439214638495'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/04/xml-comments-in-xmp.html' title='XML Comments in XMP'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-8484362504937818627</id><published>2009-04-28T20:20:00.000-07:00</published><updated>2009-04-29T16:44:17.501-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><title type='text'>Non-1.4 Features in PDF/A</title><content type='html'>&lt;span style=&quot;font-size:130%;&quot;&gt;Can PDF/A files include features from PDF 1.5, 1.6 or 1.7 (ISO 32000-1)?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is a recurring FAQ that comes from a superficial understanding of the PDF/A ISO-19005-1 specification. While PDF/A-1 was defined based on the PDF Reference for 1.4 of the format, it quite clearly allows non-1.4 features.  One of the great features of PDF is that &quot;unknown things&quot; are ignored by conforming readers. This feature is part of all PDF specifications that have ever existed, including 1.4 on which PDF/A-1 is based.&lt;br /&gt;&lt;br /&gt;To quote &lt;a href=&quot;http://www.acrobatusers.com/blogs/leonardr/&quot;&gt;Leonard Rosenthol&lt;/a&gt;, PDF Standards Architect for Adobe Systems:&lt;br /&gt;&lt;blockquote&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&quot;There is no question about this by the committee.  In fact, we just rediscussed this last week at our meeting in Germany due to a comment from one of the various national bodies about potentially changing this position (aka allowing &#39;private data&#39; or &#39;unknown keys&#39;) and the agreement was that we are still in agreement that &#39;unknown things&#39; are allowed PROVIDED THEY DO NOT CHANGE the visual appearance of the page.&quot;&lt;/span&gt;&lt;/blockquote&gt;The &lt;a href=&quot;http://www.aiim.org/standards/article.aspx?ID=29510&quot;&gt;ISO 19005-1 Application Notes from AIIM&lt;/a&gt; provide answers to many of these issues. To quote the Application Notes:&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-style: italic;&quot;&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style=&quot;font-style: italic;&quot;&gt;“A conforming PDF/A file has three kinds of content:&lt;/span&gt;&lt;br /&gt;&lt;ul style=&quot;font-style: italic;&quot;&gt;&lt;li&gt;content that affects the final visual reproduction of the composite entity;&lt;/li&gt;&lt;li&gt;other visual content such as annotations, form fields, etc.&lt;/li&gt;&lt;li&gt;non-printing content such as bookmarks, metadata, etc.&lt;/li&gt;&lt;/ul&gt;&lt;span style=&quot;font-style: italic;&quot;&gt;The PDF/A-1 standard states that a conforming file may include valid PDF features beyond those described in the standard provided they do not affect final visual reproduction of the composite entity and are included as part of PDF Reference 1.4.”&lt;/span&gt;&lt;/blockquote&gt;&lt;span style=&quot;font-style: italic;&quot;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;&lt;br /&gt;What does this mean?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One of the goals of PDF/A is to make the appearance of PDF files predictable and reproducible over time. Including &quot;private&quot; features does not affect this goal or break PDF/A-1 assuming the rules are followed.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-size:130%;&quot;&gt;Examples&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If a feature from a more recent version of PDF can be ignored by a PDF/A-1 Compliant Reader without affecting the visual appearance of the PDF then this goal is met. An example of such a feature would be /PrintScaling (PDF 1.6) in the /ViewerPreferences dictionary. Any “future” feature that does not affect appearance is exactly the same as a “private” feature between reader and writer: conforming PDF/A Readers should ignore it.&lt;br /&gt;&lt;br /&gt;Features from later versions of PDF which do indeed affect visual appearance are explicitly prohibited in the PDF/A 19005-1 specification. For example, 6.5.2 states that annotation types not defined in PDF Reference 1.4 are prohibited (along with a few that are defined in 1.4). This means newer annotation types like 3D are clearly prohibited.  Another good example of a &quot;feature from the future&quot; that clearly alters appearance is /UserUnits in the /Page dictionary which is obviously prohibited because it certainly affects appearance.&lt;br /&gt;&lt;br /&gt;Other features are implicitly forbidden. For example, an image compressed using JPXDecode filter (JPEG2000 - PDF 1.5) would be ignored by a conforming PDF/A-1 reader but the absence of this image would affect the visual appearance of the PDF. Hence, JPEG2000 should not be used in PDF/A-1 files. Another example is setting BitsPerPixel to 16 for images (PDF 1.5): since this value was introduced after 1.4, it is obviously forbidden because ignoring it would lead to undefined behavior in readers capable only of rendering 1.4. Many of these cases are covered explicitly in the PDF/A Competence Center&#39;s &lt;a href=&quot;http://www.pdfa.org/doku.php?id=pdfa:en:isartor_test_suite&quot;&gt;Isartor Test Suite&lt;/a&gt; and also the PDF/D Consortium extensions to the PDF/A test suite: &lt;a href=&quot;http://www.pdf-d.org/pdfa-compliance.htm&quot;&gt;PDF/D Compliance Tests&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Comments and questions most welcome.</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/8484362504937818627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/04/non-14-features-in-pdfa.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8484362504937818627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8484362504937818627'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/04/non-14-features-in-pdfa.html' title='Non-1.4 Features in PDF/A'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2487900704568846920</id><published>2009-04-21T15:59:00.000-07:00</published><updated>2009-04-21T22:30:47.067-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>Ambiguities in PDF/A Extension Schemas</title><content type='html'>The &lt;a href=&quot;http://www.pdfa.org/doku.php?id=pdfa:en:techdoc&quot;&gt;PDF/A XMP Technotes&lt;/a&gt; are not clear on the subject of optional/required for properties of the extension schemas.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Discussion with engineers from other PDF/A companies has resulted in the &quot;if it doesn&#39;t say &#39;Optional&#39; then it must be &#39;Required&#39; assumption&quot; which most of us are trying to abide by. The only properties in any of the extension schemas marked as Optional are:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;pdfaSchema:schema - Optional description of schema&lt;br /&gt;&lt;/li&gt;&lt;li&gt;pdfaType:field - Optional description of the structured fields&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;That&#39;s it!  All the rest must thus be &#39;Required&#39;.  Not so fast!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If this were true then both pdfaSchema:property and pdfaSchema:valueType are always required which means that &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;all &lt;/span&gt;extension schemas &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;must &lt;/span&gt;include both properties and custom value types. When we were creating &lt;a href=&quot;http://www.pdf-d.org/pdfa-compliance.htm&quot;&gt;RDF definitions &lt;/a&gt;for all the pre-defined PDF/A schemas, we noticed this issue because it made it impossible to correctly define the &quot;Dimensions&quot; valueType schema: this schema has one custom value type and no properties.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Exception #1: &lt;/span&gt;at least one of pdfaSchema:property and pdfaSchema:valueType should be present.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We&#39;ve noticed with our vast test set accumulated through our free online services like &lt;a href=&quot;http://www.freepdftoword.org/&quot;&gt;www.freepdftoword.org&lt;/a&gt; and &lt;a href=&quot;http://www.validatepdfa.com/&quot;&gt;www.validatepdfa.com&lt;/a&gt; that several Adobe products create schemas which omit one or more of pdfaProperty:description, pdfaType:description and pdfaField:description. All three of these properties are purely descriptive in the same sense as the two properties mentioned about as &#39;Optional&#39;.  We believe that these fields should also be optional but, for now, our validator still flags their absence as an error (not a fatalError though since we can add these fields to the schemas, containing filler content, to &quot;fix&quot; the issue).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Proposed Exception #2: &lt;/span&gt;pdfaProperty:description, pdfaType:description and pdfaField:description should be &#39;Optional&#39; properties. Existing PDF/A creators are omitting them and it makes sense.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A value type containing fields is required to have a pdfaType:namespaceURI property. We&#39;ve noticed customer samples created by reputable products which omit this field. In the case of the omission, the assumed namespace for the value type is simply the same as the namespace of the schema with a slash and the name of the type appended to it.  Our validator marks this issue as an Error to (and not a fatalError) since it can easily be repaired by explicitly inserting the assumed namespace.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Example:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Schema namespace:&lt;/div&gt;&lt;div&gt;&lt;pdfaschema:namespaceuri&gt;  http://www.acme.com/ns/email/1/&lt;/pdfaschema:namespaceuri&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Value type name:&lt;/div&gt;&lt;div&gt;&lt;pdfatype:type&gt;  mailaddress&lt;/pdfatype:type&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Assumed namespace of value type if pdfaType:namespaceURI  is absent:&lt;/div&gt;&lt;div&gt;   http://www.acme.com/ns/email/1/mailaddress/&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold; &quot;&gt;Proposed Exception #3: &lt;/span&gt;if pdfaType:namespaceURI is absent, construct a default namespace for the property as described above.&lt;br /&gt;&lt;/div&gt;&lt;div&gt; &lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2487900704568846920/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/04/ambiguities-in-pdfa-extension-schemas.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2487900704568846920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2487900704568846920'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/04/ambiguities-in-pdfa-extension-schemas.html' title='Ambiguities in PDF/A Extension Schemas'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-4503555915661500464</id><published>2009-03-04T15:35:00.000-08:00</published><updated>2009-03-04T15:46:53.292-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ISO 32000"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF"/><title type='text'>Flatness: Ambiguity in ISO 32000-1</title><content type='html'>&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;From the ISO 32000-1 specification:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Table 53, 8.4.1 describing initialization of graphic state at the start of each page:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The precision with which curves shall be rendered on the output device (see 10.6.2, &quot;Flatness Tolerance&quot;). The value of this parameter (positive number) gives the maximum error tolerance, measured in output device pixels; smaller numbers give smoother curves at the expense of more computation and memory use. Initial value: 1.0. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Table 57, 8.4.4 describing the &quot;i&quot; operator:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Set the flatness tolerance in the graphics state (see 10.6.2, &quot;Flatness Tolerance&quot;). &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;flatness &lt;/span&gt;is a number in the range 0 to 100; a value of 0 shall specify the output device&#39;s default flatness tolerance.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Table 58, 8.4.5 describing the graphic state parameter dictionary entry&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;FL&lt;/span&gt;:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Number, (Optional; PDF 1.3) The flatness tolerance (see 10.6.2, &quot;Flatness Tolerance&quot;).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;10.6.2 Flatness Tolerance&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The flatness tolerance controls the maximum permitted distance in device pixels between the mathematically correct path and an approximation constructed from straight line segments, as shown in Figure 54. Flatness may be specified as the operand of the &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;i&lt;/span&gt; operator (see Table 57) or as the value of the &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;FL &lt;/span&gt;entry in a graphics state parameter dictionary (see Table 58). It shall be a positive number.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold; &quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Observation&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It appears to me that the above clauses are referring to exactly the same thing. If that is correct, then the range and default value for flatness tolerance is ambiguous:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Either the default is 1.0 or it is 0: pick one.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Either the range is 0 to 100, or is a positive number (any value &gt; 0): pick one.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Comments?&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/4503555915661500464/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/03/flatness-ambiguity-in-iso-32000-1.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/4503555915661500464'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/4503555915661500464'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/03/flatness-ambiguity-in-iso-32000-1.html' title='Flatness: Ambiguity in ISO 32000-1'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-4107438476982133352</id><published>2009-02-26T14:31:00.000-08:00</published><updated>2009-03-04T15:38:35.966-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ISO 32000"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><category scheme="http://www.blogger.com/atom/ns#" term="Undefined Behavior"/><title type='text'>Anomalous Situations - Best Practices</title><content type='html'>PDF ISO-32000 has a note in clause 12.6.2 that is just dying to get the PDF/D Best Practices treatment:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&quot;Conforming readers should attempt to provide reasonable behavior in anomalous situations. For example, self-referential actions should not be executed more than once, and actions that close the document or otherwise render the next action impossible should terminate the execution sequence.&quot;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;How about insisting that the Next entry in Action dictionaries shall only contain acyclic graphs of actions?  When would endless loops of action sequences ever be a good thing?&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/4107438476982133352/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/anomalous-situations-best-practices.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/4107438476982133352'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/4107438476982133352'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/anomalous-situations-best-practices.html' title='Anomalous Situations - Best Practices'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-6093088206886185530</id><published>2009-02-26T10:45:00.000-08:00</published><updated>2009-02-26T11:10:12.778-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>Preferred prefix for Colorant Basic Value Type</title><content type='html'>&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;xapG vs xmpG&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;From the &lt;a href=&quot;http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf&quot;&gt;XMP Specification (2005)&lt;/a&gt;:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvs1XK2oRtP7qx6H03OWECQyMfnbFrHhtmTi6RckUgr7cxXxTois378BjgH3W70QaHan_c3kQfp4lp_cBKBOqnkXWrxjAz2yuSpd5GKL4G72uTXzekfXEIKgNSAXUj-OzDt7OMoUxhw5Q/s1600-h/colorant2005.png&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 74px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvs1XK2oRtP7qx6H03OWECQyMfnbFrHhtmTi6RckUgr7cxXxTois378BjgH3W70QaHan_c3kQfp4lp_cBKBOqnkXWrxjAz2yuSpd5GKL4G72uTXzekfXEIKgNSAXUj-OzDt7OMoUxhw5Q/s400/colorant2005.png&quot; border=&quot;0&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5307183201935582738&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;From the &lt;a href=&quot;http://www.adobe.com/devnet/xmp/pdfs/XMPSpecificationPart2.pdf&quot;&gt;XMP Specification Part 2 (2008)&lt;/a&gt;:&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis7L9vE1yxBTRnOlWTLywvm4PexsY13FLSaN1-ZxgTTIWugsJwGaF_6Hv0Bm0XiTWGl_ZzheHU4IKMVsfknpYKZncl1ztEJP2_cFJIlvYrz-dGUHPvin3wElvv8Zae5JBk4gi7B_A07O0/s1600-h/colorant2008.png&quot;&gt;&lt;img style=&quot;float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 103px;&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis7L9vE1yxBTRnOlWTLywvm4PexsY13FLSaN1-ZxgTTIWugsJwGaF_6Hv0Bm0XiTWGl_ZzheHU4IKMVsfknpYKZncl1ztEJP2_cFJIlvYrz-dGUHPvin3wElvv8Zae5JBk4gi7B_A07O0/s400/colorant2008.png&quot; border=&quot;0&quot; alt=&quot;&quot; id=&quot;BLOGGER_PHOTO_ID_5307182724767299234&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I Googled Adobe&#39;s site for clarification on this change, hoping to find a note on the subject: nada.&lt;/div&gt;&lt;div&gt;For the purposes of our XMP validator we&#39;re obviously going to assume that the most recent version is correct. The reason I made this blog post is so that it will pop up in Google when the next person stumbles into this question, wondering if it is a typo or a deliberate change.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/6093088206886185530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/preferred-prefix-for-colorant-basic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6093088206886185530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6093088206886185530'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/preferred-prefix-for-colorant-basic.html' title='Preferred prefix for Colorant Basic Value Type'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvs1XK2oRtP7qx6H03OWECQyMfnbFrHhtmTi6RckUgr7cxXxTois378BjgH3W70QaHan_c3kQfp4lp_cBKBOqnkXWrxjAz2yuSpd5GKL4G72uTXzekfXEIKgNSAXUj-OzDt7OMoUxhw5Q/s72-c/colorant2005.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-7999463332455348201</id><published>2009-02-25T13:30:00.000-08:00</published><updated>2009-02-25T15:43:54.548-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>Open Source PDF/A RDF Schemas</title><content type='html'>Inspired by the Isartor test set for validating PDF/A compliance we are working on a similar style set of negative tests for basic XMP compliance (&lt;a href=&quot;http://www.pdfa.org/doku.php?id=pdfa:en:techdoc&quot; title=&quot;PDF/A XMP TechNotes&quot; rel=&quot;nofollow&quot; target=&quot;_blank&quot;&gt;PDF/A XMP TechNotes&lt;/a&gt;).&lt;div&gt;&lt;br /&gt;While it is clear that this work needs to be done, nobody appears to be tackling it. PDF/A 19005-1 is now heading into its 3rd year so we&#39;re attempting to fill this gap.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;While each vendor will obviously implement their own XMP validator for PDF/A validation and conversion, there are some areas where we can easily collaborate. We believe that it is in all our interests to openly share an RDF and PDF/A compliant XMP implementation of the pre-defined schemas required to validate PDF/A files.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Today we released our first version of the &lt;a href=&quot;http://www.pdf-d.org/pdfa-compliance.htm&quot;&gt;PDF/A pre-defined schemas&lt;/a&gt; in RDF form. You can find these resources at the &lt;a href=&quot;http://www.pdf-d.org/pdfa-compliance.htm&quot;&gt;PDF/D website&lt;/a&gt;.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/7999463332455348201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/open-source-pdfa-rdf-schemas.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/7999463332455348201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/7999463332455348201'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/open-source-pdfa-rdf-schemas.html' title='Open Source PDF/A RDF Schemas'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-2634154202871987275</id><published>2009-02-23T14:28:00.000-08:00</published><updated>2009-02-23T14:31:46.718-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Compliance Reports"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/A"/><title type='text'>Isartor Truth</title><content type='html'>As promised, we&#39;ve posted more tools for standardized compliance testing.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Today we added:&lt;/div&gt;&lt;div&gt;- Isartor Truth: an XML file with the expected results of the Isartor PDF/A tests&lt;/div&gt;&lt;div&gt;- CompareReports.exe: a tool to compare the above truth file to output from a validator&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For more on our efforts to improve mechanical comparison of compliance testing reports, please visit the &lt;a href=&quot;http://www.pdf-d.org/compliance-reports.htm&quot;&gt;PDF/D&lt;/a&gt; site.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/2634154202871987275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/isartor-truth.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2634154202871987275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/2634154202871987275'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/isartor-truth.html' title='Isartor Truth'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-5899090263813224078</id><published>2009-02-20T11:56:00.001-08:00</published><updated>2011-06-03T19:29:22.573-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>XMP: bag vs Bag, seq vs Seq</title><content type='html'>The &lt;a href=&quot;http://www.w3.org/TR/rdf-schema/#ch_bag&quot; rel=&quot;nofollow&quot;&gt;RDF specification&lt;/a&gt; clearly uses &quot;Bag&quot;, &quot;Alt&quot; and &quot;Seq&quot; for the names of these container elements. This is a requirement for the names of these array container elements:&lt;div&gt;rdf:Bag, rdf:Alt and rdf:Seq&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Starting with the &lt;a href=&quot;http://www.adobe.com/devnet/xmp/pdfs/XMPSpecificationPart1.pdf&quot;  rel=&quot;nofollow&quot;&gt;XMP Specification Part 1&lt;/a&gt;, the use of &quot;bag &lt;type&gt;&quot; (as in &quot;bag Text&quot;) was introduced as a notation to describe array types in schemas. This document is consistent in using the lowercase variant for type descriptions only.&lt;/type&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I believe that the titlecase variant of this notation, first seen in &lt;a href=&quot;http://www.adobe.com/devnet/xmp/pdfs/XMPSpecificationPart2.pdf&quot; rel=&quot;nofollow&quot;&gt;XMP Specification Part 2&lt;/a&gt;, was introduced in error (example: XMP Media Management property definition for xmpMM:Ingredients is &quot;Bag ResourceRef&quot;).  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This inconsistency really didn&#39;t matter while it was limited to being used as a notation format only in documentation. The arrival of PDF/A extension schemas changed all that. Specifically, as mentioned in &lt;a href=&quot;http://www.pdfa.org/doku.php?id=pdfa:en:techdoc&quot; rel=&quot;nofollow&quot;&gt;TechNote 0009&lt;/a&gt; clause 4.5, this notation is now used in the PDF/A extension schemas for the pdfaProperty:valueType and the pdfaField:valueType properties.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Our validator will support both variants but will generate warnings for the titlecase version. In other words, we are recommending the use of the lowercase variants as a best practice for &lt;a href=&quot;http://www.pdf-d.org/&quot;&gt;PDF/D&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/5899090263813224078/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-bag-vs-bag-seq-vs-seq.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/5899090263813224078'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/5899090263813224078'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-bag-vs-bag-seq-vs-seq.html' title='XMP: bag vs Bag, seq vs Seq'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-8142685030609861116</id><published>2009-02-20T08:51:00.000-08:00</published><updated>2009-02-25T11:47:23.148-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>XMP pdfaValidate Schema</title><content type='html'>In building our new and improved validator we decided to use the pdfaExtension schema (and friends) to define all the schemas we are validating including all the pre-defined schemas. This process of eating our own dogfood has exposed numerous holes in both the XMP Specification and the PDF/A Specification.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The most obvious hole, which has already been discussed within the PDF/A Competence Center Working Group (TWG), is the loose nature of the definition of basic types in XMP. As mentioned earlier in my blog, one example is &quot;Choice of &lt;type&gt;&quot; and &quot;Open Choice of &lt;type&gt;&quot;. Another issue raised in TWG discussions is the ambigious use of case (seq vs Seg, bag vs Bag, etc).&lt;/type&gt;&lt;/type&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The XMP Specification makes provision for extending existing Properties with Qualifier Properties that are ignored by applications that are not aware of them. We used this feature and the pdfaValidate schema to extend pdfaProperty and add validation information. When defining the schemas we wish to validate, we now add the following attributes:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;status&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Description: &lt;/span&gt;used by validator to flag errors of omission, inclusion or raise warnings.&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Type:&lt;/span&gt; Closed Choice of Text&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Values:&lt;/span&gt; required|prohibited|deprecated|restricted|recommended|ignored&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&#39;deprecated&#39; is similar to &#39;prohibited&#39; only it is flagged as a warning and not an error by validators.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;constraint&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Description:&lt;/span&gt; Regular expression used to constrain &quot;Closed Choice of &lt;type&gt;&quot; values. We still need a way to flag Open vs Closed.&lt;/type&gt;&lt;/div&gt;&lt;div&gt;Regular expressions always need to match all input (start with &#39;^&#39; and end with &#39;$&#39;). Other valid constraint values include:&lt;/div&gt;&lt;div&gt;&#39;base64&#39;: used to validate Thumbnail xapGImg:image property for example.&lt;/div&gt;&lt;div&gt;Numeric ranges like: &#39;[0,255]&#39;,  &#39;(0,)&#39;, &#39;[-128,127]&#39;, etc.&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Type:&lt;/span&gt; Text&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;standard&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Description:&lt;/span&gt; This value determines which specification is violated when constraints are not met.&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Type:&lt;/span&gt; Closed Choice of Text&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Values:&lt;/span&gt; pdf|pdfa|pdfd|xmp&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;clause&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Description: &lt;/span&gt;This is the clause in the specification which is violated when constraints are not met.&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Type:&lt;/span&gt; Text&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-weight: bold;&quot;&gt;Value:&lt;/span&gt; string, typically dot delimited integers&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We are continuing to work on our full set of these schemas for validation of PDF/A. These will then be available to &lt;a href=&quot;http://www.pdf-d.org/&quot;&gt;PDF/D Consortium &lt;/a&gt;members. During this process, we may add more features to the pdfaValidate schema.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/8142685030609861116/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-pdfavalidate-schema.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8142685030609861116'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8142685030609861116'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-pdfavalidate-schema.html' title='XMP pdfaValidate Schema'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-3225392044900404906</id><published>2009-02-15T11:22:00.000-08:00</published><updated>2009-02-16T14:32:26.245-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="XMP"/><title type='text'>XMP Validator</title><content type='html'>I&#39;ve been working on building a better XMP validator. My idea was to define all the pre-defined schemas as pdfaExtension schemas and pre-load them into my validator. With this approach, I only need one validator (that validates pdfaExtension schemas) to validate all the pre-defined schemas as well as any user defined schemas.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Part of pulling this off requires that I have RDF schemas for all the common pre-defined schemas. I thought I&#39;d start with the PDF/A identification schema since it appeared almost trivial. It didn&#39;t take long before I ran into &quot;undefined&quot; ground.  I thought I could use &quot;Closed Choice of Integer&quot; to define the &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;part &lt;/span&gt;property (only one choice: 1) and &quot;Closed Choice of Text&quot; to define the &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;conformance &lt;/span&gt;property (two choices: A or B). So, using the samples I found on the in the tech notes on the PDF/A Competence Center site, I set  out to create my first pdfaExtension schema.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Soon I discovered that how to define a &quot;Choice&quot; is not defined in these tech notes. Next step was to wade through XMP documentation at Adobe. This doesn&#39;t really help much because, being new to this domain, it is not easy to tell when something is specific to XMP, RDF or pdfaExtension. On page 62 of the &lt;a href=&quot;http://www.aiim.org/documents/standards/xmpspecification.pdf&quot;&gt;XMP Specification&lt;/a&gt; a Closed Choice is described. A &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;vocabulary &lt;/span&gt;and &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;lists &lt;/span&gt;are mentioned.  I can only assume this means defining a list of values using Bag, Alt or Seq. An example would really help to clarify.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I&#39;m all ears ..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Here is my work in progress: &lt;a href=&quot;http://www.pdf-d.org/downloads/sample.rdf&quot;&gt;sample.rdf&lt;/a&gt;)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Next Idea: pdfaValidate Schema&lt;/span&gt;&lt;/div&gt;&lt;div&gt;There are not a lot of examples out there. Simple examples showing how to define a Closed Choice field would be great. The same goes for defining &quot;Property Qualifiers&quot;. From what I read in the XMP specification they would be an ideal solution for me:&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt; &quot;Property qualifiers allow values to be extended without breaking existing usage.&quot;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;The specification has pretty block diagrams but no sample code.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the absense of decent implementation documentation I decided to just take a swing at it and came up with something that I think is probably what the XMP Specification describes as &quot;Property Qualifiers&quot;. I created an RDF schema with two properties for validation:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;status: Closed Choice of Text - required|prohibited|restricted|recommended|ignored&lt;br /&gt;&lt;/li&gt;&lt;li&gt;constraint: Text - regular expression for constraining simple literal fields for PDF/A compliance.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here it is as RDF. I included the definition of pdfaValidate schema and included a &quot;constrained&quot; version of my pdfaid RDF schema definition as an example: &lt;a href=&quot;http://www.pdf-d.org/downloads/pdfaValidate.rdf&quot;&gt;pdfaValidate.rdf&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Now I have what I need to make simple &quot;constrained&quot; RDF definitions for all the pre-defined schemas that we need to validate for PDF/A compliance. Moving right along ..&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/3225392044900404906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-validator.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/3225392044900404906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/3225392044900404906'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xmp-validator.html' title='XMP Validator'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-6973333931020552412</id><published>2009-02-09T12:10:00.000-08:00</published><updated>2009-02-26T14:36:53.732-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Parser"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><title type='text'>More on Numbers</title><content type='html'>Earlier I discussed Numbers in a general post about &lt;a href=&quot;http://www.pragmaticpdf.com/2009/02/parsing-pdfd.html&quot;&gt;improving PDF for easier parsing&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have two more notes to add on the subject of numbers.&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&quot;.&quot; is not a number&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;PDF ISO-32000-1:2008 states that:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Adobe Acrobat Reader 9 clearly ignores this and accepts a single period as zero. This example (&lt;a href=&quot;http://www.pdf-d.org/downloads/7-3-3-t01-fail-b.pdf&quot;&gt;7-3-3-t01-fail-b.pdf&lt;/a&gt;) from our &lt;a href=&quot;http://www.pdf-d.org/pdf-compliance.htm&quot;&gt;PDF 1.4 test set&lt;/a&gt; clearly shows that the colors red (on the RGB page) and black (on the CMYK page) were parsed with no problem.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;1 0 . rg 72 72 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;0 1 0 rg 72 216 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;0 0 1 rg 72 360 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;..&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;0 1 1 rg 72 72 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;1 0 1 rg 72 216 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;1 1 0 rg 72 360 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;. 0 0 rg 72 504 72 72 re f&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Numbers in PDF/D&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In addition to earlier notes on parsing numbers, the above behavior will be considered an error in PDF/D. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Also, in our 10,000&#39;s of test files we have often seen number arguments terminated in content streams by the operator like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:small;&quot;&gt;... 2 0 0 2 0 0cm ...&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Acrobat does not tolerate this but we have seen other PDF software (including our own) look past this error. PDF/D will require delimiters or whitespace to terminate number tokens.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/6973333931020552412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/more-on-numbers.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6973333931020552412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/6973333931020552412'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/more-on-numbers.html' title='More on Numbers'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-7479555075492123033</id><published>2009-02-08T09:03:00.000-08:00</published><updated>2009-02-26T14:36:53.732-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Content Streams"/><category scheme="http://www.blogger.com/atom/ns#" term="Obsolete"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><title type='text'>Resources</title><content type='html'>Resources for a Page&#39;s Contents entry are defined in Resources dictionary of that Page or inherited from one of the ancestor nodes of that Page in the page node tree.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For XObjects, patterns, Type 3 Fonts and annotations that have content streams, the Resources dictionary will be included in the Content stream&#39;s dictionary. Unlike early versions of PDF, Resources cannot be inherited from the page tree for these objects (PDF 32000-1 mentions this obsolete functionality too).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ProcSets are obsolete and are excluded from PDF/D Resource dictionaries.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/7479555075492123033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/resources.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/7479555075492123033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/7479555075492123033'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/resources.html' title='Resources'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-1387994825400156027</id><published>2009-02-08T08:56:00.001-08:00</published><updated>2009-02-26T14:36:53.732-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Content Streams"/><category scheme="http://www.blogger.com/atom/ns#" term="Parser"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><title type='text'>BX and EX</title><content type='html'>The last time a content operator was added to PDF was with PDF 1.2&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Since we are defining a file format and not the behaviour of a conforming reader, it falls within the PDF/D philosophy of minimizing the cruft to drop these operators. In the unlikely event that a future version of PDF adds new operators, we can add them back in a future iteration of PDF/D so that conforming writers can use the new operators.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For now, no need for BX and EX: as with PDF/A any operator in a content stream that is not defined in PDF ISO-32000 is considered an error.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/1387994825400156027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/bx-and-ex.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1387994825400156027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1387994825400156027'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/bx-and-ex.html' title='BX and EX'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-8370077308014181292</id><published>2009-02-05T14:16:00.001-08:00</published><updated>2009-03-04T15:38:35.967-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ISO 32000"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><category scheme="http://www.blogger.com/atom/ns#" term="Undefined Behavior"/><title type='text'>Defining the Undefined</title><content type='html'>Despite being such an enormous specification, &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_0&quot;&gt;PDF&lt;/span&gt; ISO 32000-1 still has some holes in it.  Each time I encounter such a scenario I&#39;m going to write about it and start to lock down behavior for &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_1&quot;&gt;PDF&lt;/span&gt;/D. Please correct me if I miss something and if the scenario I&#39;m describing is actually defined.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Empty Object&lt;/span&gt;&lt;/div&gt;&lt;div&gt;The specification does not mention the meaning of empty indirect objects like:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;10 0 obj&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_2&quot;&gt;endobj&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I&#39;ve tried to read between the lines to fathom the meaning of this emptiness but it simply is not defined. An obvious choice would be to treat such an object as the null object. I believe Acrobat Reader does this.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Variations on this theme that &lt;span class=&quot;Apple-style-span&quot; style=&quot;font-style: italic;&quot;&gt;are &lt;/span&gt;defined include an indirect object containing an empty dictionary or an indirect object that is simply the null object:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;11 0 obj&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;&lt;&gt;&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_3&quot;&gt;endobj&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;12 0 obj&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;null&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_4&quot;&gt;endobj&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In addition, indirect references to undefined objects are treated as the null object (7.3.10) and a dictionary entry whose value is null shall be treated the same as if the entry does not exist (7.3.7). &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_5&quot;&gt;PDF&lt;/span&gt;/D&lt;/span&gt;&lt;/div&gt;&lt;div&gt;To simplify the specification and reduce &lt;span class=&quot;blsp-spelling-corrected&quot; id=&quot;SPELLING_ERROR_6&quot;&gt;unnecessary&lt;/span&gt; bloat, the only null object shall be:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:medium;&quot;&gt;null&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Illegal in &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_7&quot;&gt;PDF&lt;/span&gt;/D:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;indirect references to undefined objects&lt;br /&gt;&lt;/li&gt;&lt;li&gt;empty indirect objects&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Best Practices in &lt;span class=&quot;blsp-spelling-error&quot; id=&quot;SPELLING_ERROR_8&quot;&gt;PDF&lt;/span&gt;/D:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In addition, we consider it a best practice to omit an entry from a dictionary rather than to include an entry with a null value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;&quot;Zero&quot; Object&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Another illegal behavior I&#39;ve seen in customer files is indirect references that look like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: medium;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;courier new&#39;;&quot;&gt;... 0 0 R ...&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Sometimes I&#39;ve also seen this object actually defined like:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;0 0 obj&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-family:&#39;courier new&#39;;&quot;&gt;endobj&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ISO 32000-1 clearly states that this is illegal which means it is obviously illegal for PDF/D too. In 7.3.10 the object number is defined as a positive integer. Last time I checked, 0 is not one of the positive integers.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/8370077308014181292/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/defining-undefined.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8370077308014181292'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/8370077308014181292'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/defining-undefined.html' title='Defining the Undefined'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-1382707352083946863</id><published>2009-02-03T11:04:00.000-08:00</published><updated>2009-02-26T14:36:53.732-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="File Structure"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><title type='text'>XRef stream vs xref</title><content type='html'>That didn&#39;t take long! I&#39;ve been urged to compromise on legacy features already.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Members of the PDF/A camp, including one of my software engineers (Sergey), are concerned that dropping the old style xref tables means that no PDF/A-1 file can possibly be PDF/D compliant.  I was looking forward with the hope that PDF/A-2 support would be good enough but they disagree.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, I&#39;ve decided to take a step back and allow old style xref tables to exist in PDF/D but with plenty of constraints:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;only a single xref table (no Prev field in Trailer)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;no hybrid files (no XRefStm in trailer)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;no deleted objects (no f type in the xref table except for the first entry)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;generation numbers always zero&lt;/li&gt;&lt;li&gt;only one section (implies consecutive object numbers starting at 1)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;These simplifications mean that the end of the PDF file will always look like some variant of this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;xref&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;0 4&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;0000000000 65535 f &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;0000000009 00000 n &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;0000000122 00000 n &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;0000000175 00000 n &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;trailer&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;&lt;&lt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;  /Size 4&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;  /Root 2 0 R&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;&gt;&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;startxref&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;226&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-size: small;&quot;&gt;%%EOF&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The other valid PDF/D entries in the Trailer are ID, Info and Encrypt.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As with my earlier PDF/D constraints, incremental updates and the dead objects that come with them are eliminated. So is linearization.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There you have it: the minimum required functionality of old style xref tables to make it possible for PDF/A-1 files to be PDF/D compliant.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/1382707352083946863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xref-stream-vs-xref.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1382707352083946863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/1382707352083946863'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/xref-stream-vs-xref.html' title='XRef stream vs xref'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1383812139713991234.post-5946447208098962666</id><published>2009-02-01T17:18:00.000-08:00</published><updated>2009-02-26T14:36:53.733-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="File Structure"/><category scheme="http://www.blogger.com/atom/ns#" term="Parser"/><category scheme="http://www.blogger.com/atom/ns#" term="PDF/D"/><title type='text'>Parsing PDF/D</title><content type='html'>&lt;div&gt;We’ve spent a lot of energy optimizing the C++ parser behind the Solid Documents products. Our C# parser has also been a learning experience. Once a PDF parser is well optimized, it will always end up spending the majority of the time in parsing numbers. This has proven to be true for both our native and managed code parsers.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Instead of focusing on the parser, I decided take a look at the other half of the problem: the file format itself. What if I could change the PDF format to make it easier to parse? PDF/D will be constraining the features of PDF to a subset so why not also make some improvements that will make parsing not just faster but also more reliable?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;More reliable, you ask?  Yes. Removing multiple ways of doing things obviously has minor performanace benefits but the bigger benefit is simplification of the code needed to deal with multiple variations of essentially the same thing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;EOL&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;PDF defines end-of-line as one or two characters that may be 0x0D, 0x0A or 0x0D followed by 0x0A.  ISO 32000-1 and ISO 19005-1 go to some effort to constrain the end of line characters more tightly surrounding the data of streams.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Why not just define end-of-line as 0x0A and call it good? That would still be 100% PDF/A compliant too. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;WhiteSpace&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At the parser-level, whitespace is a special case of a delimiter for PDF. In string objects, it is data and, including UTF-16BE, there are at least 15 valid data whitespace characters. What I’m talking about here is at the parser-level and not the string objects.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;While we are getting rid of 0x0d as an end-of-line character, we may as well get rid of a few of the whitespace alternatives too. Who needs tab (0x09) and form feed (0x0C) when space (0x20) and our new end-of-line (0x0A) will do just fine?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Comments&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Comments are a pain for the PDF parser developer. They can appear anywhere whitespace is legal and they continue to the next end-of-line. More importantly, who cares? Aside from the pseudo comments used for the PDF file header and end-of-file tokens, comments serve no purpose whatsoever. There are other ways of putting application specific data in PDF files if that was what you were thinking so lets toss comments out too.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Numbers&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The + character will have to go. It adds no value. Most PDF parsers attack numbers first as integers and then switch to a real mode as soon as a decimal point is encountered. Integer parsing is more efficient than real parsing. For this reason, whole numbers should be presented as integers and not as reals. For example, favor 42 over 42. or 42.0&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Strings&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A lot can be done to simplify string parsing. We can start by removing the escaped end-of-line for allowing multiple line strings. In addition, we can drop the idea of “matched parentheses” and simply escape all parentheses.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hex strings are useful for representing byte strings as plain text and little else. Hex strings start with the same delimiter as dictionaries making parsing more complex than if they each used unique delimiters: &lt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Since most PDF files are binary anyway, regular strings can be used to represent byte strings and hex strings are no longer needed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Fixed Formats&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We should fix the format of the header and end-of-file comments. This way we can search for them as strings rather than parsing. Given \n as 0x0A, something like “%PDF-1.5\n%ÿÿÿÿ\n” for the header and “\n%%EOF\n” for the end-of-file marker should be fine.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In addition, we should lock down the syntax surrounding ‘obj’ and ‘endobj’ identifiers so that repairing of damaged PDF files can be done more reliably. For example, “\n\endobj\n\d+ 0 obj\n” makes an easy target for a regular expression search where “d+” is the object number.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, any feedback or input? More ideas for putting a PDF parser on a diet? Comment here or find contact details at &lt;a href=&quot;http://www.pdf-d.org/&quot;&gt;PDF/D&lt;/a&gt;.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.pragmaticpdf.com/feeds/5946447208098962666/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.pragmaticpdf.com/2009/02/parsing-pdfd.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/5946447208098962666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1383812139713991234/posts/default/5946447208098962666'/><link rel='alternate' type='text/html' href='http://www.pragmaticpdf.com/2009/02/parsing-pdfd.html' title='Parsing PDF/D'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>