SoatDev IT Consulting
SoatDev IT Consulting
  • About us
  • Expertise
  • Services
  • How it works
  • Contact Us
  • News
  • February 8, 2024
  • Rss Fetcher

If you save a file as a PDF twice, you won’t get exactly the same file both times. To illustrate this, I created an LibreOffice document containing “Hello world.” and saved it twice, first as humpty.pdf then as dumpty.pdf. Then I compared the two files.

    % diff humpty.pdf dumpty.pdf
    Binary files humpty.pdf and dumpty.pdf differ

That’s curious. Both files are the same size, 7260 bytes, but something is different inside.

Next I dumped both files to hexadecimal and compared the output.

.   % diff  

This produced two ranges of differences. Here's the first:

    >
    ---
    > 00001a60: ... 3339 ...  064339-06'00')>>

The files differ in two consecutive bytes. The ASCII representation at the end of the lines shows what these bytes mean. Apparently these two bytes are part of a time stamp. The first was produced at 6:43:22 this morning and the second was produced 17 seconds later at 6:43:39.

There's another block of differences further down the file. I'll leave out the hex representation of the bytes to save space and just include the positions and the ASCII representation.

    .
     ]./DocChecksu
    >.startxref
    ---
    > 00001bc0: ...  13 0 R./ID [ <7D
    > 00001bd0: ...  1441609E44A5446A
    > 00001be0: ...  8A0F9A4E96FF49>.
    > 00001bf0: ...  <7D1441609E44A54
    > 00001c00: ...  46A8A0F9A4E96FF4
    > 00001c10: ...  9> ]./DocChecksu
    > 00001c20: ...  m /A7A3CD305537E
    > 00001c30: ...  B3DC35BA5EB4678F
    > 00001c40: ...  EDA.>>.startxref

The text DocChecksum jumps out. This looks like a 32-bit check sum. If I had to guess, I'd say it's probably CRC-32. And apparently there's some sort of 32-bit hash before the checksum: CB4 ...B6A in humpty.pdf and 7D1...F49 in dumpty.pdf. This must be some sort of hash. The hash is repeated twice in each file. Maybe this is some sort of versioning information, and the hash is repeated because the initial and final versions of the file are the same.

The fact that the files were saved 17 seconds apart changed two bytes in the timestamps. But changing these two bytes caused the two 32-byte hash codes to change.

Related posts

  • Checksum polynomials
  • Vehicle Identification Number checksum
  • Notes on computing hash functions

The post If you save a file as PDF twice, you get two different files first appeared on John D. Cook.

Previous Post
Next Post

Recent Posts

  • Deel wants Rippling to hand over any agreements involving paying the alleged spy
  • Telecommunications & IT Companies Form the Next Frontier for Growth & Cyber Crime across Africa
  • Bolt Launches Flight Tracking for Seamless Airport Pick-Ups
  • Uber eyes B2B logistics push in India through state-backed open commerce network
  • VUZ gets $12M to scale immersive video experiences across emerging markets and the U.S.

Categories

  • Industry News
  • Programming
  • RSS Fetched Articles
  • Uncategorized

Archives

  • May 2025
  • April 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023

Tap into the power of Microservices, MVC Architecture, Cloud, Containers, UML, and Scrum methodologies to bolster your project planning, execution, and application development processes.

Solutions

  • IT Consultation
  • Agile Transformation
  • Software Development
  • DevOps & CI/CD

Regions Covered

  • Montreal
  • New York
  • Paris
  • Mauritius
  • Abidjan
  • Dakar

Subscribe to Newsletter

Join our monthly newsletter subscribers to get the latest news and insights.

© Copyright 2023. All Rights Reserved by Soatdev IT Consulting Inc.