xparrot:

So out of morbid curiosity I tried Tumblr’s export feature to download my blog. Pressed the button, waited. After a few hours I got an email it was ready.

For my relatively modest blog of ~5K posts/reblogs, it produced a zip file of about 12 GBs. It didn’t say how big the file was when downloading; I just had to wait until it was done. Once downloaded, Windows 10′s native zip management couldn’t handle it, insisting it was a broken archive. An ancient dusty install of 7Zip popped it right open, though.

Inside were two folders and one .xml file:

“Media” clocks in at 12 GBs and consists of 14K gifs, jpgs, pngs, mp3s, mp4s, and movs. “Posts” has a folder of individual stripped-down HTML docs of every post, plus an xml doc that also seems to be every post in a single 21MB file. (This doc does appear to include my 800 draft posts.) And messages is all your messages/chats (which I admit is nice to have a backup of, though the xml is of course unreadable without some kind of reader).

But the best, the BEST (sarcasm level 8) part of this is that all those HTML/XML post files? They link back to the files ON TUMBLR. They don’t have internal links to the files in that Media folder. So when you open one of your downloaded posts in your browser, the images you’re seeing are from Tumblr’s servers – for as long as those images are posted and you’re online.

So this reblogged post, ‘capped from my downloaded archive:

uses this image:

 <img src="https://66.media.tumblr.com/efd7a566507f52caf565b2f6a1fe2dc1/tumblr_pik7a6RyYO1qkusc6o4_500.png "/>
                           </p>

Meanwhile files in the Media folder are just numbered by the post ID number, not those strings. It would be possible, I’m guessing, to run a conversion, to switch all the media links to using the ID numbers to link to what’s in the media folder – but that’s beyond my limited regex skills. Without that, you’re left with a pile of media with no organization whatsoever, and a folder full of posts with broken media.

In conclusion – you’re probably better off using a 3rd-party downloader.

(Things like this make me wonder if they’re not trying to repackage the site for advertisers at all but are just trying to kill it quick. Then again, trying to ascribe any kind of firm rational motive to whoever’s in charge of this feels like accusing a clogged toilet of having an agenda…)