The Road to Digital Publication – Part 5

by Amanda S. Green

Before we go into the actual upload process, including choosing where to sell your e-books and how to choose the best meta tags and description, let’s take a step back to talk about the steps leading up to conversion. I know it goes without saying, but back up your work to multiple locations throughout the creation process. You never know when a hard drive or thumb drive or external hard drive will fail. But there’s another reason why as well, one I was forcibly reminded of yesterday as I was working on converting a title.

Let’s start with a question. How many of you pay attention to the file size of your short story or novel? We are a peoples that have been trained to think of length in terms of pages, not bytes. But as writers, especially as writers who are thinking about self-publishing our own e-books, we need to train ourselves to think in terms of bytes. For one thing, most outlets do have a maximum upload size. For another, unless you are doing your own html coding from a text editor and your work has never seen  the inside of a word processing program, you need to know what the “norm” is for those times when nasty code fragments attack.

No matter what word processing program you use, there is going to be background code written into it. Unlike with pure html coding where you set the style at the very beginning of the document with just a few lines of code, most word processing programs set it for every paragraph, including every detail and a lot that’s not needed. That can lead to hundreds, even thousands of lines of code that can and often do go wrong if you aren’t careful.

Compounding the problem is what happens when you write in one program, open and edit in another, then go back to the original. This is what happens sometimes when you send your manuscript out to beta readers or someone who is helping edit before you send your manuscript out or before you publish it yourself. What can happen at this point is that the different programs have now put conflicting or even duplicate coding into the background of your document, bloating its size.

I’d known that moving between Word and Word Perfect and Open Office can screw with the visible formatting of a document. Sarah has been on the receiving end of my rants about what happens when you mix Word and Word Perfect and Kate and I have commiserated about the Word and Open Office issues since we work in both Windows and Linux. But I hadn’t really thought about what was going on in the background.

Until yesterday.

Yesterday, I was working with a file that had come back from one of our editors. On the surface, everything looked great. I made a couple of small changes, made sure all the legal stuff was there and pulled up the cover to insert into the process. Saved the working file as both a final version DOC file and as a filtered HTML file (more on this later) and tried importing into Sigil, tried being the operative word.

For the first time ever, Sigil choked. It sputtered. It stopped working. So I did a forced shutdown of the program and tried again. Same result. I scratched my head. Checked to make sure I was importing the right file. Tried again. After all, third time’s the charm, right?


So I started looking at the file data. And almost dropped my teeth. The filtered HTML file was over 10 mb. WHAT?!? No way should it have been anywhere near that large. This was a novel, for Pete’s sake.  Less than 150,000 words. No interior illustrations. That meant something was not right.

Still stunned by the size of the HTML file, I checked the size of the DOC file I’d been using. Yep, you guessed it. It was too large as well. It came in weighing more than 3.5 mb. Time to check the original files. The author’s file came in at 787 kb. The file I’d sent out was 789kb. Something was very wrong.

I looked at the code and quickly decided there was no way I was going to have time to go through every line of word processing code from different programs to find out what had slipped in. That left me with two options:

  • take the editor’s file, save as a TXT file, import into a text editor and then hand code it completely. This is, honestly, my preference. However, time constraints too often keep me from doing it any longer, or
  • take the editor’s file, save as a TXT file, import into a text editor and save again — getting rid of the last of the junk code — and then open it again in a word processing program to do a quicker, if dirtier, formatting for conversion.

As I said, I prefer working with pure html/css code and style. However, after having been out of the office for the better part of two weeks, and that after having been out earlier for another emergency, I didn’t have time to hand code a novel and still get to everything else I need to do this week. So I opted for the second choice – and it confirmed my suspicions. Even though it didn’t show on a visual check of either the word processing file or the underlying html code, additional “stuff” had been coded in.

How did I know? Very simple. When I saved out as a TXT file, all formatting should have been lost. Oh, returns were still there, but indents, italics, bolds, page breaks, header styles, all that should have been gone. So imagine my surprise when I opened the TXT file back up in Word and there were 3 to 5 spaces at the beginning of every paragraph. Or 1 space at the beginning of every chapter title. None of which were present in any of the previous files. There were other anomalies as well. EEK. That meant I had to go to every paragraph, remove the spaces, set the first line indents and then search and set header styles, font styles, etc. Tedious work.

And it is done. I’ll take another look at the file this morning to make sure my cat didn’t add anything interesting to it last night. You never know what a cat will do when your back is turned. Was it worth all the work? Absolutely. The file size now is where it should be. Including cover image, small images at every scene break, front and end material, the file weighs in at 1.15 mb for the HTML, one-tenth of the previous HTML file size. The DOC size is basically the same. Good so far. Even better, no problem importing the HTML into Sigil.

If I had the time to hand code, the file would be even smaller, but not by much.

The lesson, watch your file size. If it starts jumping dramatically without you having added anything to it, you have a problem. If you’ve been working between different programs, or if you’ve sent it out to someone and they are using a different program, that’s often the cause.  It is also why you want to make sure you have at least one backup of your work at each phase of the process. That way you can check size and formatting, you can work from one while checking it against what you received back from someone.

Most of all, don’t be afraid to roll your sleeves up and dig into the underlying code when you see a problem.  If you don’t see the answer there and the problem still exists, step away for a bit and then come back. If you still don’t see the answer and there’s no one you can grab to ask, follow these steps:

  • Save as a TXT file
  • Open the TXT file in your text editor
  • hand code your HTML here and save
  • preview in your browser to make sure your coding is right
  • save again as HTML or HTML filtered,


  • Save as a TXT file
  • Open the TXT file in your text editor
  • save the file again as a TXT file
  • open in your word processing program
  • set first line indents
  • set font — type and size
  • set your heading styles for chapter headers
  • insert page breaks at the end of each chapter
  • go back and put in all italics and bolds you used in the original file
  • save as a DOC file
  • save as an HTML filtered file

Once you are satisfied with your HTML file, import it into Sigil (or your preferred conversion program. You can also upload to most outlets now. However, I recommend waiting until you’ve converted to whatever program that outlet sells. This lets you see what your product will look like in their e-reader program. It is also another chance to check for issues with your conversion process.) Save as an EPUB in Sigil (or the appropriate format in the program you are using). Then go in and set your meta tags, etc.

The moral of the story is that if you write in one word processing program and your betas or editors work in another, don’t work with the file that has been through the different programs. Most times, you’ll have no problems. But there is always that one time where it will blow up in your face. There is a reason why most legacy publishers and agents who do in-file editing and comments request you use Word. It does cut down on the conflicts between programs. But it also cuts down on the potential for coding conflicts.

Next time, I’ll go into the upload process, meta tagging, blurbs and descriptions and one last quality check of your work.

Edited to add:

One more word of caution. Whether you are writing, editing or acting as beta reader, be sure you turn off smart quotes in your preferences. I’d also add turning off all auto-correct options because they can cause issues. But smart quotes are a big problem and you’ll save yourself a lot of grief if you turn them off and never, ever turn them back on.

11 thoughts on “The Road to Digital Publication – Part 5

    1. Actually, the biggest problem is getting my head back into the game. I’d forgotten how mentally and emotionally exhausting waiting to hear from doctors, worrying about family, etc., can be. And yes, I could have sent this back to the tech to do, but I wanted to try to track down the problem so it wouldn’t happen again.

      1. Well, dealing with “software issues” is more “fun” than dealing with family medical problems and/or family deaths. [Sad Smile]

        I’d prefer to be dealing a major coding problem than dealing with Mom’s problems. (Yes I’m a little grumpy right now).

        1. Paul, sorry you’re having to deal with problems. Been there, been doing that and completely understand the grumpy. Take care.

  1. I am switching from open office to word. (I got a pc tablet for school and am using the handwriting functions of microsoft office so…)
    Im not looking forward to setting up my initial manuscript template and finding out where to turn stuff off. I’ve got a word counter on my open office template that i found once and could never find again 😉

  2. It doesn’t always work but usually save as RTF. Then load the RTF then resave as doc (or docx). What I did for Save the dragon was export from Opern Office, which was a lot cleaner HTML that MS word, and then run a custom clean up program to sort out the places where OO had gone mad (OO had a tendency to have redundant styles at the ends of paragraphs).

    1. The problem with that is that some RTF programs have a lot of hidden junk as well. I know. I’ve had it bite me before. No, the problem with the file in question was that it had gone through so many different programs and operating systems it was well and truly screwed. It doesn’t happen often, but it does some times. But thanks for the advise.

      1. And then there’s “Track changes” which is guaranteed to screw things up. Even if you share between the *same* versions of word!

        1. Oh yeah. At least you can turn that off after you’ve accepted/rejected and save out again as another file to help clear out that sort of junk. Another issue comes when someone takes a DOC file, uses Calibre to convert to RTF and then someone else opens in Word or OO. The amount of added code is amazing.

Comments are closed.

Up ↑

%d bloggers like this: