Lies, Damn lies, and Statistics

About that “most books only sell 12 copies” bunkum, which from all accounts runs perilously close to perjury… it’s statistics that have been mangled worse than most of the folks I see in physical therapy, here’s a look at what’s behind the smokescreen.

Courtesy of J.L. Curtis, We can find a lot more information at Lincoln Michel’s Countercraft substack: https://countercraft.substack.com/p/no-most-books-dont-sell-only-a-dozen

Unlike youtube, where you should never read the comments (Well, Art History by Travis Lee Clark has really heartwarming comments, but that’s one of the exceptions to the rule), this article has some really interesting comments.

Including one from Kristen McLean from NPD Bookscan, which puts a few REAL numbers on what she sees…

Hey y’all, it’s Kristen McLean, lead industry analyst from NPD BookScan. I thought I would chime in with some numbers here, since that statistic from the DOJ is super-misleading, and I’m not sure where it originally came from, since we did not provide it directly.

It is possible it came from our data, and was provided by one of the publisher parties, but based on the 58,000 figure, it’s not obvious what exactly it includes in terms of “publisher frontlist”. 58,000 titles is way too small a number for “all frontlist books published in a year by every publisher”–that’s more like 487,000 frontlist titles–so it’s clear it’s a slice but I’m not sure HOW it was sliced.

NPD BookScan (BookScan is owned by The NPD Group, not Nielsen, BTW), collects data on print book sales from 16,000 retail locations, including Amazon print book sales. Included in those numbers are any print book sales from self-publishing platforms where the author has opted for extended distribution and a print book was sold by Amazon or another retailer. So that 487K “new book” figure is all frontlist books in our data showing at least 1 unit sale over the last 52 weeks coming from publishers of all sizes, including individuals.

Lots of press outlets have been calling about it today, so I did a little digging to see if I could reverse-engineer the citation, and am happy to share our numbers here for clarity.

Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.

The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.

Here is what I found. Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks (thru week ending 8-24-2022).

In this dataset:

>>>0.4% or 163 books sold 100,000 copies or more

>>>0.7% or 320 books sold between 50,000-99,999 copies

>>>2.2% or 1,015 books sold between 20,000-49,999 copies

>>>3.4% or 1,572 books sold between 10,000-19,999 copies

>>>5.5% or 2,518 books sold between 5,000-9,999 copies

>>>21.6% or 9,863 books sold between 1,000-4,999 copies

>>>51.4% or 23,419 sold between 12-999 copies

>>>14.7% or 6,701 books sold under 12 copies

So, only about 15% of all of those publisher-produced frontlist books sold less than 12 copies. That’s not nothing, but nowhere as janky as what has been reported.

BUT, I think the real story is that roughly 66% of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks. (Those last two points combined)

And less than 2% sold more than 50,000 copies. (The top two points)

Now data is a funny thing. It can be sliced and diced to create different types of views. For instance we could run the same analysis on ALL of those 487K new books published in the last 52 weeks, which includes many small press and independetly published titles, and we would find that about 98% of them sold less that 5,000 copies in the “trade bookstore market” that NPD BookScan covers. (I know this IS a true statistic because that data was produced by us for The New York Times.)

But that data does not include direct sales from publishers. It does not include sales by authors at events, or through their websites. It does not include eBook sales which we track in a separate tool, and it doesn’t include any of the amazing reading going on through platforms like Substack, Wattpad, Webtoons, Kindle Direct, or library lending platforms like OverDrive or Hoopla.

BUT, it does represent the general reality of the ECONOMICS of the publishing market. In general, most of the revenue that keeps publishers in business comes from the very narrow band of publishing successes in the top 8-10% of new books, along with the 70% of overall sales that come from BACKLIST books in the current market. (Backlist books have gained about 4% in share from frontlist books since the pandemic began, but that is a whole other story.)

The long and short of it is publishing is very much a gambler’s game, and I think that has been clear from the testimony in the DOJ case. It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year. The big advantage that publisher consolidation has brought to the top of the market is deeper pockets and more resources to roll those dice. More money to get a hot project. More money to influence outcomes through marketing, more access to sales and distribution mechanisms, and easier access to the gatekeepers who decide what books make it onto retailers’ shelves. And better ability to distribute risk across a bigger list of gambles.

It is largely a numbers game and I’m not just saying that because I’m a numbers gal. It’s a tough business.

Hope this is helpful.

***
So there’s some truth behind a tall tale.

If you want other tall tales with some truth behind them, but far more hilarious? My Calmer Half is in this hunting stories anthology:

How NOT To Shoot A Fish and Other Deer That Got Away, available on Amazon!

17 thoughts on “Lies, Damn lies, and Statistics

  1. I think her response would have been improved if she’d gone to at least the news report of the original claim, if not gotten the actual statement from the court record– and I’m a little gob-smacked at the lead industry analyst from BookScan having to got to the New York Times to get the BookScan numbers.

    Now, this:
    For instance we could run the same analysis on ALL of those 487K new books published in the last 52 weeks, which includes many small press and independently published titles, and we would find that about 98% of them sold less that 5,000 copies in the “trade bookstore market” that NPD BookScan covers.

    Looks like it might be fairly close to the original subject of the testimony, with a dropped zero and a different sample-period– for example, at the same trial they mentioned how ….I think it’s Random Penguin House? …. had somewhat recently done a massive downsizing in their romance publishing, and I seem to remember there was some supply chain issues for paper and such, thus fewer books published (and fewer people signing up for a distribution deal on their small-press item, with the in person stores closed)

  2. For anybody who hasn’t seen the claim in the wild, this seems to be the source:
    https://mailchi.mp/hotsheetpub/bn-backlist?e=d650b3b0a0
    DOJ vs PRH: The Key Questions of the Trial
    with the specific claim being:
    during the trial, a couple of depressing statistics were shared: of the 58,000 trade titles published per year, fully half of those titles “sell fewer than one dozen books.” (Not a typo, that’s one dozen.) More broadly, 90 percent of titles sell fewer than 2,000 units. Even a small advance of a few thousand dollars would not earn out at standard royalty rates.

  3. “51.4% or 23,419 sold between 12-999 copies” and “14.7% or 6,701 books sold under 12 copies”

    Two thirds of the new dead-tree books released don’t crack 1000 copies? 70% of sales are the backlist? Are you kidding me? That’s ridiculous.

    That means that The Phantom, aka some random house-painter with a PC and some time on his hands, did pretty good compared to the big publishing houses with all the experts and all the push. I’m in the middle of that 51% pack.

    Frankly, that a newb solo-effort book should be able to do that well is not reasonable. I call shenanigans.

    Could be it be they are not addressing the actual customers? I’d be very interested to know what, particularly, is selling out of their back-lists. How many new books that die are Woke? How many old books that keep selling are Woke?

    I will say this, my trips to various bookstores over the years since around 2012 have revealed shrinking SF/F sections that contain half a book case of Tolkien and a scattering of other old titles.

    https://phantomsoapbox.blogspot.com/2017/09/somebody-called-me-liar-on-interwebz.html

    This post of mine was University of Waterloo main book store in 2017, which had two (2) book cases of SF/F at the time. (One of the Vile666 flying monkeys doubted my word, so I took pictures.) Waterloo is notable for being -the- Engineering school everybody wants to go to in Canada. The nation’s future top technologists and science nerds are educated there. They aren’t buying SF? Really?

    When I went to school dinosaurs roamed the Earth and the SF/F section was a whole row in McMaster University bookstore, SF on one side and F on the other. Just sayin’.

    1. Even if we’re assuming that the data was sliced in a massively stupid manner– basically, did a raw draw of all data posted for ‘new’ books in the last 12 months, including stuff that’s in the system and launched yesterday– that is still pretty….wow.

      1. They must have access to better numbers, one would think. Obviously anything they say in the lawsuit is going to be massaged to fit their argument, but the killer is the 70% of sales is back-catalog.

        That means ALL new tiles are 30% of sales. I see a number like that, if I’m the CEO I’m on the phone firing my editors and beating the bushes for new talent to save my mortally wounded company. But no, they are acting like everything is Just Fine.

        1. Worse, these are with the numbers where we know the backlog folks are told they sold exactly as many books as is required to not get their book rights back.

          And they *don’t* involve buying the books directly from the publisher, like schools do for mass purchasing school books.

    2. ‘12 is about when I started reading eBooks because the local B&N kept reducing it Sf/F section. At the time I think it was down to 3 cases, front and back. Last time I looked, about 2 years ago, it was half that. When I was in high school the Star Trek section by itself was that big.

    1. “51.4% or 23,419 sold between 12-999 copies” and “14.7% or 6,701 books sold under 12 copies”

      Two thirds of the new dead-tree books released don’t crack 1000 copies? 70% of sales are the backlist? Are you kidding me? That’s ridiculous.

      That means that The Phantom, aka some random house-painter with a PC and some time on his hands, did pretty good compared to the big publishing houses with all the experts and all the push. I’m in the middle of that 51% pack.

      Frankly, that a newb solo-effort book should be able to do that well is not reasonable. I call shenanigans.

      Could be it be they are not addressing the actual customers? I’d be very interested to know what, particularly, is selling out of their back-lists.

  4. By the corrected numbers, I’m in the top 11% of sellers, and I don’t do much print and I don’t do a lot of advertising. So someone like Christopher Nuttal, Larry Correia, or Chris Kennedy is in the elite of the elite. Note that two of those are indie/tiny press publishers, and Larry is Larry.

    1. Nah, Larry Correia is just a D-list author. 😀

      He’ll tell you so himself. See Larry’s List if you doubt it.

      After all, he merely built his Evil Lair on Yard Moose Mountain with the meager proceeds of his writing habit.

      1. I have seen that list. I’m thinking maybe he needs to take another think regarding his position on there, given the -amazingly bad- numbers listed above.

        All new books are 30% of annual sales? 3.7% or about 1500 titles break 10K sales? That puts him firmly in the top 2% or better. Unicorn!

  5. “It’s all a front for moneylaundering” moves up in probability….

    They are not counting direct sales to libraries and library systems, or direct paper sales through Amazon, or ebook sales of any kind. Ridiculous.

      1. The most expensive book I’ve gotten through Inter-Library Loan, and that was still in print, was a study on the background of international water law. $800, and 50 libraries had copies, fewer would allow it to leave the reference section. However, it is one of those books that if you need it, you really, really need it. Like the $1200 (and now out of print and in need of updating) petro-geology guide to the Iranian Plateau and surrounding areas.

Comments are closed.

Up ↑

%d bloggers like this: