Wednesday, March 01, 2006

Write this way

Note: I wrote this to my fellow grad students in Integrative Biology at Berkeley. If you're not a biologist, I may not be talking to you. But feel free to read on and find out.


We write at least two kinds of papers. There are conceptual/theoretical papers, which push things forward by advancing new ideas, and data-heavy papers, most of which are written to either validate or contradict ideas already in existence--often, ideas that were first floated in conceptual/theoretical papers. And of course there are papers that do both, but I think there are relatively few of those; that is, a lot of data papers are prostituted in a skimpy cloak of buzzwords, but few present genuinely new ideas. At least in my field. Your mileage may vary.

That's nothing new, and if that's all I had to say I wouldn't have written this.

The two kinds of papers get written differently, too. Conceptual papers can sometimes be banged out in an afternoon, especially if the ideas have been knocking around the author's head for a while. You already know what the other kind are like, because you're writing them right now. To take a substantive data paper from conception to submission in under a month is almost inconceivable; to do the same in under a year is still remarkable. But I think most concept papers should be written and submitted as fast as possible; if it takes more than a month to make that happen, the ideas are probably getting stale (or maybe we're just unused to the idea of moving so fast).

Maybe we should think about publishing the two kinds of papers differently. By the time a data study has been completed and written up as a data paper, any contained ideas are years old anyway. The delays inherent in the current system of academic publishing are irritating but not crippling. But they might be, for many conceptual papers. One of the great lessons of life is that if you thought of it, someone else could, too. The closer you are to the cutting edge, the more you should worry about someone else beating you into print. Another way of putting that is, if you're not worried about someone beating you into print, maybe you should be working on something more important (then again, maybe you're so far ahead of the field that you don't have to worry about immitators).

Fast-track journals are not ideal for data papers. It takes too much stuffing to get all those details into three or four pages. Online publication loosened the girdle, allowing the jiggling fat of data to spill out as supplementary information, which is often many times longer than the printed paper. But Science and Nature are the gatekeepers to the land of jobs and tenure, and we all know it.

In a rational system we might keep fast-track journals mainly for conceptual papers, or for data papers that are most urgent and will suffer least by being condensed, and "regular" journals for most data papers, and everyone would understand that the two kinds of journals were not to be segregated by importance of work published therein. But I suspect that it wouldn't work. As long as there are fast-track and slow-track journals, everyone will want to be in the fast-track journals, and that yearning will drive the economy of scientific publication.

What if we did away with journals entirely? Ever heard of arXiv?

If you have, don't worry, this will be short. If you haven't, arXiv is an e-print archive for mathematical and scientific papers. It's geared towards math, phyiscs, and computer science, but I suspect that's an artifact of history; there's no reason arXiv wouldn't work for any field*. Most papers in those fields are "published" on arXiv as soon as they're written, usually in advance of or concurrent with submission to a journal (there is a minimal amount of screening to make sure the papers aren't complete garbage). This has had a big effect on what journals in those fields are for. Instead of being primarily for the dissemination of knowledge, the journals are now mostly sources of status. They confer a sort of legitimacy on the papers they publish. Meanwhile, the rest of the field has long since digested the new information and moved on. The role of "I had the idea first and I can prove it!" claim-staking has moved from the journals to arXiv--which, given the number of abuses of the peer-review system that I know of, has got to be a good thing.

* Some divisions of the humanities may never adopt such a system, either because their members lack the technical competence, or because instant distribution of papers would only highlight their absence of content.

What if "I had the idea first and I can prove it!", henceforth called creativity credit, expanded to take in blogs? It's not as crazy as it might sound. People are already citing physics and math blogs in the comments on arXiv. To my mind, it's only a matter of time before links to blogs are included in the arXiv papers themselves, and once they're in arXiv there is probably no barrier to getting them into print.

I'm sure that each of you is sitting on an idea or two, or maybe a dozen, that you haven't told anyone about. You haven't had time to do the work, but the idea is promising enough that you don't want to just give it away, either. Maybe you'll get to it yourself someday. Maybe you'll farm it out to a friend, collaborator, or a grad student of your own.

But maybe you won't. Maybe you'll never get around to doing anything with it, or maybe someone else will take the idea to fruition in the meantime. What a waste! Either the idea sits around, going unused by the community, or someone else has it and does the work and takes the credit. Wouldn't it be nice to get creativity credit without having to do the work? It would be sorta like giving the idea to one of your students, only better, because you could give it to _any_ student, and you'd get some credit without having to force your name onto a paper that your student wrote (I detest advisors that do that, I hope you're not working with one, and I hope you don't become one).

For example, open any ornithology textbook and you'll read that birds lost their teeth to save mass and improve flight performance. What a load of crap! Enamel is dense stuff, but compared to the mass of the whole body the mass of Archaeopteryx's teeth was trivial. Never mind than an entire radiation of Cretaceous birds had teeth and did just fine with them, or that a lot of extant birds have monster beaks that weigh many times what their ancestral teeth did anyway. As far as I can tell, every group of vertebrates that ever evolved a beak got rid of their teeth shortly thereafter (ornithischian dinos did keep their teeth--but not in the beak part of the jaw). I don't know why birds traded teeth for beaks, but it seems obvious that they did, and that it had nothing to do with making their heads lighter. Admittedly, I haven't done a lot of heavy lifting here, but if the unsupported-and-obviously-wrong hypothesis is good enough to be textbook boilerplate, then my unsupported-but-probably-right hypothesis is surely good enough to publish as is. But I haven't published it, and neither has anyone else. I assume no one else has published it either because they haven't noticed it, or because (like me) they've got better things to do than scrounge up the data to support it.

Some of you are probably thinking, "Fuck you! If you're not willing to do the work, why should you get credit for the idea?" It's not that I don't value grunt-work, or producing data papers. I do. Generating reams of data is often a good way to discover new ideas, and publishing those reams of data is good science, because everyone can play with the data and maybe come up with surprising new insights.

But if there's no credit for ideas themselves, people have no incentive to distribute them. We're back to sitting on our ideas until a likely grad student comes a decade or two.

What I'd like is for ideas and work to be rewarded differently, and independently. I think that distribution of new ideas instantly, by way of some sort of Bio-arXiv, which might in turn link to or encompass the time-stamped blogs of everyone in the field, would be good for several reasons.

1. It would be a great source of project ideas for new students or for people looking for a change of pace. I've always admired people who list outstanding problems at the end of their papers, for just that reason.
2. It would be a good idea-sorter. If all the ideas currently in existence were freely available to everyone, and some of them still weren't being worked on, it might indicate that they're just not very interesting. Or maybe that they're extremely interesting, but hard to tackle empirically. You see? This would give a whole new way to parse the big ideas in our fields and pick out the promising avenues from the less promising.
3. It would be less wasteful, of time and effort. No more waiting years for new ideas to emerge. No more sinking years into a project only to find out that someone else had already solved the problem. And--hopefully--no more good ideas and good data languishing in unpublished theses and dissertations. With an arXiv-like system, filing and e-publication could be synonymous.

I had a few other pros lined up, but it's late and they've already slipped out of my mind. Too bad I didn't arXiv them. :-)

Finally, I don't think the question is, "Will biology catch up with arXiv-world? It's not even, "When?" It's "What am I going to do to get ready for the transition?"

Here's a tip: start writing down your ideas.

Responses welcome.


