The Dangers of Preprint Servers

Now that I have moved (at least partially!) into academic administration, my colleagues ask for advice on publishing strategy.  A situation has occurred with one of my colleagues that has made me question my understanding of precedence of research results.  I’d love some feedback to help me understand what went wrong here.

My colleague, call him R1, proved a couple theorems in a fast-moving subfield of optimization.  He wrote up the results and on March 1 submitted the paper to The Slow but Prestigious Journal of Optimization, which I will call SJ (the characters get confusing, so the inset Cast of Characters may help).  He also posted the paper on the well-known eprint servers Optimization Online  and ArXiv (OO/A).  The paper began its slow and arduous thorough refereeing at SJ.blog_post

On August 1, R1 received a perky email from researcher R2 with a paper attached saying “Thought you might be interested!”.  The paper contains a subset of R1’s results with no reference to R1’s work.  This is not a preprint however, but an “article in advance” for a paper  published in Quick and Fast Journal of Optimization, QJ.  QJ is a journal known for its fast turn-around time.  The submission date of R2’s work to QJ is March 15 (i.e. two weeks after R1 posted on OO/A and submitted to SJ).

R1 lets R2 know of his paper, pointing to OO/A.  R1 never hears anything more from R2.

R1 contacts the editors of QJ suggesting some effort be made to correct the literature with regard to the precedence of this work.  QJ declines to change R2’s paper since it has already been published, and the large commercial publisher (LCP) does not allow changes to published articles (and, besides, R2 won’t agree to it).

OK, what about publishing a precedence acknowledgement in the form of a letter to the editor?  I find this somewhat less than satisfying since the letter to the editor is separate from the paper and no one reads journals as “issues” anymore.  But at least QJ would be attempting to correct this mess.  And here is where I get both confused and outraged.  The editor’s response is:

Also, during consultations with [LCP]’s office, it became clear that LCP does not approve of publishing a precedence acknowledgement towards a paper in public domain (preprint server). I hope you would agree that the fact that a paper is posted on a preprint server does not guarantee its content is valuable or even correct – such (partial) assurances can be obtained only during peer-review process.

Hold on, what?  QJ and LCP are saying that they will ignore anything that is not in a peer-reviewed journal!  R2 does not have to say anything about R1’s result since it has not been refereed.  Further, unless R1 gets the paper published in SJ with the March 1 submission date, QJ will not publish a precedence acknowledgement.  If the paper gets rejected by SJ and my colleague then publishes in Second Tier Journal on Optimization, clearly the submission date there will be after QJs date so R2 takes precedence.  If the paper doesn’t get published, then R2 and QJ will simply act as if R1 and OO/A do not exist.

I find this situation outrageous.  I thought the point of things like OO/A are to let people know of known results before journals like SJ finish their considered process of stamping their imprimatur on papers.  If the results are wrong, then following authors at least have to point out the flaws sometime during the process.

Now I don’t know if R2 saw R1’s paper at OO/A.  But if he did, the R1’s posting at OO/A at least warned him that he better get his paper submitted.  Of course, R1’s paper might have helped R2 get over some roadblocks in R2’s proof or otherwise aid him in finishing (or even starting, though there are no overt signs of plagiarism) his paper.  But it seems clear there was absolutely no advantage for R1 to post on OO/A, and clear disadvantages to doing so.  R1 would have been much better served to keep his results hidden until acceptance at SJ or elsewhere.

This all seems wrong.  R1 put out the result to the public first.  How did R1 lose out on precedence here?   What advice should I be giving colleagues about this?  Here is what I seemed to have learned:

  1. If you don’t have any ideas for a paper, it is a good idea to monitor OO/A for results.  If you find one, quickly write it up in your own words and submit it to QJ (but don’t post on OO/A).  If you get lucky and the referees miss OO/A (or follow LCP’s rule and ignore anything not in the refereed literature), then you win!
  2. Conversely, if you have a result, for God’s sake, don’t tell anyone.  Ideally, send it to QJ who can get things out fast.  If you must, submit it to SJ but don’t post the preprint, present it at INFORMS, or talk about it in your sleep.

This all seems perverse.  How should I think about this?  Has anyone faced something similar?  Does anyone see a satisfactory resolution to this situation?  And, for those on editorial boards, does your journal have policies similar or different than that of LCP? Is this ever discussed within journal boards?  Is all this a well-known risk?

 

Which Average do you Want?

Now that I am spending a sentence as an academic administrator in a business school (the Tepper School at Carnegie Mellon University), I get first-hand knowledge of the amazing number of surveys, questionnaires, inquiries, and other information gathering methods organizations use to rank, rate, or otherwise evaluate our school. Some of these are “official”, involving accreditation (like AACSB for the business school and Middle States for the university). Others are organizations that provide information to students. Biggest of these, for us, is Business Week, where I am happy to see that our MBA program went up four positions from 15th to 11th in the recent ranking. Us administrators worry about this so faculty don’t have to.

Responding to all these requests takes a huge amount of time and effort. We have a full-time person whose job is to coordinate these surveys and to analyze the results of them. Larger schools might have three or four people doing this job. And some surveys turn out to be so time-intensive to answer that we decline to be part of them. Beyond Grey Pinstripes was an interesting ranking based on sustainability, but it was a pain to fill out, which seems to be one reason for its recent demise.

As we go through the surveys, I am continually struck by the vagueness in the questions, even for questions that seem to be asking for basic, quantitative information. Take the following commonly asked question: “What is the average class size in a required course?”. Pretty easy, right? No ambiguity, right?

Let’s take a school with 4 courses per semester, and two semesters of required courses. Seven courses are “normal”, classes run in 65 student sections, while one course is divided into 2 half-semester courses, each run in 20 student seminars (this is not the Tepper School but illustrates the issue). Here are some ways to calculate the average size:

A) A student takes 9 courses: 7 at 65 and 2 at 20 for an average of 55.
B) If you weight over time, it is really 8 semester-courses: 7 at 65 and 1 at 20 for an average of 59.4
C) There are about 200 students, so the school offers 21 sections of 65 student classes and 20 sections of size 20 for an average of 43.

Which is the right one? It depends on what you are going to use the answer for. If you want to know the average student experience, then perhaps calculation B is the right one. An administrator might be much more concerned about calculation C, and that is what you get if you look at the course lists of the school and take the average over that list. If you look at a student’s transcript and just run down the size for each course, you get A.

We know enough about other schools that we can say pretty clearly that different schools will answer this in different ways and I have seen all three calculations being used on the same survey by different schools. But the surveying organization will then happily collect the information, put it in a nice table, and students will sort and make decisions based on these numbers, even though the definition of “average” will vary from school to school.

This is reminiscent of a standard result in queueing theory that says that the system view of a queue need not equal a customer’s view. To take an extreme example, consider a store that is open for 8 hours. For seven of those hours, not a single customer appears. But a bus comes by and drops off 96 people who promptly stand in line for service. Suppose it takes 1 hour to clear the line. On average, the queue length was 48 during that hour. So, from a system point of view, the average (over time) queue length was (0(7)+48(1))/8=6. Not too bad! But if you ask the customers “How many people were in line when you arrived?”, the average is 48 (or 47 if they don’t count themselves). Quite a difference! What is the average queue length? Are you the store or a customer?

Not surprisingly, if we can get tripped up on a simple question like “What’s your average class size?”, filling out the questionnaires can get extremely time consuming as we figure out all the different possible interpretations of the questions. And, given the importance of these rankings, it is frustrating that the results are not as comparable as they might seem.

Sports with a vague Operations Research connection

It is pretty clear that academic administration and blogging are perfect substitutes, at least in regard to time, if not satisfaction.  After having an easy period earlier in the year when I racked up a dozen blog posts, administrative needs sucked up all my time, leading to the buildup of dust-bunnies at Ye Olde Blog.  But it is the end of term, so perhaps I can get things cleaned out.

Let me point out two recent sports-oriented items.  First is a fascinating dynamic map from Slate showing the winning of sports championships in the four major US sports (football, baseball, hockey, and basketball).  The progression is fascinating, and the graphical display gives far more information than the static listing does.  It is a great example of the value of visualization, even if I can’t quite figure out what the value is.  The graphic to the left shows a particularly good year:  1979 when Pittsburgh really was “The City of Champions”.

Second, there were two good articles on sports scheduling.  The first was on NFL scheduling in the New York Times.  Lots of people sent me this, since I’m part of the group that does Major League Baseball Scheduling.  The article does a great job of talking about all difficulties there are in agreeing on a schedule. Ironically, some of these difficulties come from the ease at which it is possible to get NFL schedules.  When it is possible to ask “What if we had Pittsburgh play New England in week 3?” and get back appropriate schedules quickly, it is tempting to ask a near-endless set of questions.  Particularly when there are many interested parties and no particular rules for aggregating preferences.

Baseball scheduling doesn’t provide the same quick response.  Due partially to the size of the schedule (2430 games or 780 series rather than the NFL’s 256 games) but due mainly to the scheduling difficulty of “good trips” (an issue of minimal importance to the NFL since teams return home after almost every game), the turn-around time on MLB schedules is measured in days or weeks, not minutes or hours.  Which brings me to the second article:  an article in the LA Times on baseball scheduling.  It even quotes my partner Doug Bureman:

Bureman, whose company also does the scheduling for several major-college conferences, summed up the job this way:

“We’re kind of in the business of seeking perfection, knowing that you’re never going to get there.”

That is for sure:  we are a long way from perfection!  But this year has been fascinating due to realignment issues:

All of this gets even more jumbled in 2013 when MLB realigns, with the Houston Astros moving to the American League and both leagues having 15 teams. (Currently there are 16 in the NL, 14 in the AL.) Interleague games will then be spread through the season instead of being bunched together around midseason as they are now.

Feeney and her group are currently working on that 2013 schedule, and have found it to be quite a challenge. “We’re still struggling with the format,” she said.

For a sports scheduler, this “struggle” is a once-in-a-lifetime opportunity, and it has been tremendously fun and interesting to work out how that format might work.

In between bouts of academic administration!

 

The Importance of Accurate Data

I have been spending the last couple of weeks assigning faculty to courses and helping staff think about scheduling issues. I wish I could say that I have been using operations research techniques to do this sort of work. After all, most of my work has been in some form of timetabling optimization. But that has not been the case: for the most part I have simply done the work manually. Partially this is because I inherited a schedule that was 90% done, so I was really in a “rework” phase. But the main reason is that I am new at this job, so I don’t really understand the constraints (though I think I have a pretty good idea of the objective and variables). Gene Woolsey of the Colorado School of Mines had the philosophy that his students had to go out and do a job before they could do any modeling or optimization. So students worked production lines or helped drivers deliver packages first. Only after spending a few weeks on the job, could they think about how operations research could improve things. If I was Gene’s student, I would definitely pick an application in sports or entertainment rather than, say, high-rise steelwork.  For now, I am emulating that approach by first handling the courses manually then thinking about optimization.

Doing the course assignment and scheduling has been eyeopening, and a little worrisome. Just as I worry at the beginning of the season for every sports league I schedule (“Why are there three teams in Cleveland this weekend?”), I worried over the beginning of the fall term as the first of my assignments rolled out. Would all the faculty show up? Would exactly one faculty member show up for each course? Oh, except for our three co-taught courses. And …. etc. etc.

It turns out there is one issue I hadn’t thought of, though fortunately it didn’t affect me. From the University of Pennsylvania (AP coverage based on the Under the Button blog entry):

PHILADELPHIA (AP) — University of Pennsylvania students who were puzzled by a no-show professor later found out why he missed the first day of class: He died months ago.

The students were waiting for Henry Teune (TOO’-nee) to teach a political science class at the Ivy League school in Philadelphia on Sept. 13.

University officials say that about an hour after the class’s start time, an administrator notified students by email that Teune had died. The email apologized for not having canceled the class.

I hadn’t thought to check on the life status of the faculty.  I guess I will add “Read obituaries” to my to-do list.

Teaching and Research

For the last few years, I have been dabbling in academic administration, first as Associate Dean for Research and now as Senior Associate Dean, Education here at the Tepper School of Business.  While there are frustrations in this position (“There are how many courses not covered?  And are all the adjuncts on vacation in Aruba now?”), some aspects are wonderful.  Working with new faculty is a great pleasure,  a pleasure that alone almost offsets the hassles.  I love the excitement and the energy and the feeling that anything is possible.

This was easy on the research side of the organization:  my job was to create a great research environment (subject to resource constraints, of course!), and that was very rewarding to do.  On the education side, my job is a bit different.  While some faculty love teaching, for others it seems to take time away from what they really want to do: research.  How can they do any research if they have to do any teaching?

Teaching is hard, and takes time and energy.  Does it take time away from research?   While I can talk to new faculty about how teaching and research intersect, and how one builds on the other, I can see a fair amount of eye-rolling.  Of course, I would say that:  that’s my job!  And when I explain that the entire “sports scheduling” part of my career happened due to an offhand conversation with an MBA student, the response is a mixture of “That’s what I have to look forward to?  Sports Scheduling?” and “Sure, teaching might be OK for practical types, but what about us theory types?”

Thanks to a colleague (thanks Stan!), I think I now have the perfect riposte.  This is from Richard Feynman‘s “Surely you’re joking, Mr. Feynman!”:

I don’t believe I can really do without teaching. The reason is, I have to have something so that when I don’t have any ideas and I’m not getting anywhere I can say to myself, “At least I’m living; at least I’m doing something; I am making some contribution” — it’s just psychological.

When I was at Princeton in the 1940s I could see what happened to those great minds at the Institute for Advanced Study, who had been specially selected for their tremendous brains and were now given this opportunity to sit in this lovely house by the woods there, with no classes to teach, with no obligations whatsoever. These poor bastards could now sit and think clearly all by themselves, OK? So they don’t get any ideas for a while: They have every opportunity to do something, and they are not getting any ideas. I believe that in a situation like this a kind of guilt or depression worms inside of you, and you begin to worry about not getting any ideas. And nothing happens. Still no ideas come.

Nothing happens because there’s not enough real activity and challenge: You’re not in contact with the experimental guys. You don’t have to think how to answer questions from the students. Nothing!

In any thinking process there are moments when everything is going good and you’ve got wonderful ideas. Teaching is an interruption, and so it’s the greatest pain in the neck in the world. And then there are the longer period of time when not much is coming to you. You’re not getting any ideas, and if you’re doing nothing at all, it drives you nuts! You can’t even say “I’m teaching my class.”

If you’re teaching a class, you can think about the elementary things that you know very well. These things are kind of fun and delightful. It doesn’t do any harm to think them over again. Is there a better way to present them? The elementary things are easy to think about; if you can’t think of a new thought, no harm done; what you thought about it before is good enough for the class. If you do think of something new, you’re rather pleased that you have a new way of looking at it.

The questions of the students are often the source of new research. They often ask profound questions that I’ve thought about at times and then given up on, so to speak, for a while. It wouldn’t do me any harm to think about them again and see if I can go any further now. The students may not be able to see the thing I want to answer, or the subtleties I want to think about, but they remind me of a problem by asking questions in the neighborhood of that problem. It’s not so easy to remind yourself of these things.

So I find that teaching and the students keep life going, and I would never accept any position in which somebody has invented a happy situation for me where I don’t have to teach. Never.

If not teaching ruined the minds of those at the Institute for Advanced Study, imagine the effect on us mere mortals!  So teach, already!  And if you want to teach a bit extra, I happen to have a few courses that need to be covered….