Main Page
From Journal of Computational Linguistics Research
Contents |
JCLR/Enhanced-CL Discussion Page
Following discussion on the NLPers blog, this page is to discuss a possible proposal to the ACL (due June 4) for an open-access journal in NLP (informally, "JCLR", modeled after JMLR). You will need permissions from me to edit this; email me at 'me-at-hal3-dot-name' for an account.
Stuart Shieber suggests just making CL an open access journal. I'm 100% behind this. I think we should still discuss what we would want CL to look like if we have free reign.
Here are some big questions I think we should try to answer:
Scope
I see the scope roughly as comprising the *ACL conferences, HLT, EMNLP, ICJNLP, RANLP, and CoNLL (and apologies to any obvious conference I left off). This is somewhat different from the other journals in the area (CSL and ACM TSLP has more speech stuff; NLE tends to be more applied; etc.).
Differentiation / Motivation
Why is this needed? I can see a few big things that I would like to have happen:
1. Online discussion boards
2. Easy submission of supplementary materials. (ryanmcd) This might also include supplementary material not from the authors themselves. For example, I just got a short manuscript by email where someone took my parser reduced the model size to 1% the original and got the same parsing accuracy. These of course would need to be refereed (ryanmcd).
3. VERY fast turn-around time
4. (ryanmcd) A recommender system based on previously viewed articles.
5. (ryanmcd) Most viewed articles. Best rated articles. Most discussed. Other metrics that are commonly used by retailers. Some of these are already used by BioMed Central.
6. Allowing more papers to be published each issue (fewer restraints on page limit). Question raised: "Are there papers that ought to be published but aren't currently?" Ans: "We don't know, but this could encourage researchers to extend conference papers with deeper analysis, experimentation, implementation details, etc." (pereira) We should think more broadly than the current "journal paper" format. Many conference submissions would do better with journal-quality reviewing and editing. PLoS ONE is an example of how volume can grow when the focus is on sound science independently of "importance" (or popularity).
7. (kevinduh) Repository of tutorial or seminal articles on different sub-fields of NLP hosted on the main CL/JCLR site. This helps beginners to jump into NLP research, and is especially important in terms of outreach if we expect the field to grow in recent years. I'm thinking something along the lines of either www.kernel-machines.org or ACM Computing Surveys.
8. (kevinduh) The journal should foster a sense of community. Several ways to do this include: 1) occasional messages from the president, 2) short summary reports from recent workshop/conference chairs, 3) list of upcoming conferences, 4) having author photos and biographies in each featured article, 5) practical tips and tricks (or other kinds of columns). In other words, the journal shouldn't consist only of technical papers, but also rich set of articles.
Fernando suggests a few more options that I (hal) think are worth considering: can the role of a journal become increasingly tied to the conferences? The problem is that people have too much reviewing to do already; can we leverage journal reviewing to aid conference reviewing? I think: yes. On the topic of reviewing, I am very much in favor of the following... currently, when you review, there are two boxes: Comments to Authors and Comments to Committee. Why not add a third: Comments to become public upon publication. This would foster the discussion board idea, and enable good feedback from really qualified reviewers on papers. The only downside I see to this (since it's opt-in, there aren't many) is that changes to papers may change these comments. For conferences, this is a problem because reviewers won't want to go back and edit. But for a journal it may not be.
Are there other things that we should consider? I think it's worth thinking way outside the box here. What would be really useful to the community that we currently lack? Just because no other journal does it doesn't mean it's not worth considering.
Some more thoughts (by hal): There could be continuing "special issues" on newly-released data and newly-released tools (to give a formal place to "announce" such things). Also, JMLR just started publishing conference/workshop proceedings -- ACL tends to do this for us anyway, but it's interesting to note.
Priorities
What are the most important items above that should definitely be included in the proposal?
In order of priority, I think these are most important. Feel free to reorder or change. (kevinduh)
1. Open access
2. Fast turn-around time
2.5. Encouraged supplemental material
3. More papers (esp. detailed version of conference papers) (pereira) (or alternative to conference papers, which have quality issues)
4. Discussion boards and supplementary material (hal: perhaps kick-started with reviewer comments)
5. (Automatic) citation hyperlinking and other search enablers
6. hal: Integration with conferences (ala Fernando's suggestion)
Facilitation
I mentioned in the blog post that SPARC supports such efforts.
I have also been contacted by someone at BioMed Central, which has spun off into Chem Central and PhysMath Central and are interested in spinning off to CS Central as well. According to their email, they're one of the largest OA publishers and are potentially interested in supporting this effort; I'll post back when I know more.
(ryanmcd) One thing about BioMed Central to keep in mind is that they defer part of the costs to authors. For some journals this means charging the authors fees of hundreds of dollars. JMLR on the other hand is free for the authors.
(ryanmcd) I think it would be good to raise the possibility that this could be a part of the Anthology effort.
(ryanmcd) What about contacting JMLR? Maybe one could bootstrap off their infrastructure.
There are probably other options as well.
(hal) Update: The BioMed Central processing charge is 750 UKP, which can be prepaid by institutions. This seems high to me. It may be possible to try to get a grant to subsidize it. The big pro of going with them is the support: they basically run everything (referees, revisions, indexing, promotion, etc.). The big con is that the cost historically associated with instutitions buying journals not falls on institutions paying for authors to publish.
Notes
I just received the following email from Julia Hockenmaier, which I think is worth consideration (she tells me it's okay to post):
I think there are a lot of things wrong with our current conference publication model, and a move towards longer, quick turnaround journal papers would definitely improve the standard in our field. CL full papers are clearly more archival than anything. But I wouldn't want to lose this category of papers, if only for the level of detail the reviewers often get into. Yes, when you submit to a biology paper, you complain to the editor if it's been two months since submission and you haven't heard back yet. And PNAS only gives reviewers 10 days to return their reviews from the day they invite them. But then these reviews are a lot shorter than what we are used to, so I'm not sure we really want that model either.
There also doesn't seem to be enough awareness in the field that you can also just submit a squib to CL, which has apparently a much shorter turnaround time. Part of the problem is that many people just shelve their copy of CL as soon as it arrives in the mail. Either you know the work already, or you're not that interested anyway. In a way, CL needs to get sexier. But I'm not entirely sure we need a whole new journal to make that happen. Immediate electronic access (and notification) would definitely help, and perhaps also a better website. Perhaps something tied in with ACLweb? I agree with those comments on the blog or wiki that said a journal should foster a sense of community. For example, one thing I quite like about NLE is Robert Dale's industry watch. My sense of 'belonging' to the ACL community comes very much from attending many conferences over the years, and this is something we should be careful not to lose in the long term if we move to a journal model. Losing this sense of community would be my main long-term worry in moving towards a journal model. This could be prevented by either allowing a special issue of CL with extended versions of the best conference papers (difficult with so many conferences per year though), or, if we want something like an editor's choice paper in our journal, perhaps an invitation for the best few papers per year to present an extended talk on their paper at the conference? [don't know how easy it would be to pick this]. At any rate, there should still be enough interesting papers for people to want to go to a conference even if they aren't presenting.
You might actually want to liaise with the people behind LiLT ('Linguistic issues in Language Technology'), a new electronic journal that is probably different in scope from what you have in mind, but their experiences might still be valuable. I'm not sure there has actually been an issue of this journal yet, but have a look at http://www.stanford.edu/~azaenen/
Anyway, here's my long list of things that I think are wrong with the current conference model:
The submission process: A non-negligible number of ACL/EMNLP/... conference submissions are being submitted *simultaneously* to several conferences or workshops whenever that is a possiblity. Given how many of us know at least a few papers that we think should have made it into one of these conferences at some point, and given how much complaining there always is about the review process, authors probably act perfectly rationally by doing this. And people will always try to aim high as well as play safe at the same time. But it still creates an unnecessary burden on everybody. Disallowing double submissions might, however, lead to fewer submissions to workshops, which would be a shame.
The review process: Reviews in our field actually tend to be fairly detailed, but review loads have risen so much in the last few years that by the end of the conference season it's either impossible to find reviewers, or people simply don't spend enough time on their reviews anymore. I'm also not sure that the 'area chair' model of ACL is still workable. It seems simply impossible for one person to be able to manage 40-50 submissions and find/know the right reviewers for all of them. I think the senior PC model at HLT/NAACL works much better. It's still a lot of work, but at least you can actually think about each submission. And, please, somebody in power, abolish the bidding system (or make clear on the CFP and the submission form that the abstracts have to be detailed enough for potential reviewers to make an informed bid)! [and hide the full papers, as was done this year at ACL, but not last year]. By stacking the conference submission deadlines, we have now almst a de facto 'revise and resubmit' model in our conferences. This also means we don't quite have the once-a-year scramble that we used to have. However, for a journal submission, the reviewers might be chosen more carefully than is currently possible for most conferences, and revised versions typically get back to the same reviewers. I think that at EMNLP they tried to choose at least one of the ACL reviewers for each resubmission. This might be a better model, since this minimizes the chances of one disapproving reviewer being able to 'sabotage' a paper, but creates more work for more people. [oh, and please, can we get to a model where each ACL membership comes with a persistent START account?]
The format of the papers: 8-page, 11-font papers with the ACL bibstyle is simply not enough space to give all the technical details that are required to understand a system. Either allow supplementary materials (in order to keep papers short and snappy), increase the paper length (without hard copy proceedings, this is surely not a problem anymore), or decrease the font size to 10 pt. Short papers work in experimental (wetlab) fields, where everbody agrees on protocols etc. Also, what exactly is the status of poster papers? Everybody knows they typically didn't get scores as high as the talk papers, but they still count as full papers. Yet, neither the review forms nor the reviewer's recommendations seem to take that into account. Either have true posters (with only abstracts published), or have more parallel sessions.
Facts and Perspectives on Open Access Journals in General
SPARC Open Access brochure: Short 6-page PDF document arguing why everyone should care about open access and why it benefits everyone (authors, readers, research community, society).
ALPSP/AAAS "Facts about Open Access", 2005: Independent study of open access journals conducted via extensive surveys and interviews. This lengthy report is a worthwhile read for those interested in understanding all the statistics and arguments for/against open access journals.
Steve Lawrence (NEC Research), "Online or Invisible?" Nature, Volume 411, Number 6837, p. 521, 2001. An analysis of citation rates for online vs. offline computer science articles from 1989 - 2000. Mean number of citations for online and offline articles are 7.03 and 2.74, respectively. (Note: this study is concerned about online articles; in the current Computational Linguistics non-open access, authors reserve the right to put their articles online.)
(kevinduh) My own perspective now is that the success of open access depends on two things: a viable financial model and high editorial standards. If these two things are in place (and I believe they are definitely achievable in CompLing), then an open-access model has the potential to really accelerate research and benefit authors, readers, and society.
Email Discussion
The following issues were discussed by email (Hal, Ryan, Stuart, Kevin, Fernando) and not on the wiki:
1. General consensus that modifying the existing CL is the best bet. 2. Overhead: Fernando and Stuart back and forth on this. MIT absorbs most of the JMLR costs. 3. Microtome can cover printing costs of MIT press doesn't want to. We don't know the situation with respect to the contract with MIT press. This is good for archival purposes. 4. Model should be completely online with a good workflow system (many freely available). 5. For cost, Fernando quotes about $80k-$150k / year to cover web server, bandwidth, offise backup, part-time webmaster, part-time low-level editorial. Stuart thinks it should be less than $15, using existing ACL infrastructure. There is some hint that external funding may be possible, but having a long term solution would be better. 6. Questionable whether it's a good idea to rely on volunteers to keep things running (ala JMLR). Not doing so requires money -- how much could be leeched off of ACL?

