How I Automated Reference Editing with Google Scholar, Zotero, and Citation Style Language
Updated: Jul 4, 2023
Reference Lists and Me: From Zero to Hero
Until very recently, I used to hate editing references. I know some copyeditors enjoy it (did they also enjoy lining up their toys in a straight line as kids?); well, tastes differ. So, it was perhaps just as well that for most of my career I successfully dodged reference editing. Or perhaps it's the other way around: reference editing successfully dodged me.
In the editing company I worked for, reference editing was done by technical editors, not by the language editors. It was only after I became a freelancer that I had to confront the reference editing problem. Again, I was lucky. The publishing services company that gave me books to edit had a couple of technical editors who did all the reference editing. I also edited journal papers, but the reference lists were either often excluded from editing or were usually short enough to be manageable.
However, I was forced to do something about reference editing when I started working for a new client that had exacting standards: the style was well defined (a CMoS variant), and proofreaders carefully checked that everything was in order. They caught the smallest deviations from the house style. Moreover, the reference lists were long, usually spanning many pages. I studied the reference style thoroughly and, to my surprise, began enjoying some aspects of reference editing, such as the detective work of tracking down the sources and verifying that the information was correct. I found mistakes. To my surprise, sometimes even the authors or titles were wrong. Setting the record straight gave me satisfaction; I knew this was valuable, albeit thankless, work. But a more practical concern began to trouble me: the amount of my time and energy that the reference lists were consuming. I became curious enough to log the time spent on editing references. I found that I was spending a sizable proportion of my editing time on the references. I knew I'd have to do something about it. Often the reference lists came in terrible shape, and my battles with them were exhausting me.
Fast forward to a few months later. Yay! I had succeeded in automating reference editing to my satisfaction, using only free tools. The process is not 100% automated; some manual intervention is needed sometimes, but this is inevitable, I think, even with the most expensive tools. The benefits were immense. The new automated workflow required considerably less time and energy. In fact, reference editing had become a friction-less process. This breakthrough was timely, as I had just begun editing a book with long reference lists in each chapter. With the new automated reference editing process, I was able to slice through the reference lists like a knife through butter.
After finishing the book, for a couple of months I did not have to edit references. When I next sat down to edit a reference list, I was shocked: I had forgotten some key steps of my automated workflow, and had to spend some time recollecting and reconstructing them.
In what follows, I describe my automated reference editing workflow. I'm pretty sure I'll need to consult this post again in the future! I also give the sources from which I learned what I present here. I make no claims for originality, except that I managed to integrate disparate pieces of knowledge into a cohesive workflow.
The Starting Point: Google Scholar
The first step for most reference entries is to track them down in Google Scholar. This works especially well with journal articles and most reports. Web pages and newspaper articles will need to be hunted down in Google. Let's begin with the following journal article:
Baldauf, M., Garlappi, L., & Yannelis, C. (2020) Does climate change affect real estate prices? Only if you believe in it. The Review of Financial Studies, Vol.33, Iss.3:1256-1295.
Our task is to put this reference, which is currently in a style I'm unfamiliar with, in "Chicago Manual of Style 17th edition author-date" style. Let's locate this paper in Google Scholar. Pasting the title into Google Scholar's search box usually does the trick:
It's the first hit. I must've run thousands of Google Scholar searches over the past 18 years, blissfully oblivious of the "Cite" button. Click it. Now.
The following pop-up window opens:
I was amazed when I first beheld this window! It presents the reference citation in a few common styles. Note the four names at the bottom of the window. EndNote and RefWorks are commercial reference managers. RefMan is a Python-based reference manager. Click BibTeX.
The browser displays some mysterious-looking code (this is BibTeX):
Select all of the code by clicking inside the code and hitting CTRL-A. Copy the code with CTRL-C.
The Plot Thickens: Enter Zotero
Our work in Google Scholar is almost done. For the next step, install Zotero Desktop and the Zotero browser add-in. Installing these is straightforward.
My Zotero Desktop looks like this:
Your Zotero Desktop will have nothing in it — but that's going to change, starting now. Click the folder icon below "File" to create a new collection. Call it "Test". The empty Test collection opens.
Switch back to Zotero. Click File and then Import from Clipboard. Zotero will use the BibTeX code you had earlier copied to the clipboard. What happens now?
It's magical! Examine the pane on the extreme right, and you'll see what I mean. The reference has been stored in Zotero and broken down into its components. Every item in this pane is editable, so in case you see anything amiss here, click the field and edit it.
While writing this post, I came across another method of getting the reference into Zotero. Try it now. Instead of clicking Cite in Google Scholar, open the journal paper by clicking its title. After the paper opens in the browser, click the Zotero browser extension and Zotero will save all the information it can scrape from the Web page to the Test collection.
Also, some journal publishers have set up a direct link for import into Zotero. Here's an example:
This Web page has a Cite button, and clicking on it opens the displayed popup. Clicking "Export citation to BibTeX" displays this:
Clicking "OK" will import the data on the Baldauf paper directly into your Zotero desktop app. Note that this link is mediated by the Zotero Connector browser extension, so you will need to install the extension. See "Adding items to Zotero" in the Zotero documentation for more details on how to get metadata into Zotero.
Let's now return to our Test collection in Zotero, which will now have two entries for Baldauf, one for each method:
Now you are perhaps getting an inkling of what this game is about. The Baldauf reference has been stored inside the Test collection in Zotero as a database record. Any style (CMoS, APA) can now be applied to produce the output we need. That's the basic idea.
I must say here that my route to Zotero was roundabout. I had first explored JabRef and Bibtex4Word using this video by James Azam:
At around the 6 minute mark, he describes the technique of importing a reference into JabRef using Google Scholar. I found out that Zotero is a better choice than JabRef for those who use only MS Word. Those who use both LaTeX and MS Word would probably be better served by JabRef.
Cashing In: Generating the Baldauf Reference List Entry in MS Word
Let's now use Zotero to generate the Baldauf reference entry in MS Word. Open a blank Word document. Switch to Zotero and right-click either of the two Baldouf entries. Click "Generate Bibliography from Item". The "Create Citation/Bibliography" pop-up window is displayed.
The top pane shows the citation styles installed in Zotero. Clicking "Manage Styles" will allow you to install new citation styles in Zotero. Select the required citation style, and the "Copy to Clipboard" radio button. Click "OK". The Baldauf reference entry is now in your clipboard. Click inside the blank Word document and hit CTRL-V (or click Paste on the toolbar). The Baldauf reference entry is now in your Word document.
Mission accomplished!
The entire bibliography in your Zotero collection can be generated with one click. To do this, click on any item in the Test collection and hit CTRL-A. Both items in Test are now highlighted. Right-click the highlighted items, and proceed as before. Note that besides copying the bibliography to the clipboard, it can be saved in RTF and HTML formats.
In-text citations can be generated similarly; select the "Citations" output mode.
I had earlier used the Zotero Word add-in to generate bibliographic items in Word. Today (January 13, 2023), Vivek Kumar, the founder of the Indian Copyeditors Forum group on Facebook, asked me a question after reading this post: Can the entire reference list be generated in one step instead of generating it item by item?
I quickly found it was indeed possible, using the simple method described above, which is a vast improvement on the convoluted method using the Zotero Word add-in that I had described earlier to generate the reference items one by one. I think the add-in is for authors, who can use it to build their bibliographies step by step as they write their paper.
Thank you, Vivek, for your sharp question that made this considerable improvement in my workflow possible. Also, whereas I'd earlier thought that it was preferable to generate the reference items and check them one by one, I now see that it's much more efficient to generate the entire reference list in one step after populating Zotero and quickly scan it for errors. This manual check is necessary, because Zotero does stumble occasionally.
Applying Reference Styles On The Fly
To appreciate the power of this approach, consider the following two references that came up recently in my practice:
Manjunatha, A. V., & Ramappa, K. P. (2017). Farmer Suicides in Karnataka. Bengaluru: Institute for Social and Economic Change. Retrieved from Institute for Social and Economic Change website: http://www.isec.ac.in/Farmer-suicide-Karnataka-Final-report-2005201_AVM.pdf
Manjunatha, A. V., & Ramappa, K. P. (2017). Farmer Suicides in Karnataka (Agriculture Development and Rural Transformation Centre Report). Institute for Social and Economic Change. http://www.isec.ac.in/Farmer-suicide-Karnataka-Final-report-2005201_AVM.pdf
The two references are very similar, but there are some differences. The location (Bengaluru) is included in the first reference but not in the second. The series title (Agriculture Development and Rural Transformation Centre Report) is included in the second reference but not in the first. In the first reference, "Retrieved from ..." introduces the URL; the second reference lacks this introductory text.
This reference was stored as a report (Item Type = Report) in Zotero. The first reference was generated by applying the "APA 6th edition" style, and the second reference was generated by applying the "APA 7th edition" style. If you were familiar with the APA 6 style and then had to suddenly move to the APA 7 style, you would have to learn about — and remember — the often-minor differences between the two styles. On the other hand, once you have the reference in a Zotero database, you can apply any style in the world to it — without your having to know anything about the style! Isn't that amazing power? It is — and that's the power of automation. Also, once you have a reference in a Zotero database, the next time it appears in a reference list, you can generate it immediately.
The Zotero Browser Extension
Now let's see how this process works with a magazine article. We'll look for the article titled "Ola S1 Pro Electric Scooter Catches Fire In Pune, Company Launches Investigation." Naturally, we use Google, not Google Scholar, for this search. It's the first hit on Google. Clicking the link opens the article:
Click the Zotero browser extension and save this article in Zotero. This is what is saved:
It's clear that key information such as the article author and date have been missed. This, I suppose, is not surprising. Zotero cannot possibly handle the wide variety of websites, especially as many store of them metadata in nonstandard ways. I suppose journals are more disciplined about this, which is why the Zotero browser extension captured data well from the Baldouf journal website. All fields in Zotero can be edited, so you can click on the fields and enter the information manually. For example, you can click on the "Item Type" field and change it from Web Page to Magazine Article.
References can also be added manually to Zotero. Click the "+" symbol above the middle pane, select the Item Type, and fill in the fields manually.
I found the following Zotero tutorial from the Paul V. Galvin Library helpful. It helped me come to grips with Zotero.
The Cherry on the Cake: Customizing Reference Styles with Citation Style Language (CSL)
So far, so good. But what if you have to apply a style that differs from a standard style? For example, what if the required style is "Chicago Manual of Style 17th edition author-date", but without quotation marks around the article title; or perhaps the article title should be in sentence case. Of course, these changes can be done manually — but there's a much better way. These modifications can be generated automatically, but another tool has to be used: CSL.
The first step is to head over to the CSL website:
This will give you information on the CSL language. If you have not heard of it (I certainly hadn't until a few months ago), I suggest you get yourself a coffee and spend some time here. I cannot praise the folks who developed CSL enough. This is where you can learn about how the CSL project began, and where it stands now.
Can't wait to get your hands dirty? Without further ado, load
Type "Chicago" in the search box. A surprisingly large variety of Chicago styles is displayed. Click the "Edit" button on the very first style: "Chicago Manual of Style 17th edition author-date". This is the result:
All the style customization action happens here. The display looks complicated, for a good reason — because it is complicated! But the good news is that anyone can learn enough about the Visual Editor (the highlighted text on the title bar of the Web page shows that we are in the Visual Editor component of the CSL Editor) to be able to make simple modifications to a standard style, which is all we'll need to do in most practical settings. Let's get to know the Visual Editor better.
The narrow left pane contains the code. The top pane contains example in-text citations and the corresponding bibliography entries. The bottom pane shows editable details of the code element that is selected in the the left pane. Notice that the Info element is selected in the left pane. The bottom pane shows the details of the Info element. If you click Global Formatting Options in the left pane, the element below Info, you'll see that the display in the bottom pane changes accordingly.
Let's first add a few more examples to the top pane. Hover over "Example citations" in the upper-right corner, and click Citation 1. A pop-up window opens.
Check the five options after "chapter": article-journal, report, book, webpage, and article-newspaper. Close the window. The top pane is populated with these additional items.
We're not finished with the top pane yet. Let's add our Baldouf reference to the top pane. How is that done? We need to return to Zotero Desktop. Locate the first Baldouf entry (the one generated using BibTeX, not using the Zotero browser extension) in our Test collection. Right-click it. Click Export Item. Select CSL JSON. Save the exported item in a convenient location. Open the JSON file in a text editor of your choice (I use Notepad++). You'll see code that looks similar to the BibTeX code you saw in Google Scholar. Click inside the code, hit CTRL-A to select all of it, and then hit CTRL-C to copy it.
Now return to the Visual Editor. Again, hover over "Example citations" in the upper-right corner of the top pane, and click Citation 1. The familiar pop-up window opens. This time cast your gaze at the bottom of the window. See Advanced there? Click it, and scroll down to unveil a text box titled "Add new reference". Click inside the text box and paste the JSON code. Then click "Add new reference". Done! Return to the top pane, and you'll see Baldouf in the Bibliography (it'll be the second item). Look in the Example Citations section, and you'll see the Baldouf citation there. Mission accomplished!
Now let's think about customizing the style. There is a useful trick you should know that eases this task. Click the name "Baldouf" in the Bibliography. You'll see that all the name elements in the Bibliography are highlighted in blue. Importantly, the corresponding code element in the left pane is unveiled and highlighted, as is the corresponding element in the bottom pane. It's not called the Visual Editor for nothing! Without this visual aid, identifying the variables in the left pane to work on would be like searching for a needle in a haystack.
Now, let's look at a few customization examples. First, let's say our style requires initials in author names. For this, click the Name element in the Names group. The bottom pane changes correspondingly:
Click the Enable button in the bottom pane. Notice that the names in the Bibliography have initials in them now! If you don't feel a spark of excitement on seeing this, you're not human. Let's be a little more demanding. We now want the initials separated by periods. Enter a period in the "initialize-with" text box. The initials now have periods in them!
Next, click the title of the Baldouf article. Notice that all the titles in the Bibliography have been selected. The variable "title" is highlighted in the left window, and the corresponding title details are shown in the bottom window. Let's now remove the quotation marks from the titles. Notice that the quotation marks in the bottom window (below "Text formatting") are highlighted. Click them, and the quotation marks around the titles vanish. Notice the scary warning that has appeared at the top of the screen. Recall the adage about barks and bites, and confidently click Dismiss.
Many of the changes made in the Visual Editor can be undone. For example, hover on Edit on top of the left window, and click Undo. The quotation marks around the title reappear. Click Redo to restore the status quo.
Now for my final customization example. Notice that all titles in the Bibliography are in title case. Let's say we want journal article titles alone to be in sentence case. To achieve this, we need to work with conditionals. Click Conditional in the title group, and next click the + symbol in the upper-right corner of the left panel. An Else-if button appears. Click it. Notice that a new Else-if condition has been added to the code tree and highlighted. The display in the bottom panel has changed accordingly. Click the drop-down and select "article-journal" as the document type. If all has gone well, you should be seeing this:
Click again on + and in the pop-up that opens, click Text. The display should now look like this:
We're almost there! In the bottom panel, change Type to "variable", and just below, change Variable to "title". The display should now look like this:
Notice that two titles in the Bibliography have been selected; they're the only two journal articles in the example list. Notice also that the journal titles alone are now in sentence case. That's because the default case of the title variable is sentence case. This is controlled by the text-case setting in the "Text formatting" window further down in the bottom pane (you may have to scroll down to see it). To see this, click the drop-down in "text-case", and select Uppercase. The journal titles are now in uppercase. Click the last option, "sentence", to restore sentence case.
We're done modifying this style! Let's save this under a new name (this is necessary, as otherwise when we save this style, the original Chicago style will be overwritten). Click Info at the top of the left pane. The Title field in the bottom window shows the title of the style: "Chicago Manual of Style 17th edition (author-date)". Change this to "Chicago Manual of Style 17th edition (author-date) For Blog" (or any other name you like). Hover on Style in the upper-left corner of the left panel, and click Save Style. A dialog box appears asking for confirmation to modify the style ID and link to the style.
Click OK. A pop-up window appears. Click "Download Style", and save the CSL file on your computer.
For the final step, return to Zotero Desktop. Click Edit and then Preferences. Click the Cite tab. The Style Manager window appears. Click the + symbol below the list of displayed available styles, navigate to where you saved the modified Chicago style, and select it. That's it! The modified Chicago style is now stored in Zotero and can be used to generate bibliography/reference entries in MS Word.
I should mention here that nodes in the CSL variable tree in the left pane can be moved around. For example, see what happens when the "date (macro)" node (highlighted in the figure below) is dragged down to the end of the layout, after "access (macro)".
This is how the left pane should appear after the drag operation:
Notice that the year is now at the end of the example reference entries in the top pane. However, it is not separated by a period and space. In the bottom pane, locate the prefix text box in the Affixes section and enter a period followed by a space (. "). The year in the reference examples is now correctly punctuated. Undo these operations now. I just wanted to show that nodes in the variable tree can be moved around.
If you have an example bibliographic entry, you can search the CSL style repository for the style that matches the entry most closely. To do this, click the "Search by example" tab in the Visual Editor title bar, and modify the style in the left pane to match your desired style. You can see the corresponding changes in the top pane as you make changes in the left pane. I haven't tried this, but it's good to know that the feature is available.
I got my start with the Visual Editor by watching the following tutorial by Sebastian Karcher, one of the core contributors to CSL. What can I say! He's an amazing teacher.
While writing this post, I stumbled upon a couple of useful CSL resources. The first is the official CSL documentation:
The following is a paper by Arthur Emil van der Pol. The title is self-explanatory.
I should in fairness mention that although I use Zotero, Mendeley has the same functionality. Try out both and choose one.
In conclusion, a disclaimer. I'm by no means an expert in the tools I've presented here. All I have is a working knowledge acquired over the past few months. This has sufficed for whatever my reference lists have thrown at me so far, but I will undoubtedly have to learn more in the future. Anything noteworthy I find will be posted in this blog. And, finally, all those involved in the CSL and Zotero/Mendeley projects: take a bow!
Comments