Journal of Operations ManagementVolume 68, Issue 2 p. 114-129 EDITORIALFree Access Empirical research methods department: Mission, learnings, and future plans Guangzhi Shang, Corresponding Author Guangzhi Shang gshang@business.fsu.edu orcid.org/0000-0002-3287-6520 Department of Business Analytics, Information Systems, & Supply Chain College of Business, Florida State University, Tallahassee, Florida, USA Correspondence Guangzhi Shang, Department of Business Analytics, Information Systems, & Supply Chain College of Business Florida State University, Tallahassee, Florida, USA. Email: gshang@business.fsu.eduSearch for more papers by this authorMikko Rönkkö, Mikko Rönkkö orcid.org/0000-0001-7988-7609 Jyväskylä University School of Business and Economics, University of Jyväskylä, Jyväskylä, FinlandSearch for more papers by this author Guangzhi Shang, Corresponding Author Guangzhi Shang gshang@business.fsu.edu orcid.org/0000-0002-3287-6520 Department of Business Analytics, Information Systems, & Supply Chain College of Business, Florida State University, Tallahassee, Florida, USA Correspondence Guangzhi Shang, Department of Business Analytics, Information Systems, & Supply Chain College of Business Florida State University, Tallahassee, Florida, USA. Email: gshang@business.fsu.eduSearch for more papers by this authorMikko Rönkkö, Mikko Rönkkö orcid.org/0000-0001-7988-7609 Jyväskylä University School of Business and Economics, University of Jyväskylä, Jyväskylä, FinlandSearch for more papers by this author First published: 14 February 2022 https://doi.org/10.1002/joom.1171 Handling Editors:: Suzanne de Treville and Tyson Browning. AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat 1 DEPARTMENT HISTORY AND OVERVIEW A recent trend in management journals is to involve specialized methods editors to check for the quality of an article's empirical execution (Antonakis et al., 2019; Bergh & Oswald, 2020; Hardwicke et al., 2019). The Journal of Operations Management (JOM) has been engaging methods specialists explicitly in the review process for about 5 years now. The initial transition into a matrix organization (Guide & Ketokivi, 2015), in which each article is assigned to a subject area editor (vertical) and, if necessary, also a review-team member specializing in a method or a theory (horizontal) was superseded by the creation of the Empirical Research Methods in Operations Management (now shortened to just Empirical Research Methods, ERM) department by the current editors-in-chief (Browning & de Treville, 2018). In this editorial, we give an overview of the current operations of the ERM department, present a collection of common method issues we have gathered, and discuss the strategic direction of the department. The ERM department serves our research community in two ways: (1) by publishing manuscripts about the use of ERM in OM and (2) by performing method checks for incoming manuscripts before they go to a topical department. Both roles align with the journal's evolution in recent years in terms of methodological rigor. We will address these two roles in turn. 2 PUBLISHING METHODOLOGICAL ARTICLES IN THE JOM JOM has a tradition of publishing methodological articles, including those for qualitative research such as case studies (Barratt et al., 2011; Handfield & Melnyk, 1998; Ketokivi & Choi, 2014; Meredith, 1998; Stuart et al., 2002), and plans to maintain this tradition going forward. To help carry this tradition forward and encourage it even more, we would like to add some clarity to two very fundamental questions for interested authors: How do we determine if a study is methodologically focused and hence suitable for the ERM department to handle? In addition, if an article is assigned to the department as a methods article, how do we determine whether it makes a methodological contribution? 2.1 A classification scheme Recognizing that it is difficult to define a catchall standard, we offer the develop-review-import classification, with suitable methods manuscripts broadly categorized into the three classes shown in Table 1. TABLE 1. Develop-review-import classification of methodological articles Manuscript class Description Potential pitfalls Examples Develop Developing a new empirical method or presenting primary methodological evidence for the usefulness of a technique for studying an OM problem. Prediction and machine learning articles fall into this class. Articles presenting original methodological research that is not OM-specific would be better suited to research methods journals. Forecasting, machine learning models (Chuang et al., 2021; Ilk et al., 2020; Ketzenberg et al., 2020; Pak et al., 2020; Petropoulos et al., 2018) Review Reviewing the application of existing methods in OM research. An article in this class would begin with a comprehensive survey of published research, then provide a summary of issues identified, and follow up with recommendations for improvement. Presenting a purely descriptive review without clearly identifying problems or areas of improvement in the current practice Endogeneity (Lu et al., 2018) Mediation (Malhotra et al., 2014) Import Importing a (class of) specific method into OM from other disciplines. The emphasis for this class is on the extent of applications in OM and the accompanied demonstration. The imported method is already known by OM researchers. The imported method represents too much of a niche for OM research. Inattentive survey responding (Abbey & Meloy, 2017) Multilevel factor analysis (Ketokivi, 2019) Articles in the develop category provide original methodological contributions. Typically, they develop an empirical method to solve an OM problem that common techniques in the empirical toolbox do not fully address. Many studies utilizing prediction, forecasting, and machine learning models as part of their method development process naturally fit under this class (Chuang et al., 2021; Ilk et al., 2020; Ketzenberg et al., 2020; Pak et al., 2020; Petropoulos et al., 2018). Whether such research goes to a topical department or to the ERM department depends on the novelty of the method and the degree to which the contribution comes from the method rather than from the research project more generally. Articles in this class must present an evaluation of the method's performance using tools such as Monte Carlo simulations and/or predictive-accuracy tests. The technical aspects of method contributions should be written at the standard required for applied research-methods journals such as Organizational Research Methods while maintaining a contextual focus on OM. Articles in the review category address how well the OM community is applying a method (e.g., Lu et al., 2018; Malhotra et al., 2014). A typical article in this class starts with a comprehensive survey of published papers using the focal method, then provides a summary of issues identified, and follows up with recommendations for improvement of the current research practice. It is important that the recommendations are supported by citations to methodological evidence such as original methodological studies or high-quality textbooks (e.g., econometrics books) that provide proofs or other forms of evidence. Authors must simultaneously recall that the reader is an applied OM researcher who will not benefit, for example, from a technical discussion of the exclusion condition of instrumental variables in sample-selection models, but would greatly benefit from insights regarding how to search for and identify such variables in the OM context. Lu et al. (2018)—handled by the ERM department as a review article—strikes a good balance between the above two issues. The authors did a thorough review of empirical OM studies, focusing on how these studies addressed endogeneity, and identified several issues. Importantly, Lu et al. (2018) not only explained how these issues could be addressed but also provided clear pointers to widely used econometrics books and well-known econometrics journals that contain proofs behind these techniques. The final class, import, contains articles that introduce methods used in other disciplines to OM (e.g., Ketokivi, 2019). The emphasis in this class is on the applicability of the technique to an OM context, demonstrated using an empirical example. Articles in this class can either present techniques that are new to OM or test the applicability of existing techniques in the OM context (e.g., Abbey & Meloy, 2017). Review and import classes do overlap: A new method might address an existing OM problem better and this improvement could be demonstrated with a review of current research practice. 2.2 Submission guidelines Prospective authors writing a methodological article for JOM, especially an article in the review or import classes, should start by submitting a proposal to the ERM department to get early feedback from the editors. A proposal should briefly describe the topic and explain why the methodological area warrants an introduction or review. A recently published commentary on use of experiments (Eckerd et al., 2021) started as a proposal that evolved considerably as it moved toward publication, leading to rich and meaningful discussions in the department and at the journal more generally. This preliminary process is carried out prior to the review process: Success in this step does not guarantee publication. We are excited to tell that the department is currently considering several interesting proposals that are making promising progress toward publication. 3 PERFORMING PRE-REVIEW METHOD CHECKS The second role of the department is to support other departments by performing pre-review method checks. A regular review typically assesses the quality of an article from three aspects: (1) theoretical contribution, (2) practical implications, and (3) rigor of research design and execution. The first two aspects require strong knowledge of the relevant OM literature but the third one less so. That is, although variations exist in what is emphasized across disciplines, there is very little if anything at all in method that is specific to OM. Although novel research design and analysis techniques are constantly being introduced, many basic principles remain unchanged over time. This crates the opportunity for standardized quality control checks: Manuscripts submitted to JOM, regardless of their topical area, can be processed initially via the ERM department to assess their methodological rigor (Figure 1 of Browning & de Treville, 2018). The goal of a pre-review method check is to identify any major weaknesses in the research design or analysis before the article is sent to a topical department for regular review. The outcome of a pre-review methods check is: (1) approval to move directly to a topical department, (2) a request for revision of the methodological part of a manuscript, or (3) a rejection due to methodological issues. When requesting a revision, we try our best to explain as clearly and practically as possible what is required to make the empirical part of the article publishable. Nevertheless, there is always a risk that even if the empirical part of a manuscript is deemed acceptable by the ERM department, the manuscript is later rejected by the topical editor because of a lack of contribution. For this reason, the pre-review methods check is typically done just for one round. In addition, successfully passing the pre-review method check does not prevent the regular review process to reveal further methodological issues. In many cases, the method check focuses on common issues that the ERM department has summarized from the past. From a reviewer-development point of view, method-focused reviews allow us to engage early-career OM scholars who do not yet have the experience to serve as topical reviewers for JOM. The requirements for a good methods reviewer are somewhat different from a regular reviewer. Evaluation of theoretical contribution and practical implications requires perspective that is largely gained through experience. In contrast, method review requires expertise on a certain type of methodology, which can be gained through doctoral level course work and self-learning. In our view, for method reviews, the freshness of method knowledge is more important than the experience of writing full review reports. Therefore, while we check most manuscripts sent for method reviews ourselves, we are actively building a pool of method reviewers. To expand this pool, we extend a warm invitation to late-stage doctoral students and junior colleagues starting their academic career. For those interested, we have developed a recruitment form.11 If you have a print copy of the article, you can find the recruitment form here: https://tinyurl.com/jom-methods-review Doing methods reviews can be attractive for junior scholars for several reasons. First, it provides a great opportunity to practice and develop one's own research method skills. Although there is no one-size-fits-all process for performing method reviews, the reviewer should expect some guidance and mentoring from the editors, which might include a template explaining common problems, notes from the initial evaluation by the editor, pointers to books, articles, and other learning materials about the issues identified in the initial evaluation, feedback on the initial version of a review, and even a short conference call to discuss the article. Based on our experience, student reviewers have found this process particularly motivating and helpful in developing confidence in their own skills. It is our sincere hope that these reviews become developmental to both the authors and the reviewers. Second, doing method reviews can boost one's career in two ways. Academic work generally involves research, teaching, and service. While junior scholars often have no lack of research and teaching opportunities, professional service opportunities might be more difficult to come by because of the lack of required seniority. Methods reviews for JOM allow one to engage in early-career professional service duties. Furthermore, doing methods reviews provides a way to be involved in the JOM community and offers a path to the editorial review board (ERB) of the journal. The department keeps track of reviewer workload and performance, and we have a policy to recommend method reviewers who have completed five high-quality method reviews to the editors-in-chief for ERB appointment. 4 A COLLECTION OF COMMON METHODOLOGICAL ISSUES This section presents a collection of common issues we have encountered during pre-review method checks. The manuscripts sent for method check have without exception been quantitative studies,22 Qualitative research, such as case studies, has been assessed methodologically during the regular review process. so our focus has been purely on issues that arise in quantitative research. We explain each issue concisely with citations to the methodological literature that provides full explanations. While the list provided here is not exhaustive, our aim has been to make it useful as pre-submission checkpoint, hopefully saving authors at least one round of revision. This checklist complements but does not replace resources like the American Psychological Association guidelines (Applebaum et al., 2018) or the excellent The Reviewer's Guide to Quantitative Methods in the Social Sciences (Hancock et al., 2019). We organize the issues under the topics research design, endogeneity, analysis, and interpretation and reporting. Although the endogeneity issues could be classified under the other categories, we decided to collect them in one category due to their relative frequency. Many issues are intertwined, and the categorization is merely an attempt to organize the materials. The rest of this section is presented in an issue-recommendation format. 4.1 Research design 4.1.1 Randomized field versus quasi/natural experiments Issue The term “field experiment” is sometimes used loosely to represent one of two types of research design: randomized field experiment or quasi/natural experiment. They are, however, very different designs and require drastically different strategies for causal identification (Shaver, 2020). A field experiment is an analog of a controlled lab experiment carried out in the real world (i.e., the field). The key features of a field experiment are that (1) there is a treatment group and a control group and (2) the participants are assigned into these groups randomly. For example, Nguyen and Kim (2019) randomized 328 textile, clothing, leather, and footwear firms into control, quality-management meeting, and placebo meeting conditions to test if quality-management meetings had an impact on quality management practices after 1 or 2-year lags. Chuang et al. (2016) randomized 60 retail stores into a treatment group that received audits and a control group that did not to study the effect of audits on operational performance. In contrast, a natural experiment is a special case of observational study, where some units have experienced the treatment and others have not, but the treatment assignment is not randomized. In some cases, the assignment can be treated as quasi random, while in other cases causal identification is achieved by a difference-in-difference strategy, sometimes with data preprocessed by a matching or synthetic control method to achieve better comparability of treatment and control units (e.g., Arslan et al., 2019). Other approaches such as regression discontinuity design (Calvo et al., 2019) and instrumental variables (Angrist & Keueger, 1991) can also be used, as permitted by data and context. Recommendation Use the field experiment terminology precisely (i.e., to represent the randomized field experiment). Chatterji et al. (2016), Eden (2017), and Lambrecht and Tucker (2015) provided guidelines for conducting randomized field experiments. Also, consult the relevant parts of Eckerd et al. (2021), Grant and Wall (2009), Lonati et al. (2018), and Sieweke and Santoni (2020). 4.1.2 Experimental vignettes Issue The idea of an experiment is that we administer a treatment or a stimulus (typically via a survey) that triggers a causal process which in turn produces an outcome (often measured on the same survey). A variety of issues arise in the process of conducting research using experimental vignettes. We refer readers to two method review articles recently published in JOM on this topic for full details (Eckerd et al., 2021; Lonati et al., 2018). We emphasize that whether experimental vignette is the appropriate tool for studying a particular research question critically depends on the underlying causal mechanism and theorization. For example, if the causal mechanism is at the individual level and the theory motivates this mechanism is about individual cognition, then a vignette would appear to be appropriate. Using vignettes to test organizational theory, however, requires the informant to take the stimuli to the organization, wait for the causal process that it triggers to play out, and then report the outcome. This is unlikely to happen in a typical vignette study given its short duration. A significant difference in the outcome between treatment and control groups is not sufficient evidence that an organizational level causal process has played out because this difference can be simply explained by implicit theories (Podsakoff et al., 2003) and demand effect (Lonati et al., 2018). That is, informants simply respond based on what they think would happen or what they think they should respond. Recommendation Exercise extreme caution when designing experimental vignettes outside the scope of individual causal processes, such as those covered by behavioral operations. Vignettes are a valuable design for answering some research questions but not appropriate for all questions. Explicitly address the possibility of demand effects and the risk caused by the use of non-consequential decision-making (Eckerd et al., 2021; Lonati et al., 2018). 4.1.3 Mturk and other online panel samples Issue Online Panels such as Amazon's Mechanical Turk (Mturk) are used as the data source for a number of studies published in JOM (e.g., Abbey et al., 2019; Ball et al., 2018; Cantor & Jin, 2019). A key debate for the appropriateness of this data source centers around whether subjects possessing the expertise required by the focal research can be reliably recruited on these platforms (Aguinis et al., 2021; Lee et al., 2018; Porter et al., 2019). If subjects are elicited for their experience or perception as consumers, such as preference for remanufactured products in Agrawal et al. (2015), Mturk is often a reliable and relatively low cost data source. If subjects are instead requested to answer questions requiring professional expertise from certain management domains (e.g., marketing and sales; eight in total as defined by Mturk), abundance of caution needs to be exercised to ensure (1) a nontrivial fraction of the online panel labor pool indeed possesses the required domain expertise, and (2) this qualified group of subjects is reliably screened out. The first criterion is to a certain extent subjective and also delicate to argue. On the one hand, we acknowledge that some management tasks can be stylized in a fashion (e.g., the newsvendor model) that managers and students perform similarly in experiments (Bolton et al., 2012). From this perspective, it appears that at least some Mturk subjects can be relied upon to conduct experiments requiring domain background they do not possess. On the other hand, if the experimental context requires experiential knowledge that takes years to develop inside a company or is only available to managers above a certain level, Mturk subjects lack face validity since they “are often unemployed or underemployed and are known to mispresent their qualifications” (Aguinis et al., 2021). Conditional on a reasonable satisfaction of the first criterion, rigor along the lines of the second criterion can be improved via a two-stage data collection procedure. Recommendation When faced with a potential concern regarding the qualification of Mturk labor as experimental subjects (first criterion), we advise the authors to be proactive about this issue by presenting arguments for the match between experimental context and Mturk labor background. In the data collection process, the authors can insert a prescreening study before administering their main study. The prescreening study should contain no indication of what the domain expertise is sought for the main study. Its purpose is rather to let subjects state an area of expertise and answer a few questions related to that area. Only subjects stated the “sought-for” area and correctly answered questions for that area will be used for the second stage data collection (for the main study). This two-stage procedure should successfully screen out desired subjects if they answer truthfully and should also lower the impact of lying about one's expertise if otherwise. In the event that Mturk is unsuitable as the main data source, we should not underestimate its value as a supplemental data source given its broad accessibility and low-cost. One such scenario arises when Mturk is used to collect a replication sample for the main study which might be highly targeted to a specific population or prohibitively costly to rerun (Lee et al., 2018). Another scenario is when the experimental design requires a very large subject pool due to the amount of manipulations (Mummolo & Peterson, 2019; Quidt et al., 2018). Last, we note that while online panels share similarities in their purposes, the subject population could exhibit nontrivial differences (Eyal et al., 2021), which implies literature insights regarding the Mturk population do not automatically apply to other platforms such as Qualtrics or Prolific. Moving forward, if Mturk samples are to become more widely used in OM, future research should sharpen our understanding of who the Mturk workers responding to OM studies really are. For example, more objective demographics data of the informants can be collected from company websites, social media profiles (e.g., LinkedIn), and by calling the informants. After all, it is important to provide a warrant (Ketokivi & Mantere, 2021) for the sample subjects we use. 4.1.4 Adapting measures from the literature Issue When authors describe a measure or variable, we frequently run into statements such as “our measure is adapted from (a citation)” or “we follow (a citation) to create our variable.” Citing a source is desirable yet insufficient for two reasons. First, citation should not replace a logical and transparent discussion of how well the operationalization captures the underlying construct in the particular context. Second, it is not uncommon that a reviewer checks the cited study and realizes that the original measure differs substantially from the measure in the submitted manuscript. Not only does this scale adaptation give the reviewer a negative impression of the paper, but it also risks contaminating the OM literature with untested and even questionable scales. Recommendation Be specific in describing and explaining adaptations of scales from previous research. If the construct and the items are identical with prior literature, this should be stated explicitly (and the item should be presented in quotation marks). Otherwise, authors should include an appendix table that shows the exact wordings of the original items (in quotation marks) compared to their adaptation. 4.1.5 Cross-sectional, single-source surveys Issue Arguing for causal impact using cross-sectional survey data is not advised: Such arguments are unlikely to survive the review process at JOM. An editorial by Guide and Ketokivi (2015, section 4.1) makes a blanket recommendation: “For survey researchers, we have a very simple recommendation: either give up single-informant surveys or stop making strong claims about common method bias.” We add three points. First, we are skeptical of the value of any post hoc test unless good marker variables are employed (Richardson et al., 2009). Second, general “procedural remedies” like maintaining anonymity are unlikely to eliminate the problem, but collecting data from multiple sources certainly mitigates the problem. Third, discussion of methods variance should move beyond models where a single method factor is assumed to adequately represent method effects, to analysis of items and how they are or are not affected by different sources of method variance (Spector et al., 2017). Recommendation For the dual benefit of having more success at publishing (for authors) and sparing review resources (for the journal), we join Guide and Ketokivi in advising researchers not to use single-informant surveys, especially as a unique data source. The success rate for such research designs over the past few years has been close to zero at JOM. 4.1.6 Role of control variables Issue In contrast to the usual extensive explanation of their choice of dependent and key independent variables (i.e., those related to hypotheses), authors tend to invest less in explaining their use of control variables. In some cases, authors list controls without explanation, or explain their inclusion with a sentence like: “Following prior literature (a number of citations), we control for …” While control variables are not expected to receive the same level of attention as focal dependent and independent variables, it is important to explain why a control variable presents an alternative explanation for the correlation between the independent and dependent variable, not just why it is expected to influence the dependent variable (Spector, 2019). If our goal is to identify β 1 , the causal impact of X 1 on Y, control variables play two basic roles. First, consider X 2 , which is correlated with X 1 and has an impact on Y, captured by β 2 . This means X 2 is directly tied to the causal identification of β 1 . If X 2 can be collected or proxied (with good quality), it is highly recommended to include it as a control variable in the regression, since this ensures the unbiasedness (e.g., in ordinary least squares [OLS] regression) and consistency33 “Unbiasedness” and “consistency” are sometimes used interchangeably by authors, but they are different. Loosely stated, an unbiased estimator produces a β 1 ̂ that has a mean equal to β 1 in finite samples (e.g., sample size = 1000). A consistent estimator, on the other hand, does the same asymptotically (i.e., when sample size approaches infinity). For example, many instrumental variable estimators in nonlinear models are consistent but not necessarily unbiased. OLS estimates for linear models are both unbiased and consistent. (e.g., in many nonlinear models) of the estimated β 1 . If X 2 unfortunately cannot be measured, it gives rise to omitted variable bias when estimating β 1 —the most common reason behind endogeneity, which we discuss further in a later section. Thus, control variables suc