"Caution also must be used because, in most cases, deletion [of outliers] helps us to support our hypotheses. Given the importance of inter-subjectivity and the separation of theoretical and empirical evidence in the testing of hypotheses, choosing a course of action post hoc that is certain to increase our chances of finding what we want to find is a dangerous practice." - Cortina (2002)
1. What is an Outlier?
As Salgado et al. (2016) define, outliers are extreme values deviating significantly from other observed data points, either high or low. We often refer to these as abnormalities, discordant, deviant, or anomalies. Contrary to the traditional approach of viewing outliers as problems needing elimination, Herman Aguinis, Ryan K. Gottfredson, and Harry Joo (2013) suggest that outliers can provide valuable theoretical insights and unique phenomena in various research contexts.
To categorize and handle outliers effectively, Herman Aguinis, Ryan K. Gottfredson, and Harry Joo (2013) identify three types: error, interesting and influential. Error outliers stem from inaccuracies in data. On the other hand, while we cannot decide whether they are error outliers, interesting outliers hold potentially valuable information. Influential outliers, not errors or interesting anomalies, significantly impact statistical analyses and substantive conclusions.
2. Detecting and Handling Outliers
Before determining how to detect outliers, an examination of data distributions is crucial. The process can begin with a Probability Plot, followed by a statistical analysis to assess normal distribution. If inconclusive, using techniques to detect outliers in non-normally distributed data becomes essential.
Kazumi Wada (2020) proposes various methods, including multivariate outlier detection techniques and regression imputation. These methods employ robust estimators, such as MSD estimators, Fast-MCD, BACON, IRLS algorithm, weighted functions, and measures of scale to address outliers in datasets with multiple variables.
Herman Aguinis, Ryan K. Gottfredson, and Harry Joo (2013) provide principles for identifying and handling outliers based on their types. We employ visual and quantitative techniques to detect error outliers while identifying interesting outliers based on specific research domain impacts. Influential outliers are detected differently depending on the statistical techniques used.
Addressing outliers involves transparently describing the chosen procedures and acknowledging the type of outlier of interest. For error outliers, adjusting data points or removing cases is recommended, with a clear rationale provided. Interesting outliers should be studied quantitatively or qualitatively, analyzing differences between groups or investigating factors affecting outliers. Handling influential outliers varies based on the analysis technique.
3. The Future of Outliers
Instead of considering outliers as problems needing removal, researchers and data scientists should systematically define, detect, and handle them for better substantive conclusions. We anticipate a shift toward consistent and transparent practices in handling outliers, contributing to improved accurate modelling and analysis for future research.
References:
Point for reflection? Please feel free to explore resources in the reference list.
反照要点?请您们探索参考文献列表中的资源。
¿Un punto de reflexión? No dude en explorer los recursos de la lista de referencias.
Muốn suy ngẫm thêm về bài viết của mình? Bạn hãy đọc các tài liệu trong danh sách tham chiếu.
Please follow me on LinkedIn to learn more about my work if you are interested https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=6891134047146397696.
如果您们对我的工作感兴趣,请上关注我在LinkedIn,了解更多信息 https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=6891134047146397696。
Hãy theo dõi tôi trên LinkedIn để biết thêm về công việc của tôi nếu bạn cảm thấy thú vị https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=6891134047146397696.
Si está interesado, sígame en LinkedIn para saber más sobre mi trabajo https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=6891134047146397696.
Do you have any questions, thoughts, ideas and valuable feedback? Please do so by sending me a message or dropping a comment.
您有问题,感想,想法而且有价值的反馈吗?请您送我消息或发表评论。
Bạn có muốn hỏi, bày tỏ suy nghĩ hay sáng kiến, và phản hồi quý gì không? Xin bạn hãy gửi cho tôi một tin nhắn hoặc bình luận.
¿Tiene alguna pregunta, sugerencia, idea o comentario valioso? Envíeme un mensaje o deje un comentario.
Comments