Disease Detection From Twitter Data Using Natural Language Processing and Machine Learning

Öztürk, AliDurak, ÜsameBadilli, Fatma2022-02-262022-02-2620202667-8055https://doi.org/10.36306/konjes.650150https://dergipark.org.tr/tr/pub/konjes/issue/57976/650150https://dergipark.org.tr/tr/download/article-file/860622https://hdl.handle.net/20.500.13091/2073DergiPark: 650150konjesIn this study, we determined whether the subject of the messages of the twitter users were about a disease and what kind of diseases they were. For this purpose, supervised and unsupervised machine learning algorithms were tested and compared using the features extracted via TF-IDF and BOW methods. Data were collected with Python scripts from Twitter. The Scikit-Learn library which was developed for Python was used to implement the algorithms. The clustering algorithms which are unsupervised methods achieved an accuracy level of %68.60, while the performance of the supervised classification algorithms reached to the accuracy level of %97.48.Bu çalışmada twitterdaki kullanıcıların yazmış oldukları mesajların hastalık konulu olup olmadığı ve hastalık türleri tespit edilmiştir. Bu amaçla gözetimli ve gözetimsiz makine öğrenmesi algoritmaları, TF-IDF ve BOW yöntemleri ile çıkarılan özellikler ile denenmiş ve karşılaştırmalar yapılmıştır. Veriler Python betikleri ile twitter üzerinden toplanmıştır. Algoritmaları uygulamak için Python için geliştirilmiş Scikit-Learn kütüphanesi kullanılmıştır. Gözetimsiz olarak verilerin kümelenmesinde %68.60’lık bir başarı elde edilirken, gözetimli algoritmalar ile yapılan sınıflandırmalarda %97.48’lik başarı oranına ulaşılmıştır.eninfo:eu-repo/semantics/openAccessTwitterDisease RecognitionNatural Language ProcessingMachine LearningTwitterHastalık TanımaDoğal Dil İşlemeMakine ÖğrenmesiDisease Detection From Twitter Data Using Natural Language Processing and Machine LearningTWİTTER VERİLERİNDEN DOĞAL DİL İŞLEME VE MAKİNE ÖĞRENMESİ İLE HASTALIK TESPİTİArticle10.36306/konjes.650150