{"id":937,"date":"2022-11-14T10:54:23","date_gmt":"2022-11-14T09:54:23","guid":{"rendered":"https:\/\/bilogic.hu\/?page_id=937"},"modified":"2022-11-14T10:55:10","modified_gmt":"2022-11-14T09:55:10","slug":"easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed","status":"publish","type":"page","link":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/","title":{"rendered":"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed"},"content":{"rendered":"\n<p>Data cleansing and preprocessing can occupy 40-70% of the workflow. We should start any analysis, modelling, or visualisation by getting to know the data and prepare the dataset for the next steps. Basic steps of data cleansing:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dropping useless or multiple columns.<\/li>\n\n\n\n<li>Coding categorical variables:\n<ul class=\"wp-block-list\">\n<li>if necessary: in case of visualisation, categorical form can be easier to understand, but some methods manage numerical data only<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Observing missing data:\n<ul class=\"wp-block-list\">\n<li>Look at the missing values by each variable and think about if further steps needed.<\/li>\n\n\n\n<li>Check if the missing is random or there could be relation or pattern between them.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Observing text data:\n<ul class=\"wp-block-list\">\n<li>Remove text errors.<\/li>\n\n\n\n<li>Remove whitespaces.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Date time formatting<\/li>\n<\/ul>\n\n\n\n<p>Some ideas on missing data handling:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using median for numerical variables.<\/li>\n\n\n\n<li>Use median as it is not sensitive to outliers opposed to the mean. Of course, in some justified cases, mean also can be used.<\/li>\n\n\n\n<li>If we have categorical variables too, we can compute the mean indicators in the subgroups, so we can have a less biased solution.<\/li>\n\n\n\n<li>Using regression for numerical variables:\n<ul class=\"wp-block-list\">\n<li>Estimate the missing values with other variables.<\/li>\n\n\n\n<li>We can use either mean or median regression.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Mode for categorical data.<\/li>\n\n\n\n<li>Using any classification or categorization method for categorical data:\n<ul class=\"wp-block-list\">\n<li>Binary or multinominal logistic regression.<\/li>\n\n\n\n<li>k-nearest neighbour method.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>These methods can be used to manage random data missing. Before we pick which method to use, think about the analysis we are going to undertake, and choose the right method accordingly. (E.g., if we want to analyse with regression, our results would be biased if we use regression for data imputation too.)<\/p>\n\n\n\n<p><em>The datacleaning part of the article based on <\/em><a href=\"https:\/\/towardsdatascience.com\/the-simple-yet-practical-data-cleaning-codes-ad27c4ce0a38\"><em>Admond Lee\u2019s article<\/em><\/a><em> titled \u201eThe Simple Yet Practical Data Cleaning Codes\u201d.<\/em><\/p>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Data cleansing and preprocessing can occupy 40-70% of the workflow. We should start any analysis, modelling, or visualisation by getting to know the data and prepare the dataset for the next steps. Basic steps of data cleansing: Some ideas on missing data handling: These methods can be used to manage random data missing. Before we&hellip;&nbsp;<a href=\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\" class=\"\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"class_list":["post-937","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC<\/title>\n<meta name=\"description\" content=\"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC\" \/>\n<meta property=\"og:description\" content=\"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\" \/>\n<meta property=\"og:site_name\" content=\"BI LOGIC\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bilogickft\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-14T09:55:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo.png\" \/>\n\t<meta property=\"og:image:width\" content=\"127\" \/>\n\t<meta property=\"og:image:height\" content=\"127\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\",\"url\":\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\",\"name\":\"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC\",\"isPartOf\":{\"@id\":\"https:\/\/bilogic.hu\/#website\"},\"datePublished\":\"2022-11-14T09:54:23+00:00\",\"dateModified\":\"2022-11-14T09:55:10+00:00\",\"description\":\"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.\",\"breadcrumb\":{\"@id\":\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Kezd\u0151lap\",\"item\":\"https:\/\/bilogic.hu\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/bilogic.hu\/#website\",\"url\":\"https:\/\/bilogic.hu\/\",\"name\":\"BiLogic\",\"description\":\"Felfedj\u00fck \u00fczlete \u00e9rt\u00e9keit a Powered BI seg\u00edts\u00e9g\u00e9vel\",\"publisher\":{\"@id\":\"https:\/\/bilogic.hu\/#organization\"},\"alternateName\":\"BiLogic\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/bilogic.hu\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/bilogic.hu\/#organization\",\"name\":\"BiLogic\",\"alternateName\":\"BiLogic\",\"url\":\"https:\/\/bilogic.hu\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/bilogic.hu\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo_big-1.jpg\",\"contentUrl\":\"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo_big-1.jpg\",\"width\":501,\"height\":127,\"caption\":\"BiLogic\"},\"image\":{\"@id\":\"https:\/\/bilogic.hu\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/bilogickft\",\"https:\/\/www.linkedin.com\/company\/bi-logic-kft\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC","description":"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/","og_locale":"en_US","og_type":"article","og_title":"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC","og_description":"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.","og_url":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/","og_site_name":"BI LOGIC","article_publisher":"https:\/\/www.facebook.com\/bilogickft","article_modified_time":"2022-11-14T09:55:10+00:00","og_image":[{"width":127,"height":127,"url":"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/","url":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/","name":"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed - BI LOGIC","isPartOf":{"@id":"https:\/\/bilogic.hu\/#website"},"datePublished":"2022-11-14T09:54:23+00:00","dateModified":"2022-11-14T09:55:10+00:00","description":"Fedezze fel a Powered BI \u00e1ltal ny\u00fajtott \u00fczleti intelligencia szolg\u00e1ltat\u00e1sokat \u00e9s tapasztalja meg, hogyan felfedheti \u00fczlet\u00e9nek val\u00f3di \u00e9rt\u00e9keit. Az adatok elemz\u00e9s\u00e9t\u0151l az \u00fczleti d\u00f6nt\u00e9sekig, a Bilogic az \u00d6n partnere a digit\u00e1lis \u00e1talakul\u00e1sban.","breadcrumb":{"@id":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/bilogic.hu\/en\/easy-data-cleaning-and-a-bit-complex-missing-data-handling-making-your-dataset-ready-to-be-analysed\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Kezd\u0151lap","item":"https:\/\/bilogic.hu\/en\/"},{"@type":"ListItem","position":2,"name":"Easy data cleaning and a bit complex missing data handling \u2013 making your dataset ready to be analysed"}]},{"@type":"WebSite","@id":"https:\/\/bilogic.hu\/#website","url":"https:\/\/bilogic.hu\/","name":"BiLogic","description":"Felfedj\u00fck \u00fczlete \u00e9rt\u00e9keit a Powered BI seg\u00edts\u00e9g\u00e9vel","publisher":{"@id":"https:\/\/bilogic.hu\/#organization"},"alternateName":"BiLogic","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/bilogic.hu\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/bilogic.hu\/#organization","name":"BiLogic","alternateName":"BiLogic","url":"https:\/\/bilogic.hu\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/bilogic.hu\/#\/schema\/logo\/image\/","url":"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo_big-1.jpg","contentUrl":"https:\/\/bilogic.hu\/wp-content\/uploads\/2021\/10\/logo_big-1.jpg","width":501,"height":127,"caption":"BiLogic"},"image":{"@id":"https:\/\/bilogic.hu\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/bilogickft","https:\/\/www.linkedin.com\/company\/bi-logic-kft"]}]}},"_links":{"self":[{"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/pages\/937","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/comments?post=937"}],"version-history":[{"count":2,"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/pages\/937\/revisions"}],"predecessor-version":[{"id":939,"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/pages\/937\/revisions\/939"}],"wp:attachment":[{"href":"https:\/\/bilogic.hu\/en\/wp-json\/wp\/v2\/media?parent=937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}