• Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
Saturday, June 3, 2023
  • Login
Billy's HOME 比利强
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
No Result
View All Result
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links

Data Collection and Feature Extraction for Machine Learning

November 16, 2019
in Blog, Machine Learning
Reading Time: 7 mins read
0 0
0
Data Collection and Feature Extraction for Machine Learning

How to uncover relevant patterns in large amounts of data

If we want machines to act and think like humans, we need to look at how we learned to walk and talk in the first place.

The answer, of course, is data. Babies learn by absorbing a whole lot of information (data) and analyzing it to identify similarities and patterns.

In my previous article on identifying business processes that can be ML-enabled, I showed how to find out which of your processes can benefit from machine learning.

The next step to enabling Machine Learning is data — because, without data, there is no machine learning.

When you think about data, many questions emerge:

  • Where is your data?
  • How much data do you need?
  • What kind of data should you collect?
  • What are the data patterns (features)?
  • How can you identify and extract features from data for Machine Learning?

Let’s address each of these questions so you can apply them to your business one by one.

Preparing and Collecting Data

Where is your data?

Data may be stored in your application database or by other third-party service providers.

If you want to, for example, analyze your users’ spending behavior, you may need to pull out their purchasing records from your own database.

Conversely, if you want to understand user interests, you might need to find third-party service providers who specialize in generating such content.

How much data do you need?

This is an interesting question, but it has no definite answer because “how much” data you need depends on how many features there are in the data set (which we’ll cover in the next section).

I do recommend collecting as much data as possible. Feature selection will help you to filter out the useful nuggets from your big data. Regardless, “big data” will take more time to analyze.

What kind of data should you collect?

Data can be categorized into two types: Structured and Unstructured. Structured Data refers to well-defined types of data that are stored in search-friendly databases, while Unstructured Data is “everything” you can collect — but it’s not search-friendly.

Structured Data:

  • Numbers, dates, strings, etc.
  • Less storage

Unstructured Data:

  • Text files and emails
  • Media files (videos, music, photos)
  • Other large files

According to Gartner, over 80% of an enterprise’s data will be unstructured.

Identifying and Extracting Features

To assist our discussion of data extraction, let’s put down some simple terms.

  • Data: all of the information you can collect, which can be Structured or Unstructured
  • Data set: your collection of data
  • Feature: patterns found in your data set; used to help you extract relevant data for training models
  • Model: your Machine Learning algorithm

What are Features?

As defined above, features are the patterns in your data set that can be used to train models. Good features (which we’ll learn to identify in a moment) can help you to increase the accuracy of your Machine Learning model when predicting or making decisions.

Your data set will have many features, but not all are relevant. With feature selection,you can avoid wasting time calculating and collecting useless patterns that you’ll have to remove later.

Feature selection helps you simplify your ML models and enables faster, more effective training by:

  • Removing unused data;
  • Avoiding “Garbage In Garbage Out”;
  • Reducing overfitting;
  • Improving the accuracy of a Machine Learning model;
  • Avoiding the curse of dimensionality.

Next, let’s talk about methods for feature selection.

How can you identify and extract features from data for Machine Learning?

Now that we’ve covered data collection, it’s time to apply different feature selection methods. This will help you filter useful content from your data for your Machine Learning models.

The three general methods for this are Filter, Wrapper, and Embedded.

Filter Methods

No alt text provided for this image
Filter Methods

The Filter Method uses statistical calculations to compute scores (or ratings) for all features independent from any Machine Learning model. Based on the scores, you can decide which features you want to keep and which to remove. However, this method ignores the relationship between features themselves.

Wrapper Methods

No alt text provided for this image
Wrapper Methods

The Wrapper Method will repeat adding/removing features in the subset and use a model to measure its performance until you choose the best one. However, this will cost a lot of time in computation.

Unlike the Filter Method, in which you must use statistical calculations to develop a subset of features, the Wrapper Method uses a real model to pick the best subset of features based on real performance.

Embedded Methods

No alt text provided for this image
Embedded Methods

The Embedded Method includes its own feature selection methods such as LASSO Regression or RIDGE Regression. In other words, you don’t need to select features yourself.

In Conclusion

Congratulations, you’ve made it to the end of another article! I hope your understanding of Machine Learning is expanding.

Now that we’ve covered the basics of collecting data and selecting features, let’s recap the ground you’ve gained in my series so far:

  • You have a clear definition of Artificial Intelligence and Machine Learning;
  • You know how to identify which business processes in your business can be Machine Learning-enabled ;
  • You know how to find your data and features from big data for Machine Learning models.

In Part 4, I’ll focus on the Machine Learning model.

Thank you for reading! Follow me here and on social media to make sure you don’t miss the next installment. If you found this article useful, a share and some claps would mean the world to me and help fuel the rest of my series.

Questions or comments? I’d be more than happy to answer them here or via email.

You can also find me on LinkedIn, Facebook, Instagram, and my personal website.

共享 Share this:

  • Facebook
  • Twitter
  • LinkedIn
  • Reddit
  • WhatsApp
  • Skype
  • Tumblr
  • Pinterest
  • Email

Like this:

Like Loading...

Related

Previous Post

北京龙潭公园

Next Post

冰雪奇缘2 Frozen II

Related Posts

Finance News

US lawmakers aim for crypto regulatory clarity with proposed bill putting the screws to SEC

June 3, 2023
Finance News

Arbitrum price soars after DeFi whale address resumes ARB accumulation

June 3, 2023
Finance News

Losses from crypto rug pulls outpaced DeFi exploits in May: Finance Redefined

June 3, 2023
Finance News

Privacy-focused Aleo blockchain gets new wallet as mainnet launch approaches

June 3, 2023
Finance News

Evertas expands crypto insurance offerings to include mining and raises limits

June 3, 2023
Finance News

Price analysis 6/2: BTC, ETH, BNB, XRP, ADA, DOGE, MATIC, SOL, DOT, LTC

June 3, 2023
Next Post
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Trending
  • Comments
  • Latest
随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

July 21, 2020

夜拍景山公园

May 30, 2018
日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

July 21, 2020
晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

July 21, 2020
济州岛 - 第三天 Jeju – Third Day 2019-09-29

济州岛 - 第三天 Jeju – Third Day 2019-09-29

1

2020-07-25 Red Brick Art Museum 红砖美术馆

1

2020-08-02 华熙LIVE(五棵松店)

1
2022-06-18 五星连珠 Planetary Alignment in Shanghai

2022-06-18 五星连珠 Planetary Alignment in Shanghai

1

US lawmakers aim for crypto regulatory clarity with proposed bill putting the screws to SEC

June 3, 2023

Arbitrum price soars after DeFi whale address resumes ARB accumulation

June 3, 2023

Losses from crypto rug pulls outpaced DeFi exploits in May: Finance Redefined

June 3, 2023

Privacy-focused Aleo blockchain gets new wallet as mainnet launch approaches

June 3, 2023

Popular Stories

  • 随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

    随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

    0 shares
    Share 0 Tweet 0
  • 夜拍景山公园

    0 shares
    Share 0 Tweet 0
  • 日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

    0 shares
    Share 0 Tweet 0
  • 晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

    0 shares
    Share 0 Tweet 0
  • 2018-09-16 晚霞-故宮 Sunset – Imperial Palace

    0 shares
    Share 0 Tweet 0
ADVERTISEMENT

Browse by Category

  • Abu Dhabi 阿布扎比
  • Animals
  • Beijing 北京
  • Blog
  • Bolivia 玻利维亚
  • Building
  • China 中国
  • Dubai 迪拜
  • Finance News
  • Flowers
  • Food
  • Free Talk
  • Fujian 福建
  • Guangdong 广东
  • Hebei 河北
  • Henan 河南
  • Hiking
  • Hong Kong 香港
  • Investment
  • Jiangxi 江西
  • Kids
  • Kuala Lumpur 吉隆坡
  • Landscape
  • Liaoning 辽宁
  • Life in BJ
  • Machine Learning
  • Malaysia 马来西亚
  • Mongolia 蒙古
  • Movie
  • Peru 秘鲁
  • Photo News
  • Photography
  • Portrait
  • Shanghai 上海
  • Snapshot
  • South America 南美
  • South Korea 南韩
  • Star
  • Taiwan 台湾
  • Tianjin 天津
  • Tibet 西藏
  • Travel
  • United Arab Emirates 阿聯酋

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 30 other subscribers.

Recent News

US lawmakers aim for crypto regulatory clarity with proposed bill putting the screws to SEC

June 3, 2023

Arbitrum price soars after DeFi whale address resumes ARB accumulation

June 3, 2023

Losses from crypto rug pulls outpaced DeFi exploits in May: Finance Redefined

June 3, 2023

Privacy-focused Aleo blockchain gets new wallet as mainnet launch approaches

June 3, 2023
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links

© 2010-2022 Billy Tang
Supported By Growth SpeedUp Company

No Result
View All Result
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links

© 2010-2022 Billy Tang
Supported By Growth SpeedUp Company

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
 

Loading Comments...
 

    %d bloggers like this: