• Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
  • 易經數字
Tuesday, May 13, 2025
  • Login
Billy's HOME 比利强
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
  • 易經數字
No Result
View All Result
Billy's HOME 比利强
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
  • 易經數字
No Result
View All Result
Billy's HOME 比利强
No Result
View All Result
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links
  • 易經數字

Data Collection and Feature Extraction for Machine Learning

November 16, 2019
in Blog, Machine Learning
Reading Time: 7 mins read
0 0
0
Data Collection and Feature Extraction for Machine Learning

How to uncover relevant patterns in large amounts of data

If we want machines to act and think like humans, we need to look at how we learned to walk and talk in the first place.

The answer, of course, is data. Babies learn by absorbing a whole lot of information (data) and analyzing it to identify similarities and patterns.

In my previous article on identifying business processes that can be ML-enabled, I showed how to find out which of your processes can benefit from machine learning.

The next step to enabling Machine Learning is data — because, without data, there is no machine learning.

When you think about data, many questions emerge:

  • Where is your data?
  • How much data do you need?
  • What kind of data should you collect?
  • What are the data patterns (features)?
  • How can you identify and extract features from data for Machine Learning?

Let’s address each of these questions so you can apply them to your business one by one.

Preparing and Collecting Data

Where is your data?

Data may be stored in your application database or by other third-party service providers.

If you want to, for example, analyze your users’ spending behavior, you may need to pull out their purchasing records from your own database.

Conversely, if you want to understand user interests, you might need to find third-party service providers who specialize in generating such content.

How much data do you need?

This is an interesting question, but it has no definite answer because “how much” data you need depends on how many features there are in the data set (which we’ll cover in the next section).

I do recommend collecting as much data as possible. Feature selection will help you to filter out the useful nuggets from your big data. Regardless, “big data” will take more time to analyze.

What kind of data should you collect?

Data can be categorized into two types: Structured and Unstructured. Structured Data refers to well-defined types of data that are stored in search-friendly databases, while Unstructured Data is “everything” you can collect — but it’s not search-friendly.

Structured Data:

  • Numbers, dates, strings, etc.
  • Less storage

Unstructured Data:

  • Text files and emails
  • Media files (videos, music, photos)
  • Other large files

According to Gartner, over 80% of an enterprise’s data will be unstructured.

Identifying and Extracting Features

To assist our discussion of data extraction, let’s put down some simple terms.

  • Data: all of the information you can collect, which can be Structured or Unstructured
  • Data set: your collection of data
  • Feature: patterns found in your data set; used to help you extract relevant data for training models
  • Model: your Machine Learning algorithm

What are Features?

As defined above, features are the patterns in your data set that can be used to train models. Good features (which we’ll learn to identify in a moment) can help you to increase the accuracy of your Machine Learning model when predicting or making decisions.

Your data set will have many features, but not all are relevant. With feature selection,you can avoid wasting time calculating and collecting useless patterns that you’ll have to remove later.

Feature selection helps you simplify your ML models and enables faster, more effective training by:

  • Removing unused data;
  • Avoiding “Garbage In Garbage Out”;
  • Reducing overfitting;
  • Improving the accuracy of a Machine Learning model;
  • Avoiding the curse of dimensionality.

Next, let’s talk about methods for feature selection.

How can you identify and extract features from data for Machine Learning?

Now that we’ve covered data collection, it’s time to apply different feature selection methods. This will help you filter useful content from your data for your Machine Learning models.

The three general methods for this are Filter, Wrapper, and Embedded.

Filter Methods

No alt text provided for this image
Filter Methods

The Filter Method uses statistical calculations to compute scores (or ratings) for all features independent from any Machine Learning model. Based on the scores, you can decide which features you want to keep and which to remove. However, this method ignores the relationship between features themselves.

Wrapper Methods

No alt text provided for this image
Wrapper Methods

The Wrapper Method will repeat adding/removing features in the subset and use a model to measure its performance until you choose the best one. However, this will cost a lot of time in computation.

Unlike the Filter Method, in which you must use statistical calculations to develop a subset of features, the Wrapper Method uses a real model to pick the best subset of features based on real performance.

Embedded Methods

No alt text provided for this image
Embedded Methods

The Embedded Method includes its own feature selection methods such as LASSO Regression or RIDGE Regression. In other words, you don’t need to select features yourself.

In Conclusion

Congratulations, you’ve made it to the end of another article! I hope your understanding of Machine Learning is expanding.

Now that we’ve covered the basics of collecting data and selecting features, let’s recap the ground you’ve gained in my series so far:

  • You have a clear definition of Artificial Intelligence and Machine Learning;
  • You know how to identify which business processes in your business can be Machine Learning-enabled ;
  • You know how to find your data and features from big data for Machine Learning models.

In Part 4, I’ll focus on the Machine Learning model.

Thank you for reading! Follow me here and on social media to make sure you don’t miss the next installment. If you found this article useful, a share and some claps would mean the world to me and help fuel the rest of my series.

Questions or comments? I’d be more than happy to answer them here or via email.

You can also find me on LinkedIn, Facebook, Instagram, and my personal website.

共享 Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to email a link to a friend (Opens in new window) Email

Like this:

Like Loading...

Related

Previous Post

北京龙潭公园

Next Post

冰雪奇缘2 Frozen II

Related Posts

Blog

CleanTalk: A Game-Changer in Combating WordPress Spam Registrations

July 26, 2024
Finance News

Why was Ryan Salame given a sentence longer than prosecutors recommended?

May 29, 2024
Finance News

‘Fly high, Bette’: This 88-year-old started her flight attendant career in 1957 and never officially retired

May 29, 2024
Finance News

Melinda French Gates is going her own way. Donating $1 billion to women’s rights after leaving the foundation she created with her ex-husband

May 29, 2024
Finance News

Country club seeks higher price in sale of 2,000-year-old sacred ceremonial grounds

May 29, 2024
Finance News

Hess shareholders to vote on Chevron deal as dispute with Exxon over Guyana assets creates uncertainty

May 29, 2024
Next Post
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

Please login to join discussion
ADVERTISEMENT

Browse by Category

  • Abu Dhabi 阿布扎比
  • Animals
  • Beijing 北京
  • Blog
  • Bolivia 玻利维亚
  • Building
  • China 中国
  • Dubai 迪拜
  • Finance News
  • Flowers
  • Food
  • Free Talk
  • Fujian 福建
  • Guangdong 广东
  • Hebei 河北
  • Henan 河南
  • Hiking
  • Hong Kong 香港
  • Investment
  • Jiangxi 江西
  • Kids
  • Kuala Lumpur 吉隆坡
  • Landscape
  • Liaoning 辽宁
  • Life in BJ
  • Machine Learning
  • Malaysia 马来西亚
  • Mongolia 蒙古
  • Movie
  • Peru 秘鲁
  • Photo News
  • Photography
  • Portrait
  • Shanghai 上海
  • Snapshot
  • South America 南美
  • South Korea 南韩
  • Star
  • Taiwan 台湾
  • Tianjin 天津
  • Tibet 西藏
  • Travel
  • United Arab Emirates 阿聯酋

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 30 other subscribers.
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links
  • 易經數字

© 2010-2022 Billy Tang
Supported By Growth SpeedUp Company

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Shanghai 上海
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Fujian 福建
      • Liaoning 辽宁
      • Hong Kong 香港
      • Taiwan 台湾
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
  • 易經數字

© 2010-2022 Billy Tang
Supported By Growth SpeedUp Company

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Discover more from Billy's HOME 比利强

Subscribe now to keep reading and get access to the full archive.

Continue reading

 

Loading Comments...
 

You must be logged in to post a comment.

    %d