• Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Hebei 河北
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
Friday, December 13, 2019
Billy's HOME 比利强
  • Photography
    • All
    • Animals
    • Building
    • Flowers
    • Kids
    • Landscape
    • Portrait
    • Snapshot
    • Star
    北京龙潭公园

    北京龙潭公园

    紫竹院公园

    紫竹院公园

    北京园博园

    北京园博园

    圆明园摄影月赛│6月获奖作品欣赏(2019)

    圆明园摄影月赛│6月获奖作品欣赏(2019)

    故宫角楼

    故宫角楼 – 日出日落

    首尔 – 第一天 Seoul – First Day 2019-10-01

    首尔 – 第一天 Seoul – First Day 2019-10-01

    济州岛 - 第五天 Jeju – Fifth Day 2019-10-01

    济州岛 - 第五天 Jeju – Fifth Day 2019-10-01

    中秋节 – 达里诺尔湖

    觉生寺

    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • All
    • China 中国
    • Malaysia 马来西亚
    • South America 南美
    • South Korea 南韩
    江西省上铙市婺源县 – 2019-11-30

    江西省上铙市婺源县 – 2019-11-30

    北京龙潭公园

    北京龙潭公园

    紫竹院公园

    紫竹院公园

    北京园博园

    北京园博园

    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    2019 MINI酷跑 – 朝阳公园

    2019 MINI酷跑 – 朝阳公园

    首尔 – 第4天 Seoul – Fourth Day 2019-10-04

    首尔 – 第4天 Seoul – Fourth Day 2019-10-04

    首尔 – 第3天 Seoul – Third Day 2019-10-03

    首尔 – 第3天 Seoul – Third Day 2019-10-03

    首尔 – 第2天 Seoul – Second Day 2019-10-02

    首尔 – 第2天 Seoul – Second Day 2019-10-02

    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Hebei 河北
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    HKUST Alumni 北京响水湖长城风景区

    香港大學 HKU -> 香港仔郊野公園

    香港大學 HKU -> 香港仔郊野公園

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    坪洲一日遊

    坪洲一日遊

    魚則魚涌->畢拿山->賽西湖公園

    魚則魚涌->畢拿山->賽西湖公園

    石澳郊野公園

    石澳郊野公園

    石澳郊野公園

  • Food
    北门张家羊肉粉

    北门张家羊肉粉

    全牛道乐山跷脚牛肉(西单店)

    全牛道乐山跷脚牛肉(西单店)

    四季常餐

    座银 Zagin soba

    豬潤湯麵 – 維記咖啡粉麵

    豬潤湯麵 – 維記咖啡粉麵

    有誠意的芝士撈丁…

    有誠意的芝士撈丁…

    霸王山莊 – 灣仔分店

    霸王山莊 – 灣仔分店

    食盡”九記牛腩”… ^__^

    食盡”九記牛腩”… ^__^

    Burger Joint – 香芒羊肉堡

    Burger Joint – 香芒羊肉堡

  • Blog
    • All
    • Free Talk
    • Life in BJ
    • Machine Learning
    • Movie
    The magic behind the hype: Machine Learning Algorithm

    The magic behind the hype: Machine Learning Algorithm

    勇敢者游戏2:再战巅峰 Jumanji: The Next Level

    勇敢者游戏2:再战巅峰 Jumanji: The Next Level

    冰雪奇缘2 Frozen II

    冰雪奇缘2 Frozen II

    Data Collection and Feature Extraction for Machine Learning

    Data Collection and Feature Extraction for Machine Learning

    Identifying Business Processes That Can Be Machine Learning-Enabled

    Identifying Business Processes That Can Be Machine Learning-Enabled

    北京园博园

    北京园博园

    中国机长 - 影评

    中国机长 - 影评

    History of Artificial Intelligence

    History of Artificial Intelligence

    9类摄影师,你是哪一类?Typical Male Photographers That Every Model Knows

    9类摄影师,你是哪一类?Typical Male Photographers That Every Model Knows

    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
  • Photography
    • All
    • Animals
    • Building
    • Flowers
    • Kids
    • Landscape
    • Portrait
    • Snapshot
    • Star
    北京龙潭公园

    北京龙潭公园

    紫竹院公园

    紫竹院公园

    北京园博园

    北京园博园

    圆明园摄影月赛│6月获奖作品欣赏(2019)

    圆明园摄影月赛│6月获奖作品欣赏(2019)

    故宫角楼

    故宫角楼 – 日出日落

    首尔 – 第一天 Seoul – First Day 2019-10-01

    首尔 – 第一天 Seoul – First Day 2019-10-01

    济州岛 - 第五天 Jeju – Fifth Day 2019-10-01

    济州岛 - 第五天 Jeju – Fifth Day 2019-10-01

    中秋节 – 达里诺尔湖

    觉生寺

    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • All
    • China 中国
    • Malaysia 马来西亚
    • South America 南美
    • South Korea 南韩
    江西省上铙市婺源县 – 2019-11-30

    江西省上铙市婺源县 – 2019-11-30

    北京龙潭公园

    北京龙潭公园

    紫竹院公园

    紫竹院公园

    北京园博园

    北京园博园

    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    2019 MINI酷跑 – 朝阳公园

    2019 MINI酷跑 – 朝阳公园

    首尔 – 第4天 Seoul – Fourth Day 2019-10-04

    首尔 – 第4天 Seoul – Fourth Day 2019-10-04

    首尔 – 第3天 Seoul – Third Day 2019-10-03

    首尔 – 第3天 Seoul – Third Day 2019-10-03

    首尔 – 第2天 Seoul – Second Day 2019-10-02

    首尔 – 第2天 Seoul – Second Day 2019-10-02

    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Hebei 河北
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    HKUST Alumni 北京响水湖长城风景区

    香港大學 HKU -> 香港仔郊野公園

    香港大學 HKU -> 香港仔郊野公園

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    坪洲一日遊

    坪洲一日遊

    魚則魚涌->畢拿山->賽西湖公園

    魚則魚涌->畢拿山->賽西湖公園

    石澳郊野公園

    石澳郊野公園

    石澳郊野公園

  • Food
    北门张家羊肉粉

    北门张家羊肉粉

    全牛道乐山跷脚牛肉(西单店)

    全牛道乐山跷脚牛肉(西单店)

    四季常餐

    座银 Zagin soba

    豬潤湯麵 – 維記咖啡粉麵

    豬潤湯麵 – 維記咖啡粉麵

    有誠意的芝士撈丁…

    有誠意的芝士撈丁…

    霸王山莊 – 灣仔分店

    霸王山莊 – 灣仔分店

    食盡”九記牛腩”… ^__^

    食盡”九記牛腩”… ^__^

    Burger Joint – 香芒羊肉堡

    Burger Joint – 香芒羊肉堡

  • Blog
    • All
    • Free Talk
    • Life in BJ
    • Machine Learning
    • Movie
    The magic behind the hype: Machine Learning Algorithm

    The magic behind the hype: Machine Learning Algorithm

    勇敢者游戏2:再战巅峰 Jumanji: The Next Level

    勇敢者游戏2:再战巅峰 Jumanji: The Next Level

    冰雪奇缘2 Frozen II

    冰雪奇缘2 Frozen II

    Data Collection and Feature Extraction for Machine Learning

    Data Collection and Feature Extraction for Machine Learning

    Identifying Business Processes That Can Be Machine Learning-Enabled

    Identifying Business Processes That Can Be Machine Learning-Enabled

    北京园博园

    北京园博园

    中国机长 - 影评

    中国机长 - 影评

    History of Artificial Intelligence

    History of Artificial Intelligence

    9类摄影师,你是哪一类?Typical Male Photographers That Every Model Knows

    9类摄影师,你是哪一类?Typical Male Photographers That Every Model Knows

    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
No Result
View All Result
Home Blog

Data Collection and Feature Extraction for Machine Learning

November 16, 2019
in Blog, Machine Learning
0
Data Collection and Feature Extraction for Machine Learning
0
SHARES
7
VIEWS
Share on FacebookShare on TwitterShare on LinkedinShare on PinterestShare on WhatsappShare on RedditShare on LineShare on Email

How to uncover relevant patterns in large amounts of data

If we want machines to act and think like humans, we need to look at how we learned to walk and talk in the first place.

The answer, of course, is data. Babies learn by absorbing a whole lot of information (data) and analyzing it to identify similarities and patterns.

In my previous article on identifying business processes that can be ML-enabled, I showed how to find out which of your processes can benefit from machine learning.

The next step to enabling Machine Learning is data — because, without data, there is no machine learning.

When you think about data, many questions emerge:

  • Where is your data?
  • How much data do you need?
  • What kind of data should you collect?
  • What are the data patterns (features)?
  • How can you identify and extract features from data for Machine Learning?

Let’s address each of these questions so you can apply them to your business one by one.

Preparing and Collecting Data

Where is your data?

Data may be stored in your application database or by other third-party service providers.

If you want to, for example, analyze your users’ spending behavior, you may need to pull out their purchasing records from your own database.

Conversely, if you want to understand user interests, you might need to find third-party service providers who specialize in generating such content.

How much data do you need?

This is an interesting question, but it has no definite answer because “how much” data you need depends on how many features there are in the data set (which we’ll cover in the next section).

I do recommend collecting as much data as possible. Feature selection will help you to filter out the useful nuggets from your big data. Regardless, “big data” will take more time to analyze.

What kind of data should you collect?

Data can be categorized into two types: Structured and Unstructured. Structured Data refers to well-defined types of data that are stored in search-friendly databases, while Unstructured Data is “everything” you can collect — but it’s not search-friendly.

Structured Data:

  • Numbers, dates, strings, etc.
  • Less storage

Unstructured Data:

  • Text files and emails
  • Media files (videos, music, photos)
  • Other large files

According to Gartner, over 80% of an enterprise’s data will be unstructured.

Identifying and Extracting Features

To assist our discussion of data extraction, let’s put down some simple terms.

  • Data: all of the information you can collect, which can be Structured or Unstructured
  • Data set: your collection of data
  • Feature: patterns found in your data set; used to help you extract relevant data for training models
  • Model: your Machine Learning algorithm

What are Features?

As defined above, features are the patterns in your data set that can be used to train models. Good features (which we’ll learn to identify in a moment) can help you to increase the accuracy of your Machine Learning model when predicting or making decisions.

Your data set will have many features, but not all are relevant. With feature selection,you can avoid wasting time calculating and collecting useless patterns that you’ll have to remove later.

Feature selection helps you simplify your ML models and enables faster, more effective training by:

  • Removing unused data;
  • Avoiding “Garbage In Garbage Out”;
  • Reducing overfitting;
  • Improving the accuracy of a Machine Learning model;
  • Avoiding the curse of dimensionality.

Next, let’s talk about methods for feature selection.

How can you identify and extract features from data for Machine Learning?

Now that we’ve covered data collection, it’s time to apply different feature selection methods. This will help you filter useful content from your data for your Machine Learning models.

The three general methods for this are Filter, Wrapper, and Embedded.

Filter Methods

No alt text provided for this image
Filter Methods

The Filter Method uses statistical calculations to compute scores (or ratings) for all features independent from any Machine Learning model. Based on the scores, you can decide which features you want to keep and which to remove. However, this method ignores the relationship between features themselves.

Wrapper Methods

No alt text provided for this image
Wrapper Methods

The Wrapper Method will repeat adding/removing features in the subset and use a model to measure its performance until you choose the best one. However, this will cost a lot of time in computation.

Unlike the Filter Method, in which you must use statistical calculations to develop a subset of features, the Wrapper Method uses a real model to pick the best subset of features based on real performance.

Embedded Methods

No alt text provided for this image
Embedded Methods

The Embedded Method includes its own feature selection methods such as LASSO Regression or RIDGE Regression. In other words, you don’t need to select features yourself.

In Conclusion

Congratulations, you’ve made it to the end of another article! I hope your understanding of Machine Learning is expanding.

Now that we’ve covered the basics of collecting data and selecting features, let’s recap the ground you’ve gained in my series so far:

  • You have a clear definition of Artificial Intelligence and Machine Learning;
  • You know how to identify which business processes in your business can be Machine Learning-enabled ;
  • You know how to find your data and features from big data for Machine Learning models.

In Part 4, I’ll focus on the Machine Learning model.

Thank you for reading! Follow me here and on social media to make sure you don’t miss the next installment. If you found this article useful, a share and some claps would mean the world to me and help fuel the rest of my series.

Questions or comments? I’d be more than happy to answer them here or via email.

You can also find me on LinkedIn, Facebook, Instagram, and my personal website.

共享 Share this:

  • Facebook
  • Twitter
  • LinkedIn
  • Reddit
  • WhatsApp
  • Skype
  • Tumblr
  • Pinterest
  • Email

Like this:

Like Loading...

Related

Previous Post

北京龙潭公园

Next Post

冰雪奇缘2 Frozen II

Related Posts

The magic behind the hype: Machine Learning Algorithm
Blog

The magic behind the hype: Machine Learning Algorithm

December 8, 2019
勇敢者游戏2:再战巅峰 Jumanji: The Next Level
Blog

勇敢者游戏2:再战巅峰 Jumanji: The Next Level

December 8, 2019
冰雪奇缘2 Frozen II
Blog

冰雪奇缘2 Frozen II

November 23, 2019
Identifying Business Processes That Can Be Machine Learning-Enabled
Blog

Identifying Business Processes That Can Be Machine Learning-Enabled

November 6, 2019
北京园博园
Beijing 北京

北京园博园

November 3, 2019
中国机长 - 影评
Blog

中国机长 - 影评

October 27, 2019
Next Post
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Trending
  • Comments
  • Latest

日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

June 22, 2018

随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

June 23, 2018

晚霞 – 中国尊 Sunset – China Zun

September 17, 2018

晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

June 22, 2018
济州岛 - 第三天 Jeju – Third Day 2019-09-29

济州岛 - 第三天 Jeju – Third Day 2019-09-29

1
The magic behind the hype: Machine Learning Algorithm

The magic behind the hype: Machine Learning Algorithm

0

Cathy @ Tai Po

0
荷花 – Part 1@ Tai Hang Chuen @ 2004-05-23

荷花 – Part 1@ Tai Hang Chuen @ 2004-05-23

0
The magic behind the hype: Machine Learning Algorithm

The magic behind the hype: Machine Learning Algorithm

December 8, 2019
勇敢者游戏2:再战巅峰 Jumanji: The Next Level

勇敢者游戏2:再战巅峰 Jumanji: The Next Level

December 8, 2019
江西省上铙市婺源县 – 2019-11-30

江西省上铙市婺源县 – 2019-11-30

December 8, 2019
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

November 23, 2019

Recent News

The magic behind the hype: Machine Learning Algorithm

The magic behind the hype: Machine Learning Algorithm

December 8, 2019
勇敢者游戏2:再战巅峰 Jumanji: The Next Level

勇敢者游戏2:再战巅峰 Jumanji: The Next Level

December 8, 2019
江西省上铙市婺源县 – 2019-11-30

江西省上铙市婺源县 – 2019-11-30

December 8, 2019
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

November 23, 2019
Billy’s HOME 比利强

Follow Us

Browse by Category

  • Animals
  • Beijing 北京
  • Blog
  • Bolivia 玻利维亚
  • Building
  • China 中国
  • Flowers
  • Food
  • Free Talk
  • Hiking
  • Jiangxi 江西
  • Kids
  • Kuala Lumpur 吉隆坡
  • Landscape
  • Life in BJ
  • Machine Learning
  • Malaysia 马来西亚
  • Movie
  • Peru 秘鲁
  • Photography
  • Portrait
  • Snapshot
  • South America 南美
  • South Korea 南韩
  • Star
  • Tibet 西藏
  • Travel

Recent News

The magic behind the hype: Machine Learning Algorithm

The magic behind the hype: Machine Learning Algorithm

December 8, 2019
勇敢者游戏2:再战巅峰 Jumanji: The Next Level

勇敢者游戏2:再战巅峰 Jumanji: The Next Level

December 8, 2019
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links

© 2019 Billy Tang.

No Result
View All Result
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Hebei 河北
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links

© 2019 Billy Tang.

Login to your account below

Forgotten Password?

Fill the forms bellow to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this: