• Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Taiwan 台湾
      • Liaoning 辽宁
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
Wednesday, January 27, 2021
  • Login
Billy's HOME 比利强
  • Photography
    • All
    • Animals
    • Building
    • Flowers
    • Kids
    • Landscape
    • Portrait
    • Snapshot
    • Star
    2021-01-25 北京下大雪 – 故宫

    2021-01-25 北京下大雪 – 故宫

    2021-01-17 故宫/景山公园

    2021-01-17 故宫/景山公园

    2020-11-24 深圳 蛇口 海上世界

    2020-11-24 深圳 蛇口 海上世界

    2020-11-08 首钢园+新首钢桥

    2020-11-08 首钢园+新首钢桥

    2020-11-08 颐和园

    2020-11-08 颐和园

    2020-11-01 地坛公园 – 银杏

    2020-11-01 地坛公园 – 银杏

    2020-09-19 故宫博物馆 The Palace Museum

    2020-09-12 故宫角楼日出

    2020-09-12 故宫角楼日出

    2020-09-01 北京雨后阳光

    2020-09-01 北京雨后阳光

    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • All
    • China 中国
    • Malaysia 马来西亚
    • South America 南美
    • South Korea 南韩
    • United Arab Emirates 阿聯酋
    2021-01-25 北京下大雪 – 故宫

    2021-01-25 北京下大雪 – 故宫

    2021-01-17 故宫/景山公园

    2021-01-17 故宫/景山公园

    【11月19日-12月27日】万代高达基地 深圳快闪店

    【11月19日-12月27日】万代高达基地 深圳快闪店

    2020-11-24 深圳 蛇口 海上世界

    2020-11-24 深圳 蛇口 海上世界

    2020-11-08 首钢园+新首钢桥

    2020-11-08 首钢园+新首钢桥

    2020-11-08 颐和园

    2020-11-08 颐和园

    2020-11-01 地坛公园 – 银杏

    2020-11-01 地坛公园 – 银杏

    2020-10-24 盘锦

    2020-10-24 盘锦

    2020-09-19 故宫博物馆 The Palace Museum

    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Taiwan 台湾
      • Liaoning 辽宁
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
    2020-07-01 鬼笑石

    2020-07-01 鬼笑石

    2020-04-18 北京黑龙潭

    2020-04-18 北京黑龙潭

    香港大学 -> 龙虎山郊野公园 -> 太平山顶

    香港大学 -> 龙虎山郊野公园 -> 太平山顶

    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    HKUST Alumni 北京响水湖长城风景区

    香港大學 HKU -> 香港仔郊野公園

    香港大學 HKU -> 香港仔郊野公園

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    坪洲一日遊

    坪洲一日遊

  • Food
    老妈牛腩

    老妈牛腩

    北门张家羊肉粉

    北门张家羊肉粉

    全牛道乐山跷脚牛肉(西单店)

    全牛道乐山跷脚牛肉(西单店)

    座银 Zagin soba

    座银 Zagin soba

    豬潤湯麵 – 維記咖啡粉麵

    豬潤湯麵 – 維記咖啡粉麵

    有誠意的芝士撈丁…

    有誠意的芝士撈丁…

    霸王山莊 – 灣仔分店

    霸王山莊 – 灣仔分店

    食盡”九記牛腩”… ^__^

    食盡”九記牛腩”… ^__^

    Burger Joint – 香芒羊肉堡

    Burger Joint – 香芒羊肉堡

  • Blog
    • All
    • Free Talk
    • Life in BJ
    • Machine Learning
    • Movie
    麦路人

    麦路人

    急先锋

    茶餐廳術語,你地又知道幾多呢??

    茶餐廳術語,你地又知道幾多呢??

    男人和女人的必備條件……..

    男人和女人的必備條件……..

    有女友及家室人士注意:10 句女人最憎的說話~~超準~~~

    有女友及家室人士注意:10 句女人最憎的說話~~超準~~~

    整死蟑螂27大絕招!!~

    整死蟑螂27大絕招!!~

    最倒霉的32件事 〔超爆笑〕

    最倒霉的32件事 〔超爆笑〕

    15句讓女生愛你一生的情話

    15句讓女生愛你一生的情話

    野外迷路五招辨南北

    野外迷路五招辨南北

    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
  • Photography
    • All
    • Animals
    • Building
    • Flowers
    • Kids
    • Landscape
    • Portrait
    • Snapshot
    • Star
    2021-01-25 北京下大雪 – 故宫

    2021-01-25 北京下大雪 – 故宫

    2021-01-17 故宫/景山公园

    2021-01-17 故宫/景山公园

    2020-11-24 深圳 蛇口 海上世界

    2020-11-24 深圳 蛇口 海上世界

    2020-11-08 首钢园+新首钢桥

    2020-11-08 首钢园+新首钢桥

    2020-11-08 颐和园

    2020-11-08 颐和园

    2020-11-01 地坛公园 – 银杏

    2020-11-01 地坛公园 – 银杏

    2020-09-19 故宫博物馆 The Palace Museum

    2020-09-12 故宫角楼日出

    2020-09-12 故宫角楼日出

    2020-09-01 北京雨后阳光

    2020-09-01 北京雨后阳光

    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • All
    • China 中国
    • Malaysia 马来西亚
    • South America 南美
    • South Korea 南韩
    • United Arab Emirates 阿聯酋
    2021-01-25 北京下大雪 – 故宫

    2021-01-25 北京下大雪 – 故宫

    2021-01-17 故宫/景山公园

    2021-01-17 故宫/景山公园

    【11月19日-12月27日】万代高达基地 深圳快闪店

    【11月19日-12月27日】万代高达基地 深圳快闪店

    2020-11-24 深圳 蛇口 海上世界

    2020-11-24 深圳 蛇口 海上世界

    2020-11-08 首钢园+新首钢桥

    2020-11-08 首钢园+新首钢桥

    2020-11-08 颐和园

    2020-11-08 颐和园

    2020-11-01 地坛公园 – 银杏

    2020-11-01 地坛公园 – 银杏

    2020-10-24 盘锦

    2020-10-24 盘锦

    2020-09-19 故宫博物馆 The Palace Museum

    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Taiwan 台湾
      • Liaoning 辽宁
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
    2020-07-01 鬼笑石

    2020-07-01 鬼笑石

    2020-04-18 北京黑龙潭

    2020-04-18 北京黑龙潭

    香港大学 -> 龙虎山郊野公园 -> 太平山顶

    香港大学 -> 龙虎山郊野公园 -> 太平山顶

    北京昌平长峪城 – 望幽谷民宿

    北京昌平长峪城 – 望幽谷民宿

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    济州岛 - 第三天 Jeju – Third Day 2019-09-29

    HKUST Alumni 北京响水湖长城风景区

    香港大學 HKU -> 香港仔郊野公園

    香港大學 HKU -> 香港仔郊野公園

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    下雨天再戰 “城門水塘 -> 鉛礦凹 -> 大埔”

    坪洲一日遊

    坪洲一日遊

  • Food
    老妈牛腩

    老妈牛腩

    北门张家羊肉粉

    北门张家羊肉粉

    全牛道乐山跷脚牛肉(西单店)

    全牛道乐山跷脚牛肉(西单店)

    座银 Zagin soba

    座银 Zagin soba

    豬潤湯麵 – 維記咖啡粉麵

    豬潤湯麵 – 維記咖啡粉麵

    有誠意的芝士撈丁…

    有誠意的芝士撈丁…

    霸王山莊 – 灣仔分店

    霸王山莊 – 灣仔分店

    食盡”九記牛腩”… ^__^

    食盡”九記牛腩”… ^__^

    Burger Joint – 香芒羊肉堡

    Burger Joint – 香芒羊肉堡

  • Blog
    • All
    • Free Talk
    • Life in BJ
    • Machine Learning
    • Movie
    麦路人

    麦路人

    急先锋

    茶餐廳術語,你地又知道幾多呢??

    茶餐廳術語,你地又知道幾多呢??

    男人和女人的必備條件……..

    男人和女人的必備條件……..

    有女友及家室人士注意:10 句女人最憎的說話~~超準~~~

    有女友及家室人士注意:10 句女人最憎的說話~~超準~~~

    整死蟑螂27大絕招!!~

    整死蟑螂27大絕招!!~

    最倒霉的32件事 〔超爆笑〕

    最倒霉的32件事 〔超爆笑〕

    15句讓女生愛你一生的情話

    15句讓女生愛你一生的情話

    野外迷路五招辨南北

    野外迷路五招辨南北

    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links
No Result
View All Result
Billy's HOME 比利强
No Result
View All Result

Data Collection and Feature Extraction for Machine Learning

November 16, 2019
in Blog, Machine Learning
0
Data Collection and Feature Extraction for Machine Learning

How to uncover relevant patterns in large amounts of data

If we want machines to act and think like humans, we need to look at how we learned to walk and talk in the first place.

The answer, of course, is data. Babies learn by absorbing a whole lot of information (data) and analyzing it to identify similarities and patterns.

In my previous article on identifying business processes that can be ML-enabled, I showed how to find out which of your processes can benefit from machine learning.

The next step to enabling Machine Learning is data — because, without data, there is no machine learning.

When you think about data, many questions emerge:

  • Where is your data?
  • How much data do you need?
  • What kind of data should you collect?
  • What are the data patterns (features)?
  • How can you identify and extract features from data for Machine Learning?

Let’s address each of these questions so you can apply them to your business one by one.

Preparing and Collecting Data

Where is your data?

Data may be stored in your application database or by other third-party service providers.

If you want to, for example, analyze your users’ spending behavior, you may need to pull out their purchasing records from your own database.

Conversely, if you want to understand user interests, you might need to find third-party service providers who specialize in generating such content.

How much data do you need?

This is an interesting question, but it has no definite answer because “how much” data you need depends on how many features there are in the data set (which we’ll cover in the next section).

I do recommend collecting as much data as possible. Feature selection will help you to filter out the useful nuggets from your big data. Regardless, “big data” will take more time to analyze.

What kind of data should you collect?

Data can be categorized into two types: Structured and Unstructured. Structured Data refers to well-defined types of data that are stored in search-friendly databases, while Unstructured Data is “everything” you can collect — but it’s not search-friendly.

Structured Data:

  • Numbers, dates, strings, etc.
  • Less storage

Unstructured Data:

  • Text files and emails
  • Media files (videos, music, photos)
  • Other large files

According to Gartner, over 80% of an enterprise’s data will be unstructured.

Identifying and Extracting Features

To assist our discussion of data extraction, let’s put down some simple terms.

  • Data: all of the information you can collect, which can be Structured or Unstructured
  • Data set: your collection of data
  • Feature: patterns found in your data set; used to help you extract relevant data for training models
  • Model: your Machine Learning algorithm

What are Features?

As defined above, features are the patterns in your data set that can be used to train models. Good features (which we’ll learn to identify in a moment) can help you to increase the accuracy of your Machine Learning model when predicting or making decisions.

Your data set will have many features, but not all are relevant. With feature selection,you can avoid wasting time calculating and collecting useless patterns that you’ll have to remove later.

Feature selection helps you simplify your ML models and enables faster, more effective training by:

  • Removing unused data;
  • Avoiding “Garbage In Garbage Out”;
  • Reducing overfitting;
  • Improving the accuracy of a Machine Learning model;
  • Avoiding the curse of dimensionality.

Next, let’s talk about methods for feature selection.

How can you identify and extract features from data for Machine Learning?

Now that we’ve covered data collection, it’s time to apply different feature selection methods. This will help you filter useful content from your data for your Machine Learning models.

The three general methods for this are Filter, Wrapper, and Embedded.

Filter Methods

No alt text provided for this image
Filter Methods

The Filter Method uses statistical calculations to compute scores (or ratings) for all features independent from any Machine Learning model. Based on the scores, you can decide which features you want to keep and which to remove. However, this method ignores the relationship between features themselves.

Wrapper Methods

No alt text provided for this image
Wrapper Methods

The Wrapper Method will repeat adding/removing features in the subset and use a model to measure its performance until you choose the best one. However, this will cost a lot of time in computation.

Unlike the Filter Method, in which you must use statistical calculations to develop a subset of features, the Wrapper Method uses a real model to pick the best subset of features based on real performance.

Embedded Methods

No alt text provided for this image
Embedded Methods

The Embedded Method includes its own feature selection methods such as LASSO Regression or RIDGE Regression. In other words, you don’t need to select features yourself.

In Conclusion

Congratulations, you’ve made it to the end of another article! I hope your understanding of Machine Learning is expanding.

Now that we’ve covered the basics of collecting data and selecting features, let’s recap the ground you’ve gained in my series so far:

  • You have a clear definition of Artificial Intelligence and Machine Learning;
  • You know how to identify which business processes in your business can be Machine Learning-enabled ;
  • You know how to find your data and features from big data for Machine Learning models.

In Part 4, I’ll focus on the Machine Learning model.

Thank you for reading! Follow me here and on social media to make sure you don’t miss the next installment. If you found this article useful, a share and some claps would mean the world to me and help fuel the rest of my series.

Questions or comments? I’d be more than happy to answer them here or via email.

You can also find me on LinkedIn, Facebook, Instagram, and my personal website.

共享 Share this:

  • Facebook
  • Twitter
  • LinkedIn
  • Reddit
  • WhatsApp
  • Skype
  • Tumblr
  • Pinterest
  • Email

Like this:

Like Loading...

Related

Previous Post

北京龙潭公园

Next Post

冰雪奇缘2 Frozen II

Related Posts

麦路人
Blog

麦路人

November 4, 2020
Blog

急先锋

October 9, 2020
茶餐廳術語,你地又知道幾多呢??
Blog

茶餐廳術語,你地又知道幾多呢??

July 11, 2020
男人和女人的必備條件……..
Blog

男人和女人的必備條件……..

July 11, 2020
有女友及家室人士注意:10 句女人最憎的說話~~超準~~~
Blog

有女友及家室人士注意:10 句女人最憎的說話~~超準~~~

July 11, 2020
整死蟑螂27大絕招!!~
Blog

整死蟑螂27大絕招!!~

July 11, 2020
Next Post
冰雪奇缘2 Frozen II

冰雪奇缘2 Frozen II

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Trending
  • Comments
  • Latest
随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

July 21, 2020
日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

July 21, 2020

夜拍景山公园

May 30, 2018
晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

July 21, 2020
济州岛 - 第三天 Jeju – Third Day 2019-09-29

济州岛 - 第三天 Jeju – Third Day 2019-09-29

1

2020-07-25 Red Brick Art Museum 红砖美术馆

1

2020-08-02 华熙LIVE(五棵松店)

1

Cathy @ Tai Po

0
2021-01-25 北京下大雪 – 故宫

2021-01-25 北京下大雪 – 故宫

January 25, 2021
2021-01-17 故宫/景山公园

2021-01-17 故宫/景山公园

January 17, 2021
【11月19日-12月27日】万代高达基地 深圳快闪店

【11月19日-12月27日】万代高达基地 深圳快闪店

December 17, 2020
2020-11-24 深圳 蛇口 海上世界

2020-11-24 深圳 蛇口 海上世界

November 24, 2020

Popular Stories

  • 随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

    随拍 – 朝阳公园 – Snapshot in Sun (Chaoyang) Park

    0 shares
    Share 0 Tweet 0
  • 日出-中国国家天文台内蒙古明安图观测站 Sunrise – China National Observatory Inner Mongolia Ming An Tu Observation Station

    0 shares
    Share 0 Tweet 0
  • 夜拍景山公园

    0 shares
    Share 0 Tweet 0
  • 晚霞-中国国家天文台内蒙古明安图观测站 Sunset – China National Observatory Inner Mongolia Ming An Tu Observation Station

    0 shares
    Share 0 Tweet 0
  • 2018-09-16 晚霞-故宮 Sunset – Imperial Palace

    0 shares
    Share 0 Tweet 0

Follow Us

Browse by Category

  • Abu Dhabi 阿布扎比
  • Animals
  • Beijing 北京
  • Blog
  • Bolivia 玻利维亚
  • Building
  • China 中国
  • Dubai 迪拜
  • Flowers
  • Food
  • Free Talk
  • Guangdong 广东
  • Hebei 河北
  • Henan 河南
  • Hiking
  • Hong Kong 香港
  • Jiangxi 江西
  • Kids
  • Kuala Lumpur 吉隆坡
  • Landscape
  • Liaoning 辽宁
  • Life in BJ
  • Machine Learning
  • Malaysia 马来西亚
  • Mongolia 蒙古
  • Movie
  • Peru 秘鲁
  • Photography
  • Portrait
  • Snapshot
  • South America 南美
  • South Korea 南韩
  • Star
  • Taiwan 台湾
  • Tianjin 天津
  • Tibet 西藏
  • Travel
  • United Arab Emirates 阿聯酋

Recent News

2021-01-25 北京下大雪 – 故宫

2021-01-25 北京下大雪 – 故宫

January 25, 2021
2021-01-17 故宫/景山公园

2021-01-17 故宫/景山公园

January 17, 2021
【11月19日-12月27日】万代高达基地 深圳快闪店

【11月19日-12月27日】万代高达基地 深圳快闪店

December 17, 2020
2020-11-24 深圳 蛇口 海上世界

2020-11-24 深圳 蛇口 海上世界

November 24, 2020
  • Photography
  • Travel
  • Hiking
  • Food
  • Blog
  • About Me
  • Links

© 2019 Billy Tang.

No Result
View All Result
  • Photography
    • Landscape
    • Building
    • Flowers
    • Portrait
    • Snapshot
    • Kids
    • Animals
  • Travel
    • China 中国
      • Beijing 北京
      • Tibet 西藏
      • Guangdong 广东
      • Mongolia 蒙古
      • Jiangxi 江西
      • Hebei 河北
      • Henan 河南
      • Tianjin 天津
      • Taiwan 台湾
      • Liaoning 辽宁
    • United Arab Emirates 阿聯酋
      • Dubai 迪拜
      • Abu Dhabi 阿布扎比
    • South America 南美
      • Peru 秘鲁
      • Bolivia 玻利维亚
    • South Korea 南韩
    • Malaysia 马来西亚
      • Kuala Lumpur 吉隆坡
  • Hiking
  • Food
  • Blog
    • Machine Learning
    • Life in BJ
    • Free Talk
    • Movie
  • About Me
  • Links

© 2019 Billy Tang.

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this:
    We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkNoPrivacy policy