Data Mining Talk & Release of Dataset


In my previous blog post, I’ve mentioned that I’ll be releasing the Pokemon spawn data used for the analysis as well as to provide more details to the upcoming data mining w/ Pokemon talk. Apologies for the delay, I’ve been bogged down by lots of work (and holiday). For that, I’ll be releasing a new set of data that I’ve not worked on. Go ahead and write about the nest migration!

And now… Here’s the information about the upcoming data mining talk!

Data Mining w/ Pokemon Go (Singapore)

Together with NTU Cultural Activities Club (CAC), my company has planned a sponsored data mining talk for you guys!

The details of the talk are as followed:

Date: 9 Oct 2016 (Sunday)
Venue: NTU (Exact location to be released to confirmed participants)
Time: 3pm-4pm
Vacancies: 30 (out of 60 left)
Sign up:

The talk will cover the following materials:

  • Data Collection
    Using simple tools to understand how web/mobile application communicate with the web server. This was how we manage to create a script to automatically ‘ask’ the server what Pokemon are around in Singapore.
  • Data Storage
    Storing and retrieving data from a database
  • Data Mining
    Hands-on experience at performing data mining on the Pokemon dataset. The program used is has a drag-and-drop interface and is extremely easy even for absolute beginners.

You will also gain access to the newest Pokemon spawn data to work with. I promise not to work on the dataset, so that you will be the first to uncover the gems from the data. By the end of the talk you probably could answer these questions yourself:

  • How rare is Chansey?
  • Where do I find Charmander?
  • Is Hougang the best place to hunt for Pokemon?
  • Where is the Farfetch’d?

The program is designed for beginners who wants a peek into data science. If you are an educator or HR personnel, contact me for a private session after the talk.

Do note that the vacancies are limited and only confirmed participants will be admitted into the venue. Be sure to sign up to reserve your seats. We will send out a confirmation email/SMS to all confirmed participants.

matrixDownload Pokemon Spawn Data (Singapore)

You can access the old and updated data dump in the following Google Drive Folder:

The folder consists of two main type of dump.

The two zip files contain mongoDB dump. Download these files only if you know how to operate mongoDB. If you would like to know more about restoring mongoDB dumps, visit this link to find out more. The old data contains the dataset used for the previous data mining task documented on my blog. The new data contains data up to two days ago and I’ve not personally worked on them.

The CSV file contains only the newer set of data. Most data mining or statistical tools will allow you to import data in CSV format. You may check out R, RapidMiner or Tableau to work on the data.

I’ll be showing how to use RapidMiner to analyse the dataset in the upcoming talk.

Understanding the fields

The dataset has a few fields, below is the explanation of the fields.

  • pokemon_id – The Pokedex ID of the Pokemon. For example, Pikachu is 25. Do not use the Pokemon name field for your work as some rows are missing that value. Use the pokemon_id instead.
  • latitude, longitude – The geocoordinate of the Pokemon spawn.
  • disappear_time – The despawn time for the Pokemon in Unix Timestamp format. To convert dates to/from the timestamp you may use this tool.

Learning RapidMiner Basics

Once again if you want to learn how to perform basic data mining using a simple drag-and-drop software, be sure to sign up for the upcoming talk before it is full!

Terms of Use

By downloading the dataset, you have agreed to:

  • Not reproducing the files on any form of publication without written permission from the author.
  • Provide permission to the author to publicise the use the data set.

So if you have any interesting project/insights from the use of the data, please do let me know. I will gladly share your insights with the rest of the community.