What is ‘Dataset’?
To develop powerful artificial intelligence, the algorithm needs to train with high-quality data. In the case of training the AlphaGo’s neural networks, it fetches about 30 million moves from 160,000 games of high-level players (6th to 9th level) among the Go games on the KGS public server (https://ww.gokgs.com/). Like so, we need millions of facial photo data to train AI model that determines age and gender from the face, and voice data of people from various ages, genders, and regions are necessary to develop AI speakers (voice recognition). These collections of data are called Datasets which are used to train AI training (machine learning, deep learning, and more).

Check out the list of Open Datasets below
World Bank Open Data
: https://data.worldbank.org/
Kaggle Datasets
: https://www.kaggle.com/datasets
imageNet
: http://image-net.org
MNIST Dataset
: http://yann.lecun.com/exdb/mnist/
Program Overview
- With the AI Training Data Grant Program, SelectStar supports R&D on AI by granting AI training datasets to selected team or individual; small businesses · startup · research institutions · individual researchers. (This is not a monetary grant program.)

- SKTelecom True Innovation, Kakao Ventures, Korea Artificial Intelligence Association joined as program partners.

- Total of 10 teams will be selected and receive $70,000 worth of dataset each. (SelectStar will build and deliever the dataset proposed on the applicant's application form.)

- The selected teams' business plans will be delivered to Kakao Ventures to review their investment potential.(Upon selected team's request. Not Mandatory)

- The selected teams must use the dataset for their own business or for research purposes. In addition, all or part of the datasets will be freely open to the public to support the advancement of the Artificial Intelligence fields. (In the case of companies, the quantity of to-be-opened datasets;none, partial or all, will be decided through the negotiation. We understand that opening whole AI training dataset can be critical for your business. But we strongly advise you to open as much as you can.)

* Please check the notes and PDF file(announcement file) at this page for the updates on policies. Note that the PDF file(announcement file) may be revised, and please keep checking them until the final submission. (If you have already submitted, we will contact you individually regarding the updated notices)

Program Duration : After the application deadline of November 30th, 2020, until the final production of dataset


Detailed Timeline
2020.11.30. || Application Deadline
2020.12.04. || Application Review and Selection of Teams for Presentation Interview (planned)

- Application judges will be selected from both internal members of SelectStar and external advisory members

- Selected 15 applications will be moved on to next round Presentation Interviews

2020.12.09. || Presentation Interviews (with Zoom online meeting)
2020.12.11. || Announcement of Final Teams

- Through the Presentation Interviews, final 10 teams will be selected.

2020.12.15~ || Program kickoff meeting
2021.01.00. || Dataset production begins
Purpose of the Program & How We Collect/Label Datasets
+
Visit SelectStar’s official website (https://selectstar.ai) to see examples of image/video/audio/text collecting & labeling.
Check out the list of Open Datasets below
World Bank Open Data (https://data.worldbank.org/)
Kaggle Datasets (https://www.kaggle.com/datasets)
imageNet (http://image-net.org)
MNIST Dataset (http://yann.lecun.com/exdb/mnist/)
The program, supports the production of AI training datasets for small businesses · startup · research institutions · individual researchers to develop A.I. algorithms. Moreover, we aim to freely open all or part of the datasets to the public for the advancement of Artificial Intelligence field (In the case of companies, the quantity of to-be-opened datasets;none, partial or all, will be decided through the negotiation. Research institutions or government institutions are required to deliver the dataset open to the public). Therefore, we recommend that teams that generate more universal datasets for A.I. development rather than vertically targeted for specific businesses/research to apply this program.

Dataset production(collecting & labeling) for selected teams will be processed through SelectStar’s web/app crowdsourcing platform CashMission (check https://selectstar.ai for info). CashMission collects or labels data through crowdsourcing the tasks to a large number of unspecified publics. Therefore, on application, please propose a dataset that can be collected and processed with CashMission's web/mobile based crowdsourcing platform.

For more information on ‘CashMission,’ please visit https://selectstar.ai. If necessary, teams may contact us (email: contact@selectstar.ai) to consult about the feasibility of collecting and labeling(annotation) the proposed dataset. However, please do so in advance (Just before the deadline, it is highly unlikely that we answer the email due to high demands of inquiries. SelectStar does not hold any responsibility arising from the unanswered inquiries)

Teams can suggest a new and original dataset or a dataset that has already been built, but needs to be newly modified in accordance with the specific environment (Advanced ImageNet dataset, english dialact dataset, world's traditional food dataset, etc.).
Primary Screening Criteria
Practicality, Universality of datasets · capability of using CashMission platform for data collection/labeling · Excellence in research/development plans using datasets
(Final scores will not be disclosed, and if there is any issue or violation of our policy, we may reject even after the selection)
Announcement
Download link (Announcement date : Nov. 2nd, 2020) : PDF file
Program Eligibility
Startups, small companies, research institutions, university labs, individual researchers.
How to Apply
Download and complete the Application Bundle (Application form, Program Agreement, Dataset Proposal)
And then, fill out and upload the bundle to following Google Form : https://forms.gle/Tygs9Jo6cH5hj8vM6
(To upload, you need Google ID)

Downloadable Link for application bundle : .docx format (Last update : Nov 3rd)
Franquesntly Asked Questions


SelectStar is a startup that collects and labels training data for AI. We started as a team of 6 college friends and now we have more than 60 members. SelectStar is one of the fastest growing AI Data Startup in the industry.


There are so many advantage we get from this program. Here are some of them:

First, publicity by helping research.
SelectStar already became one of the top AI data startup but there are so many people still don't know about us yet. We expect to gain academic publicity through program. That's why we put 'Usage Plan' on application form. Especially, selected research institutes are required submit paper to journals or conferences after conducting research using granted dataset.

Second, to reach out to more future customers
Our clients varies from small startups to large international corporations like LG. Through this program, we expect to reach out to researchers in universities and research institutes that might become our future clients.

Third, most importantly, to make 'open datasets' to establish our international presence as an AI data leader
By contributing to AI field of research, we can establish our presence as AI data leader in the industry. We first planned to make and open datasets by ourselves. And then, one of our colleague came up with a bright idea.
"If we are gonna spend the money to make open datasets anyways, why don't we help researchers/startups and make open datasets at the same time?"
Boom! It all started from here :-)


So, except "1. Cooperate to our Policy of 'aiming for making open datasets'" and "2. Selected team should try the best to make good useage of the dataset for R&D and acknowledge us (hopefully on research papers) according to 'Usage Plan' on their application,"
there are NO STRINGS ATTACHED!!
No hidden fees or payments at all!

(In the case of companies, the quantity of to-be-opened datasets;none, partial or all, will be decided through the negotiation. We understand that opening whole AI training dataset can be critical for your business. But we strongly advise you to open as much as you can.)


Not much. Application form including 'Usage Plan' is only 4~5 pages long. Since this isn't monetary grant program, there is no financial audits at all.
You don't have to write full report on dataset usage either. You can just show us via Zoom or e-mail us about your project's progress. We prefer chatting via e-mail or Zoom calls.

If you are selected for interview, you DO have to prepare for the presentation, though. But keep it short and simple. No fancy graphic designs. We don't want this to turn into a presentation contest. We only want to learn more about you and your idea. You may choose to just talk through the application form for presentation. It is perfectly fine with us if the your form contains everything you want to talk about.

But still, if you are selected as one of final teams, you have to give lots of details on specs and guideline for your dataset. We are gonna bother you a lot asking about details and guidelines of dataset. Because the more we talk about the dataset guidelines, the better output will follow.

Contact Us
E-mail. contest@selectstar.ai
SelectStar Website selectstar.ai