Hosting a Challenge via EvalAI

This document provides an overview on how to host a code-upload based challenge on EvalAI. A code-upload based challenge is usually a reinforcement learning challenge in which participants upload their trained model in the form of a Docker image. The environment is also a docker image.

Info below extracted from the EvalAI Documentation.

Step 1: Set Up Main Repo

Clone this EvalAI-Starter Repo as a template. For info on how to use a repo as a template, see this.

Step 2: Set Up Challenge Configuration

Open the “challenge_config.yml” in the repo. Update the values of the features in the file based on the characteristics of the challenge. More info about the features can be found here.

Note that the following two features have to have the following values:

  1. remote_evaluation: True

  2. is_docker_based: True

For evaluation to be possible, an AWS Elastic Kubernetes Service (EKS) cluster might need to be created. The following info is needed:

  1. aws_account_id

  2. aws_access_key_id

  3. aws_secret_access_key

  4. aws_region

This info needs to emailed to, who will set up the infrastructure in the AWS account.

Step 3: Define Evaluation Code

A evaluation file needs to be created to determine which metrics will be determined at which phase. This will also evalute the participants’ submissions and post a score to the leaderboard. The environment image should be created by the host and the agent image should be pushed by the participants.

The overall structure of the evaluation code is fixed for architectural reasons.

To define the evaluation code:

  1. Open the file located in EvalAI-Starters/code_upload_challenge_evaluation/environment/.

  2. Edit the evaluator_environment class.

    class evaluator_environment:
        def __init__(self, environment="CartPole-v0"):
            self.score = 0
   = None
            self.env = gym.make(environment)
        def get_action_space(self):
            return list(range(self.env.action_space.n))
        def next_score(self):
            self.score += 1

    There are three methods:

    a) __init__: initialization method
    b) get_action_space: returns the action space of the agent in the environment
    c) next_score: returns or updates the reward achieved

Additional methods can be added as need be.

  1. Edit the Environment class in

    class Environment(evaluation_pb2_grpc.EnvironmentServicer):
        def __init__(self, challenge_pk, phase_pk, submission_pk, server):
            self.challenge_pk = challenge_pk
            self.phase_pk = phase_pk
            self.submission_pk = submission_pk
            self.server = server
        def get_action_space(self, request, context):
            message = pack_for_grpc(env.get_action_space())
            return evaluation_pb2.Package(SerializedEntity=message)
        def act_on_environment(self, request, context):
            global EVALUATION_COMPLETED
            if not or not[2]:
                action = unpack_for_grpc(request.SerializedEntity)
       = env.env.step(action)
                if not LOCAL_EVALUATION:
                        env, self.challenge_pk, self.phase_pk, self.submission_pk
                    print("Final Score: {0}".format(env.score))
                    print("Stopping Evaluation!")
                    EVALUATION_COMPLETED = True
            return evaluation_pb2.Package(
                    {"feedback":, "current_score": env.score,}

    gRPC servers are used to get actions in the form of messages from the agent container. This class can be edited to fit the needs of the current challenge. Seriailzation and deserialization of the messages to be sent across gRPC is needed. The following two methods may be helpful for this:

    a) unpack_for_gprc: this method deserializes entities from request/response sent over gRPC. This is useful for receiving messages (for example, actions from the agent).
    b) pack_for_gprc: this method serializes entities to be sent over a request over gRPC. This is useful for sending messages (for example, feedback from the environment).

  2. Edit the requirements file based on the required packages for the environment.

  3. Edit environment Dockerfile located in EvalAI-Starters/code_upload_challenge_evaluation/docker/environment/ if need be.

  4. Fill in the docker enviroment variables in docker.env located in EvalAI-Starters/code_upload_challenge_evaluation/docker/environment/:

  5. Create a docker image on upload on Amazon Elastic Container Registry (ECR). More info on pushing a docker image to ECR can be found here.

    docker build -f <file_path_to_Dockerfile>
    aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>
    docker tag <image_id> <aws_account_id>.dkr.ecr.<region><my-repository>:<tag>
    docker push <aws_account_id>.dkr.ecr.<region><my-repository>:<tag>
  6. Add environment image to challenge configuration for challenge phase. For each challenge phase, add the link to the environment image.

        - id: 1
        - environment_image: <docker image uri>
  7. Create a starter example for creating the agent: the participants are expected to create a docker image with the policy and methods to interact with the environment. To create the agent environment:

    a) Create the starter script. A template,, is provided in EvalAI-Starters/code_upload_challenge_evaluation/agent/.

    import evaluation_pb2
    import evaluation_pb2_grpc
    import grpc
    import os
    import pickle
    import time
        channel = grpc.insecure_channel("environment:8085")
        channel = grpc.insecure_channel("localhost:8085")
    stub = evaluation_pb2_grpc.EnvironmentStub(channel)
    def pack_for_grpc(entity):
        return pickle.dumps(entity)
    def unpack_for_grpc(entity):
        return pickle.loads(entity)
    flag = None
    while not flag:
        base = unpack_for_grpc(
        flag = base["feedback"][2]
        print("Agent Feedback", base["feedback"])
        print("*"* 100)

    b) Edit requirements.txt located in EvalAI-Starters/code_upload_challenge_evaluation/requirements based on package requirements.
    c) Edit the Dockerfile (if need be) located in EvalAI-Starters/code_upload_challenge_evaluation/docker/agent/ which will interact run to interact with the environment.
    d) Edit docker.env located in EvalAI-Starters/code_upload_challenge_evaluation/docker/agent/ to be:


Step 4: Edit Challenge HTML Templates

Update the HTML templates in EvalAI-Starters/templates. The submission-guidelines.html should be detailed to ensure participants can upload their submissions. The participants are expected to submit links to their docker images using evalai-cli (more info here). The command is:

evalai push <image>:<tag> --phase <phase_name>

At this point, the challenge configuration has been submitted for review and the EvalAI team has been notified. They will review and approve the challenge.

How to Submit Results