skip to content
anntz.com

ENTRYPOINT

because a container’s lifecycle depends on the starting process, how can we define it? this is where ENTRYPOINT comes into action. this is a docker feature, but we can use it to pass commandline arguments through Kubernetes.

suppose, we have an image of ubuntu, and, we want to run the sleep process upon start. But, like most of the processes, the sleep process takes an argument. So, how can we pass it?

for instance, in this example dockerfile

FROM unbuntu

CMD sleep 5

we are running the sleep process with the argument of 5. We can set it up like this, but, we also can, define the name of the execution as ENTRYPOINT and pass the argument like this:

FROM unbuntu

ENTRYPOINT ["sleep"]
CMD ["5"]

after which we can build: docker build -t my-sleepy-ubuntu . & run: docker run my-sleepy-ubuntu 5.

this will run the sleep process with the argument of 5.

if we don’t pass the command line argument, the image will default to 5. to override this entrypoint, we can run docker run --name my-sleepy-ubuntu —entrypoint myscript my-sleepy-ubuntu 20`

to run this docker image within a kubernetes pod, we can pass it as args.

apiVersion: v1
kind: Pod
metadata:
  name: my-sleepy-ubuntu
spec:
  containers:
    - name: my-sleepy-ubuntu
      image: my-sleepy-ubuntu
      args: ["10"]

to which, under the hood, kubernetes will build and run it using something like this: docker run my-sleepy-ubuntu 10.

Environment Variables

Entrypoints are mainly used for processess, but, as a developer, we mostly use Environment variables to get these configuration in our applications.

key value pair

a simple way to defining environment variables in docker would be using a key-value pair. an imperitive way would be something like this: docker run -e APP_PORT=8081 my-backend-app

this can be done in kubernetes by passing the key-value pair within our env key configuration:

apiVersion: v1
kind: Pod
metadata:
  name: my-backend-app
spec:
  containers:
    - image: my-backend-app
      name: my-backend-app
      ports:
        - containerPort: 8080
      env:
        - name: APP_PORT
          value: 8081

config maps

what if we have multiple pods defintion files, the configuration will be scattered across multiple files thus making it difficult to manage. what if there was a way to manage these configurations within a single file and use it across all other pod defintions. this is where config maps comes into action.

  • config maps using imperitive way: kubectl create configmap my-app-config --from-literal=PORT=8081 --from-literal=COLOR=red or, directly specify the application configuration file: kubectl create configmap my-app-config --from-file=app_config.properties

  • config maps using declarative way:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_PORT: 8081
  COLOR: red

now, we can inject this as:

apiVersion: v1
kind: Pod
metadata:
  name: my-backend-app
spec:
  containers:
    - image: my-backend-app
      name: my-backend-app
      ports:
        - containerPort: 8080
      env:
        - name: APP_PORT
          valueFrom:
            configMapKeyRef:
                name: app-config
                key: APP_PORT

but, what if we have 100s of config map files. for that, we can specify a volume dedicated to store the config maps.

volumes:
  - name: app-config-volume
    configMap:
      name: app-config

So, we’ve got config maps to store our environment specific constants. but, should we put sensitive keys that can be easily viewable by other users like secrets, passwords in config maps? that’s an issue!

to solve this, kubernetes has kubernetes secrets.

secrets

secrets are like config maps but these are stored in an encoded format.

  • create secrets using imperitive approach: kubectl create secret generic my-app-secret --from-literal=DB_Host=mysql --from-literal=DB_Password=hemlo or using a file kubectl create secret generic my-app-secret --from-file=app-secrets.properties

  • create secrets using declarative approach:

apiVersion: v1
kind: Secret
metadata:
  name: my-app-secret
data:
  DB_Host: mysql
  DB_Password: hemlo

but, there’s a probelm, as the data is yet to be encoded, so, to encode it manually, we can use base64 executable in our unix env. echo -n 'mysql' | base64 this command will encode the given text into base64, which, we can add in our secrets.

unlike configmaps, if we view secret information using kubectl describe secret SECRET_NAME, it will show attributes but hide the values. to view the values, we will have to get the output as yaml using kubectl describe secret SECRET_NAME -o yaml.

now, to use this in our pod,

  • ENV
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - image: my-app-image
      name: my-app
      ports:
        - containerPort: 8081
      envFrom:
        - secretRef:
            name: app-secret
  • SINGLE ENV
env:
  - name: DB_Password
    valueFrom:
      secretKeyRef:
        name: app-secret
        key: DB_Password
  • VOLUMES ENV
volumes:
  - name: app-secrets-volume
    secret:
      secretName: app-secret

with volume approach, it will create files based on the available secret name. the contents of the files will be the values of the secret.

but, at the end of the day, secrets are not encrypted in etcd, but only encoded. so, it’s better to enable encryption at rest. we can also use third-party secrets store providers from AWS, Azure, GCP Vault Providers, etc

security context

docker security

  • process isolation within docker docker isolates the processes within containers by using namespaces. for instance, if we run a sleep process within a container from ubuntu image, we’ll be able to view the process id inside the container(using ps aux) as PID of 1. but, if we look into the processes within the host system, we’ll be able to see the same process but with a different process id. this is how docker isolates processes in different environments.

  • user isolation within docker docker host uses root user and by default, docker run’s processes within containers as root users. we can also change the root user to any other user for security. for instance docker run --user-=69420 ubuntu sleep 1000, will run the process sleep within container with the user-id: 69420. this can be done impartively using:

    FROM ubuntu
    
    USER 69420

    but, even so, if we run the processes within container as a root user, is it the same root user in the host? doesn’t that bring security concerns? because of this reason, docker implements a set of security feature that limits the root user within a container to the root privelage in the host.

    tldr; root user within container != root uaser within host

    just like docker, these can be configured in kubernetes aswell. but, unlike docker, where we can set these configuration in containers only, kubernetes allows us to set these config in either containers or pods.

    running processes within the pod by:

    apiVersion: v1
    kind: Pod
    metadata:
      name: web-pod
    spec:
      securityContext:
        runs: 1000
      containers:
        - name: ubuntu
          image: ubuntu
          command: ["sleep", "3600"]

    we can also run the process within the containers by:

    running processes within the pod with user-id 1000.

    apiVersion: v1
    kind: Pod
    metadata:
      name: web-pod
    spec:
      containers:
        - name: ubuntu
          image: ubuntu
          command: ["sleep", "3600"]
          securityContext:
            runs: 1000
            capabilities:
              add: ["MAC_ADMIN"]

access accounts

kubernetes has two types of accounts.

  • user accounts
  • service accounts

user accounts

user accounts are used by user entitites in the project. for instance, there might be two users accessing the administer-cluster:

  • administrator account
  • developer account

service accounts

service accounts are generally used by services. these acts as an authentication and authorization access points for services in a cluster. for instance, there might be services such as:

  • prometheus: which needs cluster info to generate performance metrices.
  • jenkins: which needs cluster access to deploy applications.

create service account: kubectl create serviceaccount ACCOUNT_NAME get service accounts: kubectl get serviceaccounts

with the creation of a service account, kubernetes generates a TOKEN. the token is stored in a SECRET object and is linked with the service account. we can view the token using kubectl describe secret SECRET_NAME. the secret name can be found in service account object.

now, this token can be used as a BEARER auth token by services to access the kubernetes cluster.

the service account objects can be mounted inside a cluster itself so that, it can be automatically shared between multiple clusters if required.

having said that, the default service account is automatically mounted in all the pods in kuberentes cluster. this can be found in the location /var/run/secrets/kubernetes.io/serviceaccount/token within the pod.

here’s a way to include our custom service account using declarative approach:

apiVersion: v1
kind: Pod
metadata:
  name: my-web-app

spec:
  containers:
    - image: my-web-app
      name: my-web-app
  serviceAccountName: my-sa

updating the service account within a pod requires a restart.

resource requirements

within kubernetes, a pod is distributed to various nodes based on the resource requirements by the kubernetes scheduler. if the resource required by a pod isn’t available in a node, kubernetes tries to deploy it in a different node where the resource is sufficient as requested by the pod.

if it doesn’t any nodes, the pod goes to the PENDING state with the error stating that the resources weren’t there for the pod to spin.

by default, unless specified, kubernetes will try to allocate the following resource for the pod:

Resource NameAmount
CPU0.5
Memory256Mi

CPU value 0.5 is equivalent to 500m equivalent to 0.1 Vcpu in cloud-platforms Memory value 256Mi (mebi-byte) is equivalent to 1024 * 1024 bytes.

note, Mi != Mb as Mb = 1000 * 1000 bytes, whereas Mi = 1024 * 1024 bytes. This is the same for Ki, Gi, etc.

to override the default values, we can define resource requirement in our pod definition file.

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - image: my-app-image
      name: my-app
      resources:
        requests:
          memory "1Gi"
          cpu: 1

we can also set limits to our pod. this can be done by setting up the limits object in our pod definition file.

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - image: my-app-image
      name: my-app
      resources:
        requests:
          memory "1Gi"
          cpu: 1
        limits:
          memory: "2Gi"
          cpu: 2

if a pod tries to consume more CPU than the given limits, kubernets will throttle the CPU, but, if a pod tries to consume more memory than the limits, kubernetes won’t terminate. but, if it happens often, kubernetes will terminate it.

taints & tolerations

taints and tolerants are a mechanism to restrict on which pod can be scheduled on a node.

suppose, we have Node #1, where we want to schedule Pod 1 if required and ignore other pods:

  1. set taints on node: add a taint in the Node #1, limiting pods with the specipic tolerance kubectl taint nodes NODE_NAME KEY=VALUE:TAINT_EFFECT NODE_NAME : name of the node KEY=VALUE : an identification entity to be put as tolerance TAINT_EFFECT:
  • NoSchedule
  • PreferNoSchedule
  • NoExecute

example: kubectl taint nodes node1 app=blue:NoSchedule

  1. set tolerance on pods: add tolerance to the pods which we want to give access to run within Node #1
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - image: my-app-image
      name: my-app
  tolerations:
    - key: "app"
      operator: "Equal"
      value: "blue"
      effect: "NoSchedule"

now the my-app is allowed to be scheduled on node1, but it doesn’t necessarily restrict the my-app to node1. it can still be scheduled on other nodes, but while scheduling pods on node1, the kubernetes scheduler, will only schedule nodes which are tolerant to the taints. to restrict certain pods to only certain nodes, we have kuberentes affinity

the kubernetes scheduler, by default, doesn’t schedule any pods in the master node because of the default taint applied on the master node. we can change this behaviour by updating the taints, but it’s generally not a good practice to alter that behaviour.

node selectors

node selectors are used to set limitations on pod to be placed only in selected nodes. suppose, we have 3 nodes, out of which Node #1 is the powerful. Now, if we want to set the Pod #1 in Node #1, we can either use node selector or node affinity.

node selector is a simpler approach, were we have a straight forward assigning of the specific pod to specific node. to add a nodeselector to the pod, we can add the nodeSelector syntax.

nodeSelector:
  size: Large

here, size:Large acts a label to identify on which node to be used as placement. for this to work, a node should have already assigned the given key-value label. if, we were to assign key-value pair to a node, we can use the command: kubectl label nodes node-1 size=large this command will create a label of size:large on node-1 node.

but, node selector can only work if we have a single label and selector. what if we had multiple lables and multiple selectors with complex expression to define the placement? this is where node-affinity comes into action.

node affinity

ensure pods are hosted on particular node. but this allows us to provide advanced expressions to limit pods placement in the cluster. an equivalent example of nodeSelector where the pods will be only deployed to nodes with the size: Large labels:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoreDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: size
          operator: In
          values:
           - Large

now, if we want to add our pod to either Large or Medium nodes, we can just add the - Medium item in the values.

requiredDuringSchedulingIgnoreDuringExecution is a type of nodeAffinity that represents the lifecycle of the pod. splitting this results in: requiredDuringScheduling & ignoreDuringExecution. it means while scheduling pods in the nodes, it will require the given labels, but during execution, if someone changes the labels, it won’t have impact on already scheduled nodes.

in total nodeAffinity has three types:

  • requiredDuringSchedulingIgnoreDuringExecution
  • preferredDuringSchedulingIgnoreDuringExecution
  • requiredDuringSchedulingRequiredDuringExecution