kubernetes configuration
commands, arguments, config maps, security contexts and service accounts
commands, arguments, config maps, security contexts and service accounts
because a container’s lifecycle depends on the starting process, how can we define it? this is where ENTRYPOINT comes into action. this is a docker feature, but we can use it to pass commandline arguments through Kubernetes.
suppose, we have an image of ubuntu, and, we want to run the sleep process upon start. But, like most of the processes, the sleep process takes an argument. So, how can we pass it?
for instance, in this example dockerfile
FROM unbuntu
CMD sleep 5
we are running the sleep
process with the argument of 5. We can set it up like this, but, we also can, define the name of the execution as ENTRYPOINT
and pass the argument like this:
FROM unbuntu
ENTRYPOINT ["sleep"]
CMD ["5"]
after which we can
build: docker build -t my-sleepy-ubuntu .
&
run: docker run my-sleepy-ubuntu 5
.
this will run the sleep process with the argument of 5.
if we don’t pass the command line argument, the image will default to 5
.
to override this entrypoint, we can run docker run --name my-sleepy-ubuntu
—entrypoint myscript my-sleepy-ubuntu 20`
to run this docker image within a kubernetes pod, we can pass it as args
.
apiVersion: v1
kind: Pod
metadata:
name: my-sleepy-ubuntu
spec:
containers:
- name: my-sleepy-ubuntu
image: my-sleepy-ubuntu
args: ["10"]
to which, under the hood, kubernetes will build and run it using something like this: docker run my-sleepy-ubuntu 10
.
Entrypoints are mainly used for processess, but, as a developer, we mostly use Environment variables to get these configuration in our applications.
a simple way to defining environment variables in docker would be using a key-value pair. an imperitive way would be something like this:
docker run -e APP_PORT=8081 my-backend-app
this can be done in kubernetes by passing the key-value pair within our env
key configuration:
apiVersion: v1
kind: Pod
metadata:
name: my-backend-app
spec:
containers:
- image: my-backend-app
name: my-backend-app
ports:
- containerPort: 8080
env:
- name: APP_PORT
value: 8081
what if we have multiple pods defintion files, the configuration will be scattered across multiple files thus making it difficult to manage.
what if there was a way to manage these configurations within a single file and use it across all other pod defintions.
this is where config maps
comes into action.
config maps using imperitive way:
kubectl create configmap my-app-config --from-literal=PORT=8081 --from-literal=COLOR=red
or, directly specify the application configuration file:
kubectl create configmap my-app-config --from-file=app_config.properties
config maps using declarative way:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
APP_PORT: 8081
COLOR: red
now, we can inject this as:
apiVersion: v1
kind: Pod
metadata:
name: my-backend-app
spec:
containers:
- image: my-backend-app
name: my-backend-app
ports:
- containerPort: 8080
env:
- name: APP_PORT
valueFrom:
configMapKeyRef:
name: app-config
key: APP_PORT
but, what if we have 100s of config map files. for that, we can specify a volume dedicated to store the config maps.
volumes:
- name: app-config-volume
configMap:
name: app-config
So, we’ve got config maps to store our environment specific constants. but, should we put sensitive keys that can be easily viewable by other users like secrets, passwords in config maps? that’s an issue!
to solve this, kubernetes has kubernetes secrets
.
secrets are like config maps but these are stored in an encoded format.
create secrets using imperitive approach: kubectl create secret generic my-app-secret --from-literal=DB_Host=mysql --from-literal=DB_Password=hemlo
or using a file
kubectl create secret generic my-app-secret --from-file=app-secrets.properties
create secrets using declarative approach:
apiVersion: v1
kind: Secret
metadata:
name: my-app-secret
data:
DB_Host: mysql
DB_Password: hemlo
but, there’s a probelm, as the data is yet to be encoded, so, to encode it manually, we can use base64 executable in our unix env.
echo -n 'mysql' | base64
this command will encode the given text into base64, which, we can add in our secrets.
unlike configmaps
, if we view secret information using kubectl describe secret SECRET_NAME
, it will show attributes but hide the values.
to view the values, we will have to get the output as yaml
using kubectl describe secret SECRET_NAME -o yaml
.
now, to use this in our pod,
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- image: my-app-image
name: my-app
ports:
- containerPort: 8081
envFrom:
- secretRef:
name: app-secret
env:
- name: DB_Password
valueFrom:
secretKeyRef:
name: app-secret
key: DB_Password
volumes:
- name: app-secrets-volume
secret:
secretName: app-secret
with volume approach, it will create files based on the available secret name. the contents of the files will be the values of the secret.
but, at the end of the day, secrets are not encrypted in etcd
, but only encoded. so, it’s better to enable encryption at rest.
we can also use third-party secrets store providers from AWS, Azure, GCP Vault Providers, etc
process isolation within docker
docker isolates the processes within containers by using namespaces. for instance, if we run a sleep
process within a container from ubuntu image, we’ll be able to view the process id inside the container(using ps aux
) as PID of 1.
but, if we look into the processes within the host system, we’ll be able to see the same process but with a different process id.
this is how docker isolates processes in different environments.
user isolation within docker
docker host uses root
user and by default, docker run’s processes within containers as root
users.
we can also change the root
user to any other user
for security.
for instance docker run --user-=69420 ubuntu sleep 1000
, will run the process sleep within container with the user-id: 69420.
this can be done impartively using:
FROM ubuntu
USER 69420
but, even so, if we run the processes within container as a root
user, is it the same root
user in the host? doesn’t that bring security concerns?
because of this reason, docker implements a set of security feature that limits the root
user within a container to the root privelage in the host.
tldr;
root
user within container !=root
uaser within host
just like docker, these can be configured in kubernetes aswell. but, unlike docker, where we can set these configuration in containers only, kubernetes allows us to set these config in either containers or pods.
running processes within the pod by:
apiVersion: v1
kind: Pod
metadata:
name: web-pod
spec:
securityContext:
runs: 1000
containers:
- name: ubuntu
image: ubuntu
command: ["sleep", "3600"]
we can also run the process within the containers by:
running processes within the pod with user-id 1000.
apiVersion: v1
kind: Pod
metadata:
name: web-pod
spec:
containers:
- name: ubuntu
image: ubuntu
command: ["sleep", "3600"]
securityContext:
runs: 1000
capabilities:
add: ["MAC_ADMIN"]
kubernetes has two types of accounts.
user accounts are used by user entitites in the project. for instance, there might be two users accessing the administer-cluster:
service accounts are generally used by services. these acts as an authentication and authorization access points for services in a cluster. for instance, there might be services such as:
create service account: kubectl create serviceaccount ACCOUNT_NAME
get service accounts: kubectl get serviceaccounts
with the creation of a service account, kubernetes generates a TOKEN
. the token is stored in a SECRET object and is linked with the service account.
we can view the token using kubectl describe secret SECRET_NAME
. the secret name can be found in service account object.
now, this token can be used as a BEARER auth token by services to access the kubernetes cluster.
the service account objects can be mounted inside a cluster itself so that, it can be automatically shared between multiple clusters if required.
having said that, the default service account is automatically mounted in all the pods in kuberentes cluster.
this can be found in the location /var/run/secrets/kubernetes.io/serviceaccount/token
within the pod.
here’s a way to include our custom service account using declarative approach:
apiVersion: v1
kind: Pod
metadata:
name: my-web-app
spec:
containers:
- image: my-web-app
name: my-web-app
serviceAccountName: my-sa
updating the service account within a pod requires a restart.
within kubernetes, a pod is distributed to various nodes based on the resource requirements by the kubernetes scheduler. if the resource required by a pod isn’t available in a node, kubernetes tries to deploy it in a different node where the resource is sufficient as requested by the pod.
if it doesn’t any nodes, the pod goes to the PENDING
state with the error stating that the resources weren’t there for the pod to spin.
by default, unless specified, kubernetes will try to allocate the following resource for the pod:
Resource Name | Amount |
---|---|
CPU | 0.5 |
Memory | 256Mi |
CPU value 0.5 is equivalent to 500m
equivalent to 0.1 Vcpu in cloud-platforms
Memory value 256Mi (mebi-byte) is equivalent to 1024 * 1024 bytes.
note, Mi != Mb as Mb = 1000 * 1000 bytes, whereas Mi = 1024 * 1024 bytes. This is the same for Ki, Gi, etc.
to override the default values, we can define resource requirement in our pod definition file.
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- image: my-app-image
name: my-app
resources:
requests:
memory "1Gi"
cpu: 1
we can also set limits to our pod. this can be done by setting up the limits object in our pod definition file.
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- image: my-app-image
name: my-app
resources:
requests:
memory "1Gi"
cpu: 1
limits:
memory: "2Gi"
cpu: 2
if a pod tries to consume more CPU than the given limits, kubernets will throttle the CPU, but, if a pod tries to consume more memory than the limits, kubernetes won’t terminate. but, if it happens often, kubernetes will terminate it.
taints and tolerants are a mechanism to restrict on which pod can be scheduled on a node.
suppose, we have Node #1
, where we want to schedule Pod 1
if required and ignore other pods:
Node #1
, limiting pods with the specipic tolerance
kubectl taint nodes NODE_NAME KEY=VALUE:TAINT_EFFECT
NODE_NAME : name of the node
KEY=VALUE : an identification entity to be put as tolerance
TAINT_EFFECT:example:
kubectl taint nodes node1 app=blue:NoSchedule
Node #1
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- image: my-app-image
name: my-app
tolerations:
- key: "app"
operator: "Equal"
value: "blue"
effect: "NoSchedule"
now the my-app
is allowed to be scheduled on node1
, but it doesn’t necessarily restrict the my-app
to node1
. it can still be scheduled on other nodes, but while scheduling pods on node1, the kubernetes scheduler
, will only schedule nodes which are tolerant to the taints.
to restrict certain pods to only certain nodes, we have kuberentes affinity
the kubernetes scheduler, by default, doesn’t schedule any pods in the
master node
because of the default taint applied on themaster node
. we can change this behaviour by updating the taints, but it’s generally not a good practice to alter that behaviour.
node selectors are used to set limitations on pod to be placed only in selected nodes.
suppose, we have 3 nodes, out of which Node #1 is the powerful. Now, if we want to set the Pod #1 in Node #1, we can either use node selector
or node affinity
.
node selector
is a simpler approach, were we have a straight forward assigning of the specific pod to specific node. to add a nodeselector to the pod, we can add the nodeSelector
syntax.
nodeSelector:
size: Large
here, size:Large
acts a label to identify on which node to be used as placement. for this to work, a node should have already assigned the given key-value label.
if, we were to assign key-value pair to a node, we can use the command: kubectl label nodes node-1 size=large
this command will create a label of size:large
on node-1 node.
but, node selector can only work if we have a single label and selector. what if we had multiple lables and multiple selectors with complex expression to define the placement? this is where node-affinity comes into action.
ensure pods are hosted on particular node. but this allows us to provide advanced expressions to limit pods placement in the cluster.
an equivalent example of nodeSelector where the pods will be only deployed to nodes with the size: Large
labels:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoreDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- Large
now, if we want to add our pod to either Large
or Medium
nodes, we can just add the - Medium
item in the values.
requiredDuringSchedulingIgnoreDuringExecution is a type of nodeAffinity that represents the lifecycle of the pod.
splitting this results in: requiredDuringScheduling
& ignoreDuringExecution
.
it means while scheduling pods in the nodes, it will require the given labels, but during execution, if someone changes the labels, it won’t have impact on already scheduled nodes.
in total nodeAffinity has three types: