Recent Wikipedia edits raising alarms in Estonia suggest attempts to reshape historical narratives, with implications for AI training and ideological biases.
Kiwix is the easiest way to do it; if you have Docker/Kubernetes, there’s a Docker image at ghcr.io/kiwix/kiwix-serve, and the K8s manifest to deploy is as simple as:
Then you just need to download a copy of the mirror file wikipedia_en_all_maxi.zim and put it in the appropriate place - wget https://download.kiwix.org/zim/wikipedia_en_all_maxi.zim
Annnnnd that’s why I downloaded a snapshot of Wikipedia a few months ago and host it locally.
Sad that it’s necessary, but with modern AI tooling, we have everything we need to destroy knowledge on an industrial scale.
How do you selfhost Wikipedia? Any good guides in how to do it?
Wikipedia has guides for it; Check the downloading wikipedia section. The most popular offline client atm is Kiwix reader
Kiwix is the easiest way to do it; if you have Docker/Kubernetes, there’s a Docker image at
ghcr.io/kiwix/kiwix-serve, and the K8s manifest to deploy is as simple as:apiVersion: v1 kind: Service metadata: name: wikipedia-service spec: selector: app: kiwix-server ports: - port: 80 targetPort: 8080 clusterIP: None --- apiVersion: apps/v1 kind: Deployment metadata: name: wikipedia-server labels: app: kiwix-server spec: replicas: 1 selector: matchLabels: app: kiwix-server template: metadata: name: wikipedia-server labels: app: kiwix-server spec: containers: - name: kiwix-server image: kiwix/kiwix-serve:3.8.0 imagePullPolicy: IfNotPresent command: - /usr/local/bin/kiwix-serve - --port=8080 - --verbose - /data/wikipedia_en_all_maxi.zim ports: - containerPort: 8080 protocol: TCP volumeMounts: - name: data mountPath: /data readOnly: true limits: memory: "128Mi" cpu: "2000m" volumes: - name: data persistentVolumeClaim: claimName: wikipedia-mirrorThen you just need to download a copy of the mirror file
wikipedia_en_all_maxi.zimand put it in the appropriate place -wget https://download.kiwix.org/zim/wikipedia_en_all_maxi.zim