| Mar | APR | May |
| 15 | ||
| 2020 | 2021 | 2022 |
COLLECTED BY
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
Collection: Archive Team: URLs
Author type that has a list of posts and a Post type that allows Author retrieval. GraphQL you can rather easily allow complex cyclical queries such as this one:
# cyclical query
# depth: 8+
query cyclical {
author(id: "xyz") {
posts {
author {
posts {
author {
posts {
author {
... {
... # more deep nesting!
}
}
}
}
}
}
}
}
}
# inline fragment
# depth: 1
query inlineShallow {
authors
... on Query {
posts
}
}
# named fragment
# depth: 1
query namedShallow {
... namedFragment
}
fragment namedFragment on Query {
posts
}
Maximum Query Depth is used to limit query depth.
A GraphQL server can then easily reject queries based on depth.
graphql-depth-limit
graphql-depth-limitisrecommended instead.
Limiting query depth with express-graphql and graphql-depth-limit is done using validationRules as follows:
import depthLimit from 'graphql-depth-limit'
import express from 'express'
import graphqlHTTP from 'express-graphql'
import schema from './schema'
const app = express()
const DepthLimitRule = depthLimit(
4,
{ ignore: [ 'whatever', 'trusted' ] },
depths => console.log(depths)
)
const graphqlMiddleware = graphqlHTTP({
schema,
validationRules: [
DepthLimitRule,
],
})
app.use('/graphql', graphqlHTTP((req, res) => ({
graphqlMiddleware
})))
the first argument in deepLimit specifies total depth limit. This will throw back a validation error for queries with a depth of 5 or more.
The second argument is an option object for specifying ignored fields.
AMaximum Query Depthof4 would then consider the query below invalid.
# invalid query
# depth: 9+
query cyclical {
author(id: "xyz") {
posts {
author {
posts {
author {
posts {
author {
... {
...
}
}
}
}
}
}
}
}
}
This is what will be returned to the client:
{
"errors": [
{
"message": "'cyclical' exceeds max... depth of 4",
"locations": [
{
"line": ...,
"column": ...
}
]
}
]
}
Since the abstract tree syntax is analyzed statistically, there is no additional load to the GraphQL server, the query does not even execute.
Depth alone cannot cover shallow abusive, that where query complexity comes in.
1
query simple {
author(id: "xyz") { # complexity: 1
posts(first: 5) { # complexity: 5
title # complexity: 1
}
}
}
The query would have a complexity of 7. If we had then set a maximum complexity of 5 on our schema, this query would fail.
Note that here, the posts more expensive than author because complexity is based on arguments.
graphql-validation-complexity
graphql-depth-limit Apollo GraphQL has no builtin feature for limiting query complexity. graphql-validation-complexityisrecommended instead.
Limiting query complexity with graphql-validation-complexity and express-graphqlis somewhat trivial.
import { createComplexityLimitRule } from 'graphql-validation-complexity'
import express from 'express'
import graphqlHTTP from 'express-graphql'
import schema from './schema'
const app = express()
const ComplexityLimitRule = createComplexityLimitRule(1000, {
scalarCost: 1,
objectCost: 10, // Default is 0.
listFactor: 20, // Default is 10.
})
const graphqlMiddleware = graphqlHTTP({
schema,
validationRules: [
ComplexityLimitRule,
],
})
app.use('/graphql', graphqlHTTP((req, res) => ({
graphqlMiddleware
})))
The configuration objects are the custom globals for scalars, objects and lists respectively. objectCost and listFactor have a default of 0 and 10respectively.
You can also limit query complexity by passing cost factors with getCost and getCostFactor callbacks in field definitions.
const expensiveField = {
type: ExpensiveItem,
getCost: () => 60,
};
const expensiveList = {
type: new GraphQLList(MyItem),
getCostFactor: () => 100,
};
Or in your GraphQL schema definitions as follows:
type CustomCostItem {
expensiveField: ExpensiveItem @cost(value: 50)
expensiveList: [MyItem] @costFactor(value: 100)
}
Query complexity is however, still hard to implement perfectly (particularly with mutations) because complexity is estimated by developers and it often needs to be updated, especially after iterations.
graphql-cost-analysis
graphql-cost-analysis makes calculating query costs somewhat easier but allowing custom fine-grained control over specific fields with the @cost directive.
Is then parses queries and computes costs.
It allows custom query costs to be dropped right in the GraphQL schema as follows:
# you can define a cost directive on a type
type TypeCost @cost(complexity: 3) {
string: String
int: Int
}
type Query {
# will have the default cost value
defaultCost: Int
# will have a cost of 2 because this field does not depend on its parent fields
customCost: Int @cost(useMultipliers: false, complexity: 2)
# complexity should be between 1 and 10
badComplexityArgument: Int @cost(complexity: 12)
# the cost will depend on the `limit` parameter passed to the field
# then the multiplier will be added to the `parent multipliers` array
customCostWithResolver(limit: Int): Int
@cost(multipliers: ["limit"], complexity: 4)
# for recursive cost
first(limit: Int): First
@cost(multipliers: ["limit"], useMultipliers: true, complexity: 2)
# you can override the cost setting defined directly on a type
overrideTypeCost: TypeCost @cost(complexity: 2)
getCostByType: TypeCost
# You can specify several field parameters in the `multipliers` array
# then the values of the corresponding parameters will be added together.
# here, the cost will be `parent multipliers` * (`first` + `last`) * `complexity
severalMultipliers(first: Int, last: Int): Int
@cost(multipliers: ["first", "last"])
}
type First {
# will have the default cost value
myString: String
# the cost will depend on the `limit` value passed to the field and the value of `complexity`
# and the parent multipliers args: here the `limit` value of the `Query.first` field
second(limit: Int): String @cost(multipliers: ["limit"], complexity: 2)
# the cost will be the value of the complexity arg even if you pass a `multipliers` array
# because `useMultipliers` is false
costWithoutMultipliers(limit: Int): Int
@cost(useMultipliers: false, multipliers: ["limit"])
}
persistgraphql.
Apollo Client uses static queries.
persistgraphql
persistgraphql is a buildtime tool that extracts static GraphQL queries from .graphql files.
Each query is then assigned a hash. The hashes are then stored in a JSON object. This is how queries are whitelisted and persisted.
First we install persistgraphql CLI tool (together with the Apollo Network Interface).
$ npm install --save persistgraphql
Then we point the tool to our frontend source directory for .graphql files.
$ persistgraphql src/
Or to a file containing GraphQL
$ persistgraphql queries.graphql
If it is called on a directory, it will step recursively through each .graphql file.
It is also possible to extract GraphQL queries from Javascript using --extension=js --js as follows:
$ persistgraphql src/index.js --js --extension=js
Apollo uses query support by default using it's client network interface.
Only query hashes and are sent to the server, not the query document.
The Apollo Client Network Interface implementation is a replacement of the standard network interface.
If you use the the client network interface, you can roll your own middleware in the GraphQL server as follows:
import queryMap from ‘../extracted_queries.json’
import { invert } from 'lodash'
...
app.use(
'/graphql',
(req, resp, next) => {
if (config.persistedQueries) {
const invertedMap = invert(queryMap)
req.body.query = invertedMap[req.body.id]
}
next()
},
)
persistgraphql gives you an understanding of how automatically persisting queries work - more like peeking under-the-hood.
The alternative is to simply add automatic persisted query link as illustrated below, upgrade to the latest version of Apollo Engine and you're set.
apollo-link-persisted-queries
apollo-link-persisted-queries library is an implementation for use with Apollo Client by using a custom Apollo Link (using http-link).
$ npm -install apollo-link-persisted-queries --save
Then in your GraphQl server.js
import { createPersistedQueryLink } from "apollo-link-persisted-queries"
import { createHttpLink } from "apollo-link-http"
import { InMemoryCache } from "apollo-cache-inmemory"
import ApolloClient from "apollo-client"
// use this with Apollo Client
const link = createPersistedQueryLink().concat(createHttpLink({ uri: "/graphql" }))
const client = new ApolloClient({
cache: new InMemoryCache(),
link: link,
})
That's it, now your GraphQL server will start whitelisting and persisting queries. You can see the library for configuration options.
1000ms and clients gain 50ms of server time / sec (leak rate).
A query that takes 200ms to complete can only be called 5 times within a second. It would then be blocked on the 6th until more server time is added to the client.
Meanwhile, the client will gain 200ms in 4 seconds. After which it can call the query only once.
As you might expect, this naturally allocates more server time to less expensive queries that take less server time to compute and vice versa.
Such rate limiting contraints are easy to expressed in GraphQL API docs, but as stated earlier, it is hard to accurately estimate the amount of time a certain query will take (without trying it first) when these are generated by the client.
7for example:
query simple {
author(id: "xyz") { # complexity: 1
posts { # complexity: 1
title # complexity: 1
}
}
}
Now if instead of the using maximum server time, we use query cost. We can cap the bucket size at a maximum cost of 9and say a leak rate of say 0.5 per second.
A client can now only call simple 3 times before being blocked, and it needs at least 6 seconds before calling it again.
Github actually uses rate limiting based on complexity on their GraphQL server.