从Webpack本质开始(四):手写tapable

这个是探究 Webpack 本质的系列文章，会详细讲解如何手写一些源码如 Webpack, loader, plugin 等等，本文主要讲解的是如何手写 tapbable 库

tapable 是个啥

由之前写的文章可以了解到，webpack 本质上是基于 事件流 的机制，它的工作流程就是将各个插件串联起来，而实现这一切的核心就是 tapable。tapable 就有点类似于 Node.js 的 events 库，核心原理也依赖于 发布订阅 模式

以下是 tapable 几个常用钩子

const {
	SyncHook,
	SyncBailHook,
	SyncWaterfallHook,
	SyncLoopHook,

	AsyncParallelHook,
	AsyncParallelBailHook,
	AsyncSeriesHook,
	AsyncSeriesBailHook,
	AsyncSeriesWaterfallHook
 } = require("tapable")

以 Sync 开头的都是同步钩子，以 Async 开头的都是异步钩子

这些钩子基本上都是通过 tap 来绑定事件，然后使用 call 来触发事件，比如如下的代码

const { SyncHook } = require('tapable')

class Test {
  constructor() {
    this.hooks = {
      // new 一个钩子
      arch: new SyncHook(['name'])
    }
  }
  // 注册监听函数
  tap() {
    this.hooks.arch.tap('test1', (name) => {
      console.log('test1', name)
    })
    this.hooks.arch.tap('test2', (name) => {
      console.log('test2', name)
    })
  }
  // 调用方法
  call() {
    // 'strugglebak' 就是 tap 函数中回调的 name
    this.hooks.arch.call('strugglebak')
  }
}

const t = new Test()
t.tap()
t.call()

执行下打印结果是

test1 strugglebak
test2 strugglebak

由以上可以看出，tapable 是将一堆 tap 中订阅的函数放到一个数组中去，调用 call 时分别按顺序去执行

实现 SyncHook

由以上的论证，代码不难实现

class SyncHook {
  constructor(args) {
    this.tasks = []
  }

  tap(name, task) {
    this.tasks.push(task)
  }
  // call 传入的不只有一个参数
  call(...args) {
    this.tasks.forEach(task => task(...args))
  }
}

测试代码

const hook = new SyncHook(['name'])
hook.tap('test1', (name) => {
  console.log('test1', name)
})
hook.tap('test2', (name) => {
  console.log('test2', name)
})
hook.call('strugglebak')

打印结果为

test1 strugglebak
test2 strugglebak

实现 SyncBailHook

这个钩子的作用是，只要有任何的监听函数返回了一个非 undefined 的结果，那么该监听函数里面的逻辑在执行完成后会就不会继续向下执行了

class SyncBailHook {
  constructor(args) {
    this.tasks = []
  }

  tap(name, task) {
    this.tasks.push(task)
  }

  call(...args) {
    let index = 0, ret
    const { length } = this.tasks
    do {
      // 每个任务里面调用函数并传参
      ret = this.tasks[index++](...args)
      // while 里面对其返回的结果做判断
    } while(ret === undefined && index < length)
  }
}

测试代码

const hook = new SyncBailHook(['name'])
hook.tap('test1', (name) => {
  console.log('test1', name)
})
hook.tap('test2', (name) => {
  console.log('test2', name)
})
hook.tap('test3', (name) => {
  console.log('test3', name)
  return name
})
hook.tap('test4', (name) => {
  console.log('test4', name)
})
hook.call('strugglebak')

打印结果

test1 strugglebak
test2 strugglebak
test3 strugglebak

实现 SyncWaterfallHook

这个钩子本质上就是，在调用函数的时候，将上一个函数的返回值作为下一个函数的参数，这样一个传递的流程

class SyncWaterfallHook {
  constructor(args) {
    this.tasks = []
  }

  tap(name, task) {
    this.tasks.push(task)
  }

  call(...args) {
    const [first, ...rest] = this.tasks
    const ret = first(...args)
    // 由于是流水线式的传参数，所以这里可以用 reduce
    rest.reduce((acc, cur) => {
      return cur(acc)
    }, ret) // 初始传入是第一个函数的返回值
  }
}

测试代码

const hook = new SyncWaterfallHook(['name'])
hook.tap('test1', (name) => {
  console.log('test1', name)
  return 'test1'
})
hook.tap('test2', (data) => {
  console.log('test2', data)
  return 'test2'
})
hook.tap('test3', (data) => {
  console.log('test3', data)
  return 'test3'
})
hook.tap('test4', (data) => {
  console.log('test4', data)
  return 'test4'
})
hook.call('strugglebak')

打印结果

test1 strugglebak
test2 test1
test3 test2
test4 test3

实现 SyncLoopHook

这个钩子的作用就是 当在同步执行时，遇到某个不返回 undefined 的函数会多次执行

class SyncLoopHook {
  constructor(args) {
    this.tasks = []
  }

  tap(name, task) {
    this.tasks.push(task)
  }

  call(...args) {
    this.tasks.forEach((task) => {
      let ret
      // 某个不返回 undefined 的函数会多次执行
      do {
        ret = task(...args)
      } while (ret !== undefined)
    })
  }
}

测试代码

const hook = new SyncLoopHook(['name'])
let count = 0
hook.tap('test1', (data) => {
  console.log('test1', data)
  return ++count === 3 ? undefined : 'keep going'
})
hook.tap('test2', (data) => {
  console.log('test2', data)
})
hook.tap('test3', (data) => {
  console.log('test3', data)
})
hook.tap('test4', (data) => {
  console.log('test4', data)
})
hook.call('strugglebak')

打印结果

test1 strugglebak
test1 strugglebak
test1 strugglebak
test2 strugglebak
test3 strugglebak
test4 strugglebak

实现 AsyncParallelHook

首先需要说明的是，这是一个 异步并行 的钩子，所谓 “异步并行”，在这里的表现就是 需要等待所有并发的异步事件执行完成后再执行回调方法

比如如下的代码

const { AsyncParallelHook } = require('tapable')

class Test {
  constructor() {
    this.hooks = {
      arch: new AsyncParallelHook(['name'])
    }
  }
  // 注册监听函数
  tapAsync() {
    this.hooks.arch.tapAsync('test1', (name, cb) => {
      // 异步代码
      setTimeout(() => {
        console.log('test1', name)
        cb()
      }, 1000)
    })
    this.hooks.arch.tapAsync('test2', (name, cb) => {
      // 异步代码
      setTimeout(() => {
        console.log('test2', name)
        cb()
      }, 1000)
    })
  }
  // 调用方法
  callAsync() {
    this.hooks.arch.callAsync('strugglebak', () => {
      console.log('end')
    })
  }
}

const t = new Test()
t.tapAsync()
t.callAsync()

打印输出为

test1 strugglebak
test2 strugglebak
end

以上代码的意思是，只有当 tapAsync 里面的回调函数中，cb 都执行完了，最后才会调用 callAsync 里的回调函数，即输出 end

原理大致就是，每次执行完一个异步操作，就会调用 cb，而这个 cb 里面会有个计数器，如果计数器的总数等于当前回调函数注册的总数，就说明所有的异步操作执行完成，接着就可以执行最后的 callAsync 里面的回调了，所以代码可以这么写

class AsyncParallelHook {
  constructor(args) {
    this.tasks = []
  }

  tapAsync(name, task) {
    this.tasks.push(task)
  }

  callAsync(...args) {
    // 取出最后的函数
    const finalCallback = args.pop()
    let count = 0
    // 当注册的函数(cb)执行完成之后会执行 done 回调里的 finalCallback
    const done = () => {(++count === this.tasks.length) && finalCallback()}
    this.tasks.forEach((task) => {
      task(...args, done)
    })
  }
}

测试代码为

const hook = new AsyncParallelHook(['name'])
hook.tapAsync('test1', (data, cb) => {
  setTimeout(() => {
    console.log('test1', data)
    cb()
  }, 1000)
})
hook.tapAsync('test2', (data, cb) => {
  setTimeout(() => {
    console.log('test2', data)
    cb()
  }, 1000)
})
hook.callAsync('strugglebak', () => {
  console.log('end')
})

打印输出为

test1 strugglebak
test2 strugglebak
end

当然，以上的思想跟 Promise.all 很像，那么这种异步也可以用 Promise 来改写

class AsyncParallelHook {
  constructor(args) {
    this.tasks = []
  }

  tapPromise(name, task) {
    this.tasks.push(task)
  }

  promise(...args) {
    // 这里每个 task 都是 promise
    const tasks = this.tasks.map(task => task(...args))
    return Promise.all(tasks)
  }
}

测试代码为

const hook = new AsyncParallelHook(['name'])
hook.tapPromise('test1', (data) => {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      console.log('test1', data)
      resolve(data)
    }, 1000)
  })
})
hook.tapPromise('test2', (data) => {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      console.log('test2', data)
      resolve(data)
    }, 1000)
  })
})
hook.promise('strugglebak').then(() => {
  console.log('end')
})

测试结果为

test1 strugglebak
test2 strugglebak
end

实现 AsyncSeriesHook

注意这个是 异步串行 的钩子，而 异步串行 表示只有执行完 异步 1 之后才可以执行 异步 2，能够这么实现的就是使用回调函数

class AsyncSerieslHook {
  constructor(args) {
    this.tasks = []
  }

  tapAsync(name, task) {
    this.tasks.push(task)
  }

  callAsync(...args) {
    // 取出最后的函数
    const finalCallback = args.pop()
    let index = 0
    const next = () => {
      if (index === this.tasks.length) return finalCallback()
      const task = this.tasks[index++]
      // 可以看出这里 next 是递归去在执行的
      // 这里的思想有点像 express 中间件
      task(...args, next)
    }
    next()
  }
}

测试代码为

const hook = new AsyncSerieslHook(['name'])
hook.tapAsync('test1', (data, cb) => {
  setTimeout(() => {
    console.log('test1', data)
    cb()
  }, 1000)
})
hook.tapAsync('test2', (data, cb) => {
  setTimeout(() => {
    console.log('test2', data)
    cb()
  }, 1000)
})
hook.callAsync('strugglebak', () => {
  console.log('end')
})

打印输出为

test1 strugglebak
test1 strugglebak
end

当然，有异步的地方也可以改写成 Promise 的形式，如下

class AsyncSerieslHook {
  constructor(args) {
    this.tasks = []
  }

  tapPromise(name, task) {
    this.tasks.push(task)
  }

  promise(...args) {
    // 注意这里 tasks 是一个 promise 的数组
    const [first, ...rest] = this.tasks
    const ret = first(...args)
    return rest.reduce((acc, cur) => { // 有些像 redux 的思想
      return acc.then(() => cur(...args)) // promise 串行执行用 reduce
    }, ret)
  }
}

测试代码为

const hook = new AsyncSerieslHook(['name'])
hook.tapPromise('test1', (data) => {
  return new Promise((resovle, reject) => {
    setTimeout(() => {
      console.log('test1', data)
      resovle(data)
    }, 1000)
  })
})
hook.tapPromise('test2', (data) => {
  return new Promise((resovle, reject) => {
    setTimeout(() => {
      console.log('test2', data)
      resovle(data)
    }, 1000)
  })
})
hook.promise('strugglebak').then(() => {
  console.log('end')
})

打印输出为

test1 strugglebak
test2 strugglebak
end

实现 AsyncSeriesWaterfallHook

这里顾名思义，就是 异步 + 串行 + waterfall 钩子，那么代码应该这么写

class AsyncSeriesWaterfalllHook {
  constructor(args) {
    this.tasks = []
  }

  tapAsync(name, task) {
    this.tasks.push(task)
  }

  callAsync(...args) {
    const finalCallback = args.pop()
    let index = 0
    // 这里需要异步迭代
    const next = (error, data) => {
      const task = this.tasks[index]
      // 如果没有注册函数，直接调用 callAsync 的回调
      if (!task) return finalCallback()
      if (index === 0) { // 如果执行的是第一个注册的函数
        // 这里的 next 就是 cb
        task(...args, next)
      } else { // 如果执行的不是第一个注册的函数，这里就应该传 data
        // 因为这里是迭代传了，不应该传 args 而应该传 data
        task(data, next)
      }
      ++index
    }
    next()
  }
}

测试代码为

const hook = new AsyncSeriesWaterfalllHook(['name'])
hook.tapAsync('test1', (data, cb) => {
  setTimeout(() => {
    console.log('test1', data)
    cb(null, 'hello world')
  }, 1000)
})
hook.tapAsync('test2', (data, cb) => {
  setTimeout(() => {
    console.log('test2', data)
    cb(null)
  }, 1000)
})
hook.callAsync('strugglebak', () => {
  console.log('end')
})

打印结果为

test1 strugglebak
test2 hello world
end

当然，这里也可以用 Promise 改写

class AsyncSeriesWaterfalllHook {
  constructor(args) {
    this.tasks = []
  }

  tapPromise(name, task) {
    this.tasks.push(task)
  }

  promise(...args) {
    const [first, ...rest] = this.tasks
    const ret = first(...args)
    return rest.reduce((acc, cur) => { // 这里的代码可以参考之前写 SyncWaterfallHook 的
      return acc.then((data) => cur(data))
    }, ret)
  }
}

测试代码为

const hook = new AsyncSeriesWaterfalllHook(['name'])
hook.tapPromise('test1', (data) => {
  return new Promise((resovle, reject) => {
    setTimeout(() => {
      console.log('test1', data)
      resovle('hello world')
    }, 1000)
  })
})
hook.tapPromise('test2', (data, cb) => {
  return new Promise((resole, reject) => {
    setTimeout(() => {
      console.log('test2', data)
      resole(null)
    }, 1000)
  })
})
hook.promise('strugglebak').then(() => {
  console.log('end')
})

打印结果为

test1 strugglebak
test2 hello world
end

总结

由以上的描述我们可以看出一个规律，那就是

tapable 库中有 3 种注册方法: tap(同步注册)、tapAsync(cb) (异步注册)、tapPromise(注册时是 promise)
tapable 库中有 3 种调用方法: call(同步调用)、callAsync(cb)(异步调用)、promise(调用时是 promise 用 .then)

以上三者两两对应

同时，我们也知道了，对于 tapable 这个库的钩子来说，分同步钩子和异步钩子，同时钩子还分并行和串行钩子

对于异步并行钩子来说，异步的操作可以是同时发生的，即哪个执行快就先执行
对于异步串行钩子来说，异步的操作是依赖于上个异步的结果的，只有等上个异步执行了，才能执行下个异步