YA

In me the tiger sniffes the rose.

  • 主页
  • 世界之内
  • 世界之外
  • 叶隙随笔
所有文章 友链 关于我

YA

In me the tiger sniffes the rose.

  • 主页
  • 世界之内
  • 世界之外
  • 叶隙随笔

【学习笔记】人工智能规划与决策

阅读数:21882次 2022-04-26
字数统计: 3.3k字   |   阅读时长≈ 20分

6 topics

  1. Classical AI Planning and Acting
  2. Decision Theory
    • Rational and Judgmental Decision Making
    • Decision Networks
  3. Markov Decision Process
  4. Reinforcement Learning
  5. POMDP
  6. Game Theory and Multi-agent Decision Making

Classical AI Planning and Acting

Rational and Judgmental Decision Making

Uncertainty

  • Partially Observable
  • Non-deterministic

Rational Decision Making

  • Combining belief and desire
    • belief: judgment of unknown states (uncertainty)
    • desire: preference of consequences (utility)

Utility Theory

  • utility function which expresses the desitability of a state: U(s)
  • expeted utility of an action given evidence: EU(a∣e)
  • Principle of maximum expected utility (MEU): agent should choose actions which can maximize the expected utility.
  • Six Axioms

Decision Networks

Basian Network → DAG

Decision Network, A.K.A. , Influence Diagrams

  • Chance Node
  • Decision Node
  • Utility Node

Value of Information
VPIe(Ej)=(∑kP(Ej=ejk∣e)EU(αejk∣e,Ej=ejk)−EU(α∣e)).

Markov Decision Process

  • MDP = Transition Function + Reward Function
  • Expeted utility executing policy π starting in state s: Uπ(s)=E[∑∞t=0γtR(st)]
  • Value function: $U^{\pi^{\}}(s).Expetedutilityexecutingoptimalpolicy\pi^{\}startinginstates$.
  • Optimal policy starting in state s: $\pi_{s}^{\}=\underset{\pi}{\operatorname{argmax}} U^{\pi}(s)$. The optimal policy is independent of the starting state, so optimal policy can be simply denoted as $\pi^{\}$. (see AIMA-3rd P651)
  • Utility function (Value function) of state s : Uπ\*(s) can be written as U(s).
    • Utility function U(s) vs. reward function R(s)
      • U(s) is long-term total reward from s onward.
      • R(s) is short-term reward of being s.
  • After finding Utility function U(s), we can easily get the optimal policy — choose the action that maximizes the expected utility of the subsequent state:

π\*(s)=argmaxa∈A(s)∑s′P(s′∣s,a)U(s′)

  • Utility Function: the key to find the optimal policy.

  • Two methods to find optimal policy (get the utility function):

    • Value Iteration (VI)
      1. calculate the utility of each state
      2. use the state utilities to select an optimal action in each state.
    • Policy Iteration (PI)
      1. policy evaluation
      2. Policy improvement

Value Iteration

  • Bellman Equation

U(s)=R(s)+γmaxa∈A(s)∑s′P(s′∣s,a)U(s′).

Utility of a state U(s):

  • immediate reward for that state R(s)
  • expected utility of the next state ∑s′P(s′∣s,a)U(s′), assuming that the agent chooses the optimal action a.

Note: the exact utilities of states are Uπ(s)=E[∑∞t=0γtR(st)], they are exactly the unique solutions to the Bellman Equation.

Policy Iteration

Intuitive idea

  • If one action is clearly better than all others, then the exact utilities on the states need not be precise.

Two steps

  • Policy Evaluation:
    • calculate the utility of executing policy πi starting in state s

Ui(s)=R(s)+γ∑s′P(s′∣s,πi(s))Ui(s′)

  • Policy Improvement:
    • if the current action a is not the optimal action:
      maxa∈A(s)∑s′P(s′∣s,a)U[s′]>∑s′P(s′∣s,π[s])U[s′]
    • then update the policy and the utility function

Improvement Version of PI

  • Modified Policy Iteration
  • Asynchronous Policy Iteration
  • General Policy Iteration

Reinforcement Learning

Definition

lecture 7, P8-9

  • reply on MDP

    • optimal policy = policy that maximize the expected total reward
  • use observed reward to learn the optimal policy

  • unknow transition function and reward function

Structure

Three agent designs

  • Model-based

    • A utility-based agent, learns a Model P and utility function U on states, and uses it to select actions that maximize the expected outcome utility.
  • Model-free

    • A Q-learning agent, learns an action-utility function Q, or Q-function, giving the expected utility of taking a given action in a given state.
  • Reflex

    • A reflex agent learns a policy π that maps directly from states to actions.

Learning Strategy

  • Learn Utility Function U
    • Direct Utility Estimation
    • Passive Adaptive Dynamic Programming
    • Passive Temporal-Difference Learning
  • Learn action-utility function Q (Q-function)
    • Active Temporal-Difference Learning
  • Learn a policy π that maps directly from states to actions
    • Policy Search

Known or Unknown Policy

  • Known Policy (Passive Learning)
    • Direct Utility Estimation
    • Passive Adaptive Dynamic Programming
    • Passive Temporal-Difference Learning
  • Unknown Policy (Acitve Learning)
    • Active Adaptive Dynamic Programming
    • Q-Learning
      • Active Temporal-Difference Learning

Passive Learning

  • Known: policy π
  • Unknown: transition function P(s′∣s,a) and reward function R(s).
  • Goal: evaluate how good the policy is, i.e. learn the utility function U.

Note: The passive learning task is similar to the policy evaluation task, part of the policy iteration algorithm

Direct Utility Estimation (Monte Carlo Learning, MC)

  • The exact utility of state s is

Uπ(s)=E[∞∑t=0γtR(st)]

  • The core idea of MC
    • The utility of a state is the expected total reward from that state onward (called the expected reward-to-go), and each trial provides a sample of this quantity for each state visited.
    • Run infinite trails, the he expected total reward from that state onward will converge to the true utility.

Pros and Cons

  • Simple and Efficient: reduce RL problems to inducive learning problems.
  • Slow: need to wait until the end of episodes.
  • Converges very slowly: variance can be high as we accumulate the variance of every trial.
    • Ignore the fact that utility of state is not independent, instead it is related to the successor states.
      • By ignoring the connections between states, direct utility estimation misses opportunities for learning. More broadly, we can view direct utility estimation as searching for U in a hypothesis space that is much larger than it needs to be, in that it includes many functions that violate the Bellman equations. For this reason, the algorithm often converges very slowly.

Therefore, use Bellman equation is a better choice.

Adaptive Dynamic Programming (ADP)

Simplified Bellman Equation (as shown in Policy Iteration):
Uπ(s)=R(s)+γ∑s′P(s′∣s,π(s))Uπ(s′)


Need to learn transition function P(s′∣s,π(s))

  • Estimate the transition function P(t∣s,a) based on the trails
    • P(t∣s,a)←Ns′∣sa[t,s,a]/Nsa[s,a]
  • Learn utility function U(s) by policy evaluation
    • U(s)=R(s)+γ∑s′P(s′∣s,π(s))U(s′)

Temporal-Difference Learning (TD)

  • Another way to utilize Bellman equation (i.e. relation to the successor states)

    • use the observed transitions to adjust the utilities of the observed states so that they agree with the constraint equations.
      • Uπ(s)←Uπ(s)+α(R(s)+γUπ(s′)−Uπ(s))
      • where α is the learning rate, it will converge if α decreases appropriately with the number of times the state has visited.
  • Similar to gradient descent.

  • TD vs. ADP

    • ADP needs to estimate transition model while TD doesn’t need transition model.
      • The environment supplies the connection between neighboring states in the form of observed transitions.
    • Both try to make local adjustments to the utility estimates in order to make each state “agree” with its successors (Bellman equation). But they still have differences:
      • TD adjusts a state to agree with its observed successor.
      • ADP adjusts the state to agree with all of the successors that might occur.
      • TD makes a single adjustment per observed transition
      • ADP makes as many as it needs to restore consistency between the utility estimates U and the environment model P.
    • Thus, TD can be viewed as a crude but efficient first approximation to ADP.
  • TD (ADP) vs. MC

    • TD (ADP) can perform online learning every step while MC need to wait until the end of episode.
    • TD depends only on one measured reward while MC considers all rewards.
      • TD target has lower variance, but is biased, while MC target is unbiased but has higher variance.
    • TD (ADP) usually converges faster than MC in practice.
  • ADP vs. TD (MC)
    • ADP learns the model (transition function) and then solves for the value while TD and MC don’t need model.
    • ADP is more data efficient
      • ADP it requires less data from the real-world while TD and MC require more data for learning.
    • TD is more computationally efficient
      • TD doesn’t need to compute expectations but ADP needs to do so.

Active Learning

  • the agent will need to learn a complete model with outcome probabilities for all actions, rather than just the model for the fixed policy.
  • we need to take into account the fact that the agent has a choice of actions.

Adaptive Dynamic Programming

Note: policy π is no longer fixed; changes as transitions & rewards are learned

Algorithm:

  • Repeat untill converges
    • Estimate the transition function P(t∣s,a) based on the current optimal policy
      • P(t∣s,a)←Ns′∣sa[t,s,a]/Nsa[s,a]
    • Perform Policy Iteration or Value Iteration
      • Learn utilities defined by the optimal policy
      • choose optimal policy based on utilities

The ADP agen is greedy, so need to trade off

  • Exploitation: maximize value as reflected by current estimate
  • Exploration: learn more about the model to potentially improve long term gain

How to balance Exploitation and Exploration:

  • Greedy in the Limit of Infinite Exploration, or GLIE:
    • Try each action in each state an unbounded number of times to avoid a finite probability of missing an optimal action.
    • Eventually become greedy so that it is optimal w.r.t the true model.

GLIE

  • ϵ-greedy exploration
    • Choose greedy action with probability (1 − ϵ); random action with probability ϵ.
  • optimistic estimate of the utility

Q-Learning

  • Model-free algorithm
    • take action by refering to Q-table
  • Find optimal policy by maximizing the expected value of toal reward

  • How to update the Q-table?

    • bellman equation (TD update)

bellman-equation

Function Approximation

Policy Search

POMDP

Basics

Unknown state but can receive the sensor information for state estimation.

  • Belief State: b. The probability distribution over states.
    • b(s) the probability of being state s. s
      • For example, in a 2*2 drid world, the initial belief state is 1/4,1/4,1/4,1/4 because the agent don’t know anything about the world.
  • The agent can learn something about its actual state by sensing the environment:

    • Sensor Model: P(e∣s). Probability of sensing evidence e when in state s.
  • POMDPs can be formulated by 3 components:

    • Transition Model: P(s′∣s,a)
    • Reward Function: R(s)
    • Sensor Model: P(e∣s)
  • Belief Update (Filtering):

    • b′(s′)=αP(e∣s′)∑sP(s′∣s,a)b(s)
    • First term: sensor model, probability of sensing evedence e in state s
    • Second term: sum over all the states that can take to s’ after performing a.
    • Ofen written as b′=FORWARD(b,a,e).
  • Optimal Policy: π∗(b)

How to find the optimal policy?

  • POMDP VI
  • Dynamic Decision Network

POMDP to MDP

MDP consists of 2 components

  • Transition Function: P(s′∣s,a)
  • Reward Function: R(s)

Now we can convert POMDP problem into MDP problem in belief space.

  • Transition Function: P(b′∣b,a)

    • P(b′∣b,a)=∑eP(b′∣e,a,b)P(e∣a,b) =∑eP(b′∣e,a,b)∑s′P(e∣s′)∑sP(s′∣s,a)b(s)
    • where P(e∣a,b) is the percept probability
      • P(e∣a,b)=∑s′P(e∣a,s′,b)P(s′∣a,b) =∑s′P(e∣s′)P(s′∣a,b) =∑s′P(e∣s′)∑sP(s′∣s,a)b(s)
      • where term P(e∣s′) is sensor model, term P(s′∣s,a) is transition model.
  • Reward Function: ρ(b)

    • ρ(b)=∑sb(s)R(s)

POMDPs in physical space is transfered to MDPs in belief state space.

  • Belief State Space: space of probability distributions over original states, i.e. space of belief state b
    • For example, the initial belief state b in a 2*2 drid world is 1/4,1/4,1/4,1/4, where it is a point in belief state space.
    • Thue, belief state space is continuous and multi-dimensional (e.g., 4 dimensions in a 2*2 grid world). The original VI or PI in MDP cannot work in a continuous space.

POMDP Value Iteration

Ideas:

  • Original VI (in discrete state space): Calculate utility value of each physical state.
  • POMDP VI (in continuous state sapce, i.e. belief state space): Calculate utility value of conditional plan.

Concepts:

  • p: conditional plan, a policy at a belief b.
  • αp(s): The utility of executing a fixed conditional plan p starting in physical state s.
  • ∑sb(s)αp(s): The expected utility of executing p in belief state b, which is linearly to belief state b.

Value Iteration in belief state space:
αp(s)=R(s)+γ(∑s′P(s′∣s,a)∑eP(e∣s′)αp.e(s′)′).


Problem

  • Exact POMPDP solvers only work for very small problems
  • Finding optimal policy for general POMDP is PSPACE-hard
  • Approximate solvers can scale to moderate sized problems

Online search tends to scale better.

Dynamic Decision Network

  • Execution of POMDP over time can be represented as a DDN
    • The transition and sensor models are represented by a dynamic Bayesian network (DBN), as described in Chapter 15.
    • The dynamic Bayesian network is extended with decision and utility nodes, as used in decision networks in Chapter 16. The resulting model is called a dynamic decision network, or DDN.
    • A filtering algorithm is used to incorporate each new percept and action and to update the belief state representation.
    • Decisions are made by projecting forward possible action sequences and choosing the best one.

Game Theory

Single-Move Game

  • Conponents

    • Agents
    • Actions
    • Payoff Function
      • can be represented as a matrix
  • Game Strategies

    • pure strategy: a deterministic policy
    • mixed strategy: a randomized policy that selects actions according to a probability distribution
    • game profile
      • game solutions

E.g. Prisoners’ Dilemma

Alice: testify Alice: refuse
Bob: testify B=-5;A=-5 B=0;A=-10
Bob: refuse B=-10;A=0 B=-1;A=-1

Alice finds that “testify” is a dominant strategy for the game, so she eliminates “refuse”, the same for Bob. They finally reach an equilibrium: both choose to “testify”.

  • Domination
    • strictly dominate
      • for one player, if the outcom of adopting strategy s is always higher than the outcome of every choice of strategies for other players, than strategy s strictly dominate other strategies.
        • E.g. In prisoners’ dilemma, for Alice, Alice is always better than Bob when adopts “testify”., i.e. in (B: testify, A: testify), A=-5, b=-5, where -5>=-5; in (B: refuse, A: testify), A=0, B=-10, where 0>=-10.
    • weekly dominate
  • Iterated elimination of strictly dominated strategies
  • Nash Equilibrium
    • A strategy profile forms an equilibrium if no player can benefit from switching strategies, given that every other player sticks to the same strategies
      • Note: an equilibrium is a local optimum
    • dominant strategy equilibrium
      • When each player has a dominant strategy, combination is called dominant strategy equilibrium

Although both choose “testify” and reach an equilibrium, the equilibrium is a local optimum, because they can both choose to “refuse”.

  • Pareto optimal
    • An outcome is Pareto optimal if there is no other outcomes that all players would prefer
    • Pareto dominate
      • An outcome is Pareto dominated by another outcome if all players would prefer the other outcome

(B: testify, A: testify) is the Nash equilibrium while (B: refuse, A: refuse) is Pareto dominating.

Another Example:

Acme: bluray Acme: dvd
Best: bluray B=+9;A=+9 B=-1;A=-4
Best: dvd B=-1;A=-3 B=+5;A=+5

We can find two Nash equilibria: (B: bluray, A: bluray) and (B: dvd, A: dvd). We can also find the Pareto Optimum: (B: bluray, A: bluray)). To choose the best strategy, the agent can either guess or communicate

  • Coordination Games
    • Games where agents need to communicate like the example.

Another Example: two-finger Morra

O: one O: two
E: one E=+2,O=-2 E=-3,O=+3
E: two E=-3,O=+3 E=+4,O=-4

Both E and O cannot reach Nash equilibrium because they always want to swicth a strategy in a bad situation, i.e. O will ask for switch when in (E: one, O: one) or (E: two, O: two), the same for E. It’s a zero-sum game.

Thus, there is no pure-strategy Nash equilibria, which requires for mixed strategy

  • A simple algorithm to find pure-strategy Nash equilibrium:

    1. For each column, mark cell if it has the maximum payoff for the row player
      • May be more than one
    2. For each row, mark cell if it has the maximum payoff for the column player
      • May be more than one
    3. Cells with both row and column marks are pure Nash equilibrium
  • Zero-Sum Games

    • Games in which the sum of the payoffs is always zero (or a constant), in other word, the agents cannot get benefit at the same time.
    • Minimax Algorithm
      • Minimax Game Tree
      • mixed strategy (with probability)
      • maximin equilibrium
    • See example in Lecture12 slide No. 5-6

Multiple-Move Game (simpliest ver.: Repeated Game)

  • The players face the same choices repeatedly

    • In other words, each time, history of all players’ previous choices is available
  • Two strategies

    • perpetual punishment
      • suppose that after each round there is a 99% chance that the players will meet again. Then the expected number of rounds is still 100, but neither player knows for sure which round will be the last.
        • If always choose “refuse”:
        • ∞∑t=00.99t⋅(−1)=−100
        • If one round choose “testify”:
        • 0+∞∑t=10.99t⋅(−5)=−495
    • tit-for-tat
      • calls for starting with “refuse” and then echoing the other player’s previous move on all subsequent moves.
      • So Alice would refuse as long as Bob refuses and would testify the move after Bob testified, but would go back to refusing if Bob did. Although very simple, this strategy has proven to be highly robust and effective against a wide variety of strategies.
赏

谢谢你请我吃糖果

支付宝
微信
  • 本文作者: YA
  • 本文链接: http://www.yuuuuang.com/2022/04/26/【学习笔记】人工智能规划与决策/
  • 版权声明: 本博客所有文章除特别声明外,均采用 MIT 许可协议。转载请注明出处!
  • 世界之内

扫一扫,分享到微信

【叶隙集】23 学习与研究
博士申请的总结
  1. 1. Classical AI Planning and Acting
  2. 2. Rational and Judgmental Decision Making
  3. 3. Decision Networks
  4. 4. Markov Decision Process
    1. 4.1. Value Iteration
    2. 4.2. Policy Iteration
  5. 5. Reinforcement Learning
    1. 5.1. Definition
    2. 5.2. Structure
      1. 5.2.1. Three agent designs
      2. 5.2.2. Learning Strategy
      3. 5.2.3. Known or Unknown Policy
    3. 5.3. Passive Learning
      1. 5.3.1. Direct Utility Estimation (Monte Carlo Learning, MC)
      2. 5.3.2. Adaptive Dynamic Programming (ADP)
      3. 5.3.3. Temporal-Difference Learning (TD)
    4. 5.4. Active Learning
      1. 5.4.1. Adaptive Dynamic Programming
      2. 5.4.2. Q-Learning
    5. 5.5. Function Approximation
    6. 5.6. Policy Search
  6. 6. POMDP
    1. 6.1. Basics
    2. 6.2. POMDP to MDP
    3. 6.3. POMDP Value Iteration
    4. 6.4. Dynamic Decision Network
  7. 7. Game Theory
    1. 7.1. Single-Move Game
    2. 7.2. Multiple-Move Game (simpliest ver.: Repeated Game)
© 2018-2025 YA
GitHub:hexo-theme-yilia-plus by Litten
本站总访问量25464次 | 本站访客数20622人
  • 所有文章
  • 友链
  • 关于我

tag:

  • 随笔
  • 年终总结
  • 世界之内
  • 世界之外
  • 叶隙集
  • 机器学习
  • 叶隙随笔
  • 图像处理
  • 数据挖掘

    缺失模块。
    1、请确保node版本大于6.2
    2、在博客根目录(注意不是yilia-plus根目录)执行以下命令:
    npm i hexo-generator-json-content --save

    3、在根目录_config.yml里添加配置:

      jsonContent:
        meta: false
        pages: false
        posts:
          title: true
          date: true
          path: true
          text: false
          raw: false
          content: false
          slug: false
          updated: false
          comments: false
          link: false
          permalink: false
          excerpt: false
          categories: false
          tags: true
    

  • 2024年终总结

    2025-04-08

    #随笔#年终总结

  • 【叶隙集】41 盘旋的白文鸟

    2025-01-12

    #随笔#叶隙集

  • 大语言模型正在伤害人机交互领域的研究

    2025-01-05

    #随笔#世界之内

  • 【叶隙集】40 台湾旅行

    2024-12-22

    #随笔#叶隙集

  • 【叶隙集】39 搬家了

    2024-09-05

    #随笔#叶隙集

  • 2023年终总结

    2024-06-27

    #随笔#年终总结

  • 【叶隙集】38 参加学术会议

    2024-05-22

    #随笔#叶隙集

  • Notes of 3D Gaussian Splatting

    2024-03-19

    #世界之内

  • 【叶隙集】37 音乐会和朋友

    2023-12-04

    #随笔#叶隙集

  • 【叶隙集】36 QE和音乐会

    2023-11-02

    #随笔#叶隙集

  • 【叶隙集】35 新室友和更积极的生活

    2023-09-11

    #随笔#叶隙随笔

  • 读书笔记|《规训与惩罚》

    2023-08-27

    #随笔#世界之外

  • 【叶隙集】34 无法参加学术会议

    2023-06-28

    #随笔#叶隙集

  • 【叶隙集】33 回国后与朋友和家人们的聚会

    2023-06-11

    #随笔#叶隙集

  • 视频压缩技术概述

    2023-04-28

    #世界之内

  • 2022年终总结

    2023-03-31

    #随笔#年终总结

  • 【叶隙集】32 平和的心态

    2022-12-27

    #随笔#叶隙集

  • 【叶隙集】31 双相情绪障碍症

    2022-12-17

    #随笔#叶隙集

  • 【学习笔记】CS5229 Advanced Computer Network

    2022-12-17

    #世界之内

  • 【叶隙集】30 下半学期太忙了!

    2022-11-25

    #随笔#叶隙集

  • 【叶隙集】29 当助教的半个学期

    2022-10-07

    #随笔#叶隙集

  • 【叶隙集】28 忙碌的第一个月

    2022-08-31

    #随笔#叶隙集

  • 【叶隙集】27 老师的职责

    2022-07-31

    #随笔#叶隙集

  • 【叶隙集】26 新加坡太难找工作了

    2022-07-23

    #随笔#叶隙集

  • 【叶隙集】25 生产工具、学习生活和阅读笔记

    2022-07-15

    #随笔#叶隙集

  • 【叶隙集】24 学习、科研、旅行和爱与关怀

    2022-06-24

    #随笔

  • 【叶隙集】23 学习与研究

    2022-04-26

    #随笔#叶隙集

  • 【学习笔记】人工智能规划与决策

    2022-04-26

    #世界之内

  • 博士申请的总结

    2022-03-31

    #随笔

  • 【叶隙集】22 新的体验和宗教

    2022-03-07

    #随笔#叶隙集

  • 2021年终总结

    2022-01-08

    #随笔#年终总结

  • 【叶隙集】21 新朋友和学术报告

    2021-10-31

    #随笔#叶隙集

  • 【叶隙集】20 音乐会与教训

    2021-10-19

    #随笔#叶隙集

  • 【叶隙集】19 六周年纪念日

    2021-10-03

    #随笔#叶隙集

  • 【叶隙集】18 疫情与疫苗

    2021-09-24

    #随笔#叶隙集

  • 摘录|联合国2021年气候问题总结报告的摘要

    2021-09-19

    #世界之外

  • 【叶隙集】17 音乐会和读书

    2021-09-08

    #随笔#叶隙集

  • 【叶隙集】16 喜欢上了游泳

    2021-09-01

    #随笔#叶隙集

  • 【叶隙集】15 课前的夜曲

    2021-08-24

    #随笔#叶隙集

  • 【叶隙集】14 平稳的学习生活

    2021-08-16

    #随笔#叶隙集

  • 【叶隙集】13 生活与朋友

    2021-07-15

    #随笔#叶隙集

  • 【叶隙集】12 毕业

    2021-06-30

    #随笔#叶隙集

  • 【叶隙集】11 毕业前的生活

    2021-06-23

    #随笔#叶隙集

  • 读书笔记|《国境以南,太阳以西》读后感

    2021-06-17

    #随笔

  • 【叶隙集】10 青甘环线旅行

    2021-06-13

    #随笔#叶隙集

  • 半监督学习|论文粗读

    2021-06-07

    #机器学习

  • 【叶隙集】9 纯粹地生活

    2021-06-06

    #随笔#叶隙集

  • 【叶隙集】8 生活的界限

    2021-05-30

    #随笔#叶隙集

  • 【叶隙集】7 隔离结束

    2021-05-21

    #随笔#叶隙集

  • 【叶隙集】6 隔离生活

    2021-05-14

    #随笔#叶隙集

  • 【叶隙集】5 新的阶段

    2021-05-08

    #随笔#叶隙集

  • 【叶隙集】4 团队管理

    2021-04-30

    #随笔#叶隙集

  • 【叶隙集】3 过低的自我评价

    2021-04-23

    #随笔#叶隙集

  • 【叶隙集】2 方向与交往

    2021-04-16

    #随笔#叶隙集

  • 【叶隙集】1 原爆点-续

    2021-04-08

    #随笔#叶隙集

  • 随笔|目的与纯粹

    2021-03-28

    #随笔

  • 随笔|白文鸟

    2021-01-20

    #随笔

  • 写在一百以后——2020年终总结

    2021-01-01

    #随笔#年终总结

  • 随笔|选择

    2020-12-25

    #随笔

  • 读书笔记|《人生的意义》总结、摘录

    2020-11-25

    #世界之外

  • 书评|Normal People, Normal Love

    2020-10-07

    #随笔

  • Decision Making|人工智能、机器学习与强化学习的概述与比较

    2020-10-03

    #世界之内

  • 随笔|疫情后的总结

    2020-09-10

    #随笔

  • 学习笔记@PRML|1 Introduction

    2020-07-31

    #世界之内

  • 随笔|面试后的回顾与思考

    2020-07-26

    #随笔

  • 数据挖掘|数据挖掘概论笔记

    2020-06-24

    #世界之内#数据挖掘

  • 续写|美女或野兽

    2020-06-18

    #随笔

  • 随笔|无常

    2020-05-31

    #随笔

  • 现象学|胡塞尔《小观念》笔记

    2020-05-13

    #世界之外

  • 随笔|我的局限性

    2020-05-13

    #随笔

  • 随笔|胡乱的记录

    2020-04-09

    #随笔

  • 随笔|疫情

    2020-02-16

    #随笔

  • 随笔|怅惘地忖度

    2020-01-29

    #随笔

  • 2019年终总结

    2019-12-08

    #随笔#年终总结

  • 机器学习|Flow-based Model学习笔记

    2019-11-06

    #世界之内#机器学习

  • 【Introduction to TensorFlow】03 卷积神经网络与复杂数据集

    2019-10-31

    #世界之内#机器学习

  • 【Introduction to TensorFlow】02 初识机器学习与计算机视觉

    2019-10-29

    #世界之内#机器学习

  • 【Introduction to TensorFlow】01 TF 快速入门

    2019-10-29

    #世界之内#机器学习

  • 【Introduction to TensorFlow】00 课程概览

    2019-10-29

    #世界之内#机器学习

  • 随笔|呓语

    2019-10-27

    #随笔

  • 周记|面纱 久别重逢

    2019-09-21

    #随笔

  • 学习笔记|拟合优化

    2019-09-15

    #世界之内

  • 周记|爱人 体验 芝诺

    2019-09-07

    #随笔

  • 摘录|造成不幸福的原因之六:嫉妒

    2019-09-06

    #世界之外

  • 随笔|虚无 纵欲

    2019-08-22

    #随笔

  • 周记|尘埃落定

    2019-06-29

    #随笔

  • 周记|本能 愉悦 基因

    2019-06-12

    #随笔

  • 周记|空荡荡

    2019-06-02

    #随笔

  • 四月裂帛——读《女儿红》

    2019-05-30

    #随笔#世界之外

  • 机器学习|主成分分析

    2019-05-10

    #世界之内#机器学习

  • 《好运设计》史铁生

    2019-05-09

    #世界之外

  • 机器学习|感知机与支持向量机

    2019-04-27

    #世界之内#机器学习

  • 周记|记忆 概念 庸俗

    2019-04-27

    #随笔

  • 机器学习|模型评估与选择

    2019-04-17

    #世界之内#机器学习

  • 机器推理|SLD Resolution

    2019-04-06

    #世界之内

  • 第五代计算机

    2019-03-31

    #世界之内

  • 学习笔记|Volume Rendering

    2019-03-31

    #世界之内#图像处理

  • 周记|三月驼云

    2019-03-28

    #随笔

  • 生成对抗网络与强化学习:文本生成的方法

    2019-03-11

    #世界之内

  • 《桨声灯影里的秦淮河》俞平伯

    2019-03-09

    #世界之外

  • 周记|雨

    2019-03-09

    #随笔

  • 《春之积雪》简媜

    2019-03-01

    #世界之外

  • 周记|逃离

    2019-02-15

    #随笔

  • 存在主义是一种人道主义

    2019-02-11

    #世界之外

  • 学习笔记|比较文学

    2019-02-09

    #世界之外

  • 尼采的美学

    2019-02-01

    #世界之外

  • 哲学涉猎

    2019-02-01

    #世界之外

  • 读书笔记|光的诗人——《如何看懂印象派》

    2019-01-28

    #随笔#世界之外

  • 叔本华的生命意志哲学

    2019-01-25

    #世界之外

  • 再也不要把他们弄丢了

    2019-01-21

    #随笔

  • 2018年终总结

    2018-12-31

    #随笔#年终总结

  • 人类的心理行为模式

    2018-12-25

    #世界之外

  • 周记|神经症人格

    2018-12-22

    #随笔

  • 【周记】旋转

    2018-11-30

    #随笔

  • 七牛云Bucket失效

    2018-11-21

    #世界之内

  • 周记|从前的日色慢

    2018-11-21

    #随笔

  • 【数理逻辑】Incompleteness Theorem

    2018-11-10

    #世界之外

  • 专业随想

    2018-11-05

    #随笔

  • 生活

    2018-11-04

    #世界之外

  • 计算机组成与体系结构

    2018-11-04

    #世界之内

  • 【强化学习】Policy Gradient

    2018-11-03

    #世界之内

  • 怀疑是否有价值——怀疑论

    2018-10-30

    #世界之外

  • 周记|Every hero and coward

    2018-10-20

    #随笔

  • Web in Java

    2018-10-11

    #世界之内

  • 周记|十月女泽

    2018-10-02

    #随笔

  • 托福备考

    2018-09-28

    #世界之内

  • 周记|裸体之舞

    2018-09-24

    #随笔

  • 周记|中秋幸福

    2018-09-18

    #随笔

  • History of artificial intelligence

    2018-09-09

    #世界之外

  • 周记|我那无趣的灵魂

    2018-09-09

    #随笔

  • Softmax Regression

    2018-09-08

    #世界之内

  • 周记|Rational

    2018-09-02

    #随笔

  • 贰 《SICP》笔记:模块化、对象和状态

    2018-08-05

    #世界之内

  • 周记|困倦

    2018-08-04

    #随笔

  • 壹 《SICP》笔记:构造数据抽象

    2018-07-31

    #世界之内

  • 周记|原爆点

    2018-07-31

    #随笔

  • 零 《SICP》笔记:构造过程抽象

    2018-07-23

    #世界之内

  • Norms or maybe more

    2018-07-09

    #世界之内

  • 事已至此

    2018-06-24

    #随笔

  • 【增强学习】AirSim搭建

    2018-06-02

    #世界之内

  • 【机器学习】BP算法

    2018-05-26

    #世界之内

  • 【康德】宏大的哲学语境

    2018-05-26

    #世界之外

  • 【康德】康德的研究领域是什么

    2018-05-11

    #世界之外

  • 【高等数学】什么是梯度(期中考试复习思考)

    2018-04-29

    #世界之内

  • 《自控力》读书笔记

    2018-04-21

    #随笔

  • 【线性代数】The Essence of Linear Algebra

    2018-04-21

    #世界之内

  • 【数据结构与算法】临时抱佛脚

    2018-03-10

    #世界之内

  • 科技革命与人类社会——《论工业社会及其未来》读后感

    2018-03-08

    #随笔

  • 《论工业社会及其未来》原文摘录

    2018-02-23

    #世界之外

  • 《如何高效学习》读后总结

    2018-02-19

    #随笔

  • 《精进》chapter-2读后总结

    2018-02-13

    #随笔

  • A Review of Brian - Inspired Computer Vision

    2018-02-11

    #世界之内

  • 最近有个女生,说对我很失望

    2017-12-07

    #随笔

  • 病入膏肓

    2017-01-29

    #随笔

  • 白文鸟

    2016-10-29

    #随笔

  • 《不能承受的生命之轻》读后感

    2016-07-13

    #随笔

  • 都五月份了

    2016-04-29

    #随笔

  • 《四月裂帛》简媜

    2014-09-29

    #世界之外

  • Wuuuudle
  • Nemo
  • Elmo (yyh)
  • highestpeak
  • Kazoo Blog
努力做一名谦逊、独立、乐于思考的学生