Python集成Excel的神器Grid Studio | Alan Hou的个人博客

这是一个最近两天 GitHub 上 Trending 的一个项目，是一个深度集成了Python编程语言的网页端电子表格应用。

它提供集成了加载、清理、操作和可视化数据的工作流。通过使用 Go 语言编写的电子表格后端并集成了Python运行时来操作其内容。因此它可充分地利用了强大的 Python库如 Pandas, Numpy等来进行数据分析，同时又结合了 Excel 的强大特性。

项目地址：https://github.com/ricklamers/gridstudio

测试环境：Ubuntu 16.04

# 要求安装 Docker，如未安装，使用
curl -sSL https://get.docker.com/ | sh
# 使用 xxx 用户运行 Docker
sudo usermod -aG docker xxx
sudo systemctl start docker
# 安装&运行
git clone https://github.com/ricklamers/gridstudio
cd gridstudio && sudo ./run.sh

# 要求安装 Docker，如未安装，使用

curl -sSL https://get.docker.com/ | sh

# 使用 xxx 用户运行 Docker

sudo usermod -aG docker xxx

sudo systemctl start docker

# 安装&运行

git clone https://github.com/ricklamers/gridstudio

cd gridstudio && sudo ./run.sh

此时访问http://127.0.0.1:8080或http://your.ip.addr:8080；默认用户名/密码均为 admin

示例效果：

import numpy as np
import pandas as pd
import math
import time

n = 10000

for i in range(9):
    sample_n = 2**i * 40
    print(sample_n)
    
    normally_dist = np.random.randn(sample_n)
    rep_count = math.ceil(n / sample_n)
    normally_dist = np.repeat(normally_dist, rep_count)
    
    # limit
    normally_dist = normally_dist[0:n]
    
    sheet("A1", pd.DataFrame(normally_dist))
    
    time.sleep(1)

import numpy as np

import pandas as pd

import math

import time

n = 10000

for i in range(9):

sample_n = 2**i * 40

print(sample_n)

normally_dist = np.random.randn(sample_n)

rep_count = math.ceil(n / sample_n)

normally_dist = np.repeat(normally_dist, rep_count)

# limit

normally_dist = normally_dist[0:n]

sheet("A1", pd.DataFrame(normally_dist))

time.sleep(1)

import requests
from bs4 import BeautifulSoup
import pandas as pd

# make HTTPS GET request to HN
request = requests.get("https://news.ycombinator.com/")

# parse HTML with bs4
soup = BeautifulSoup(request.content, features="html.parser")

# select posts from table
posts = soup.select('table.itemlist > tr')

# messy bit: extract relevant information from HTML
post_titles = []
post_authors = []
post_urls = []

for i, post in enumerate(posts):
    
    if post.get('class') and 'athing' in post.get('class'):
        post_titles.append(
            post.select('a.storylink')[0].text)
        post_urls.append(
            post.select('a.storylink')[0].get('href'))
        
        user_el = posts[i+1].select('a.hnuser')
        if len(user_el) > 0:
            post_authors.append(user_el[0].text)
        else:
            post_authors.append('No author')
        

# construct dataframe from lists
df = pd.DataFrame({
    "titles": post_titles, 
    "authors": post_authors, 
    "urls": post_urls})

# write dataframe to sheet starting at A1 position
sheet("A1", df, headers=True)

import requests

from bs4 import BeautifulSoup

import pandas as pd

# make HTTPS GET request to HN

request = requests.get("https://news.ycombinator.com/")

# parse HTML with bs4

soup = BeautifulSoup(request.content, features="html.parser")

# select posts from table

posts = soup.select('table.itemlist > tr')

# messy bit: extract relevant information from HTML

post_titles = []

post_authors = []

post_urls = []

for i, post in enumerate(posts):

if post.get('class') and 'athing' in post.get('class'):

post_titles.append(

post.select('a.storylink')[0].text)

post_urls.append(

post.select('a.storylink')[0].get('href'))

user_el = posts[i+1].select('a.hnuser')

if len(user_el) > 0:

post_authors.append(user_el[0].text)

else:

post_authors.append('No author')

# construct dataframe from lists

df = pd.DataFrame({

"titles": post_titles,

"authors": post_authors,

"urls": post_urls})

# write dataframe to sheet starting at A1 position

sheet("A1", df, headers=True)

Hi，您需要填写昵称和邮箱！